Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240512となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Dual-Domain Deep D-bar Method for Solving Electro Impedance Tomography (特集電気インピーダンス・トモグラフィー) Dual-Domain Deep D-bar Method for Solving Electrical Impedance Tomography ( http://arxiv.org/abs/2407.03335v1 ) ライセンス: Link先を確認	Xiang Cao, Qiaoqiao Ding, Xiaoqun Zhang,	(参考訳) 電気インピーダンストモグラフィー(EIT)の高効率化と簡易化により, 正則化D-bar法は最も顕著な解法の一つである。非線型フーリエ領域の散乱データにローパスフィルタを適用して直接アプローチし、滑らかな導電率近似を与える。しかしDバー画像は、正確な高周波情報の欠如と問題の不備により、コントラストが低く、解像度が低いことが多い。本稿では、低コントラストDバー画像から高コントラストDバー画像列を検索するデュアルドメインニューラルネットワークアーキテクチャを提案する。導電率分布の空間的特徴をより強調するために、広く採用されているU-netは、予測されたD-bar画像列から導電率画像の校正のために調整されている。このようなハイブリッド手法をDual-Domain Deep D-bar法と呼ぶのは,散乱データと画像情報の両方を考慮するためである。単スケール構造と比較して, 提案するマルチスケール構造は, アーティファクトの低減と導電率近似の精細化に優れた性能を示す。さらに、GMRESアルゴリズムを用いて離散Dバーシステムを解くには、CPUベースのデバイスで非常に時間がかかる計算の複雑さが伴う。そこで我々は,Dバーによるデータ拡張処理を高速化するために,GPUをベースとしたリヒャルトソン反復法を設計した。 KIT4 および ACT4 システムのシミュレーション EIT データの数値計算を行い,既存の手法と比較して絶対 EIT 画像品質が顕著に向上したことを示す。 The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolution due to the absence of accurate high-frequency information and ill-posedness of the problem. In this paper, we proposed a dual-domain neural network architecture to retrieve high-contrast D-bar image sequences from low-contrast D-bar images. To further accentuate the spatial features of the conductivity distribution, the widely adopted U-net has been tailored for conductivity image calibration from the predicted D-bar image sequences. We call such a hybrid approach by Dual-Domain Deep D-bar method due to the consideration of both scattering data and image information. Compared to the single-scale structure, our proposed multi-scale structure exhibits superior capabilities in reducing artifacts and refining conductivity approximation. Additionally, solving discrete D-bar systems using the GMRES algorithm entails significant computational complexity, which is extremely time-consuming on CPU-based devices. To remedy this, we designed a surrogate GPU-based Richardson iterative method to accelerate the data enhancement process by D-bar. Numerical results are presented for simulated EIT data from the KIT4 and ACT4 systems to demonstrate notable improvements in absolute EIT imaging quality when compared to existing methodologies.	翻訳日:2024-07-22 22:09:05 公開日:2024-05-12
# 抽象的多文書要約のためのディペンタングリング特異性 Disentangling Specificity for Abstractive Multi-document Summarization ( http://arxiv.org/abs/2406.00005v1 ) ライセンス: Link先を確認	Congbo Ma, Wei Emma Zhang, Hu Wang, Haojie Zhuang, Mingyu Guo,	(参考訳) 多文書要約(MDS)は文書集合から要約を生成する。セット内の各ドキュメントはトピック関連の概念を記述し、各ドキュメントは独自の内容を持っている。しかし、文書の特異性は既存のMDSアプローチからはほとんど注目されていない。各文書の特定の情報を無視することは、生成された要約の包括性を制限します。この問題を解決するために,本稿では,文書から特定の内容を1つの文書集合に切り離す手法を提案する。文書固有の表現は、提案した直交制約によって互いに距離を置くことを奨励され、特定の表現学習者によって学習される。より広範な分析を行い、特定の情報と文書集合の表現が独特な強みに寄与し、それらの組み合わせがMDSにとってより包括的な解決策をもたらすという興味深い知見を得た。また、共通情報(つまり共有情報)がMDS設定下での全体的なパフォーマンスにはあまり寄与しないことがわかった。 Implemetationのコードはhttps://github.com/congboma/DisentangleSum.comで公開されている。 Multi-document summarization (MDS) generates a summary from a document set. Each document in a set describes topic-relevant concepts, while per document also has its unique contents. However, the document specificity receives little attention from existing MDS approaches. Neglecting specific information for each document limits the comprehensiveness of the generated summaries. To solve this problem, in this paper, we propose to disentangle the specific content from documents in one document set. The document-specific representations, which are encouraged to be distant from each other via a proposed orthogonal constraint, are learned by the specific representation learner. We provide extensive analysis and have interesting findings that specific information and document set representations contribute distinctive strengths and their combination yields a more comprehensive solution for the MDS. Also, we find that the common (i.e. shared) information could not contribute much to the overall performance under the MDS settings. Implemetation codes are available at https://github.com/congboma/DisentangleSum.	翻訳日:2024-06-09 16:19:21 公開日:2024-05-12
# 数値ライブラリの自動チューニングへのXAIの適用 Adaptation of XAI to Auto-tuning for Numerical Libraries ( http://arxiv.org/abs/2405.10973v1 ) ライセンス: Link先を確認	Shota Aoki, Takahiro Katagiri, Satoshi Ohshima, Masatoshi Kawai, Toru Nagai, Tetsuya Hoshino,	(参考訳) 人工知能(AI)のアウトプットの非規制利用に関する懸念が持ち上がり、様々な社会問題に繋がる可能性がある。人間は定期的に情報を検証するが、膨大な量のAI生成結果を手動で検査することは現実的ではない。したがって、自動化と可視化が不可欠である。この状況において、説明可能なAI(XAI)技術は、AIモデル開発の合理化と、ユーザへのAI出力の説明の負担軽減を目的として、注目を集めている。同時に、数値計算におけるパフォーマンスチューニングに必要な時間を削減することを目的として、ソフトウェア自動チューニング(AT)技術が出現している。 ATはパラメータ最適化と数値計算のための高性能プログラミングにおけるコスト削減のための強力なツールである。 ATのメカニズムとAI技術の相乗効果は注目に値する。しかし、AIをATメカニズムに適用することは、AIモデル説明可能性の課題をもたらす。本研究は、精度保証数値計算の性能パラメータチューニングとスパース反復アルゴリズムという、2つの異なるプロセスに統合されたAIモデルのXAIに焦点を当てる。 Concerns have arisen regarding the unregulated utilization of artificial intelligence (AI) outputs, potentially leading to various societal issues. While humans routinely validate information, manually inspecting the vast volumes of AI-generated results is impractical. Therefore, automation and visualization are imperative. In this context, Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users. Simultaneously, software auto-tuning (AT) technology has emerged, aiming to reduce the man-hours required for performance tuning in numerical calculations. AT is a potent tool for cost reduction during parameter optimization and high-performance programming for numerical computing. The synergy between AT mechanisms and AI technology is noteworthy, with AI finding extensive applications in AT. However, applying AI to AT mechanisms introduces challenges in AI model explainability. This research focuses on XAI for AI models when integrated into two different processes for practical numerical computations: performance parameter tuning of accuracy-guaranteed numerical calculations and sparse iterative algorithm.	翻訳日:2024-05-27 03:08:05 公開日:2024-05-12
# ランダム行列アンサンブルによる無限次元アンダーソン転移の臨界挙動の記述:対数的多フラクタル性と臨界局在 Describing the critical behavior of the Anderson transition in infinite dimension by random-matrix ensembles: logarithmic multifractality and critical localization ( http://arxiv.org/abs/2405.10975v1 ) ライセンス: Link先を確認	Weitao Chen, Olivier Giraud, Jiangbin Gong, Gabriel Lemarié,	(参考訳) 解析的トラクタビリティのため、ランダム行列アンサンブルは、計算的に要求されるシステムにおいて、エキゾチックな現象を探索するための堅牢なプラットフォームとして機能する。本稿では,共用文字 (arXiv:2312.17481) に基づいて,アンダーソン転移の無限次元における臨界挙動を解析的手法と広範囲な数値シミュレーションを用いて解析する。本研究は対数的多フラクタル性と臨界局在の2種類の臨界挙動を明らかにする。従来の多フラクタル性とは対照的に、新しい対数的多フラクタル性は、システムサイズの対数と代数的にスケールする固有状態モーメントを特徴付ける。局所化を示す有限値に収束する位数$q>1/2$の固有状態モーメントによって特徴づけられる臨界局所化は、ランダム正則と有効無限次元のエルド・オズ・レニイグラフで観測される臨界挙動と一致する特徴的な対数的有限サイズまたは時間効果を示す。摂動法を用いて,本モデルにおける対数的多フラクタル性と臨界局所化の存在を確立する。さらに、時間力学と空間相関関数における新しいスケーリング行動の出現について検討する。我々のモデルは、無限次元量子乱れ系を研究するための貴重な枠組みを提供し、我々の発見の普遍性は、無限次元におけるアンダーソン転移と似た、競合する多体局在遷移を含む、有限サイズ効果とスローダイナミクスの系への広範な適用を可能にする。 Due to their analytical tractability, random matrix ensembles serve as robust platforms for exploring exotic phenomena in systems that are computationally demanding. Building on a companion letter [arXiv:2312.17481], this paper investigates two random matrix ensembles tailored to capture the critical behavior of the Anderson transition in infinite dimension, employing both analytical techniques and extensive numerical simulations. Our study unveils two types of critical behaviors: logarithmic multifractality and critical localization. In contrast to conventional multifractality, the novel logarithmic multifractality features eigenstate moments scaling algebraically with the logarithm of the system size. Critical localization, characterized by eigenstate moments of order $q>1/2$ converging to a finite value indicating localization, exhibits characteristic logarithmic finite-size or time effects, consistent with the critical behavior observed in random regular and Erd\"os-R\'enyi graphs of effective infinite dimensionality. Using perturbative methods, we establish the existence of logarithmic multifractality and critical localization in our models. Furthermore, we explore the emergence of novel scaling behaviors in the time dynamics and spatial correlation functions. Our models provide a valuable framework for studying infinite-dimensional quantum disordered systems, and the universality of our findings enables broad applicability to systems with pronounced finite-size effects and slow dynamics, including the contentious many-body localization transition, akin to the Anderson transition in infinite dimension.	翻訳日:2024-05-27 03:08:05 公開日:2024-05-12
# 光系II反応中心における深層学習による電荷輸送予測 Charge-transport forecasted via deep learning in the photosystem II reaction center ( http://arxiv.org/abs/2405.12232v1 ) ライセンス: Link先を確認	Zi-Ran Zhao, Shun-Cai Zhao, Yi-Meng Huang,	(参考訳) 限られた理論シミュレーションデータを通じて将来の物理行動を予測することは、人工知能技術と量子物理学の統合による新たな研究パラダイムである。本研究では,光合成II反応中心(PSII-RC)における長寿命記憶(LSTM)ネットワークと誤差しきい値学習法により,電荷輸送(CT)の挙動を長期にわたって予測した。 8 fs以内の理論的シミュレーションデータを改良LSTMネットワークに入力し, トレーニングセットの収集時間と比較すると, 10^{-4}=桁違いの差が長時間に渡り, 明らかな予測結果が得られた。その結果、LSTMを用いて、量子物理法に加えて、CTを制御している物理を解明する可能性が示唆された。本研究の意義は、分子スケールでの光合成の理解を深めるために、LSTMのスコープと有効性を完全に解明することである。 Predicting future physical behavior through the limited theoretical simulation data available is an emerging research paradigm resulted by the integration of artificial intelligence technology and quantum physics. In this work, the charge-transport(CT) behavior was forecasted over a long time by a deep learning model, the long short-term memory (LSTM) network with error threshold training method in the photosynthesis II reaction center (PSII-RC). The theoretical simulation data within 8 fs was fed to the modified LSTM network for training, which brings out a distinct prediction with difference of $10^{-4}$ orders of magnitude over a long time period compared to the collection time for training sets. The results indicate the potential of employing LSTM to reveal the physics governing CT in addition to quantum physical methods. The implications of this work warrant further investigation to fully elucidate the scope and efficacy of LSTM for advancing our understanding of photosynthesis at the molecular scale.	翻訳日:2024-05-27 03:08:05 公開日:2024-05-12
# 教育用大規模言語モデル:調査 Large Language Models for Education: A Survey ( http://arxiv.org/abs/2405.13001v1 ) ライセンス: Link先を確認	Hanyi Xu, Wensheng Gan, Zhenlian Qi, Jiayang Wu, Philip S. Yu,	(参考訳) 人工知能(AI)は伝統的な教育に大きな影響を与えている。近年,自然言語処理,コンピュータビジョン,音声認識,自律運転など,大規模言語モデル (LLM) が多用されている。 LLMは、レコメンデーション、金融、政府、教育、法務、金融など、多くの分野にも適用されている。強力な補助ツールとして、LLMは深層学習、事前学習、微調整、強化学習といった様々な技術を取り入れている。 LLMをスマート教育(LLMEdu)に利用することは、世界中の国々にとって重要な戦略的方向性である。 LLMは、教育の質の向上、教育モデルの変更、教師の役割の変更において大きな期待を示してきたが、これらの技術は依然としていくつかの課題に直面している。本稿では,LLMEduの体系的レビューを行い,現在の技術,課題,今後の発展に焦点をあてる。まず,LLMEduの現状を概説し,LLMと教育の特徴を紹介するとともに,LLMを教育に組み込むことのメリットも紹介する。また,LLMを教育産業に統合するプロセスや,関連技術の導入についても検討する。最後に,LLMEduが直面する課題と課題,および今後のLLMEduの最適化の可能性について議論する。 Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and finance. As powerful auxiliary tools, LLMs incorporate various technologies such as deep learning, pre-training, fine-tuning, and reinforcement learning. The use of LLMs for smart education (LLMEdu) has been a significant strategic direction for countries worldwide. While LLMs have shown great promise in improving teaching quality, changing education models, and modifying teacher roles, the technologies are still facing several challenges. In this paper, we conduct a systematic review of LLMEdu, focusing on current technologies, challenges, and future developments. We first summarize the current state of LLMEdu and then introduce the characteristics of LLMs and education, as well as the benefits of integrating LLMs into education. We also review the process of integrating LLMs into the education industry, as well as the introduction of related technologies. Finally, we discuss the challenges and problems faced by LLMEdu, as well as prospects for future optimization of LLMEdu.	翻訳日:2024-05-27 03:08:05 公開日:2024-05-12
# DuetRAG: 共同検索強化世代 DuetRAG: Collaborative Retrieval-Augmented Generation ( http://arxiv.org/abs/2405.13002v1 ) ライセンス: Link先を確認	Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang,	(参考訳) Retrieval-Augmented Generation (RAG) 法は,Large Language Models (LLMs) の入力を関連付け,知識集約タスクにおける事実誤りを低減する。しかしながら、現代のRAGアプローチは、対応するドメイン知識の欠如により、複雑なドメイン質問(例えば、HotPot QA)において、無関係な知識検索問題に悩まされ、低品質世代に繋がる。この問題に対処するため,我々は新しい協調検索型生成フレームワークであるDuetRAGを提案する。我々のブートストラッピング哲学は、知識検索の品質を向上させるため、ドメインフィニングとRAGモデルを同時に統合し、生成品質を向上させることである。最後に、HotPot QAにおいて、DuetRAGの人間研究者とのマッチングを実証した。 Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generations. To address this issue, we propose a novel Collaborative Retrieval-Augmented Generation framework, DuetRAG. Our bootstrapping philosophy is to simultaneously integrate the domain fintuning and RAG models to improve the knowledge retrieval quality, thereby enhancing generation quality. Finally, we demonstrate DuetRAG' s matches with expert human researchers on HotPot QA.	翻訳日:2024-05-27 03:08:05 公開日:2024-05-12
# 会話データ生成の最近の進歩に関する調査研究 A Survey on Recent Advances in Conversational Data Generation ( http://arxiv.org/abs/2405.13003v1 ) ライセンス: Link先を確認	Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi,	(参考訳) 近年の会話システムの進歩は、様々な領域における人間と機械の相互作用を著しく向上させてきた。しかし,特殊な対話データが不足しているため,これらのシステムの訓練は困難である。伝統的に、会話データセットはクラウドソーシングによって作成されていたが、この手法はコストがかかり、規模が限られ、労働集約的であることが証明された。ソリューションとして、既存のデータセットを拡張したり、テキストリソースを会話形式に変換する技術を活用して、データセット作成のためのより効率的でスケーラブルなアプローチを提供する合成対話データの開発が登場した。本稿では,オープンドメイン,タスク指向,情報検索の3種類の対話システムに着目し,マルチターン対話データ生成の体系的・包括的レビューを行う。本稿では,シードデータ生成や発話生成,品質フィルタリングといったキーコンポーネントに基づく既存研究を分類し,会話データ生成システムの主な原理を概説する一般的なフレームワークを紹介する。さらに、合成会話データの評価のための評価指標と手法について検討し、現場における課題に対処し、今後の研究に向けた可能性を探る。我々のゴールは、最先端の手法の概要を提示し、この分野のさらなる研究の機会を強調することで、研究者や実践者の進歩を加速することである。 Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.	翻訳日:2024-05-27 02:58:21 公開日:2024-05-12
# MathDivide: 大規模言語モデルによる数学的推論の改善 MathDivide: Improved mathematical reasoning by large language models ( http://arxiv.org/abs/2405.13004v1 ) ライセンス: Link先を確認	Saksham Sahai Srivastava, Ashutosh Gandhi,	(参考訳) 大規模言語モデルは複雑な言語的および認知的なタスクを扱うことができることが証明されている。そのため、それらの用法は数学のような論理的推論能力を必要とするタスクにまで拡張された。本稿では,数学的問題をより単純なサブプロブレムに分解するMathDivideというプロンプト手法を提案する。各サブプロブレムは、対応する代数式に対してLLMによって生成されたPythonコードによって評価された値の代数式として定式化される。 Pythonコードに供給される値は、問題ステートメントで提供される数値である。サブプロブレムの解は、問題文の最終的な答えを得るために構成される。最後に、最終回答を正解と比較する。最終回答が正しい答えと一致する場合、出力として生成され、その他のものとして、精製プロンプトがLLMに供給される。我々は、GSM8Kデータセットを用いて、このプロンプトをクローズドソースLLMモデルとオープンソースLLMモデルの両方で実験する。その結果、MathDivideはMath-prompterと呼ばれる先進的なプロンプト技術を大幅に上回った。 Large language models have been proven to be capable of handling complex linguistic and cognitive tasks. Therefore their usage has been extended to tasks requiring logical reasoning ability such as Mathematics. In this paper, we propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems. Each of the subproblems is formulated as an algebraic expression whose value is evaluated by the Python code generated by the LLM for the corresponding algebraic expression. The values fed to the Python code are the numerical values provided in the problem statement. The solutions for the subproblems are composed together to obtain the final answer for the problem statement. Finally, the final answer is compared to the correct answer. If the final answer matches the correct answer, it is produced as output else a refinement prompt is fed to the LLM. We experiment with this prompting technique on both closed-source LLM models and open-source LLM models using GSM8K dataset. The results obtained demonstrate that MathDivide was able to significantly outperform the leading prompting technique called Math-prompter.	翻訳日:2024-05-27 02:58:21 公開日:2024-05-12
# 大規模言語モデルとソーシャルメディアデータを用いた急性炎症性疾患の理解 Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data ( http://arxiv.org/abs/2405.13005v1 ) ライセンス: Link先を確認	Nan Miles Xi, Hong-Long Ji, Lin Wang,	(参考訳) サルコイドーシスは,各種臓器に肉芽腫が出現する稀な炎症性疾患である。この病気は、その多様な症状と予測不可能な性質により、診断と治療の課題を呈する。本研究では,ソーシャルメディアプラットフォームRedditにおけるサルコイドーシスに関連する議論を分析するために,Large Language Model (LLM)を用いた。サルコイドーシス関連物質を正確に同定するためのLSMの有用性について検討した。症状は, 疲労, 腫大したリンパ節, 呼吸の短さなど多岐にわたる。プレドニゾンが最も処方された薬剤であり, インフリキシマブは予後改善に最も有効であった。特に, 年齢, 性別による予後の相違がみられ, 女性, 若年者の予後は良好であった。さらに、教師なしクラスタリングでは、独自の症状プロファイル、予後結果、人口分布を持つ3つの異なる患者サブグループ(フェノタイプ)が同定された。最後に、感情分析の結果、特に女性や若年者において、患者の精神疾患後の健康に適度なネガティブな影響が認められた。本研究は,ソーシャルメディアデータによるサルコイドーシスの理解にLLMを応用した最初の事例である。これは、その症状、治療、予後、患者の生活への影響に関するデータ駆動的な洞察を提供することによって、疾患を理解するのに寄与する。本研究は,サルコイドーシス患者に対するパーソナライズされた治療戦略の改善と,ケアの質の向上に直接的な意味を持っている。 Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of LLMs in accurately identifying sarcoidosis-related content. We discovered a wide array of symptoms reported by patients, with fatigue, swollen lymph nodes, and shortness of breath as the most prevalent. Prednisone was the most prescribed medication, while infliximab showed the highest effectiveness in improving prognoses. Notably, our analysis revealed disparities in prognosis based on age and gender, with women and younger patients experiencing good and polarized outcomes, respectively. Furthermore, unsupervised clustering identified three distinct patient subgroups (phenotypes) with unique symptom profiles, prognostic outcomes, and demographic distributions. Finally, sentiment analysis revealed a moderate negative impact on patients' mental health post-diagnosis, particularly among women and younger individuals. Our study represents the first application of LLMs to understand sarcoidosis through social media data. It contributes to understanding the disease by providing data-driven insights into its manifestations, treatments, prognoses, and impact on patients' lives. Our findings have direct implications for improving personalized treatment strategies and enhancing the quality of care for individuals living with sarcoidosis.	翻訳日:2024-05-27 02:58:21 公開日:2024-05-12
# Triple-CFN:抽象推論プロセスの強化のための概念空間の再構築 Triple-CFN: Restructuring Conceptual Spaces for Enhancing Abstract Reasoning process ( http://arxiv.org/abs/2403.03190v7 ) ライセンス: Link先を確認	Ruizhuo Song, Beiming Yuan,	(参考訳) 抽象推論は人工知能アルゴリズムに重大な課題をもたらし、知覚タスクに必要なものよりも高いレベルの認知能力を要求する。本研究では,Bongard Logo問題に対処するTriple-CFN法を導入し,競合するインスタンスの概念空間を暗黙的に再編成することで,顕著な推論精度を実現する。さらに、必要な修正を加えることで、トリプルCFNパラダイムはRPM(Raven's Progressive Matrices)問題でも有効であることが証明され、競争結果が得られた。 RPM問題におけるTriple-CFNの性能をさらに向上させるため,提案手法をMeta Triple-CFNネットワークにアップグレードし,RPM問題の概念空間を明示的に構築し,概念解釈性を確保しつつ高い推論精度を確保した。 Meta Triple-CFNの成功は、概念空間をモデル化するパラダイムに起因している。この考え方に基づいて、我々はRe-spaceレイヤを導入し、Meta Triple-CFNとTriple-CFNの両方の性能を高めました。本稿では,機械知能の進歩に寄与し,抽象的推論問題を解くための革新的なネットワーク設計を探求することによって,この分野におけるさらなるブレークスルーの道を開くことを目的とする。 Abstract reasoning poses significant challenges to artificial intelligence algorithms, demanding a higher level of cognitive ability than that required for perceptual tasks. In this study, we introduce the Triple-CFN method to tackle the Bongard Logo problem, achieving remarkable reasoning accuracy by implicitly reorganizing the conflicting concept spaces of instances. Furthermore, with necessary modifications, the Triple-CFN paradigm has also proven effective on the RPM (Raven's Progressive Matrices) problem, yielding competitive results. To further enhance Triple-CFN's performance on the RPM problem, we have upgraded it to the Meta Triple-CFN network, which explicitly constructs the concept space of RPM problems, ensuring high reasoning accuracy while achieving conceptual interpretability. The success of Meta Triple-CFN can be attributed to its paradigm of modeling the concept space, which is tantamount to normalizing reasoning information. Based on this idea, we have introduced the Re-space layer, boosting the performance of both Meta Triple-CFN and Triple-CFN. This paper aims to contribute to the advancement of machine intelligence and pave the way for further breakthroughs in this field by exploring innovative network designs for solving abstract reasoning problems.	翻訳日:2024-05-16 15:45:06 公開日:2024-05-12
# D4Cグラブトレイン:概念記述と建築分布によるRPMとボンガードログ問題の解法 D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts ( http://arxiv.org/abs/2403.03452v7 ) ライセンス: Link先を確認	Ruizhuo Song, Beiming Yuan,	(参考訳) 本稿では,抽象的推論の領域において,特にRaven's Progressive Matrices (RPM) と Bongard-Logo の課題に対処する上で,注目すべき進歩を実現する。リコネット(Lico-Net)は,RPM問題に顕著な精度で対処する新しいベースラインモデルである。この基礎を生かして、我々はD3Cアプローチを推進し、分布を通して抽象的推論問題の根底にある概念を提唱する。この観点は、Lico-NetとBongard-Logoタスクに優れたベースラインモデルの両方のパフォーマンスを向上させる。 D3Cの計算効率を高めるために,D3C-cosの変種を示す。さらに,これらの領域における概念的境界を再定義するD2C法を提案する。最後に、我々の方法論をD4Cに拡張し、さらに概念境界を洗練させ、RPMとBongard-Logoの課題において実質的な改善を示す。全体として、我々の貢献は抽象的推論の分野における新たな展望と実践的な進歩を示している。 This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstract reasoning problems through distributions. This perspective enhances the performance of both Lico-Net and a baseline model excelling in Bongard-Logo tasks. To bolster the computational efficiency of D3C, we present the D3C-cos variant, offering a streamlined yet precise solution. Furthermore, we propose the D2C method, redefining conceptual boundaries within these domains and bridging the divide between high-level abstractions and their lower-dimensional counterparts. Finally, we extend our methodology to D4C, employing adversarial techniques to refine conceptual boundaries further and demonstrate substantial improvements in both RPM and Bongard-Logo challenges. Overall, our contributions present a fresh outlook and practical advancements in the field of abstract reasoning.	翻訳日:2024-05-16 15:45:06 公開日:2024-05-12
# 学習3期における熱力学限界 Thermodynamic limit in learning period three ( http://arxiv.org/abs/2405.08825v1 ) ライセンス: Link先を確認	Yuichiro Terasaki, Kohei Nakajima,	(参考訳) 周期 3 の連続した一次元写像はすべての周期を含む。周期軌道は3つのデータポイントだけを学習することで得られるのか? ランダムニューラルネットワークを用いた学習期間3について検討し,それに関連する普遍性について報告する。まず、トレーニングされたネットワークには、ターゲットデータとネットワーク設定の選択に依存する熱力学的制限があることを示します。分析の結果,学習期間のほとんどすべてが不安定であり,各ネットワークに特有のアトラクタ(トレーニングされていない場合もある)があることがわかった。本稿では,ネットワークに固有の埋め込み型アトラクタを表現した特性分岐の概念を提案し,対象データポイントとネットワーク重みのスケールを分岐パラメータとして機能させる。結論として、学習期間3は、システムの最近に存在する多数の不安定な期間の安定性の変化により、特徴的分岐によって様々な誘引子を生成する。 A continuous one-dimensional map with period three includes all periods. This raises the following question: Can we obtain any types of periodic orbits solely by learning three data points? We consider learning period three with random neural networks and report the universal property associated with it. We first show that the trained networks have a thermodynamic limit that depends on the choice of target data and network settings. Our analysis reveals that almost all learned periods are unstable and each network has its characteristic attractors (which can even be untrained ones). Here, we propose the concept of characteristic bifurcation expressing embeddable attractors intrinsic to the network, in which the target data points and the scale of the network weights function as bifurcation parameters. In conclusion, learning period three generates various attractors through characteristic bifurcation due to the stability change in latently existing numerous unstable periods of the system.	翻訳日:2024-05-16 15:24:45 公開日:2024-05-12
# データマイニングによる異なる言語におけるセキュリティ脆弱性タイプとその軽減に関する研究 A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages ( http://arxiv.org/abs/2405.08025v1 ) ライセンス: Link先を確認	Gábor Antal, Balázs Mosolygó, Norbert Vándor, Péter Hegedüs,	(参考訳) オンラインサービスにアクセスする人の数は日々増えており、新しいユーザーとともに、効果的でレスポンシブなサイバーセキュリティの必要性が高まっている。本研究の目的は,セキュリティ問題や修正点の観点から,最も広く使用されているプログラミング言語に共通するパターンが存在するかどうかを確かめることであった。本稿では,これらの言語から抽出したデータに基づいて,いくつかの統計値を示す。より人気のあるものを分析すると、同じセキュリティ問題が異なる言語で異なるように見え、提供されたソリューションも同じように異なる可能性があることが分かりました。また、同じサイズのプロジェクトでも、非常に異なる結果が得られ、同じタスクに対してソリューションを提供しても、共通の弱点が生まれることもわかりました。これらの統計は、セキュリティに関してプロジェクトの標準を完全に示すものではないかもしれないが、期待すべきことのよい参照ポイントを提供する。サンプルのサイズが大きくなると、さらに正確になり、与えられた言語で書かれたプロジェクト内のセキュリティ関連アクティビティをより深く理解することが可能になる。 The number of people accessing online services is increasing day by day, and with new users, comes a greater need for effective and responsive cyber-security. Our goal in this study was to find out if there are common patterns within the most widely used programming languages in terms of security issues and fixes. In this paper, we showcase some statistics based on the data we extracted for these languages. Analyzing the more popular ones, we found that the same security issues might appear differently in different languages, and as such the provided solutions may vary just as much. We also found that projects with similar sizes can produce extremely different results, and have different common weaknesses, even if they provide a solution to the same task. These statistics may not be entirely indicative of the projects' standards when it comes to security, but they provide a good reference point of what one should expect. Given a larger sample size they could be made even more precise, and as such a better understanding of the security relevant activities within the projects written in given languages could be achieved.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# ExplainableDetector:説明可能性分析によるSMSスパム検出のためのトランスフォーマーに基づく言語モデリング手法の探索 ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis ( http://arxiv.org/abs/2405.08026v1 ) ライセンス: Link先を確認	Mohammad Amaz Uddin, Muhammad Nazrul Islam, Leandros Maglaras, Helge Janicke, Iqbal H. Sarker,	(参考訳) SMS(ショートメッセージサービス)は、広く使われ、費用対効果の高い通信媒体であり、悲しいことにSMSスパムとして知られる望ましくないメッセージの避難所となった。スマートフォンやインターネット接続の急速な普及により、SMSスパムは大きな脅威となっている。スパマーは、携帯電話ユーザーにとってSMSの重要性に注目している。その結果、新たなサイバーセキュリティの脅威が出現し、SMSスパムの数は近年大幅に増加している。 SMSデータの非構造化フォーマットは、SMSスパム検出に重大な課題をもたらし、サイバーセキュリティ領域におけるスパム攻撃に成功させるのがより困難になる。本研究では、スパムメッセージ検出の問題を解決するために、最適化および微調整された変換器ベース大規模言語モデル(LLM)を用いる。このスパム検出にSMSスパムデータセットのベンチマークを使用し、いくつかの前処理技術を用いてクリーンでノイズのないデータを取得し、テキスト拡張手法を用いてクラス不均衡問題を解決する。総合実験の結果、最適化された細調整BERT (Bidirectional Encoder Representations from Transformers) 変種モデルRoBERTaは99.84\%の精度で得られた。また、このテキストベースのスパムSMS検出タスクにおいて、微調整されたモデルの透明性を探索し、説明する正と負の係数スコアを計算するために、説明可能な人工知能(XAI)技術を用いて作業する。さらに、従来の機械学習(ML)モデルも、その性能をトランスフォーマーベースモデルと比較するために検討された。この分析は、LLMがサイバーセキュリティ分野における複雑なテキストベースのスパムデータにどのように影響を与えるかを説明する。 SMS, or short messaging service, is a widely used and cost-effective communication medium that has sadly turned into a haven for unwanted messages, commonly known as SMS spam. With the rapid adoption of smartphones and Internet connectivity, SMS spam has emerged as a prevalent threat. Spammers have taken notice of the significance of SMS for mobile phone users. Consequently, with the emergence of new cybersecurity threats, the number of SMS spam has expanded significantly in recent years. The unstructured format of SMS data creates significant challenges for SMS spam detection, making it more difficult to successfully fight spam attacks in the cybersecurity domain. In this work, we employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection. We use a benchmark SMS spam dataset for this spam detection and utilize several preprocessing techniques to get clean and noise-free data and solve the class imbalance problem using the text augmentation technique. The overall experiment showed that our optimized fine-tuned BERT (Bidirectional Encoder Representations from Transformers) variant model RoBERTa obtained high accuracy with 99.84\%. We also work with Explainable Artificial Intelligence (XAI) techniques to calculate the positive and negative coefficient scores which explore and explain the fine-tuned model transparency in this text-based spam SMS detection task. In addition, traditional Machine Learning (ML) models were also examined to compare their performance with the transformer-based models. This analysis describes how LLMs can make a good impact on complex textual-based spam data in the cybersecurity field.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性 Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v1 ) ライセンス: Link先を確認	Tian Xie, Xueru Zhang,	(参考訳) 機械学習(ML)モデルは、人間に関する連続的な決定を行うために、社会的ドメインでますます使われているため、データ分散を再形成する能力を持つことが多い。人間は、戦略的エージェントとして、学習システムに反応して継続的に行動に適応する。人口が動的に変化するにつれて、MLシステムは高いパフォーマンスを保証するために頻繁な更新を必要とする可能性がある。しかし、高品質な人名サンプルの取得は、社会的領域において非常に困難であり、不可能である。この問題に対処する一般的なプラクティスは、モデル自体を使用してラベルのないデータサンプルを注釈付けすることです。本稿では,MLモデルが人的戦略応答を組み込んだモデルアノテート標本で再訓練された場合の長期的影響について検討する。まず,戦略エージェントとモデル間の相互作用を形式化し,それらの動的相互作用の下でどのように進化するかを分析する。モデルが再訓練されるにつれて、エージェントは肯定的な決定を受ける傾向が増し、一方、ポジティブなラベルを持つエージェントの割合は、時間とともに減少する可能性がある。そこで本研究では,力学を安定化させる改良されたリトレーニングプロセスを提案する。最後に、これらの再訓練プロセスによってアルゴリズム的公正性がどのように影響するかを検証し、各ラウンドで共通公正性制約を課すことは、長期的には不利なグループにとって利益にならないことを発見した。半合成および実データの実験は理論的な結果を検証する。 As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# PHUDGE: スケーラブルな審査員としてのPhi-3 PHUDGE: Phi-3 as Scalable Judge ( http://arxiv.org/abs/2405.08029v1 ) ライセンス: Link先を確認	Mahesh Deshwal, Apoorva Chawla,	(参考訳) 本稿では,PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test aheading each and every existing model in latency and throughput。 GPT4だけでなく、人間のアノテータにも、絶対的および相対的なグルーピングタスクにおいて、非常に強い相関関係を示す。我々は、コスト効率のよい運用グレードシステムにおいて、小さなLMの使用に対処しただけでなく、Causalモデリングが本質的に遅いだけでなく、学習能力を阻害し、システム全体をより速く、より良くするためには、より簡単なタスクに置き換えるべきであることを示した。我々は、体系的なML実験、思慮深いデータ拡張、問題自体の浄化に従えば、より少ないトレーニングデータでも10倍のモデルを達成できることを示した。我々の知る限り、我々は、ミンコフスキー距離とペナルティと損失の平滑化を制御し、クロスエントロピーの代わりに損失関数として使用し、安定したトレーニングと成績向上のためのより良い結果を得るために、アースモーバー距離の一般化版(別名ワッサースタイン距離)の試験と実演を行う。 In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser training data. To the best of our knowledge, we are re the first one to experiment and showcase the usage of generalised version of Earth Movers Distance AKA Wasserstein distance by using Minkowski Distance with a penalty to control loss smoothing and can be used as a loss function instead of Cross Entropy to get stable training and better results for grading tasks.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# HGTDR:不均質グラフ変換器による薬物再精製の促進 HGTDR: Advancing Drug Repurposing with Heterogeneous Graph Transformers ( http://arxiv.org/abs/2405.08031v1 ) ライセンス: Link先を確認	Ali Gharizadeh, Karim Abbasi, Amin Ghareyazi, Mohammad R. K. Mofrad, Hamid R. Rabiee,	(参考訳) モチベーション(Motivation): 薬物再資源化は、薬物開発に関連する時間とコストを削減するための有効な解決策である。しかし、これまでのところ、提案されている薬物再資源化アプローチは依然として期待に応える必要がある。したがって、コスト削減と人命向上のために、医薬品再資源化のための体系的なアプローチを提供することが不可欠である。近年, 生物学的ネットワークを用いた薬物再資源化法は, 有望な結果を生んでいる。しかし、これらの方法には制限がある。主に、これらの手法の範囲は、彼らが効果的に扱えるデータのサイズと多様性に制限される。もう一つの問題は、均質なデータに対処または変換する必要がある異質なデータを扱うことで起こり、情報の喪失につながる。重大な欠点は、これらのアプローチのほとんどはエンドツーエンドの機能がなく、手動による実装と特定の段階でのエキスパートの知識を必要としていることです。結果: 薬物再資源化に伴う課題に対処するため, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing) を提案する。 HGTDRは知識グラフに基づく薬物再資源化のための3段階のアプローチである。 1)異種知識グラフの構築 2ヘテロジニアスグラフトランス網の利用、及び 3) 完全に接続されたネットワークを用いて, 計算関係のスコアを算出した。 HGTDRを利用することで、ユーザは入力グラフを操作し、多様なエンティティから情報を抽出し、所望の出力を得ることができる。評価ステップでは,HGTDRが従来の手法と相容れない性能を示す。さらに,本手法の薬品再資源化提案の上位10点を検証するため,医療研究をレビューし,有望な結果が得られた。また,HGTDRは,薬物タンパク質や疾患タンパク質の相互関係などの数値的および実験的検証を通じて,他の種類の関係を予測する能力も実証した。 Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# エージェント型社会シミュレーションモデル設計のための会話型AIサポートの可能性を探る Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design ( http://arxiv.org/abs/2405.08032v1 ) ライセンス: Link先を確認	Peer-Olaf Siebers,	(参考訳) AIで動くチャットボットChatGPTは、数億のユーザベースを持つが、今やグローバルな現象だ。しかし、社会シミュレーション分野の研究におけるChatGPTのような会話型AIシステム(CAIS)の利用は依然として限られている。具体的には,エージェント・ベース・ソーシャル・シミュレーション (ABSS) モデル設計におけるその使用の証拠はない。新たなものに対する懐疑論は人間の本質に固有のものであるが、我々はABSSモデル設計をサポートするためにこの革新的な技術の使用を開始することが不可欠であると強く信じている。本稿では,CAISが簡潔な時間枠と必要最小限のケースベース知識で,革新的なABSSモデルの開発をいかに促進できるかを実証する概念実証について述べる。先進的なプロンプト技術を採用し,エンジニアリングABSSフレームワークを定着させることで,CAISによるABSSモデルの設計を可能にする包括的なプロンプトスクリプトを構築した。本書の有効性は,美術館における適応型建築の利用に関する実証的な事例研究を通じて実証される。会話における不正確さや相違点が時々あったにもかかわらず、CAISはABSSモデラーにとって貴重なパートナーであることが判明した。 ChatGPT, the AI-powered chatbot with a massive user base of hundreds of millions, has become a global phenomenon. However, the use of Conversational AI Systems (CAISs) like ChatGPT for research in the field of Social Simulation is still limited. Specifically, there is no evidence of its usage in Agent-Based Social Simulation (ABSS) model design. While scepticism towards anything new is inherent to human nature, we firmly believe it is imperative to initiate the use of this innovative technology to support ABSS model design. This paper presents a proof-of-concept that demonstrates how CAISs can facilitate the development of innovative conceptual ABSS models in a concise timeframe and with minimal required upfront case-based knowledge. By employing advanced prompt engineering techniques and adhering to the Engineering ABSS framework, we have constructed a comprehensive prompt script that enables the design of ABSS models with or by the CAIS. The effectiveness of the script is demonstrated through an illustrative case study concerning the use of adaptive architecture in museums. Despite occasional inaccuracies and divergences in conversation, the CAIS proved to be a valuable companion for ABSS modellers.	翻訳日:2024-05-15 18:03:09 公開日:2024-05-12
# いくら食べるか? スポンジの食事のポーション推定 How Much You Ate? Food Portion Estimation on Spoons ( http://arxiv.org/abs/2405.08717v1 ) ライセンス: Link先を確認	Aaryam Sharma, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong,	(参考訳) 食事摂取のモニタリングは健康な生活を促進する重要な側面である。近年、コンピュータビジョン技術の進歩により、画像と深度カメラを用いて食事摂取監視が進められている。しかし、現在の最先端画像に基づく食品部分推定アルゴリズムでは、ユーザーは食事の画像を1、2回撮影するが、これは不便であり、シチューに浸した材料など、トップダウンの視点では見えない食品を捕獲することができないと仮定している。これらの制約に対処するため,我々は,固定式のユーザ向けカメラを用いて,設置後のカメラ視点の変更を必要とせず,機器上での食品の追跡を行う革新的なソリューションを導入した。食材深度の浅い道具は、食品を捕食するのに有利な角度を与え、食材の表面でそれらを追跡することは、食事後のイメージキャプチャを必要とせず、食事の摂取量をはるかに正確に推定する。本システムは,スープやシチューなどの液状固形不均一混合物の栄養含量の推定に信頼性が高い。非侵襲的で、ユーザフレンドリで、高精度な食事摂取モニタリングツールとして、我々は一連の実験を通して、我々の方法の異常な可能性を実証した。 Monitoring dietary intake is a crucial aspect of promoting healthy living. In recent years, advances in computer vision technology have facilitated dietary intake monitoring through the use of images and depth cameras. However, the current state-of-the-art image-based food portion estimation algorithms assume that users take images of their meals one or two times, which can be inconvenient and fail to capture food items that are not visible from a top-down perspective, such as ingredients submerged in a stew. To address these limitations, we introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils, not requiring any change of camera perspective after installation. The shallow depth of utensils provides a more favorable angle for capturing food items, and tracking them on the utensil's surface offers a significantly more accurate estimation of dietary intake without the need for post-meal image capture. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews. Through a series of experiments, we demonstrate the exceptional potential of our method as a non-invasive, user-friendly, and highly accurate dietary intake monitoring tool.	翻訳日:2024-05-15 13:28:19 公開日:2024-05-12
# 構造モチーフを用いた分子スキャッホールドの拡張学習 Learning to Extend Molecular Scaffolds with Structural Motifs ( http://arxiv.org/abs/2103.03864v5 ) ライセンス: Link先を確認	Krzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt,	(参考訳) 深層学習に基づく分子モデリングの最近の進歩は、シリコ薬物発見の加速を約束している。多くの生成モデルが利用可能であり、原子・バイ・原子・ボンド・バイ・フラグメント・バイ・フラグメント・バイ・フラッグメントのいずれでも分子を構築することができる。しかし、多くの薬物発見プロジェクトでは、生成した分子に固定された足場が必要であり、その制約を組み込むことは、最近になって研究されたばかりである。本稿では,生成過程の初期シードとして自然に足場をサポートするグラフベースモデルであるMoLeRを提案する。実験の結果,MoLeRは非拘束の分子最適化タスクにおいて最先端の手法と相容れない性能を示し,既存の手法よりも訓練やサンプルの処理が格段に速く,足場ベースのタスクでは性能が向上することがわかった。さらに, 外観が微妙な設計選択が全体的な性能に与える影響も示す。 Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.	翻訳日:2024-05-15 02:11:16 公開日:2024-05-12
# 分類のためのルール生成:スケーラビリティ、解釈可能性、公正性 Rule Generation for Classification: Scalability, Interpretability, and Fairness ( http://arxiv.org/abs/2104.10751v4 ) ライセンス: Link先を確認	Tabea E. Röber, Adia C. Lumadjeng, M. Hakan Akyüz, Ş. İlker Birbil,	(参考訳) 制約付き分類のためのルールベースの新しい最適化手法を提案する。提案手法は,線形プログラミングに列生成を利用するため,大規模データセットに対してスケーラブルである。その結果の価格サブプロブレムはNP-Hardであることが示されている。我々は決定木に基づくヒューリスティックに言及し、アクセラレーションのためのプロキシ価格サブプロブレムを解決する。この方法は、学習における各ルールの重要性を示す最適な重みとともに、一連のルールを返します。ルールにコスト係数を割り当て、追加制約を導入することにより、解釈可能性と公正性に対処する。特に、局所的解釈可能性に着目し、複数の属性やクラスに対して公平に分離基準を一般化する。本稿では,提案手法の性能をデータセットの集合上で検証し,その異なる側面について詳しく述べる。ルールに基づく学習手法は,一方の局所的解釈可能性と一方の公平性,他方の精度との間に良好な妥協関係を示す。 We introduce a new rule-based optimization method for classification with constraints. The proposed method leverages column generation for linear programming, and hence, is scalable to large datasets. The resulting pricing subproblem is shown to be NP-Hard. We recourse to a decision tree-based heuristic and solve a proxy pricing subproblem for acceleration. The method returns a set of rules along with their optimal weights indicating the importance of each rule for learning. We address interpretability and fairness by assigning cost coefficients to the rules and introducing additional constraints. In particular, we focus on local interpretability and generalize separation criterion in fairness to multiple sensitive attributes and classes. We test the performance of the proposed methodology on a collection of datasets and present a case study to elaborate on its different aspects. The proposed rule-based learning method exhibits a good compromise between local interpretability and fairness on the one side, and accuracy on the other side.	翻訳日:2024-05-15 02:11:16 公開日:2024-05-12
# 最適輸送写像のエントロピー推定 Entropic estimation of optimal transport maps ( http://arxiv.org/abs/2109.12004v3 ) ライセンス: Link先を確認	Aram-Alexandre Pooladian, Jonathan Niles-Weed,	(参考訳) 我々は,厳密な有限サンプル保証付きで$\mathbb{R}^d$上の2つの分布間の最適写像を推定する計算可能手法を開発した。ブレニエの定理のエントロピック版を利用することで、最適エントロピック計画の「emph{barycentric projection}」という推定器がシンクホーンのアルゴリズムを用いて容易に計算できることが示される。その結果, サンプルの次元や数が大きい場合, 評価が遅い現在の地図推定手法とは異なり, 大規模データセットにおいても並列化が可能であり, 極めて効率的であることがわかった。最適写像上の滑らかさの仮定の下では、我々の推定器は文学における他の推定器と同等の統計的性能を享受するが、計算コストははるかに低い。提案した推定器の数値例による有効性を示す。 Lepskiの手法により、基礎となる最適輸送写像の滑らかさに適応する推定器の修正版を提案する。我々の証明は、エントロピー最適輸送のための修正双対原理と、Pal (2019) による最適エントロピー計画の近似法に基づいている。 We develop a computationally tractable method for estimating the optimal map between two distributions over $\mathbb{R}^d$ with rigorous finite-sample guarantees. Leveraging an entropic version of Brenier's theorem, we show that our estimator -- the \emph{barycentric projection} of the optimal entropic plan -- is easy to compute using Sinkhorn's algorithm. As a result, unlike current approaches for map estimation, which are slow to evaluate when the dimension or number of samples is large, our approach is parallelizable and extremely efficient even for massive data sets. Under smoothness assumptions on the optimal map, we show that our estimator enjoys comparable statistical performance to other estimators in the literature, but with much lower computational cost. We showcase the efficacy of our proposed estimator through numerical examples, even ones not explicitly covered by our assumptions. By virtue of Lepski's method, we propose a modified version of our estimator that is adaptive to the smoothness of the underlying optimal transport map. Our proofs are based on a modified duality principle for entropic optimal transport and on a method for approximating optimal entropic plans due to Pal (2019).	翻訳日:2024-05-15 02:11:16 公開日:2024-05-12
# マルコフ線型確率近似の最適およびインスタンス依存保証 Optimal and instance-dependent guarantees for Markovian linear stochastic approximation ( http://arxiv.org/abs/2112.12770v2 ) ライセンス: Link先を確認	Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett,	(参考訳) エルゴード型マルコフ連鎖から長さ$n$の軌跡を観測し,$d$次元線形不動点方程式を近似的に解くための確率近似法について検討した。まず、標準スキームの最後の繰り返しの2乗誤差に対して、$t_{\mathrm{mix}} \tfrac{d}{n}$の非漸近的境界を示す($t_{\mathrm{mix}}$は混合時間である)。次に、適切な平均化されたイテレート列上の非漸近的インスタンス依存境界を証明し、高次項のパラメータ $(d, t_{\mathrm{mix}})$ への鋭い依存を含む局所漸近的ミニマックス極限に一致する先頭項とする。これらの上界を非漸近ミニマックス下界で補い、平均化されたSA推定器のインスタンス最適性を確立する。マルコフノイズを用いた政策評価のためのこれらの結果は、すべての$\lambda \in [0, 1)$に対するTD($\lambda$)アルゴリズムファミリーと線形自己回帰モデルをカバーする。例えば、TD($\lambda$)アルゴリズムを実行するとき、$\lambda$の値を選択する。 We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).	翻訳日:2024-05-15 02:11:16 公開日:2024-05-12
# 確率的ランゲヴィン差分包と機械学習への応用 Stochastic Langevin Differential Inclusions with Applications to Machine Learning ( http://arxiv.org/abs/2206.11533v3 ) ライセンス: Link先を確認	Fabio V. Difonzo, Vyacheslav Kungurtsev, Jakub Marecek,	(参考訳) ランゲヴィン拡散形式の確率微分方程式は、ベイズサンプリングアルゴリズムと機械学習における最適化の両方において基礎的な役割を担っているため、大きな注目を集めている。後者では、過パラメータ化モデルのトレーニングにおいて、確率勾配流の概念モデルとして機能する。しかしながら、文献は通常、勾配がドリフト項であるポテンシャルの滑らかさを仮定する。それでも、ポテンシャル函数が連続的に微分可能でないような問題が多く、したがってドリフトは至る所でリプシッツ連続ではない。これは、リグレッション問題におけるロバストな損失とRectified Linear Unitsによって実証される。本稿では,Langevin型確率微分包摂のフローと漸近特性に関する基礎的な結果を示す。特に、この解の強い存在を示すとともに、標準自由エネルギー関数の漸近最小化を示す。 Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# 帯域フィードバックを用いたオンライン・サBmodular + SuPermodular (BP)最大化 Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback ( http://arxiv.org/abs/2207.03091v3 ) ライセンス: Link先を確認	Adhyyan Narang, Omid Sadeghi, Lillian J Ratliff, Maryam Fazel, Jeff Bilmes,	(参考訳) 組合せ目的を伴うオンラインインタラクティブ機械学習の文脈では、純粋にサブモジュラーな事前作業はより一般的な非サブモジュラーな目的に拡張する。これは、(1)加法的に分解可能な項を2つの項の和(単調部分モジュラー項と単調超モジュラー項、BP分解(英語版)として知られる)、(2)弱部分モジュラー項のみを含む。どちらの場合でも、これはオブジェクト間の競合(サブモジュール)だけでなく、補完的(スーパーモジュール)な関係を表現することを可能にし、この設定をより広い範囲のアプリケーション(例えば、映画レコメンデーション、医療治療など)に拡張する。さらに,従来のモノリシックなフィードバックアプローチだけでなく,それぞれの用語でフィードバックを個別に利用できる新しいフレームワークについても検討する。実世界の実用性とスケーラビリティを念頭に置いて、純粋にモジュラーなケースを含む計算コストを大幅に削減するために、Nystromスケッチ技術を統合する。ガウス過程の文脈的帯域設定では、すべての場合において準線形理論的後悔境界を示す。また,レコメンデーションシステムやデータサブセットの選択に優れた適用性を示す。 In the context of online interactive machine learning with combinatorial objectives, we extend purely submodular prior work to more general non-submodular objectives. This includes: (1) those that are additively decomposable into a sum of two terms (a monotone submodular and monotone supermodular term, known as a BP decomposition); and (2) those that are only weakly submodular. In both cases, this allows representing not only competitive (submodular) but also complementary (supermodular) relationships between objects, enhancing this setting to a broader range of applications (e.g., movie recommendations, medical treatments, etc.) where this is beneficial. In the two-term case, moreover, we study not only the more typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. With real-world practicality and scalability in mind, we integrate Nystrom sketching techniques to significantly reduce the computational cost, including for the purely submodular case. In the Gaussian process contextual bandits setting, we show sub-linear theoretical regret bounds in all cases. We also empirically show good applicability to recommendation systems and data subset selection.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# HiKonv: 量子畳み込みのスループットを、新しいビット単位の管理と計算で最大化する HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation ( http://arxiv.org/abs/2208.00763v2 ) ライセンス: Link先を確認	Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen,	(参考訳) CNNの量子化は、低ビット幅のデータ表現による計算とストレージのコスト削減を意図して、大きな進歩を見せている。しかし、CPUの ALU やFPGAの DSP など、既存のフルビット幅処理ユニットが、様々な量子化ビット幅での畳み込みにおいて、より高い計算スループットを実現するために、どのように利用できるかという体系的な研究は存在しない。本研究では,新しいビットワイド管理と並列計算により,低ビット幅の量子化データ入力を持つ処理ユニット上での畳み込みのスループットを最大化する統一解であるHiKonvを提案する。我々は,高並列化低ビット幅畳み込みのための全ビット幅乗算器を用いた理論的枠組みと性能モデルを構築し,この臨界領域における高性能コンピューティングの新しいブレークスルーを実証する。例えば、CPU内の単一の32ビット処理ユニットは、128の双項化畳み込み演算(乗算と加算)と13の4ビットの畳み込み演算を1つの乗算命令で行うことができ、FPGA DSP内の1つの27x18乗算器は、1つのクロックサイクルで1,4,8ビット入力で60,8,2の畳み込み演算を配信できる。我々は、CPUとFPGAの両方におけるHiKonvの有効性を示す。 CPUでは、HiKonvは1から8ビットの入力でベースライン実装を上回り、1-D畳み込みでは最大7.6倍と1.4倍の性能向上を実現し、4-D畳み込みでは2.74倍と3.19倍の性能向上を実現している。 FPGAでは、HiKonvソリューションにより、1つのDSPがより短い処理レイテンシで複数の畳み込みを処理することができる。バイナライズされた入力では、HiKonv を持つ各 DSP は 76.6 LUT に等しい。 DAC-SDC 2020のチャンピオンモデルと比較して、HiKonvは2.37倍のスループット向上と2.61倍のDSP効率向上を実現している。 Quantization for CNN has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data representations. There are, however, no systematic studies on how an existing full-bitwidth processing unit, such as ALU in CPUs and DSP in FPGAs, can be better utilized to deliver significantly higher computation throughput for convolution under various quantized bitwidths. In this study, we propose HiKonv, a unified solution that maximizes the throughput of convolution on a given underlying processing unit with low-bitwidth quantized data inputs through novel bit-wise management and parallel computation. We establish theoretical framework and performance models using a full-bitwidth multiplier for highly parallelized low-bitwidth convolution, and demonstrate new breakthroughs for high-performance computing in this critical domain. For example, a single 32-bit processing unit in CPU can deliver 128 binarized convolution operations (multiplications and additions) and 13 4-bit convolution operations with a single multiplication instruction, and a single 27x18 multiplier in the FPGA DSP can deliver 60, 8 or 2 convolution operations with 1, 4 or 8-bit inputs in one clock cycle. We demonstrate the effectiveness of HiKonv on both CPU and FPGA. On CPU, HiKonv outperforms the baseline implementation with 1 to 8-bit inputs and provides up to 7.6x and 1.4x performance improvements for 1-D convolution, and performs 2.74x and 3.19x over the baseline implementation for 4-bit signed and unsigned data inputs for 2-D convolution. On FPGA, HiKonv solution enables a single DSP to process multiple convolutions with a shorter processing latency. For binarized input, each DSP with HiKonv is equivalent up to 76.6 LUTs. Compared to the DAC-SDC 2020 champion model, HiKonv achieves a 2.37x throughput improvement and 2.61x DSP efficiency improvement, respectively.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# 自動推奨コード更新:まだ存在するか? Automatically Recommend Code Updates: Are We There Yet? ( http://arxiv.org/abs/2209.07048v3 ) ライセンス: Link先を確認	Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Patanamon Thongtanunam, Li Li,	(参考訳) 近年、CodeLM(Code-trained Language Models of Code)は、様々なソフトウェアエンジニアリングタスクにおいて有望な結果を示している。そのようなタスクのひとつが自動コード更新レコメンデーションであり、古いコードスニペットを承認および修正されたコードに変換する。多くのCodeLMベースのアプローチが提案されているが、精度が高く、実際のコード更新タスクの有効性と信頼性は疑問視されている。本稿では,コード更新を自動で推奨する,最先端のCodeLMの広範な評価を行う。時間的進化, プロジェクトの特異性, メソッドサイズ, 更新複雑性などの要因を考慮し, ペア更新手法の2つの多種多様なデータセットの性能評価を行った。結果から,CodeLMは時間的情報を無視した環境では良好に機能するが,より現実的な時間的シナリオに苦しむとともに,新しいプロジェクトへの一般化が不十分であることが明らかとなった。さらに、より大規模なメソッドやより複雑な更新では、CodeLMのパフォーマンスが大幅に低下する。さらに、多くのCodeLM生成した"更新"は実際にはnullであり、特に時間的な設定では意味のある編集は難しいままである。本研究は,実世界のコード更新勧告におけるCodeLMの認識と実際の有効性の間に有意なギャップを生じさせ,実用性,堅牢性,一般化性の向上に向けたさらなる研究の必要性を強調した。 In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on real-world code update tasks remain questionable. In this paper, we present the first extensive evaluation of state-of-the-art CodeLMs for automatically recommending code updates. We assess their performance on two diverse datasets of paired updated methods, considering factors such as temporal evolution, project specificity, method size, and update complexity. Our results reveal that while CodeLMs perform well in settings that ignore temporal information, they struggle in more realistic time-wise scenarios and generalize poorly to new projects. Furthermore, CodeLM performance decreases significantly for larger methods and more complex updates. Furthermore, we observe that many CodeLM-generated "updates" are actually null, especially in time-wise settings, and meaningful edits remain challenging. Our findings highlight the significant gap between the perceived and actual effectiveness of CodeLMs for real-world code update recommendation and emphasize the need for more research on improving their practicality, robustness, and generalizability.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# ISFL:地域重要度サンプリングによる非i.d.データのフェデレーション学習 ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling ( http://arxiv.org/abs/2210.02119v3 ) ライセンス: Link先を確認	Zheqi Zhu, Yuchen Shi, Pingyi Fan, Chenghui Peng, Khaled B. Letaief,	(参考訳) 計算とコミュニケーションを統合する有望な学習パラダイムとして、フェデレートラーニング(FL)は、分散クライアントからのローカルトレーニングと定期的な共有を進める。クライアント上の非IDデータ分布のため、FLモデルは勾配の多様性、性能の低下、収束不良等に悩まされる。本研究は,地域訓練に重要サンプリング(IS)を採用することで,この課題に対処することを目的とする。理論的保証のある明示的な枠組みであるISFLを提案する。まず、ISFLの収束定理を導出し、局所的な重要度サンプリングの効果を包含する。そして、最適なIS重みを選択する問題を定式化し、理論解を得る。また,IS重みを計算し,ISFLアルゴリズムを開発するために水充填法を用いる。 CIFAR-10の実験結果は、提案された定理によく適合し、ISFLがより優れた性能、サンプリング効率、および非i.d.データの説明可能性が得られることを検証した。私たちの知る限りでは、ISFLは、ニューラルネットワークモデルとの理論的互換性を示す局所的なサンプリングの側面から、最初の非一意のFLソリューションである。さらに、局所的なサンプリング手法として、ISFLは他の新しいFLフレームワークに容易に移行できる。 As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# 進化的一般化ゼロショット学習 Evolutionary Generalized Zero-Shot Learning ( http://arxiv.org/abs/2211.13174v2 ) ライセンス: Link先を確認	Dubing Chen, Chenyi Jiang, Haofeng Zhang,	(参考訳) 属性ベースのゼロショット学習(ZSL)は、トレーニング中に見えない新しいクラスを認識するモデルの能力に革命をもたらした。しかし、大規模モデルの進歩に伴い、期待は高まった。単にゼロショットの一般化を達成するだけでなく、ラベルのないデータを使って専門家の領域で継続的に進化できる普遍モデルへの需要が高まっている。これを解決するために,進化的一般化ゼロショット学習(EGZSL)という,この課題のスケールダウンインスタンス化を導入する。この設定により、低パフォーマンスのゼロショットモデルでテストデータストリームに適応し、オンラインで進化させることができる。本稿では,この特別課題の3つの課題,すなわち,破滅的忘れ,初期予測バイアス,進化的データクラスバイアスについて詳述する。さらに,各課題に対する目標解を提案することにより,与えられた初期IGZSLモデルから連続的に進化可能な汎用的手法を提案する。 3つの人気のあるGZSLベンチマークデータセットの実験では、他のベースラインがフェールしている間に、テストデータストリームからモデルを学習できることが示されています。コードは \url{https://github.com/cdb342/EGZSL} で入手できる。 Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, \ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. Codes are available at \url{https://github.com/cdb342/EGZSL}.	翻訳日:2024-05-15 02:01:31 公開日:2024-05-12
# 量子二コトミーと一階漸近現象を超えたコヒーレント熱力学 Quantum dichotomies and coherent thermodynamics beyond first-order asymptotics ( http://arxiv.org/abs/2303.05524v3 ) ライセンス: Link先を確認	Patryk Lipka-Bartosik, Christopher T. Chubb, Joseph M. Renes, Marco Tomamichel, Kamil Korzekwa,	(参考訳) すなわち、量子チャネル $\mathcal E$ mapping $\rho_1^{\otimes n}$ to $\rho_2^{\otimes R_nn}$ with an error $\epsilon_n$ and $\sigma_1^{\otimes n}$ to $\sigma_2^{\otimes R_nn}$ である。我々は、任意のペア$(\rho_1,\sigma_1) の初期状態と可換ペア$(\rho_2,\sigma_2) 最終状態の$に対して、小、中、大の偏差誤差レジームおよびゼロエラーレジームにおいて、最適変換率$R_n$の2階漸近式を導出する。また、熱ギブス状態によって与えられる$\sigma_1$および$\sigma_2$の場合、第1の3つの状態における最適変換速度は熱演算によって達成できることを示す。これにより、私たちは初めて、異なるエネルギー固有空間間のコヒーレンスを持つような完全に一般的な初期状態と熱力学的状態相互変換の2階漸近を研究することができる。そこで本研究では,コヒーレント入力を用いた熱力学プロトコルの最適性能について論じ,有限サイズ効果による変換誤差を著しく低減できる3つの新しい共振現象について述べる。さらに、量子二コトミーに関する我々の結果は、2階の漸近項、局所的な演算と古典的な通信の下での純二部共役状態間の最適変換率を得るためにも利用できる。 We address the problem of exact and approximate transformation of quantum dichotomies in the asymptotic regime, i.e., the existence of a quantum channel $\mathcal E$ mapping $\rho_1^{\otimes n}$ into $\rho_2^{\otimes R_nn}$ with an error $\epsilon_n$ (measured by trace distance) and $\sigma_1^{\otimes n}$ into $\sigma_2^{\otimes R_n n}$ exactly, for a large number $n$. We derive second-order asymptotic expressions for the optimal transformation rate $R_n$ in the small, moderate, and large deviation error regimes, as well as the zero-error regime, for an arbitrary pair $(\rho_1,\sigma_1)$ of initial states and a commuting pair $(\rho_2,\sigma_2)$ of final states. We also prove that for $\sigma_1$ and $\sigma_2$ given by thermal Gibbs states, the derived optimal transformation rates in the first three regimes can be attained by thermal operations. This allows us, for the first time, to study the second-order asymptotics of thermodynamic state interconversion with fully general initial states that may have coherence between different energy eigenspaces. Thus, we discuss the optimal performance of thermodynamic protocols with coherent inputs and describe three novel resonance phenomena allowing one to significantly reduce transformation errors induced by finite-size effects. What is more, our result on quantum dichotomies can also be used to obtain, up to second-order asymptotic terms, optimal conversion rates between pure bipartite entangled states under local operations and classical communication.	翻訳日:2024-05-15 01:51:46 公開日:2024-05-12
# 因果構造における量子ゆらぎの角スペクトル Angular spectrum of quantum fluctuations in causal structure ( http://arxiv.org/abs/2303.06563v2 ) ライセンス: Link先を確認	Craig Hogan, Ohkyung Kwon, Nathaniel Selub,	(参考訳) スケーリング引数は、因果コヒーレントな量子重力のプランクスケール真空ゆらぎによって生じるマクロな因果ダイヤモンドの境界に歪みの角度スペクトルを制約するために用いられる。逆物理的分離における半径$R$の因果ダイヤモンドの表面への歪みの分散とゆらぎの速度は、プランク時間$t_P$によって設定された正規化により$\tau$にのみ依存し、$R$に依存してはならない。スケール$R$の場合、この原理は角スケール$\Theta$, $\langle\delta\tau^2\rangle_\Theta\simeq\tau\:\! t_p\sim\Theta R\:\! t_P/c$と角パワースペクトル$C_\ell\sim (R\:\! l_P)/\ell^3$ at $\ell\gg1$ このスペクトルは、すべての$\ell$で予想される因果コヒーレントな仮想ヌル重力ショックに基づくホログラムノイズのリレーショナルモデルと一致している。高い$\ell$スケーリングは、他のいくつかの量子モデルで予測されるものと対照的であり、これは角波数$\ell$の1つのパワーで異なり、遠方からの画像の過度なぼやけを予測することが示されている。 Scaling arguments are used to constrain the angular spectrum of distortions on boundaries of macroscopic causal diamonds, produced by Planck-scale vacuum fluctuations of causally-coherent quantum gravity. The small-angle spectrum of displacement is derived from a form of scale invariance: the variance and fluctuation rate of distortions normal to the surface of a causal diamond of radius $R$ at transverse physical separation $c\tau\ll R$ should depend only on $\tau$, with a normalization set by the Planck time $t_P$, and should not depend on $R$. For measurements on scale $R$, the principle leads to universal scaling for variance on angular scale $\Theta$, $\langle\delta\tau^2\rangle_\Theta\simeq\tau\:\!t_p\sim\Theta R\:\!t_P/c$, and angular power spectrum $C_\ell\sim (R\:\!l_P)/\ell^3$ at $\ell\gg1$. This spectrum is consistent with a relational model of holographic noise based on causally coherent virtual null gravitational shocks, a general picture conjectured for all $\ell$. The high $\ell$ scaling is contrasted with that predicted in some other quantum models, which differ by one power of angular wavenumber $\ell$ and are shown to predict excessive blurring of images from distant sources.	翻訳日:2024-05-15 01:51:46 公開日:2024-05-12
# 論理ラベルを用いたラベル分布学習 Label Distribution Learning from Logical Label ( http://arxiv.org/abs/2303.06847v2 ) ライセンス: Link先を確認	Yuheng Jia, Jiawei Tang, Jiahao Jiang,	(参考訳) ラベル分布学習(LDL)は、サンプルのラベル記述度(ラベル分布)を予測する効果的な方法である。しかし、トレーニングサンプルのアノテートラベル分布(LD)は非常にコストがかかる。そのため、最近の研究はまずまずラベル拡張(LE)を用いて、論理ラベルから推定されたラベル分布を生成し、その後、回収されたラベル分布に外部LCLアルゴリズムを適用して、見当たらないサンプルのラベル分布を予測する。しかし、このステップワイズなやり方は、LEとDLLの接続の可能性を見落としている。さらに、既存のLEアプローチは、いくつかの記述度を無効なラベルに割り当てることができる。上記の問題を解決するために,論理ラベルから直接LDLモデルを学習する新しい手法を提案し,LEとLDLを結合モデルに統合し,従来のLE手法の欠点を回避する。様々なデータセットに対する大規模な実験により、提案手法は論理ラベルから直接信頼性の高いLCLモデルを構築し、最先端のLE法よりも正確なラベル分布を生成できることが証明された。 Label distribution learning (LDL) is an effective method to predict the label description degree (a.k.a. label distribution) of a sample. However, annotating label distribution (LD) for training samples is extremely costly. So recent studies often first use label enhancement (LE) to generate the estimated label distribution from the logical label and then apply external LDL algorithms on the recovered label distribution to predict the label distribution for unseen samples. But this step-wise manner overlooks the possible connections between LE and LDL. Moreover, the existing LE approaches may assign some description degrees to invalid labels. To solve the above problems, we propose a novel method to learn an LDL model directly from the logical label, which unifies LE and LDL into a joint model, and avoids the drawbacks of the previous LE methods. Extensive experiments on various datasets prove that the proposed approach can construct a reliable LDL model directly from the logical label, and produce more accurate label distribution than the state-of-the-art LE methods.	翻訳日:2024-05-15 01:51:46 公開日:2024-05-12
# BanditQ: Fair Bandits with Guaranteed Rewards BanditQ: Fair Bandits with Guaranteed Rewards ( http://arxiv.org/abs/2304.05219v3 ) ライセンス: Link先を確認	Abhishek Sinha,	(参考訳) アッパー信頼境界(UCB)、ヘッジ(Hedge)、EXP3(EXP3)など、古典的な非レグレトなマルチ武器の盗賊アルゴリズムは本質的に不公平である。彼らの不公平さは、最も報われる腕をできるだけ頻繁に弾きながら、残りを無視するという目的に起因している。本稿では,各アームに対する報酬の最小化率を保証した確率的設定における公平な予測問題について考察する。本研究は,全情報と帯域幅のフィードバック設定の両方で問題を調査する。本稿では,待ち行列理論と敵対的盗賊を組み合わせることで,最大$O(T^{\frac{3}{4}})の後悔と目標レート違反を容認しつつ,目標報酬率を達成するBanditQという新たなオンラインポリシーを提案する。完全な情報設定における後悔は、単調性仮定または平均的後悔を考える場合、$O(\sqrt{T})$にさらに改善することができる。提案手法は効率的で,公正な予測問題から標準逆MAB問題へのブラックボックス削減を許容する。 BanditQポリシの分析には、独立した関心を持つ可能性のある、新たな自己拘束的不平等が含まれている。 Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as possible while ignoring the rest. In this paper, we consider a fair prediction problem in the stochastic setting with a guaranteed minimum rate of accrual of rewards for each arm. We study the problem in both full-information and bandit feedback settings. Combining queueing-theoretic techniques with adversarial bandits, we propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty of at most $O(T^{\frac{3}{4}}).$ The regret bound in the full-information setting can be further improved to $O(\sqrt{T})$ under either a monotonicity assumption or when considering time-averaged regret. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem. The analysis of the BanditQ policy involves a new self-bounding inequality, which might be of independent interest.	翻訳日:2024-05-15 01:51:46 公開日:2024-05-12
# 画像ラインセグメンテーションの検出と記述に関する総合的レビュー:分類学,比較,課題 A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and Challenges ( http://arxiv.org/abs/2305.00264v2 ) ライセンス: Link先を確認	Xinyu Lin, Yingjie Zhou, Yipeng Liu, Ce Zhu,	(参考訳) イメージラインセグメントは、画像内のオブジェクトやシナリオのストレート、スレンダー、未断の部分を明確にする、基本的な低レベルの視覚的特徴である。ラインセグメントの検出と記述は多くの視覚タスクの基礎となった。多くの研究は線分の検出と記述を目的としているが、包括的なレビューは欠如しており、その進捗を妨げている。本研究は,2次元イメージラインセグメントの検出と記述に関する関連研究を網羅的にレビューし,全体像と深い理解を研究者に提供することにより,ギャップを埋めるものである。それらのメカニズムに基づき、線分検出と記述のための2つの分類法を提示し、これらの研究を導入、分析、要約し、研究者がそれらを迅速かつ広範囲に学べるようにした。主要な問題、中核的な考え、既存手法の利点とデメリット、そして各カテゴリの潜在的な応用について分析・要約し、これまで未知の発見を含む。既存の手法の課題や、それを解決するための知見も、研究者に刺激を与えるために提供される。さらに、いくつかの最先端の線分検出および記述アルゴリズムをバイアスなく評価し、評価コードを公開する。理論的解析は、実験結果と組み合わせて、研究者が目的とする視覚応用のための最良の方法を選択するのを導くことができる。最後に、この研究は、この分野の研究者からより多くの注目を集めるために、潜在的に興味深い将来の研究方向についての洞察を提供する。 An image line segment is a fundamental low-level visual feature that delineates straight, slender, and uninterrupted portions of objects and scenarios within images. Detection and description of line segments lay the basis for numerous vision tasks. Although many studies have aimed to detect and describe line segments, a comprehensive review is lacking, obstructing their progress. This study fills the gap by comprehensively reviewing related studies on detecting and describing two-dimensional image line segments to provide researchers with an overall picture and deep understanding. Based on their mechanisms, two taxonomies for line segment detection and description are presented to introduce, analyze, and summarize these studies, facilitating researchers to learn about them quickly and extensively. The key issues, core ideas, advantages and disadvantages of existing methods, and their potential applications for each category are analyzed and summarized, including previously unknown findings. The challenges in existing methods and corresponding insights for potentially solving them are also provided to inspire researchers. In addition, some state-of-the-art line segment detection and description algorithms are evaluated without bias, and the evaluation code will be publicly available. The theoretical analysis, coupled with the experimental results, can guide researchers in selecting the best method for their intended vision applications. Finally, this study provides insights for potentially interesting future research directions to attract more attention from researchers to this field.	翻訳日:2024-05-15 01:51:46 公開日:2024-05-12
# MAGDiff:ディープニューラルネットワークの活性化グラフによる共変量データセットシフト検出 MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks ( http://arxiv.org/abs/2305.13271v2 ) ライセンス: Link先を確認	Charles Arnal, Felix Hensel, Mathieu Carrière, Théo Lacombe, Hiroaki Kurihara, Yuichi Ike, Frédéric Chazal,	(参考訳) さまざまなタスクへの適用が成功したにもかかわらず、ニューラルネットワークは、他の機械学習方法と同様に、データのシフトに対する感受性によって制限されている。本稿では、任意のニューラルネットワーク分類器から抽出し、このタスク専用の新しいモデルをトレーニングすることなく、効率的な共変量データシフト検出を可能にするMAGDiffと呼ばれる新しい表現群を提案する。これらの表現は、トレーニング分布と対象分布に属するサンプルのニューラルネットワークのアクティベーショングラフを比較して計算され、データセットシフト検出に一般的に使用される2サンプルテストの強力なデータおよびタスク適応統計値が得られる。本研究では,2サンプルのコルモゴロフ・スミルノフ検定(KS)の複数の異なるデータセットとシフトタイプに対する統計的パワーを実験的に測定し,ネットワーク出力に依存する最先端のベースラインに対して,新しい表現が顕著な改善をもたらすことを示す。 Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data- and task-adapted statistics for the two-sample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of two-sample Kolmogorov-Smirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a state-of-the-art baseline relying on the network output.	翻訳日:2024-05-15 01:42:01 公開日:2024-05-12
# 大量のビジュアルプロンプトは本当に必要か? Do We Really Need a Large Number of Visual Prompts? ( http://arxiv.org/abs/2305.17223v2 ) ライセンス: Link先を確認	Youngeun Kim, Yuhang Li, Abhishek Moitra, Ruokai Yin, Priyadarshini Panda,	(参考訳) 資源制約のあるエッジにモデルを適用することへの関心が高まっているため、パラメータ効率の高い転送学習が広く研究されている。 Visual Prompt Tuning (VPT)は、入力空間への学習可能なプロンプトを予測し、完全なネットワークパラメータのトレーニングと比較して、競争力のある微調整性能を示す。しかし、VPTは入力トークンの数を増やし、計算オーバーヘッドを増大させる。本稿では,視覚トランスアーキテクチャの微調整性能と自己注意操作に及ぼすプロンプト数の影響を解析する。理論的および経験的分析を通して、より多くのプロンプトを追加すると線形性能が向上しないことを示す。さらに,少数のプロンプトの使用による性能劣化を防止することを目的とした,PC(Prompt Condensation)技術を提案する。提案手法はFGVCとVTAB-1kのタスクに対して検証し,精度を維持しながらプロンプト数を約70%削減することを示す。 Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ~70% while maintaining accuracy.	翻訳日:2024-05-15 01:42:01 公開日:2024-05-12
# 2dディラック結晶の電子関連格子熱特性に及ぼす高次電子-フォノン相互作用の影響 Influence of higher order electron-phonon interaction on the electron-related lattice thermal properties of 2d Dirac crystal ( http://arxiv.org/abs/2305.18369v3 ) ライセンス: Link先を確認	Sina Kazemian, Giovanni Fanchini,	(参考訳) 熱伝導率などのディラック結晶の本質的性質を理解するためには、ディラック電子と分散音響フォノンとの相互作用を考えるモデルが必要である。 2次元ディラック結晶の非常に高い熱伝導度は、準理想のフォノン量子ガスによるものであるが、望ましくない制限は電子-フォノン相互作用(e-ph)によって生じる。 e-ph熱伝導率はフォノン散乱率に直接関連している。従来の計算では短波長のフォノンを見落とし、2次元ディラック結晶を解析するには不十分である。フォノン散乱速度は、電子とフォノン(EP-E)の崩壊を含む3つの粒子相互作用を考慮すると、通常1階の大きさまで計算される。しかし、電子の崩壊と新しい電子とフォノン(E-EP)の生成を含む過程は無視される。本研究では,2次元ディラック結晶におけるフォノン散乱速度とe-ph熱伝導率について,短波長フォノンを考慮した正確な式を示す。フォノン散乱率とe-ph熱伝導率を計算する場合, 室温でもE-EP過程の意義を示す。さらに,電子とフォノンの崩壊に伴うEP-EP相互作用と新しい電子-フォノン対の生成を二階のe-ph相互作用,特にEP-EP相互作用を取り入れることの重要性を強調し,高温・低フェルミエネルギーにおけるフォノン散乱速度とe-ph熱伝導率を正確に決定する。この4粒子相互作用プロセスは、これらの特性を効果的に特徴づける上で重要な役割を果たす。 To understand the essential properties of Dirac crystals, such as their thermal conductivity, we require models that consider the interaction between Dirac electrons and dispersive acoustic phonons. The exceptionally high thermal conductivity in 2D Dirac crystals is attributed to near-ideal phonon quantum gases, while undesired limitations arise from electron-phonon (e-ph) interactions which have been shown to limit the thermal conductivity up to several microns away. The e-ph thermal conductivity is directly linked to the phonon scattering rate. Conventional calculations overlook phonons with short-dispersive wavelengths, rendering them inadequate for analyzing 2D Dirac crystals. The phonon scattering rate is typically calculated up to the first-order magnitude, considering 3-particle interactions involving the decay of an electron and phonon (EP-E) to create a new electron. However, processes involving the decay of an electron and the creation of a new electron and phonon (E-EP) are neglected. In this study, we present an accurate expression for the phonon scattering rate and e-ph thermal conductivity in 2D Dirac crystals, accounting for short-dispersive wavelength phonons. We demonstrate the significance of the E-EP process even at room temperature in calculating the phonon scattering rate and e-ph thermal conductivity, particularly for first-order e-ph interactions. Furthermore, we emphasize the importance of incorporating second-order e-ph interactions, specifically the EP-EP interaction involving the decay of an electron and phonon and the creation of a new electron-phonon pair, to accurately determine the phonon scattering rate and e-ph thermal conductivity at high temperatures and low Fermi energies. This 4-particle interaction process plays a crucial role in characterizing these properties effectively.	翻訳日:2024-05-15 01:42:01 公開日:2024-05-12
# セレンディピティーの獲得:オフポリティアクター批判における過去の成功価値の爆発 Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic ( http://arxiv.org/abs/2306.02865v5 ) ライセンス: Link先を確認	Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu,	(参考訳) 高品質な$Q$値関数の学習は、多くの現代のオフポリシーディープ強化学習(RL)アルゴリズムの成功に重要な役割を果たしている。これまでの研究は主に、価値過大評価問題、関数近似器の採用結果、および非政治学習に対処することに焦点を当てていた。共通視点から考えると、RLトレーニングプロセスの後半段階では、$Q$-valueが過小評価されることがしばしばあり、政策学習の妨げとなり、サンプル効率が低下する可能性がある。このような長期予測現象は、リプレイバッファのより最適なアクションサンプルと比較して、ベルマン更新における現在のポリシーからの劣ったアクションの使用とよく関係している。この問題に対処するために、我々の洞察は、探索的楽観主義を維持しながら、過去の成功を十分に活用することである。本稿では,Blended Exploitation and Exploration (BEE)演算子を提案する。 BEEに基づいて、実際のBACは50以上の連続制御タスクにおいて最先端の手法よりも優れており、失敗を招きやすいシナリオや現実のロボットタスクにおいて高いパフォーマンスを達成する。ベンチマーク結果とビデオはhttps://jity16.github.io/BEE/で公開されている。 Learning high-quality $Q$-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that $Q$-values are often underestimated in the latter stage of the RL training process, potentially hindering policy learning and reducing sample efficiency. We find that such a long-neglected phenomenon is often related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. To address this issue, our insight is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates $Q$-value using both historical best-performing actions and the current policy. Based on BEE, the resulting practical algorithm BAC outperforms state-of-the-art methods in over 50 continuous control tasks and achieves strong performance in failure-prone scenarios and real-world robot tasks. Benchmark results and videos are available at https://jity16.github.io/BEE/.	翻訳日:2024-05-15 01:42:01 公開日:2024-05-12
# 量子機械学習における絡み合ったデータの遷移の役割 Transition Role of Entangled Data in Quantum Machine Learning ( http://arxiv.org/abs/2306.03481v2 ) ライセンス: Link先を確認	Xinbiao Wang, Yuxuan Du, Zhuozhuo Tu, Yong Luo, Xiao Yuan, Dacheng Tao,	(参考訳) エンタングルメントは量子コンピューティングを強化するリソースとして機能する。最近の進歩は量子力学の学習に対する肯定的な影響を強調しており、量子演算への絡み合いの統合や量子機械学習(QML)モデルの測定により、特定の予測エラーしきい値を超えた、トレーニングデータサイズが大幅に削減される。しかし、データにおける絡み合い度がモデル性能にどのように影響するかの分析的理解はいまだに解明されていない。本研究では,この知識ギャップを,絡み合ったデータを用いて量子力学を学習する量子ノーランチ(NFL)定理を確立することによって解決する。従来の知見とは対照的に, 絡み合ったデータが予測誤差に与える影響は, 許容された測定値の数に応じて二重効果を示すことを示す。十分な数の測定で、トレーニングデータの絡み合いを増大させることで、予測誤差を一貫して減らしたり、トレーニングデータの必要なサイズを減らして、同じ予測誤差を達成することができる。逆に、少ない測定が許される場合、高度に絡み合ったデータを使用することで、予測エラーが増大する可能性がある。得られた結果は、特に量子リソースへのアクセスが制限されたアーリーステージ量子コンピュータ上での実行に適した、高度なQMLプロトコルを設計するための重要なガイダンスを提供する。 Entanglement serves as the resource to empower quantum computing. Recent progress has highlighted its positive impact on learning quantum dynamics, wherein the integration of entanglement into quantum operations or measurements of quantum machine learning (QML) models leads to substantial reductions in training data size, surpassing a specified prediction error threshold. However, an analytical understanding of how the entanglement degree in data affects model performance remains elusive. In this study, we address this knowledge gap by establishing a quantum no-free-lunch (NFL) theorem for learning quantum dynamics using entangled data. Contrary to previous findings, we prove that the impact of entangled data on prediction error exhibits a dual effect, depending on the number of permitted measurements. With a sufficient number of measurements, increasing the entanglement of training data consistently reduces the prediction error or decreases the required size of the training data to achieve the same prediction error. Conversely, when few measurements are allowed, employing highly entangled data could lead to an increased prediction error. The achieved results provide critical guidance for designing advanced QML protocols, especially for those tailored for execution on early-stage quantum computers with limited access to quantum resources.	翻訳日:2024-05-15 01:42:01 公開日:2024-05-12
# クープマン理論を用いた対話環境における効率的なダイナミクスモデリング Efficient Dynamics Modeling in Interactive Environments with Koopman Theory ( http://arxiv.org/abs/2306.11941v4 ) ライセンス: Link先を確認	Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh,	(参考訳) 対話環境におけるダイナミクスの正確なモデリングは、長距離予測の成功に不可欠である。このような能力は強化学習(RL)と計画アルゴリズムを前進させるが、達成は困難である。モデル推定の不正確さは複雑になり、長い地平線上でエラーが増加する。我々は、環境の非線形ダイナミクスを高次元潜在空間で線形化することができるクープマン理論のレンズからこの問題にアプローチする。これにより、エージェントのアクションを毎回考慮しながら畳み込みを用いて長距離予測のシーケンシャルな問題を効率的に並列化することができる。提案手法は安定性解析と時間経過による勾配の制御も可能とした。これらの利点は、拡張された地平線上でのモデリング力学の効率と精度の両方において、既存のアプローチよりも大幅に改善される。また、モデルベース計画とモデルフリーRLの動的モデリングにこのモデルを容易に組み込むことができ、有望な実験結果を報告する。 The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.	翻訳日:2024-05-15 01:32:16 公開日:2024-05-12
# ベルの不等式における量子重力の影 The shadows of quantum gravity on Bell's inequality ( http://arxiv.org/abs/2307.13006v3 ) ライセンス: Link先を確認	Hooman Moradpour, Shahram Jalalzadeh, Hamid Tebyanian,	(参考訳) 本研究は、量子重力の文脈における量子力学演算子の妥当性を考察し、それらの一般化の必要性を認識した。第一の目的は、ベルの不等式で示されるように、量子力学における固有の非局所性に対するこれらの一般化の反響を調査することである。さらに、この研究はベルの不平等の確立された枠組みにゼロでない最小長を導入する結果について精査している。この結果は、量子力学と重力の間の複雑な相互作用の理論的理解に大きく貢献する。さらに、ベルの不等式に対する量子重力の影響と、特にデバイス非依存プロトコル、量子鍵分布、量子ランダムネス生成の領域における量子技術におけるその実践的応用について検討する。 This study delves into the validity of quantum mechanical operators in the context of quantum gravity, recognizing the potential need for their generalization. A primary objective is to investigate the repercussions of these generalizations on the inherent non-locality within quantum mechanics, as exemplified by Bell's inequality. Additionally, the study scrutinizes the consequences of introducing a non-zero minimal length into the established framework of Bell's inequality. The findings contribute significantly to our theoretical comprehension of the intricate interplay between quantum mechanics and gravity. Moreover, this research explores the impact of quantum gravity on Bell's inequality and its practical applications within quantum technologies, notably in the realms of device-independent protocols, quantum key distribution, and quantum randomness generation.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# 多モード左手超伝導リング共振器による人工原子間のエンタングリング相互作用 Entangling interactions between artificial atoms mediated by a multimode left-handed superconducting ring resonator ( http://arxiv.org/abs/2307.15695v2 ) ライセンス: Link先を確認	T. McBroom-Carroll, A. Schlabes, X. Xu, J. Ku, B. Cole, S. Indrajeet, M. D. LaHaye, M. H. Ansari, B. L. T. Plourde,	(参考訳) 積層回路素子で実装された超伝導メタマテリアル伝送線は、グループの速度と位相速度が反対の符号を持つ左利きの分散を、超伝導人工原子に関連する周波数範囲で示すことができる。このようなメタマテリアル伝送線路をリングに形成し、リングの周りの異なる点の量子ビットに結合すると、コンパクトなフットプリントを持つマルチモードバス共振器が得られる。フラックス可変量子ビットを用いて、2つの量子ビットとリング共振器モードの結合強度の変動を特徴づけ、理論的にモデル化する。量子ビット間の直接結合は無視できるが、多モードリング共振器との相互作用は、逆交換結合と、量子ビット間のより高次の$ZZ$相互作用の両方をもたらす。リング共振器モードに対する量子ビットとそれらの周波数のゆらぎが変化するにつれて、零交叉や符号の変化を含むこれらの2つの量子ビット間相互作用の有意な変動が観測される。量子ビット周波数の小さな変化に対して、ゼロ値と大値の間のZZ$スケールのような相互作用項を変調する能力は、多くの量子ビットをホストできるシステムでエンタングゲートを実装するための有望な経路を提供する。 Superconducting metamaterial transmission lines implemented with lumped circuit elements can exhibit left-handed dispersion, where the group and phase velocity have opposite sign, in a frequency range relevant for superconducting artificial atoms. Forming such a metamaterial transmission line into a ring and coupling it to qubits at different points around the ring results in a multimode bus resonator with a compact footprint. Using flux-tunable qubits, we characterize and theoretically model the variation in the coupling strength between the two qubits and each of the ring resonator modes. Although the qubits have negligible direct coupling between them, their interactions with the multimode ring resonator result in both a transverse exchange coupling and a higher order $ZZ$ interaction between the qubits. As we vary the detuning between the qubits and their frequency relative to the ring resonator modes, we observe significant variations in both of these inter-qubit interactions, including zero crossings and changes of sign. The ability to modulate interaction terms such as the $ZZ$ scale between zero and large values for small changes in qubit frequency provides a promising pathway for implementing entangling gates in a system capable of hosting many qubits.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# 任意の忠実度に対するMargolus-Levitin量子速度限界について Note on the Margolus-Levitin quantum speed limit for arbitrary fidelity ( http://arxiv.org/abs/2307.16854v2 ) ライセンス: Link先を確認	Krzysztof Andrzejewski, Katarzyna Bolonek-Lasoń, Piotr Kosiński,	(参考訳) 初期状態と最終状態の間の忠実性の消失には、マンデルスタム・タム限界(エネルギー分散の関与)とマルゴラス・レヴィチン限界(励起エネルギー期待値の関与)の2つの重要な量子速度限界が導出された。任意の忠実性の場合に対する前者の極限の一般化は単純であるが、ジョヴァネッティら(Phys)によるセミナー論文で与えられる後者の関連する一般化は単純である。 A67 (2003), 052109) は、一般化されたマルゴラス・レヴィチンの不等式(英語版)(Margolus-Levitin inequality)の右辺と上辺の予想される等式に基づいており、数値的に最大7桁まで証明されている。つい最近になって、この予想の証明が2つ現れている。微分計算の最も単純なツールに基づいて、非常に基本的な新しい証明を提供する。したがって、一般化されたマルゴラス・レヴィチンの速度制限は、忠実さを消すのに有効な元の限界の精神から導かれる。 For vanishing fidelity between initial and final states two important quantum speed limits, the Mandelstam-Tamm limit (involving energy dispersion) and Margolus-Levitin one (involving excitation energy expectation value) have been derived. While the generalization of the former limit to the case of arbitrary fidelity is straightforward, the relevant generalization of the latter, given in the seminal paper by Giovanetti et al (Phys. Rev. A67 (2003), 052109) was based on the conjectured equality of lower and upper bounds on the right hand side of generalized Margolus-Levitin inequality, verified numerically up to seven digits. Only recently there appear two proofs of the conjecture. We provide below a very elementary new proof, based on the simplest tools from differential calculus. Thus the generalized Margolus-Levitin speed limit can be derived much in the spirit of the original one valid for vanishing fidelity.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# グローバー適応探索の高速化:高次定式化によるビット数とゲート数削減戦略 Accelerating Grover Adaptive Search: Qubit and Gate Count Reduction Strategies with Higher-Order Formulations ( http://arxiv.org/abs/2308.01572v2 ) ライセンス: Link先を確認	Yuki Sano, Kosuke Mitarai, Naoki Yamamoto, Naoki Ishikawa,	(参考訳) グロバー適応探索(Grover Adaptive Search、GAS)は、二項最適化問題の解法として設計された量子抜粋探索アルゴリズムである。本稿では,GASに必要なキュービット数とゲート数を同時に削減できる高次二項式を提案する。具体的には、多項式分解によるゲート数を減らし、目的関数の順序を保ち、回路ランタイムと実装コストを減少させる2つの新しい戦略を考える。解析により,提案した高次定式化により,探索空間サイズと量子ゲート数の両方を削減し,GASの収束性能が向上することが示された。また,本手法はワンホット符号化を用いた一般的な組合せ最適化問題にも有用である。 Grover adaptive search (GAS) is a quantum exhaustive search algorithm designed to solve binary optimization problems. In this paper, we propose higher-order binary formulations that can simultaneously reduce the numbers of qubits and gates required for GAS. Specifically, we consider two novel strategies: one that reduces the number of gates through polynomial factorization, and the other that halves the order of the objective function, subsequently decreasing circuit runtime and implementation cost. Our analysis demonstrates that the proposed higher-order formulations improve the convergence performance of GAS by both reducing the search space size and the number of quantum gates. Our strategies are also beneficial for general combinatorial optimization problems using one-hot encoding.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# FIRE:食品画像から世代を再現する FIRE: Food Image to REcipe generation ( http://arxiv.org/abs/2308.14391v2 ) ライセンス: Link先を確認	Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip Ilievski,	(参考訳) 近年,食品科学は多分野の研究分野として注目されている。フードコンピューティングの野心的な目標は、食品画像のレシピ情報を自律的に生成できるエンドツーエンドのインテリジェントシステムを開発することである。現在のイメージ・ツー・レシピ法は検索ベースであり、その成功はデータセットのサイズと多様性、学習された埋め込みの品質に大きく依存する。一方、強力な注意力に基づく視覚と言語モデルの出現は、正確で一般化可能なレシピ生成のための有望な道のりを示し、まだ広く研究されていない。本稿では,食品処理領域におけるレシピ生成に適した新しいマルチモーダル手法であるFIREを提案する。 FIREはBLIPモデルを利用してタイトルを生成し、Vision Transformerとデコーダを使って材料抽出を行い、T5モデルを使用してタイトルと材料を入力として組み込んだレシピを生成する。本稿では,FIREを大規模言語モデルに統合することで,レシピをユーザの好みに適合させるレシピカスタマイズと,自動調理プロセスを実現するレシピ・ツー・コード変換という2つの実践的応用を紹介した。提案手法の有効性を実験的に検証し,今後の進歩と食品コンピューティングへの普及の可能性を明らかにした。 Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learned embeddings. Meanwhile, the emergence of powerful attention-based vision and language models presents a promising avenue for accurate and generalizable recipe generation, which has yet to be extensively explored. This paper proposes FIRE, a novel multimodal methodology tailored to recipe generation in the food computing domain, which generates the food title, ingredients, and cooking instructions based on input food images. FIRE leverages the BLIP model to generate titles, utilizes a Vision Transformer with a decoder for ingredient extraction, and employs the T5 model to generate recipes incorporating titles and ingredients as inputs. We showcase two practical applications that can benefit from integrating FIRE with large language model prompting: recipe customization to fit recipes to user preferences and recipe-to-code transformation to enable automated cooking processes. Our experimental findings validate the efficacy of our proposed approach, underscoring its potential for future advancements and widespread adoption in food computing.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# 重力波に対するマクロ量子応答 Macroscopic Quantum Response to Gravitational Waves ( http://arxiv.org/abs/2309.02992v2 ) ライセンス: Link先を確認	Asuka Ito, Ryuichiro Kitano,	(参考訳) 重力波による1電子量子サイクロトロンの励起について検討する。ペニングトラップ等の電子は、波動関数の大きさによってパラメータ化された無限縮退性を有するランダウレベルが最低となるように準備される。基底状態から第1励起状態への励起速度は、電子波関数のサイズによって増大し、より大きな波動関数を持つ電子はより重力波を感じる。結果として、マクロな1電子量子サイクロトロンにおける重力波に対する優れた感度を導出する。 We study the excitation of a one-electron quantum cyclotron by gravitational waves. The electron in such as a penning trap is prepared to be at the lowest Landau level, which has an infinite degeneracy parameterized by the size of the wave function. We find that the excitation rate from the ground state to the first excited state is enhanced by the size of the electron wave function: an electron with a larger wave function feels gravitational waves more. As a consequence, we derive a good sensitivity to gravitational waves at a macroscopic one-electron quantum cyclotron.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# C-Pack:中国の一般的な埋め込みを促進するためにパッケージ化されたリソース C-Pack: Packaged Resources To Advance General Chinese Embedding ( http://arxiv.org/abs/2309.07597v4 ) ライセンス: Link先を確認	Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, Jian-Yun Nie,	(参考訳) C-Packは、一般的な中国の埋め込みの分野を著しく前進させるリソースのパッケージである。 C-Packには3つの重要なリソースが含まれている。 1) C-MTEBは6つのタスクと35のデータセットをカバーする中国語テキスト埋め込みの総合ベンチマークである。 2) C-MTPは, ラベル付き, ラベルなしの中国語コーパスを用いて, 埋め込みモデルを訓練するための大量のテキスト埋め込みデータセットである。 3) C-TEMは、複数のサイズをカバーする埋め込みモデルのファミリーである。弊社のモデルは、C-MTEB上の以前の中国語のテキスト埋め込みを、リリース時に最大で10%上回っている。また、C-TEMのための一連のトレーニング方法を統合し、最適化します。一般的な中国語の埋め込みに関するリソースに加えて、英語のテキスト埋め込みのためのデータとモデルもリリースしています。 MTEBベンチマークでは、英語モデルは最先端のパフォーマンスを達成していますが、我々のリリースした英語データは、中国のデータより2倍も大きいのです。これらのリソースはすべてhttps://github.com/FlagOpen/FlagEmbedding.comで公開されています。 We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# MoDem-V2:実世界ロボットマニピュレーションのためのVisuo-Motor World Model MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation ( http://arxiv.org/abs/2309.14236v2 ) ライセンス: Link先を確認	Patrick Lancaster, Nicklas Hansen, Aravind Rajeswaran, Vikash Kumar,	(参考訳) 建設されていない現実世界環境での運用を目指すロボットシステムは、オンボードセンシングを通じて世界を直接知覚する必要がある。視覚に基づく学習システムは、生の画素に基づく暗黙的な世界理解を構築することで環境計測の必要性を解消することを目的としているが、単にスパースな視覚報酬信号から接触に富んだ高次元検索空間をナビゲートすることは、探索の課題を大幅に悪化させる。このようなシステムの適用性は通常、明示的な状態推定や厳密な報酬を伴わずに現実世界でのエージェント探索が、破滅的な不安全行動や安全性の欠陥を引き起こす可能性があるため、シミュレーションされた環境や高機能な環境に制限される。本研究では,これらの制約の背後にある根本原因を分離し,非構造化現実世界で直接コンタクトリッチな操作を学習するシステムであるMoDem-V2を開発した。モデルベース強化学習(MBRL)、デモブートストレッピング、効果的な探索のアルゴリズムによる最新の進歩に基づいて、MoDem-V2は、実世界で直接、接触に富むデキスタス操作技術を取得することができる。我々は、現実世界の安全性、探索中心、代理店の引き渡し、アクター批判的なアンサンブルを尊重しながら、モデル学習におけるデモンストレーションを活用するための重要な要素を特定します。シミュレーションと実世界の両方における4つの複雑なビジュオモータ操作問題におけるこれらの成分の寄与を実証的に示す。我々の知る限り、我々の研究は実世界で直接訓練されたデモ強化視覚的MBRLのための最初の成功システムを示す。ビデオや詳細についてはhttps://sites.google.com/view/modem-v2をご覧ください。 Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit https://sites.google.com/view/modem-v2 for videos and more details.	翻訳日:2024-05-15 01:22:32 公開日:2024-05-12
# 位相確率ブリッジを用いた生成モデリング Generative Modeling with Phase Stochastic Bridges ( http://arxiv.org/abs/2310.07805v4 ) ライセンス: Link先を確認	Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai,	(参考訳) 拡散モデル(DM)は、連続入力のための最先端の生成モデルを表す。 DMは入力空間(e, position space)に確率微分方程式(SDE)を構築し、ニューラルネットワークを用いてそれを反転させる。本稿では, 位相空間を位置と速度を包含する拡張空間として定義する, textbf{phase space dynamics} に基づく新しい生成モデリングフレームワークを提案する。 } 確率的最適制御からの洞察を活用して,効率的なサンプリングを可能にする位相空間における経路測度を構築する。 DMとは対照的に,我々のフレームワークは動的伝播の初期段階において,現実的なデータポイントを生成する能力を示している。 } この早期予測は、軌道に沿った追加の速度情報を活用することにより、効率的なデータ生成のステージを設定する。標準画像生成ベンチマークでは, 少数の機能評価(NFE)において, ベースラインよりも良好な性能が得られた。さらに,本手法は,効率的なサンプリング技術を備えた拡散モデルの性能に匹敵するものであり,新しいツール生成モデルとしての可能性を示している。 Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.	翻訳日:2024-05-15 01:12:47 公開日:2024-05-12
# 地学システムの機械学習に基づくモデリングのための大量保存・パーセプトロン A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems ( http://arxiv.org/abs/2310.08644v4 ) ライセンス: Link先を確認	Yuan-Heng Wang, Hoshin V. Gupta,	(参考訳) 地学システムの時系列進化を予測するための物理概念(PC)モデルの構築に何十年も取り組んできたが、最近の研究は機械学習(ML)ベースのGated Recurrent Neural Network技術が、はるかに正確なモデルの開発に利用できることを示している。しかし,MLモデルから身体的理解を抽出することの難しさは,システム構造や機能に関する科学的知識を高めるために,その有用性を複雑にしている。そこで本研究では,PCベースとMLベースのモデリングアプローチのギャップを埋める手段として,物理的に解釈可能なMass Conserving Perceptron(MCP)を提案する。 MCPは、PCモデルとGRNNの両方の基盤となる有向グラフ構造間の固有同型を利用して、物理的プロセスの質量保存性を明確に表現し、それらのプロセスの機能的性質を、既製のML技術を用いて利用可能なデータから(解釈可能な方法で)直接学習できるようにする。概念実証として,MPPの機能的表現性(容量)を検証し,リーフ川流域の降雨流出(RR)動態をパロニニニに表現し,科学的仮説検証に有用であることを示す。結論として,この概念を拡張して,地学システムを通しての質量エネルギー情報流の結合特性をMLベースで物理概念的に表現する方法について論じる。 Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.	翻訳日:2024-05-15 01:12:47 公開日:2024-05-12
# Janusインターフェース: 大規模言語モデルにおける微調整がプライバシリスクをいかに増幅するか The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks ( http://arxiv.org/abs/2310.15469v2 ) ライセンス: Link先を確認	Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, Haixu Tang,	(参考訳) 大規模言語モデル(LLM)の急速な進歩は、個人識別可能な情報(PII)のプライバシー漏洩を、広範囲にわたるトレーニングデータセット内で公に懸念している。近年の研究では、敵が慎重に設計されたプロンプトを用いて、LLMのトレーニングデータから高感度なプライバシーデータを抽出できることが示されている。しかし、これらの攻撃は、訓練前の段階での幻覚と破滅的忘れ(CF)の傾向に悩まされ、希釈されたPIIの正確性は無視できない。本研究では,LLMの事前学習データから忘れられたPIIを復元するために,微調整インタフェースを利用した新しい攻撃であるJanusを提案する。 LLMのプライバシリーク問題を形式化し,オープンソース言語モデルの実証分析により,なぜ忘れられたPIIを回収できるのかを説明する。これらの知見に基づき、Janusのオープンソース言語モデルと最新のLLMであるGPT-3.5-TurboとLLaMA-2-7bの性能を評価する。実験の結果,Janusはベースラインと比較して10倍以上のプライバシーリスクを増幅し,プレフィックス攻撃やテキスト内学習(ICL)を含む最先端のプライバシ抽出攻撃を著しく上回っていることがわかった。さらに、我々の分析は、OpenAIとAzure AI Studioが提供する既存の微調整APIがJanus攻撃の影響を受けやすいことを検証し、敵がそのような攻撃を低コストで実施できるようにする。 The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model's tendency to hallucinate and catastrophic forgetting (CF) in the pre-training stage, rendering the veracity of divulged PIIs negligible. In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs. We formalize the privacy leakage problem in LLMs and explain why forgotten PIIs can be recovered through empirical analysis on open-source language models. Based upon these insights, we evaluate the performance of Janus on both open-source language models and two latest LLMs, i.e., GPT-3.5-Turbo and LLaMA-2-7b. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline and significantly outperforms the state-of-the-art privacy extraction attacks including prefix attacks and in-context learning (ICL). Furthermore, our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack, allowing an adversary to conduct such an attack at a low cost.	翻訳日:2024-05-15 01:12:47 公開日:2024-05-12
# 長期非凸制約を用いたオンライン非凸最適化 Online Non-convex Optimization with Long-term Non-convex Constraints ( http://arxiv.org/abs/2311.02426v3 ) ライセンス: Link先を確認	Shijie Pan, Wenjie Huang,	(参考訳) 目的と制約が任意に生成され、必ずしも凸ではないオンライン手法で、一般的な長期制約付き最適化問題を解くために、新しいFollow-the-Perturbed-Leader型アルゴリズムを提案し、解析した。各周期において、ランダムな線形摂動と強い凹凸摂動は、それぞれ、オフラインのオラクルに対して原始方向と双対方向に組み込まれ、その解として、大域的なミニマックス点が探索される。提案された静的累積的後悔に基づいて、この問題のクラスに対する最初のサブ線形$O(T^{8/9})$後悔の複雑さを導出する。提案アルゴリズムは,河川汚染源の長期的(極端値)の特定問題に対処し,理論的結果の検証を行い,既存手法と比較して優れた性能を示す。 A novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are arbitrarily generated and not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as the solution. Based on a proposed expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (extreme value) constrained river pollutant source identification problem, validate the theoretical results and exhibit superior performance compared to existing methods.	翻訳日:2024-05-15 01:02:54 公開日:2024-05-12
# 非局所量子状態アンサンブルと量子データ隠れ Nonlocal quantum state ensembles and quantum data hiding ( http://arxiv.org/abs/2311.06029v2 ) ライセンス: Link先を確認	Donghoon Ha, Jeong San Kim,	(参考訳) 両部量子状態の識別を考察し,非局所量子状態アンサンブルと量子データ隠蔽処理の関係を確立する。両部量子状態の最適局所的判別に縛られ、両部量子状態アンサンブルが量子データハイディングスキームを構築するのに十分な条件を提供する。この結果は多次元二部量子系における例によって示される。 We consider the discrimination of bipartite quantum states and establish a relation between nonlocal quantum state ensemble and quantum data hiding processing. Using a bound on optimal local discrimination of bipartite quantum states, we provide a sufficient condition for a bipartite quantum state ensemble to be used to construct a quantum data-hiding scheme. Our results are illustrated by examples in multidimensional bipartite quantum systems.	翻訳日:2024-05-15 01:02:54 公開日:2024-05-12
# バイアスのジャングルを探る:依存性分析による言語モデルにおける政治的バイアス属性 Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis ( http://arxiv.org/abs/2311.08605v2 ) ライセンス: Link先を確認	David F. Jenny, Yann Billeter, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin,	(参考訳) 大規模言語モデル(LLM)の急速な進歩は、これらのモデルにおけるバイアスの出現とその緩和に関する激しい議論を引き起こしている。しかし、文献におけるデバイアス法と、より広いコミュニティからのアライメントに関連する欠陥の報告の両方の結果が示すように、その実践的関連性にもかかわらず、バイアスはよく理解されていないトピックである。偏見の内的原因の理解を深めるために、因果公正分析のレンズを通してLCMバイアスを分析し、偏見の起源を理解し、その下流の帰結と緩和の理由を解明する。このフレームワークを運用するために,LLM決定プロセスに寄与する属性の抽出と仲介を行うプロンプトベースの手法を提案する。アクティビティ依存ネットワーク(ADN)を適用することで、これらの属性がLCMの決定プロセスにどのように影響するかを分析する。政治討論における議論品質のLCM評価に本手法を適用した。観察された異種間処理は,少なくとも一部は,属性とモデルの相違とモデルの相違によるものであり,人間のAIアライメントと偏見の緩和に関する結果が議論されている。私たちのコードとデータはhttps://github.com/david-jenny/LLM-Political-Studyにあります。 The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the internal causes of bias, we analyse LLM bias through the lens of causal fairness analysis, which enables us to both comprehend the origins of bias and reason about its downstream consequences and mitigation. To operationalize this framework, we propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the LLM decision process. By applying Activity Dependency Networks (ADNs), we then analyse how these attributes influence an LLM's decision process. We apply our method to LLM ratings of argument quality in political debates. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment, and discuss the consequences of our findings for human-AI alignment and bias mitigation. Our code and data are at https://github.com/david-jenny/LLM-Political-Study.	翻訳日:2024-05-15 01:02:54 公開日:2024-05-12
# 効率的な超解法のためのスウィフトパラメータフリーアテンションネットワーク Swift Parameter-free Attention Network for Efficient Super-Resolution ( http://arxiv.org/abs/2311.12770v3 ) ライセンス: Link先を確認	Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xuanwu Yin, Kunlong Zuo,	(参考訳) SISR(Single Image Super-Resolution)は、低解像度のコンピュータビジョンにおいて重要な課題であり、低解像度の画像から高解像度の画像を再構成することを目的としている。従来の注意機構はSISRの性能を大幅に向上させたが、しばしば複雑なネットワーク構造と多数のパラメータが発生し、推論速度が遅くなり、モデルのサイズが大きくなる。この問題に対処するために、パラメータカウント、推論速度、画像品質のバランスをとる高効率なSISRモデルであるSwift Parameter-free Attention Network (SPAN)を提案する。 SPANは、対称的なアクティベーション関数と残差接続を利用して、高寄与度情報を強化し、冗長な情報を抑制する新しいパラメータフリーアテンション機構を採用している。この設計が注意機構の目的を達成する上での有効性を理論的に示す。複数のベンチマークでSPANを評価し、画像品質と推論速度の両面で既存の高効率超解像モデルより優れており、品質と速度のトレードオフが著しく達成されていることを示す。これにより、SPANは現実世界のアプリケーション、特にリソース制約のあるシナリオに非常に適しています。特に、NTIRE 2024の全体的なパフォーマンストラックとランタイムトラックの両方において、私たちは、効率的な超解像度チャレンジで第一位を獲得しました。私たちのコードとモデルはhttps://github.com/hongyuanyu/SPAN.comで公開されています。 Single Image Super-Resolution (SISR) is a crucial task in low-level computer vision, aiming to reconstruct high-resolution images from low-resolution counterparts. Conventional attention mechanisms have significantly improved SISR performance but often result in complex network structures and large number of parameters, leading to slow inference speed and large model size. To address this issue, we propose the Swift Parameter-free Attention Network (SPAN), a highly efficient SISR model that balances parameter count, inference speed, and image quality. SPAN employs a novel parameter-free attention mechanism, which leverages symmetric activation functions and residual connections to enhance high-contribution information and suppress redundant information. Our theoretical analysis demonstrates the effectiveness of this design in achieving the attention mechanism's purpose. We evaluate SPAN on multiple benchmarks, showing that it outperforms existing efficient super-resolution models in terms of both image quality and inference speed, achieving a significant quality-speed trade-off. This makes SPAN highly suitable for real-world applications, particularly in resource-constrained scenarios. Notably, we won the first place both in the overall performance track and runtime track of the NTIRE 2024 efficient super-resolution challenge. Our code and models are made publicly available at https://github.com/hongyuanyu/SPAN.	翻訳日:2024-05-15 00:53:00 公開日:2024-05-12
# 補足ラベルによる学習:選択された完全一致のランサム設定はより実践的 Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical ( http://arxiv.org/abs/2311.15502v3 ) ライセンス: Link先を確認	Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, Masashi Sugiyama,	(参考訳) 補完ラベル学習(complementary-label learning)は、各トレーニング例が1つまたは複数の補完ラベルに関連付けられている弱い教師付き学習問題である。既存の一貫したアプローチは、相補的なラベルの生成をモデル化するための一様分布の仮定や、非一様の場合の遷移行列を推定するための通常のラベルのトレーニングセットに依存している。しかし、どちらの条件も現実のシナリオでは満たされないかもしれない。本稿では,これらの条件に依存しない新しい一貫したアプローチを提案する。本研究は,肯定的未ラベル学習(PU)学習の文献に着想を得て,相補的ラベル学習のための選択完備ランダム仮定に基づく非バイアスリスク推定器を提案する。次に、過度に適合する問題に対処するためのリスク補正アプローチを導入します。さらに, 1-versus-rest戦略を用いることで, 相補的ラベル学習を負のラベル付きバイナリ分類問題の集合として表現できることが判明した。合成および実世界のベンチマークデータセットの大規模な実験結果から,提案手法が最先端手法よりも優れていることを検証した。 Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix in non-uniform cases. However, either condition may not be satisfied in real-world scenarios. In this paper, we propose a novel consistent approach that does not rely on these conditions. Inspired by the positive-unlabeled (PU) learning literature, we propose an unbiased risk estimator based on the Selected-Completely-at-Random assumption for complementary-label learning. We then introduce a risk-correction approach to address overfitting problems. Furthermore, we find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems when using the one-versus-rest strategy. Extensive experimental results on both synthetic and real-world benchmark datasets validate the superiority of our proposed approach over state-of-the-art methods.	翻訳日:2024-05-15 00:53:00 公開日:2024-05-12
# 多周波数部分相関グラフの学習 Learning Multi-Frequency Partial Correlation Graphs ( http://arxiv.org/abs/2311.15756v2 ) ライセンス: Link先を確認	Gabriele D'Acunto, Paolo Di Lorenzo, Francesco Bonchi, Stefania Sardellitti, Sergio Barbarossa,	(参考訳) 時系列間の依存関係を学習するための大規模な研究努力にもかかわらず、最先端技術は依然として大きな限界に直面している。この微分が中心となる多くのアプリケーションによって動機付けられ、ブロックスパース、周波数依存、部分相関グラフを学習することで、この制限を克服する。本研究の目的は,2つの非凸学習問題の定式化と解法である。第1は閉形式解を持ち,部分相関数に関する事前知識がある場合に適したもので,第2は連続凸近似に基づく反復解に基づくヒンジであり,事前知識が得られない一般的な場合に対して有効である。合成データの数値計算結果から,提案手法は現状よりも優れていることがわかった。最後に、ファイナンシャル・タイム・シリーズの分析により、部分的相関が数個の周波数帯域内でのみ存在することが確認され、我々の手法が周波数領域に沿って識別することなく検出されない貴重な洞察の獲得をいかに可能かが示される。 Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.	翻訳日:2024-05-15 00:53:00 公開日:2024-05-12
# 確率近似の収束率:非有界雑音とその応用 Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications ( http://arxiv.org/abs/2312.02828v3 ) ライセンス: Link先を確認	Rajeeva L. Karandikar, M. Vidyasagar,	(参考訳) 本稿では、与えられた目的関数$J(\cdot)$の定常点を求める確率勾配 Descent (SGD) 法の収束特性について検討する。目的関数は凸である必要はない。むしろ、我々の結果は `invex'' 関数のクラスに適用される。まず、$J(\cdot)$ はクルディカ・ロジャシエヴィチ(KL)条件よりもわずかに弱い性質を満たすと仮定され、ここで (KL') と表される。反復 $J({\boldsymbol \theta}_t)$ はほぼ確実に大域最小の$J(\cdot)$ に収束する。次に、$J(\cdot)$ の仮説は (KL') から Polyak-Lojasiewicz (PL) 条件に強化される。この強い仮説により、その極限まで$J({\boldsymbol \theta}_t)$の収束率の見積もりを導き出す。これらの結果から,PL特性を満たす関数に対して,SGDの収束率と凸関数の収束率が一致することを示した。これらの線に沿ったいくつかの結果が過去に発表されているが、私たちの貢献には2つの異なる改善が含まれている。第一に、確率勾配の仮定は他よりも一般的であり、第二に、我々の収束はほぼ確実であり、期待できない。また,機能評価のみを許す場合のSGDについて検討する。この設定では、'\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\ 同じアイデアの集合を用いて、既存の文献と比較して、測定誤差に関するより一般的な仮定の下で、確率近似(SA)アルゴリズムのグローバル収束を確立する。また、適切な仮定の下でのSAアルゴリズムの収束率のバウンダリを導出する。 In this paper, we study the convergence properties of the Stochastic Gradient Descent (SGD) method for finding a stationary point of a given objective function $J(\cdot)$. The objective function is not required to be convex. Rather, our results apply to a class of ``invex'' functions, which have the property that every stationary point is also a global minimizer. First, it is assumed that $J(\cdot)$ satisfies a property that is slightly weaker than the Kurdyka-Lojasiewicz (KL) condition, denoted here as (KL'). It is shown that the iterations $J({\boldsymbol \theta}_t)$ converge almost surely to the global minimum of $J(\cdot)$. Next, the hypothesis on $J(\cdot)$ is strengthened from (KL') to the Polyak-Lojasiewicz (PL) condition. With this stronger hypothesis, we derive estimates on the rate of convergence of $J({\boldsymbol \theta}_t)$ to its limit. Using these results, we show that for functions satisfying the PL property, the convergence rate of SGD is the same as the best-possible rate for convex functions. While some results along these lines have been published in the past, our contributions contain two distinct improvements. First, the assumptions on the stochastic gradient are more general than elsewhere, and second, our convergence is almost sure, and not in expectation. We also study SGD when only function evaluations are permitted. In this setting, we determine the ``optimal'' increments or the size of the perturbations. Using the same set of ideas, we establish the global convergence of the Stochastic Approximation (SA) algorithm under more general assumptions on the measurement error, compared to the existing literature. We also derive bounds on the rate of convergence of the SA algorithm under appropriate assumptions.	翻訳日:2024-05-15 00:53:00 公開日:2024-05-12
# 低次元後投射による不確かさの可視化 Uncertainty Visualization via Low-Dimensional Posterior Projections ( http://arxiv.org/abs/2312.07804v2 ) ライセンス: Link先を確認	Omer Yair, Elias Nehme, Tomer Michaeli,	(参考訳) 不測の逆問題では、単一の再構成のみを抽出するのではなく、可算解の全スペクトルについての洞察を得ることが一般的である。可算解とその可能性に関する情報は後部分布に符号化される。しかし、高次元データでは、この分布を可視化することは困難である。本研究では,低次元部分空間上のエネルギーベースモデル(EBM)を用いて後部を推定・可視化するための新しいアプローチを提案する。具体的には、入力測定と解の低次元部分空間にまたがる方向の集合を受信する条件付きEMMを訓練し、その空間内の後方の確率密度関数を出力する。提案手法の有効性を多種多様なデータセットおよび画像復元問題に適用し,不確実性定量化と可視化におけるその強みを示す。このように,本手法は拡散型後部サンプリング器からサンプルを投影するベースラインよりも優れ,精度は桁違いに向上する。さらに、ガウス後方を仮定するベースラインよりも正確である。 In ill-posed inverse problems, it is commonly desirable to obtain insight into the full spectrum of plausible solutions, rather than extracting only a single reconstruction. Information about the plausible solutions and their likelihoods is encoded in the posterior distribution. However, for high-dimensional data, this distribution is challenging to visualize. In this work, we introduce a new approach for estimating and visualizing posteriors by employing energy-based models (EBMs) over low-dimensional subspaces. Specifically, we train a conditional EBM that receives an input measurement and a set of directions that span some low-dimensional subspace of solutions, and outputs the probability density function of the posterior within that space. We demonstrate the effectiveness of our method across a diverse range of datasets and image restoration problems, showcasing its strength in uncertainty quantification and visualization. As we show, our method outperforms a baseline that projects samples from a diffusion-based posterior sampler, while being orders of magnitude faster. Furthermore, it is more accurate than a baseline that assumes a Gaussian posterior.	翻訳日:2024-05-15 00:43:11 公開日:2024-05-12
# 高次元ホルシュタインモデルにおけるクエンチダイナミクス:縮合ウィグナーアプローチからの考察 Quench dynamics in higher-dimensional Holstein models: Insights from Truncated Wigner Approaches ( http://arxiv.org/abs/2312.12291v2 ) ライセンス: Link先を確認	Eva Paprotzki, Alexander Osterkorn, Vibhu Mishra, Stefan Kehrein,	(参考訳) 量子材料中の電荷密度波の位相は、電子と格子の自由度の複雑な相互作用に由来する。今日では、様々な時間分解分光技術により、そのような位相を積極的に操作し、そのダイナミクスをリアルタイムで監視することができる。このような非平衡力学を理論的にモデル化することは大きな課題であり、正確な方法は通常少数の原子と有限個のフォノンしか扱えない。電子ホッピングの急なスイッチオン後のホルシュタインモデルにおける電荷密度波の融解にアプローチする: 非相互作用および強結合限界において、高次元超立方格子上のCDW順序パラメータは、その力学が1次元の場合に還元されるように、長い間分解関係に従うことを証明する。第二に, 半古典的手法による2次元のトレンシ化ウィグナー近似による数値計算結果を示す。ホルシュタイン連鎖で得られた正確なデータと比較すると、フォノンと電子の半古典的な扱いは音速力学を正確に記述するために必要であることを示している。これに加えて、電子-フォノン結合強度のクエンチも確認される。 Charge-density wave phases in quantum materials stem from the complex interplay of electronic and lattice degrees of freedom. Nowadays, various time-resolved spectroscopy techniques allow to actively manipulate such phases and monitor their dynamics in real time. Modeling such nonequilibrium dynamics theoretically is a great challenge and exact methods can usually only treat a small number of atoms and finitely many phonons. We approach the melting of charge-density waves in a Holstein model after a sudden switch-on of the electronic hopping from two perspectives: We prove that in the non-interacting and in the strong-coupling limit, the CDW order parameter on high-dimensional hypercubic lattices obeys a factorization relation for long times, such that its dynamics can be reduced to the one-dimensional case. Secondly, we present numerical results from semiclassical techniques based on the Truncated Wigner Approximation for two spatial dimensions. A comparison with exact data obtained for a Holstein chain shows that a semiclassical treatment of both the electrons and phonons is required in order to correctly describe the phononic dynamics. This is confirmed, in addition, for a quench in the electron-phonon coupling strength.	翻訳日:2024-05-15 00:43:11 公開日:2024-05-12
# 量子ドットデバイス自動化におけるデータニーズと課題:ワークショップ報告 Data Needs and Challenges of Quantum Dot Devices Automation: Workshop Report ( http://arxiv.org/abs/2312.14322v2 ) ライセンス: Link先を確認	Justyna P. Zwolak, Jacob M. Taylor, Reed Andrews, Jared Benson, Garnett Bryant, Donovan Buterakos, Anasua Chatterjee, Sankar Das Sarma, Mark A. Eriksson, Eliška Greplová, Michael J. Gullans, Fabian Hader, Tyler J. Kovach, Pranav S. Mundada, Mick Ramsey, Torbjoern Rasmussen, Brandon Severin, Anthony Sigillito, Brennan Undseth, Brian Weber,	(参考訳) ゲート定義量子ドットは、スケーラブルで結合された量子ビットシステムを実現するための有望な候補システムであり、量子コンピュータの基本的な構成要素として機能する。しかし、現在の量子ドットデバイスは、説明しなければならない不完全さに悩まされ、特徴づけ、チューニング、操作プロセスが妨げられる。さらに、量子ドット量子ビットの増加に伴い、関連するパラメータ空間が十分に増大し、ヒューリスティック制御が実現不可能となる。したがって、信頼性が高くスケーラブルな自律チューニング手法が開発されることが不可欠である。本稿では,量子ドットデバイスのチューニングと操作を自動化する上での現在の課題について概説する。また、量子ドットコミュニティが提案する、量子ドットの克服方法に関するアイデアも提示する。 Gate-defined quantum dots are a promising candidate system to realize scalable, coupled qubit systems and serve as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. In this report, we outline current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present ideas put forward by the quantum dot community on how to overcome them.	翻訳日:2024-05-15 00:43:11 公開日:2024-05-12
# 水稲類型データのためのサンプリングクラスタリングアルゴリズム A Novel Sampled Clustering Algorithm for Rice Phenotypic Data ( http://arxiv.org/abs/2312.14920v2 ) ライセンス: Link先を確認	Mithun Singh, Kapil Ahuja, Milind B. Ratnaparkhe,	(参考訳) 植物種のフェノタイプ(または物理的)特性は、一般的にクラスタリングに使用される。最近の研究の一つ(Shastri et al (2021))では、確率的サンプリング(ピボットサンプリング)とスペクトル的クラスタリングアルゴリズムを用いてダイズ種を分類した。これらの手法は、高精度なクラスタリングを低コストで得るために使用された。本研究では,初期のアルゴリズムをイネの群落に拡張する。基本アルゴリズムを3つの方法で改善する。まず、スペクトルクラスタリングにおける類似度行列を構築するための新しい関数を提案する。一般に、この目的のために自然指数関数が用いられる。スペクトルグラフ理論とチーガーの不等式に基づいて、代わりに基底 "a" 指数関数を提案する。これにより、クラスタリングに好適な類似性行列スペクトルが得られ、固有値解析によってそれをサポートする。また、スペクトルクラスタリングで類似性行列を構築するために使われる関数は、以前、固定因子(グローバルスケーリングと呼ばれる)でスケールされた。 Zelnik-Manor と Perona (2004) のアイデアに基づいて、行列要素(局所スケーリングと呼ばれる)によって変化する因子を使い、よりうまく機能する。第2に、ピボットサンプリングアルゴリズムにおけるスペクティの包含確率を計算するために、私たちは以前、スペクティの特徴的な値がそれぞれの基本値からどれだけ遠いか(全種で計算される)を捉えた偏差の概念を用いていた。基本値を見つけるために、以前は最大関数が使われていた。私たちは現在、より直感的な中央値関数を使用しています。我々は統計分析を用いてこの選択を支持する。第3に、1865種のイネについての実験を行い、シルエット値の観点から、我々の新しいサンプリングスペクトルクラスタリングは階層クラスタリング(現在広く使われている)よりも61%良いことを実証した。また、新しいアルゴリズムは、関連するサンプリングのため階層的クラスタリングよりもはるかに高速である。 Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways. First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Also, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better. Second, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis. Third, with experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.	翻訳日:2024-05-15 00:43:11 公開日:2024-05-12
# 分割学習に基づくEMG補綴装置の収束率最大化 Convergence Rate Maximization for Split Learning-based Control of EMG Prosthetic Devices ( http://arxiv.org/abs/2401.03233v3 ) ライセンス: Link先を確認	Matea Marinova, Daniel Denkovski, Hristijan Gjoreski, Zoran Hadzi-Velkov, Valentin Rakovic,	(参考訳) Split Learning (SL) は筋電図に基づく補綴制御における有望な分散学習手法である。ディープラーニングやフェデレートラーニング(FL)といった他の学習手法は、補綴装置の処理能力とバッテリー寿命に極めて制限があるため、準最適ソリューションを提供する。このようなシナリオでSLを実装することは、クライアントがより小さなモデルセグメントを実行するという、その固有のモデルパーティショニングによって引き起こされる。しかし、不適切なカット層を選択することは、SLシステムのトレーニングプロセスを妨げる。本稿では,モデル収束率の最大化の観点から,最適カット層選択のためのアルゴリズムを提案する。性能評価の結果,提案アルゴリズムはEMGパターン認識タスクの収束を著しく加速し,補綴装置制御の改善を図っている。 Split Learning (SL) is a promising Distributed Learning approach in electromyography (EMG) based prosthetic control, due to its applicability within resource-constrained environments. Other learning approaches, such as Deep Learning and Federated Learning (FL), provide suboptimal solutions, since prosthetic devices are extremely limited in terms of processing power and battery life. The viability of implementing SL in such scenarios is caused by its inherent model partitioning, with clients executing the smaller model segment. However, selecting an inadequate cut layer hinders the training process in SL systems. This paper presents an algorithm for optimal cut layer selection in terms of maximizing the convergence rate of the model. The performance evaluation demonstrates that the proposed algorithm substantially accelerates the convergence in an EMG pattern recognition task for improving prosthetic device control.	翻訳日:2024-05-15 00:43:11 公開日:2024-05-12
# 単一イオン量子ビット上での逆Mpemba効果 The inverse Mpemba effect demonstrated on a single trapped ion qubit ( http://arxiv.org/abs/2401.05830v2 ) ライセンス: Link先を確認	Shahaf Aharony Shapira, Yotam Shapira, Jovan Markov, Gianluca Teza, Nitzan Akerman, Oren Raz, Roee Ozeri,	(参考訳) Mpemba効果(Mpemba effect)は、他の条件下で高温がより低温に達する反直感現象である。ここでは、最も単純な量子系である量子ビット上で、Mpemba効果の量子アナログを提案する。具体的には,冷量子ビットが熱量子ビットよりも早く高温に達する逆効果を示す。さらに,本システムでは冷量子ビットが指数関数的に速く加熱され,その効果の強いバージョンが示される。これは十分なコヒーレントな系に対してのみ起こり、量子力学的効果、すなわち干渉効果によって生じる。我々は, 1 つの $^{88}\text{Sr}^+$ イオン量子ビットについて実験を行った。単純な量子系におけるこの異常緩和効果の存在は、その基本性を明らかにし、量子情報処理デバイスの設計と運用に重要な役割を果たしている可能性がある。 The Mpemba effect is a counter-intuitive phenomena in which a hot system reaches a cold temperature faster than a colder system, under otherwise identical conditions. Here we propose a quantum analog of the Mpemba effect, on the simplest quantum system, a qubit. Specifically, we show it exhibits an inverse effect, in which a cold qubit reaches a hot temperature faster than a hot qubit. Furthermore, in our system a cold qubit can heat up exponentially faster, manifesting the strong version of the effect. This occurs only for sufficiently coherent systems, making this effect quantum mechanical, i.e. due to interference effects. We experimentally demonstrate our findings on a single $^{88}\text{Sr}^+$ trapped ion qubit. The existence of this anomalous relaxation effect in simple quantum systems reveals its fundamentality, and may have a role in designing and operating quantum information processing devices.	翻訳日:2024-05-15 00:33:27 公開日:2024-05-12
# 幾何学的推定問題に対するサンプソン近似の再検討 Revisiting Sampson Approximations for Geometric Estimation Problems ( http://arxiv.org/abs/2401.07114v2 ) ライセンス: Link先を確認	Felix Rydell, Angélica Torres, Viktor Larsson,	(参考訳) コンピュータビジョンにおける多くの問題は幾何学的推定問題として定式化することができ、例えば、測定値(例えば点対応)の集合が、観測値に一致するモデル(例えば本質的な行列)に適合することを期待する。これは、あるモデルに対する観測の ‘agrees’ の程度を測る必要がある。自然な選択は、観測が制約を完全に満たす最小の摂動を考えることである。しかし、多くの問題に対して、この計量は高価であり、計算には難解である。いわゆるサンプソン誤差は、線形化スキームを通じてこの幾何学的誤差を近似する。エピポーラ幾何学において、サンプソン誤差は一般的な選択であり、実際には対応する幾何学的残差(再射誤差)の非常に厳密な近似が得られることが知られている。本稿では,サンプソン近似を再検討し,この近似がなぜ,いつ動作するのかという新たな理論的知見を与えるとともに,いくつかの軽微な仮定の下でのタイツネスの明確な境界を与える。我々の理論結果は実データに関するいくつかの実験と異なる幾何推定タスクの文脈で検証される。 Many problems in computer vision can be formulated as geometric estimation problems, i.e. given a collection of measurements (e.g. point correspondences) we wish to fit a model (e.g. an essential matrix) that agrees with our observations. This necessitates some measure of how much an observation ``agrees" with a given model. A natural choice is to consider the smallest perturbation that makes the observation exactly satisfy the constraints. However, for many problems, this metric is expensive or otherwise intractable to compute. The so-called Sampson error approximates this geometric error through a linearization scheme. For epipolar geometry, the Sampson error is a popular choice and in practice known to yield very tight approximations of the corresponding geometric residual (the reprojection error). In this paper we revisit the Sampson approximation and provide new theoretical insights as to why and when this approximation works, as well as provide explicit bounds on the tightness under some mild assumptions. Our theoretical results are validated in several experiments on real data and in the context of different geometric estimation tasks.	翻訳日:2024-05-15 00:33:27 公開日:2024-05-12
# 脆弱性のあるクラスに対するバイアスの分析と緩和:データセットにおけるバランスの取れた表現を目指して Analyzing and Mitigating Bias for Vulnerable Classes: Towards Balanced Representation in Dataset ( http://arxiv.org/abs/2401.10397v2 ) ライセンス: Link先を確認	Dewant Katare, David Solans Noguero, Souneil Park, Nicolas Kourtellis, Marijn Janssen, Aaron Yi Ding,	(参考訳) 自動運転における認識システムの正確性と公正性は、特に都市運転環境において重大なリスクに直面している自転車、歩行者、モーターサイクリストのような脆弱な道路利用者にとって不可欠である。主流研究は、主にクラスパフォーマンス指標を強化するが、AIモデルにおけるバイアス継承の隠れた特性、クラス不均衡、データセット内の格差はしばしば見過ごされる。本研究は, 脆弱な道路利用者間のクラス不均衡を調査し, クラス分布の分析, 性能評価, バイアスの影響評価に焦点をあてる。一般的なCNNモデルとヴィジュアルトランスフォーマー(ViT)をnuScenesデータセットと組み合わせることで,表現不足のクラスに対する検出の相違を評価できる。関連する研究と比較して、モデル最適化とバイアス軽減のためのメトリクス特化学習とコスト感学習に焦点を合わせ、データ拡張と再サンプリングを含む。提案手法を用いて、CNNモデルでは、IoU(\%)とNDS(\%)のメトリクスが71.3から75.6、80.6から83.7に改善されている。同様に、ViTでは、IoUとNDSのメトリクスが74.9から79.2、83.8から87.1に改善されているのを観察する。本研究は,データセットにおけるマイノリティクラスに対する包括性を向上しつつ,信頼性の高いモデルの開発に寄与する。 The accuracy and fairness of perception systems in autonomous driving are essential, especially for vulnerable road users such as cyclists, pedestrians, and motorcyclists who face significant risks in urban driving environments. While mainstream research primarily enhances class performance metrics, the hidden traits of bias inheritance in the AI models, class imbalances and disparities within the datasets are often overlooked. Our research addresses these issues by investigating class imbalances among vulnerable road users, with a focus on analyzing class distribution, evaluating performance, and assessing bias impact. Utilizing popular CNN models and Vision Transformers (ViTs) with the nuScenes dataset, our performance evaluation indicates detection disparities for underrepresented classes. Compared to related work, we focus on metric-specific and Cost-Sensitive learning for model optimization and bias mitigation, which includes data augmentation and resampling. Using the proposed mitigation approaches, we see improvement in IoU(\%) and NDS(\%) metrics from 71.3 to 75.6 and 80.6 to 83.7 for the CNN model. Similarly, for ViT, we observe improvement in IoU and NDS metrics from 74.9 to 79.2 and 83.8 to 87.1. This research contributes to developing reliable models while enhancing inclusiveness for minority classes in datasets.	翻訳日:2024-05-15 00:33:27 公開日:2024-05-12
# 厳格なAI監査にはブラックボックスアクセスが不十分 Black-Box Access is Insufficient for Rigorous AI Audits ( http://arxiv.org/abs/2401.14446v2 ) ライセンス: Link先を確認	Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell,	(参考訳) AIシステムの外部監査は、AIガバナンスの重要なメカニズムとして、ますます認識されている。しかし、監査の有効性は監査人に与えられるアクセスの程度に依存する。最近の最先端のAIシステムの監査は、主にブラックボックスアクセスに依存しており、監査官はシステムに問い合わせて出力を観察することしかできない。しかしながら、システムの内部動作(例えば重量、アクティベーション、勾配)へのホワイトボックスアクセスは、監査人がより強力な攻撃を行い、モデルをより徹底的に解釈し、微調整を行うことを可能にする。一方、トレーニングやデプロイメント情報(方法論、コード、ドキュメンテーション、データ、デプロイメントの詳細、内部評価からの発見など)への外部アクセスは、監査人が開発プロセスを精査し、より対象とする評価を設計できるようにします。本稿では,ブラックボックス監査の限界と,ホワイトボックス監査とアウトサイドボックス監査の利点について検討する。また、これらの監査を最小限のセキュリティリスクで実施するための技術的、物理的、法的保護についても論じる。その結果,(1)監査員が使用するアクセスと手法に関する透明性は,監査結果を適切に解釈するには必要であり,(2)ブラックボックスのみよりも,ホワイトボックスとアウト・ザ・ボックスのアクセスの方がかなり精査できることがわかった。 External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.	翻訳日:2024-05-15 00:33:27 公開日:2024-05-12
# マルチエージェント会話による診断精度の向上:認知バイアス軽減のための大規模言語モデルを用いて Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias ( http://arxiv.org/abs/2401.14589v2 ) ライセンス: Link先を確認	Yu He Ke, Rui Yang, Sui An Lie, Taylor Xin Yi Lim, Hairil Rizal Abdullah, Daniel Shu Wei Ting, Nan Liu,	(参考訳) 背景: 臨床的意思決定における認知バイアスは, 診断の誤りや患者準最適結果に大きく寄与する。これらのバイアスに対処することは、医療分野における深刻な課題である。目的:本研究では,大規模言語モデル(LLM)が,マルチエージェントフレームワークの利用を通じてバイアスを軽減する役割について検討する。我々は,多エージェント会話による臨床意思決定プロセスのシミュレートを行い,診断精度の向上に有効性を評価する。方法: 認知バイアスが誤診となった16件の症例報告を文献から同定した。マルチエージェントフレームワークでは,GPT-4を利用して4つの模擬エージェント間の相互作用を促進し,臨床チームのダイナミクスを再現した。各エージェントにはそれぞれ異なる役割がある。 1)議論の後に最終診断を行う。 2 悪魔の主張及び正当性確認及び偏見 3 早期閉鎖バイアスを低減するための議論の指導者及びファシリテーター 4) 結果を記録・要約すること。初発診断, 上発鑑別診断, 最終2つの鑑別診断の精度について, 合計80のシミュレーションを行った。結果: 初期診断と最終診断の両方を評価する80の回答において, 初診の精度は0% (0/80) であったが, マルチエージェントによる議論の結果, トップディファレンシャル診断の精度は71.3% (57/80), 最終2つのディファレンシャル診断の精度は80.0% (64/80) に向上した。結論: このフレームワークは、誤解を招く初期調査のシナリオであっても、誤解を再評価し、修正する能力を示した。 LLM駆動型多エージェント会話フレームワークは、診断に難渋する医療シナリオにおける診断精度を高めることを約束している。 Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses. Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80). Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.	翻訳日:2024-05-15 00:33:27 公開日:2024-05-12
# AI信頼度測定のための決定理論フレームワーク A Decision Theoretic Framework for Measuring AI Reliance ( http://arxiv.org/abs/2401.15356v4 ) ライセンス: Link先を確認	Ziyang Guo, Yifan Wu, Jason Hartline, Jessica Hullman,	(参考訳) 人間はしばしば人工知能(AI)システムの助けを借りて意思決定をする。一般的なパターンは、最終決定をコントロールしている人間に対して、AIがアクションを推奨することである。研究者は、補完的なパフォーマンスを達成する上で重要な要素として、人間がAIに適切に依存していることを確認する。このような研究で用いられる適切な依存の定義には、正式な統計的根拠が欠如しており、矛盾を招く可能性があると論じる。統計的決定理論に基づく信頼の形式的定義を提案する。これは、意思決定者がAIの推奨に従う確率として信頼の概念を、人間が信号の識別や状況に関する正確な信念を形成する際の課題と区別するものである。私たちの定義は、人間とAIの相補性と信頼に関する研究の設計と解釈を導くのに使用できるフレームワークを生み出します。文献からのAIによる最新の意思決定研究を用いて、我々のフレームワークは、信号の正確な識別ができないために、損失と損失との信頼の相違による損失を分離するためにどのように使用できるかを実証する。これらの損失を,行動意思決定者と同じ意思決定課題に直面した合理的な意思決定者によって達成される期待された報酬によって定義される相補的性能の基準とベンチマークと比較することにより評価する。 Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.	翻訳日:2024-05-15 00:23:41 公開日:2024-05-12
# より深いかより広いか:ソボレフ損失を伴う最適一般化誤差からの展望 Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss ( http://arxiv.org/abs/2402.00152v2 ) ライセンス: Link先を確認	Yahong Yang, Juncai He,	(参考訳) ニューラルネットワークのアーキテクチャを構築することは、機械学習コミュニティにとって困難な追求であり、より深く進むか、より広く進むかというジレンマは、依然として永続的な疑問である。本稿では,よりフレキシブルな層数を持つディープニューラルネットワーク (DeNN) と限られた層を持つワイドニューラルネットワーク (WeNN) を比較し,ソボレフの損失における最適一般化誤差に着目した。分析研究により、ニューラルネットワークのアーキテクチャは、サンプルポイントの数、ニューラルネットワーク内のパラメータ、損失関数の正則性など、様々な要因に大きく影響を受けることが判明した。具体的には、より多くのパラメータがWeNNを好む傾向にあり、一方、サンプルポイントの増加と損失関数の規則性の向上は、DeNNの採用に傾いている。この理論を、ディープ・リッツと物理インフォームド・ニューラルネットワーク(PINN)法を用いた偏微分方程式に応用し、ニューラルネットワークの設計を導く。 Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.	翻訳日:2024-05-15 00:23:41 公開日:2024-05-12
# Dataset Condensation Driven Machine Unlearning Dataset Condensation Driven Machine Unlearning ( http://arxiv.org/abs/2402.00195v2 ) ライセンス: Link先を確認	Junaid Iqbal Khan,	(参考訳) データ規制要件とプライバシ保護機械学習の現在のトレンドは、機械学習の重要性を強調している。余分なサンプルを補足して再訓練することで、未学習のトレーニングデータに対する素直なアプローチは、計算上の課題に影響を受けやすい。これらの課題は、機械学習の傘の下に落ちてくるテクニックの集合を通じて、効果的に対処されてきた。しかし、未学習モデルの実用性とプライバシと調和して、永続的な計算課題を扱うのに十分でないことがまだ残っている。これは、トレーニングデータセットの観点から、近似アンラーニングの計算複雑性を改善する作業が不足しているためである。本稿では,画像分類の文脈において,機械学習の重要な要素としてデータセットの凝縮を導入することで,このギャップを埋めることを目的とする。この目的を達成するために、機械学習のプライバシ、ユーティリティ、効率のバランスをとる新しいデータセット凝縮技術と革新的なアンラーニングスキームを提案する。さらに,機械のアンラーニングを計測するための新しい効果的な手法を提案し,その適用方法として,メンバシップ推論とモデル逆転攻撃を防御する手法を提案する。さらに,本手法の新たな応用として,未学習サンプルの影響を受けずに任意のモデルを迅速に学習できる「凝縮モデル」からデータを抽出する手法を提案する。対応するコードは \href{https://github.com/algebraicdianuj/DC_U}{URL} で公開されている。 The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.	翻訳日:2024-05-15 00:23:41 公開日:2024-05-12
# NeuroCine:人間の脳活動から映像を復号する NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties ( http://arxiv.org/abs/2402.01590v2 ) ライセンス: Link先を確認	Jingyuan Sun, Mingxiao Li, Zijiao Chen, Marie-Francine Moens,	(参考訳) 人間の脳の視覚処理の複雑さを理解するために、脳の活動から動的視覚体験を再構築することは、難しいが魅力的な試みとして現れている。近年の進歩は、非侵襲的な脳記録からの静的な画像の再構成に成功しているが、連続的な脳活動の動画フォーマットへの変換の領域はいまだ未解明のままである。本研究では、ノイズ、空間冗長性、時間ラグなどのfMRIデータを復号化するための新しい二相フレームワークであるNeuroCineを紹介する。本フレームワークは、コントラスト学習fMRI表現のための空間マスキングと時間補間に基づく拡張と、ビデオ生成のための依存先行雑音によって強化された拡散モデルを提案する。 SSIMが測定した,fMRIデータセットにおける3つの被験者の脳活動の復号化について,各被験者の脳活動の復号化について,それぞれ${20.97\%}$,${31.00\%}$,${12.30\%}$の顕著なマージンで,従来の最先端モデルを上回る有望な結果を示す。さらに,本モデルが既存の脳構造や機能と一致していることが示唆され,その生物学的妥当性と解釈可能性が示唆された。 In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor. While recent advancements have achieved success in reconstructing static images from non-invasive brain recordings, the domain of translating continuous brain activities into video format remains underexplored. In this work, we introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data, such as noises, spatial redundancy and temporal lags. This framework proposes spatial masking and temporal interpolation-based augmentation for contrastive learning fMRI representations and a diffusion model enhanced by dependent prior noise for video generation. Tested on a publicly available fMRI dataset, our method shows promising results, outperforming the previous state-of-the-art models by a notable margin of ${20.97\%}$, ${31.00\%}$ and ${12.30\%}$ respectively on decoding the brain activities of three subjects in the fMRI dataset, as measured by SSIM. Additionally, our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.	翻訳日:2024-05-15 00:23:41 公開日:2024-05-12
# 微調整強化学習モデルは秘かに緩和問題である Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem ( http://arxiv.org/abs/2402.02868v2 ) ライセンス: Link先を確認	Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś,	(参考訳) ファインチューニング(英: Fine-tuning)は、ファウンデーションモデルの成功例で最近紹介されたように、実践者が事前訓練された能力の伝達を可能にする広範なテクニックである。しかし、微調整強化学習(RL)モデルは依然として課題である。この研究は、行動と観察の間の相互作用によってRL設定でアクセント化され、事前訓練された能力を忘れる、移動不良の1つの特定の原因を概念化する。すなわち、モデルは、微調整の初期段階に訪れない下流タスクの状態部分空間を悪化させ、事前学習によりモデルがうまく振る舞う。このようにして、期待される転送利益を失うのです。この問題が発生した場合の条件を特定し、それが一般的であり、多くの場合破滅的であることを示す。課題であるNetHackとMontzumaのRevenge環境の詳細な実証分析を通じて、標準的な知識保持技術が問題を緩和し、事前学習された能力を最大限に活用できることを示す。特にNetHackでは、Human Monkシナリオの前のベストスコアを5ドルKから10ドルKポイントに改善した、ニューラルモデルのための新たな最先端技術を実現しています。 Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario.	翻訳日:2024-05-15 00:23:41 公開日:2024-05-12
# テキストと画像の拡散を優先的に調整するDense Reward View A Dense Reward View on Aligning Text-to-Image Diffusion with Preference ( http://arxiv.org/abs/2402.08265v2 ) ライセンス: Link先を確認	Shentao Yang, Tianqi Chen, Mingyuan Zhou,	(参考訳) 好みのテキスト・画像拡散モデル(T2I)の調整が研究の注目を集めている。優先データによるT2Iを直接最適化する以前の研究は存在するが、これらの手法は、生成過程のシーケンシャルな性質を無視しつつ、拡散逆鎖全体に対する遅延報酬のバンドイット仮定の下で開発されている。これは選好アライメントの有効性と効率を損なう可能性がある。本稿では,T2I逆鎖の初期ステップを強調する,より微細な報酬視点を導出し,トラクタブルアライメントの目的を導出する。特に、時間的対称性を破り、T2I生成階層に適合するように、DPOスタイルの明示的回帰自由目的に時間的割引を導入する。単一および複数プロンプト生成実験において,本手法は定量的および定性的に,強い関連するベースラインと競合する。我々のアプローチの洞察を説明するために、さらなる調査が行われた。 Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.	翻訳日:2024-05-15 00:13:55 公開日:2024-05-12
# ニューラルネットワークにおけるインテクスト学習による人間のカリキュラム効果 Human Curriculum Effects Emerge with In-Context Learning in Neural Networks ( http://arxiv.org/abs/2402.08674v2 ) ライセンス: Link先を確認	Jacob Russin, Ellie Pavlick, Michael J. Frank,	(参考訳) 人間の学習は規則のような構造と訓練に使用される例のカリキュラムに敏感である。簡潔な規則によって管理されるタスクでは、関連する例が試行錯誤によってブロックされる場合、学習はより堅牢になるが、そのような規則がなければインターリービングはより効果的である。これまでのところ、これらの一見矛盾する効果を同時に捉えた神経モデルはない。ここでは、メタラーニングで訓練されたニューラルネットワークと大規模言語モデル(LLM)の両方において、同じトレードオフが'in-context learning'(ICL)'で自然に現れることを示す。 ICLは、アクティベーションダイナミックスで実装されたインナーループアルゴリズムを通じて、重み変更なしで新しいタスク‘in context'’を学習する機能である。事前訓練されたLLMとメタラーニングトランスフォーマーを用いた実験では、ICLはルールのような構造を含むタスクにおいて人間に示されるブロッキングの利点を示し、逆に、同時に重み付き学習は、そのような構造が欠如しているタスクにおいて人間に観察されるインターリービングの利点を再現することを示した。 Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with ``in-context learning'' (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks ``in context'' -- without weight changes -- via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.	翻訳日:2024-05-15 00:13:55 公開日:2024-05-12
# HyperAgent: 複雑な環境のためのシンプルでスケーラブルで効率的な強化学習フレームワーク HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments ( http://arxiv.org/abs/2402.10228v4 ) ライセンス: Link先を確認	Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo,	(参考訳) 資源制約下での複雑なタスクを解決するためには、強化学習(RL)エージェントは単純で効率的でスケーラブルで、(1)大きな状態空間と(2)相互作用データの連続的な蓄積に対処する必要がある。一般値関数に関連付けられた後続の計算効率の高いインクリメンタル近似を,共役性やデータ効率のよい動作選択を不要に実現した,ハイパーモデルとインデックスサンプリングを特徴とするRLフレームワークHyperAgentを提案する。 HyperAgentの実装は簡単で、Double-DQNに必要なモジュールをひとつ追加するだけでよい。 HyperAgentは、大規模なディープRLベンチマークで堅牢なパフォーマンスを提供する最初の方法であり、証明可能なスケーラブルなステップ毎の計算複雑性を実現し、表の仮定の下でサブ線形後悔を実現する。 HyperAgentは、問題のサイズに合わせて最適にスケールし、Atariベンチマークの下でのデータと計算の両方で大幅な効率向上を示すエピソードでディープシーのハードな探索問題を解決することができる。理論解析の核となるのは、ジョンソン-リンデンシュトラウスの非自明なマーチンゲール拡大であるシーケンシャルランダム射影の最初の解析ツールによって実現された逐次後近似論である。この研究はRLの理論的および実践的な領域を橋渡しし、RLアルゴリズム設計の新しいベンチマークを確立した。 To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.	翻訳日:2024-05-15 00:13:55 公開日:2024-05-12
# 逐次ランダム投影のための確率ツール Probability Tools for Sequential Random Projection ( http://arxiv.org/abs/2402.14026v3 ) ライセンス: Link先を確認	Yingru Li,	(参考訳) 本稿では,不確実性の下での逐次的意思決定の課題に根ざした,逐次的ランダムプロジェクションに適した最初の確率的フレームワークを提案する。この分析は、逐次決定過程に固有の適応機構の副産物である確率変数の逐次依存と高次元の性質によって複雑である。本研究は停止過程の新規な構築を特徴とし,連続的に相互に相互に相互に相互に相互に相互に連携する一連の集中事象の解析を容易にする。停止過程から導かれる自己正規化過程において混合の手法を用いることで、所望の非漸近確率境界を達成する。この境界はジョンソン・リンデンシュトラウス(JL)補題の非自明なマーチンゲール拡大を表し、ランダム射影とシーケンシャル解析に関する文献への先駆的な貢献を示している。 We introduce the first probabilistic framework tailored for sequential random projection, an approach rooted in the challenges of sequential decision-making under uncertainty. The analysis is complicated by the sequential dependence and high-dimensional nature of random variables, a byproduct of the adaptive mechanisms inherent in sequential decision processes. Our work features a novel construction of a stopped process, facilitating the analysis of a sequence of concentration events that are interconnected in a sequential manner. By employing the method of mixtures within a self-normalized process, derived from the stopped process, we achieve a desired non-asymptotic probability bound. This bound represents a non-trivial martingale extension of the Johnson-Lindenstrauss (JL) lemma, marking a pioneering contribution to the literature on random projection and sequential analysis.	翻訳日:2024-05-15 00:13:55 公開日:2024-05-12
# ACE : 因果性を考慮したエントロピー規則化によるオフポリシィアクター批判 ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization ( http://arxiv.org/abs/2402.14528v2 ) ライセンス: Link先を確認	Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu,	(参考訳) 政策学習過程における異なる原始的行動の異なる重要性は、以前のモデルフリーなRLアルゴリズムによって見過ごされてきた。この知見を生かして、異なる行動次元と報酬の間の因果関係を探求し、訓練中の様々な原始的行動の重要性を評価する。因果関係を意識したエントロピーという用語を導入し、効率的に探索するための潜在的影響の高いアクションを効果的に識別し、優先順位付けする。さらに,特定の原始的行動に過度に焦点を合わせることを防ぐために,勾配休眠現象を解析し,休眠誘導リセット機構を導入し,本手法の有効性をさらに高める。提案アルゴリズムであるACE:Off-policy Actor-critic with Causality-aware Entropy regularizationは、7つのドメインにまたがる29の異なる連続制御タスクに対して、モデルのないRLベースラインと比較して大きな性能上の優位性を示す。ベンチマーク結果とビデオはhttps://ace-rl.github.io/.com/で公開されている。 The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.	翻訳日:2024-05-15 00:04:06 公開日:2024-05-12
# スピン軌道結合二元ボソニック凝縮体における半渦ソリトンとその励起状態 Semi-vortex solitons and their excited states in spin-orbit-coupled binary bosonic condensates ( http://arxiv.org/abs/2403.01458v2 ) ライセンス: Link先を確認	Haiming Deng, Jinqing Li, Zhaopin Chen, Yaohui Liu, Dong Liu, Chunzhi Jiang, Chao Kong, Boris A. Malomed,	(参考訳) 半渦(SV)型の2次元2成分基本ソリトンは、その成分の渦率$(s_{+},s_{-})=(0,1)$で、スピン軌道結合(SOC)二元系における安定基底状態(GS)であり、系の臨界崩壊の可能性にもかかわらず、両成分で作用する接触自己誘引と凝縮することが知られている。しかし、SVソリトン(英語版)の励起状態(ESs)は、(s_{+},s_{-})=(S_{+},S_{+}+1)$と$S_{+}=1,2,3,...$が同じ系で不安定である。本研究では,SOC系におけるSVソリトンESを2成分の自己相互作用の反対の符号で構成する。主な発見はES-SVソリトンの安定性であり、追加の渦度は(少なくとも)$S_{+}=6$である。臨界崩壊の開始のノルムのしきい値である$N_{\mathrm{thr}}$は、一般的に知られている臨界値より高く、$N_{c}\approx 5.85$は単成分タウンズソリトンと結びつき、$N_{\mathrm{thr}}$は$S_{+}$の成長とともに増加する。 GS-SVソリトンの安定運動速度間隔も見いだされた。以上の結果から, 安定渦ソリトンの生成は, トポロジカル電荷の高い解法であることが示唆された。 It is known that two-dimensional two-component fundamental solitons of the semi-vortex (SV) type, with vorticities $(s_{+},s_{-})=(0,1)$ in their components, are stable ground states (GSs) in the spin-orbit-coupled (SOC) binary Bose-Einstein condensate with the contact self-attraction acting in both components, in spite of the possibility of the critical collapse in the system. However, excited states(ESs) of the SV solitons, with the vorticity set $(s_{+},s_{-})=( S_{+},S_{+}+1)$ and $S_{+}=1,2,3,...$, are unstable in the same system. We construct ESs of SV solitons in the SOC system with opposite signs of the self-interaction in the two components. The main finding is stability of the ES-SV solitons, with the extra vorticity (at least) up to $S_{+}=6$. The threshold value of the norm for the onset of the critical collapse, $N_{\mathrm{thr}}$, in these excited states is higher than the commonly known critical value, $N_{c}\approx 5.85$,associated with the single-component Townes solitons, $N_{\mathrm{thr}}$ increasing with the growth of $S_{+}$. A velocity interval for stable motion of the GS-SV solitons is found too. The results suggest a solution for the challenging problem of the creation of stable vortex solitons with high topological charges.	翻訳日:2024-05-15 00:04:06 公開日:2024-05-12
# 時空重畳における量子アルゴリズム Quantum Algorithms in a Superposition of Spacetimes ( http://arxiv.org/abs/2403.02937v3 ) ライセンス: Link先を確認	Omri Shmueli,	(参考訳) 量子コンピュータは私たちの情報処理能力に革命をもたらすと期待されている。古典から量子コンピューティングへの進歩は、古典から量子物理学への進化の産物である。自然の疑問は、物理学が将来どんなことを許すのかということだ。物理学のより高度な理論は、量子コンピューティングを超えて、我々の計算能力を高めることができるのか? 物理学における活発な研究分野は、量子力学(QM)と一般相対性理論(GR)を量子重力の統一理論(QG)に結合しようとするときに形成される説明可能な量子力学の範囲外の理論現象の研究である。 QGは因果構造と事象順序の量子重ね合わせの可能性を示すことが知られている。量子情報理論の文献では、これはユニタリ進化順序の重ね合わせに翻訳される。本研究では、QGに基づく自然計算モデルの最初の例を示し、標準量子計算(標準硬度仮定の下で)よりも指数的な高速化を提供する。我々は、ユニタリ進化順序の重ね合わせを生成する能力を持つ量子コンピュータのモデルと複雑性の尺度を定義し、そのようなコンピュータが多項式時間で解くことができることを示す: グラフ同型問題(英語版)(\mathsf{GI}$)とギャップ$O\left(n \sqrt{n} \right)$)であるギャップのギャップを持つギャップクローズトベクトル問題(英語版)(\mathsf{GapCVP}$)である。これらの問題は、通常の量子コンピュータでは解決が難しいと専門家によって信じられている。興味深いことに、我々のモデルはオーバーパワーとは思えず、$\mathbf{NP}$ や $\mathbf{SZK}$ のように、コンピュータ科学において難しいと考えられるすべての複雑性クラスを解く明確な方法が見つからなかった。 Quantum computers are expected to revolutionize our ability to process information. The advancement from classical to quantum computing is a product of our advancement from classical to quantum physics -- the more our understanding of the universe grows, so does our ability to use it for computation. A natural question that arises is, what will physics allow in the future? Can more advanced theories of physics increase our computational power, beyond quantum computing? An active field of research in physics studies theoretical phenomena outside the scope of explainable quantum mechanics, that form when attempting to combine Quantum Mechanics (QM) with General Relativity (GR) into a unified theory of Quantum Gravity (QG). QG is known to present the possibility of a quantum superposition of causal structure and event orderings. In the literature of quantum information theory, this translates to a superposition of unitary evolution orders. In this work we show a first example of a natural computational model based on QG, that provides an exponential speedup over standard quantum computation (under standard hardness assumptions). We define a model and complexity measure for a quantum computer that has the ability to generate a superposition of unitary evolution orders, and show that such computer is able to solve in polynomial time two of the fundamental problems in computer science: The Graph Isomorphism Problem ($\mathsf{GI}$) and the Gap Closest Vector Problem ($\mathsf{GapCVP}$), with gap $O\left( n \sqrt{n} \right)$. These problems are believed by experts to be hard to solve for a regular quantum computer. Interestingly, our model does not seem overpowered, and we found no obvious way to solve entire complexity classes that are considered hard in computer science, like the classes $\mathbf{NP}$ and $\mathbf{SZK}$.	翻訳日:2024-05-15 00:04:06 公開日:2024-05-12
# カメラLiDARフュージョンを用いた自律走行用多物体追跡 Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving ( http://arxiv.org/abs/2403.04112v2 ) ライセンス: Link先を確認	Riccardo Pieroni, Simone Specchia, Matteo Corno, Sergio Matteo Savaresi,	(参考訳) 本稿では、カメラとLiDARデータを組み合わせた自動運転車のための新しいマルチモーダルマルチオブジェクトトラッキング(MOT)アルゴリズムを提案する。カメラフレームは最先端の3Dオブジェクト検出器で処理されるのに対し、古典的なクラスタリング技術はLiDAR観測に使用される。提案したMOTアルゴリズムは、3段階のアソシエーションプロセスと、検出された動的障害物の運動を推定する拡張カルマンフィルタと、トラック管理フェーズとを備える。 EKF運動モデルは、観測対象の電流測定された相対位置と向きと、エゴ車両の縦・角速度を入力として要求する。多くの最先端のマルチモーダルMOTアプローチとは異なり、提案アルゴリズムはエゴのグローバルなポーズの地図や知識に依存しない。さらに、カメラ専用に3D検出器を使用し、使用するLiDARセンサーの種類に依存しない。このアルゴリズムはシミュレーションと実世界のデータの両方で検証され、良好な結果が得られる。 This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.	翻訳日:2024-05-15 00:04:06 公開日:2024-05-12
# 長い管内を移動する量子渦ループのエネルギースペクトル The energy spectrum of a quantum vortex loop moving in a long pipe ( http://arxiv.org/abs/2403.06441v2 ) ライセンス: Link先を確認	S. V. Talalov,	(参考訳) 本研究では, 細長い管内を移動する量子渦ループのエネルギースペクトルの問題を初めて解く。渦フィラメントは局所誘導近似に記述されている。我々は、この力学系を新しい方法を用いて定量化し、循環$\Gamma$とエネルギー値$E$の非自明な結果をもたらす。最終形式では、渦ループのスペクトルを ''Regge trajectory'' の形で示し、$E = E(\Gamma)$ とする。渦量子化問題は2流体流体力学や他の従来の手法の外部にあると考えられる。 In this study, the problem of the energy spectrum of a quantum vortex loop moving in a thin long pipe is solved for the first time. The vortex filament is described in the Local Induction Approximation. We quantize this dynamic system using a new method, which leads to non-trivial results for circulation $\Gamma$ and energy values $E$. In the final form, we present the spectrum of the vortex loop in the form of a ''Regge trajectory'' $E = E(\Gamma)$. The vortex quantization problem is considered outside of two-fluid hydrodynamics and other conventional approaches.	翻訳日:2024-05-15 00:04:06 公開日:2024-05-12
# シュレーディンガー化による偏微分方程式の量子回路 Quantum Circuits for partial differential equations via Schrödingerisation ( http://arxiv.org/abs/2403.10032v2 ) ライセンス: Link先を確認	Junpeng Hu, Shi Jin, Nana Liu, Lei Zhang,	(参考訳) 量子コンピューティングは、特に大規模PDEシミュレーションにおいて、古典コンピューティングと比較して、大きなスピードアップを達成するための有望な道として登場した。主要な量子的アプローチの1つは、シュリンガー型方程式にのみ直接適用可能なハミルトニアンシミュレーションの利用である。この制限に対処するため、一般線形 PDE を Schr\"odinger-type equation に変換するためにワープ変換を用いることで、Schr\"odingerisation 技術が開発された。しかし、Schr\"オーダライゼーション技術の開発にもかかわらず、一般のPDEを解くための対応する量子回路の明示的な実装は設計されていない。本稿では、Schr\"オーダライゼーション技術を用いた一般PDEのための量子アルゴリズムの詳細な実装について述べる。提案手法の有効性を実証するために, 熱方程式の例と, 風上スキームにより近似された対流方程式を提案する。複素性解析は、これらのアルゴリズムの量子的利点を古典的アルゴリズムよりも高次元で示すためにも行われる。 Quantum computing has emerged as a promising avenue for achieving significant speedup, particularly in large-scale PDE simulations, compared to classical computing. One of the main quantum approaches involves utilizing Hamiltonian simulation, which is directly applicable only to Schr\"odinger-type equations. To address this limitation, Schr\"odingerisation techniques have been developed, employing the warped transformation to convert general linear PDEs into Schr\"odinger-type equations. However, despite the development of Schr\"odingerisation techniques, the explicit implementation of the corresponding quantum circuit for solving general PDEs remains to be designed. In this paper, we present detailed implementation of a quantum algorithm for general PDEs using Schr\"odingerisation techniques. We provide examples of the heat equation, and the advection equation approximated by the upwind scheme, to demonstrate the effectiveness of our approach. Complexity analysis is also carried out to demonstrate the quantum advantages of these algorithms in high dimensions over their classical counterparts.	翻訳日:2024-05-14 23:54:21 公開日:2024-05-12
# DTOR: 異常を説明するための決定木外部回帰器 DTOR: Decision Tree Outlier Regressor to explain anomalies ( http://arxiv.org/abs/2403.10903v4 ) ライセンス: Link先を確認	Riccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio Cosentini,	(参考訳) 外乱の発生と発生のメカニズムを説明することは、様々な領域において非常に重要である。誤動作、詐欺、脅迫は正しく識別されるだけでなく、効果的に行動可能な対策を実行するために有効な説明を必要とすることが多い。異常を識別するための高度な機械学習アプローチを、これまで以上に広く利用することで、このような説明がより困難になる。本稿では,異常検出モデルにより生成された異常スコアを推定することにより,個々のデータポイントに対する規則に基づく説明を生成する手法であるDTORを提案する。これはまず、推定スコアを計算し、データポイントスコアに関連する相対パスを抽出する決定木回帰器を適用する。本結果は,多数の特徴を持つデータセットにおいても,DTORの堅牢性を示すものである。さらに、他の規則に基づくアプローチとは対照的に、生成された規則は説明すべき点によって一貫して満たされる。さらに、我々の評価基準は、実行時間を短縮し、外乱説明タスクにおけるAnchorsに匹敵する性能を示す。 Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.	翻訳日:2024-05-14 23:54:21 公開日:2024-05-12
# 顔表情認識のための注意融合型エモティックマスク付きオートエンコーダ Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition ( http://arxiv.org/abs/2403.13039v3 ) ライセンス: Link先を確認	Bach Nguyen-Xuan, Thien Nguyen-Hoang, Thanh-Huy Nguyen, Nhu Tai-Do,	(参考訳) 表情認識(FER)はコンピュータビジョンにおける重要な課題であり、様々な領域にまたがる多様な応用がある。表現認識モデルの一般化能力を損なうような限られたFERデータセットの課題に対処することは、性能向上に不可欠である。本稿では,表現分類のためのMAE-Face self-supervised learning (SSL) 法と多視点統合注意機構を統合し,特に第6回感情行動分析(ABAW)コンペティションで紹介する。人間の表情の変化を強調する高次特徴を学習する前に、外見(外見)からの低次特徴情報を活用することにより、本研究は、検査された視点(主観)を改善するための、単純かつ革新的な方法を提供することを目指している。また、重要な顔の特徴を強調することを目的とした、実装が容易でトレーニングなしのフレームワークを提案し、そのような機能がモデルのガイドとして機能し、重要なローカル要素に焦点を当てるかどうかを判断する。本手法の有効性は,Aff-wild2データセットにおけるモデル性能の向上によって検証される。 Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.	翻訳日:2024-05-14 23:54:21 公開日:2024-05-12
# サイバー犯罪とオンライン詐欺における良心の武器化 : 新しいシステム理論 Weaponization of Conscience in Cybercrime and Online Fraud: A Novel Systems Theory ( http://arxiv.org/abs/2403.14667v2 ) ライセンス: Link先を確認	Michelle Espinoza,	(参考訳) 本論では, 詐欺師が行為を偽装したり, 他人を強要したり, 被害者を欺いたりするための, 複雑なシステムと戦術としての良心の武器化の概念を紹介する。本研究は、軍事プロパガンダと心理学的操作原理の理論的基盤から導かれる概念的アプローチを採用し、良心の兵器化に対する理解と防御のためのレンズとして機能させる。 This article introduces the concept of weaponization of conscience as a complex system and tactic employed by fraudsters to camouflage their activity, coerce others, or to deceive their victims. This study adopts a conceptual approach, drawing from the theoretical underpinnings of military propaganda and psychological operations doctrines and adapting them to serve as a lens through which to understand and defend against weaponization of conscience.	翻訳日:2024-05-14 23:54:21 公開日:2024-05-12
# 保存光子電流 Conserved photon current ( http://arxiv.org/abs/2403.16919v2 ) ライセンス: Link先を確認	Margaret Hawton,	(参考訳) 保存光子電流は、電磁四電位場テンソル演算子によって満たされる可換関係から導かれる。密度は正および負の周波数項に対する和であり、どちらも正の数密度に寄与し、共通の方向に伝播する。離散正および負の周波数励起はどちらも光子として同定される。光子数は光子密度の空間積分に等しいが、源やシンクが存在しない状態で保存される。 A conserved photon current is derived from the commutation relations satisfied by the electromagnetic four-potential and field tensor operators. The density is found to be a sum over positive and negative frequency terms, both of which contribute a positive number density and propagate in a common direction. Discrete positive and negative frequency excitations are both identified as photons. Photon number, equal to the spatial integral of photon density, is conserved in the absence of sources and sinks.	翻訳日:2024-05-14 23:54:21 公開日:2024-05-12
# ディープチャネル事前制御による非教師なし機能強化モジュールによる実世界の劣化における視覚認識の促進 Boosting Visual Recognition in Real-world Degradations via Unsupervised Feature Enhancement Module with Deep Channel Prior ( http://arxiv.org/abs/2404.01703v2 ) ライセンス: Link先を確認	Zhanwen Liu, Yuhang Li, Yang Wang, Bolin Gao, Yisheng An, Xiangmo Zhao,	(参考訳) 通常の環境下での自動運転車の環境認識は、過去10年間にかなりの成功を収めてきた。しかし、霧、低照度、動きのぼかしなどの様々な不快な条件は、画像の品質を低下させ、自動運転の安全性に重大な脅威をもたらす。すなわち、劣化画像に適用した場合、画像の統計的・構造的特性の破壊による特徴量損失やアーチファクトの干渉により、最先端の視覚モデルが性能低下に悩まされることがしばしばある。そこで本研究では,劣化した視覚認識のための新しいDeep Channel Prior (DCP)を提案する。具体的には、事前学習されたモデルの深部表現空間において、劣化した特徴と同一の劣化型とのチャネル相関が、異なる内容や意味を持つ場合でも一様分布を持ち、高分離性特徴空間における劣化した特徴と明確な表現の間のマッピング関係の学習を容易にすることを観察する。そこで,UFEMの第1段階では,多目的機構を導入して,高分離性特徴空間における遅延コンテンツ復元とアーティファクト除去を実現する,新しいプラグアンドプレイunsupervised Feature Enhancement Module (UFEM)を提案する。次に、DCPの指導の下、大域的相関変調のための第2段階に生成した特徴を移し、高品質で認識しやすい特徴を得る。 3つのタスクと8つのベンチマークデータセットの評価結果から,提案手法は実劣化条件下での事前学習モデルの性能を総合的に向上できることを示した。ソースコードはhttps://github.com/liyuhang166/Deep_Channel_Priorで入手できる。 The environmental perception of autonomous vehicles in normal conditions have achieved considerable success in the past decade. However, various unfavourable conditions such as fog, low-light, and motion blur will degrade image quality and pose tremendous threats to the safety of autonomous driving. That is, when applied to degraded images, state-of-the-art visual models often suffer performance decline due to the feature content loss and artifact interference caused by statistical and structural properties disruption of captured images. To address this problem, this work proposes a novel Deep Channel Prior (DCP) for degraded visual recognition. Specifically, we observe that, in the deep representation space of pre-trained models, the channel correlations of degraded features with the same degradation type have uniform distribution even if they have different content and semantics, which can facilitate the mapping relationship learning between degraded and clear representations in high-sparsity feature space. Based on this, a novel plug-and-play Unsupervised Feature Enhancement Module (UFEM) is proposed to achieve unsupervised feature correction, where the multi-adversarial mechanism is introduced in the first stage of UFEM to achieve the latent content restoration and artifact removal in high-sparsity feature space. Then, the generated features are transferred to the second stage for global correlation modulation under the guidance of DCP to obtain high-quality and recognition-friendly features. Evaluations of three tasks and eight benchmark datasets demonstrate that our proposed method can comprehensively improve the performance of pre-trained models in real degradation conditions. The source code is available at https://github.com/liyuhang166/Deep_Channel_Prior	翻訳日:2024-05-14 23:44:37 公開日:2024-05-12
# 2レベルフィードバック制御によるネットワークシステムの侵入耐性 Intrusion Tolerance for Networked Systems through Two-Level Feedback Control ( http://arxiv.org/abs/2404.01741v4 ) ライセンス: Link先を確認	Kim Hammar, Rolf Stadler,	(参考訳) サービスレプリカを2段階最適制御問題とするシステムの侵入耐性を定式化する。ローカルレベルではノードコントローラが侵入回復を行い、グローバルレベルではシステムコントローラが複製係数を管理する。局所的およびグローバルな制御問題は、操作研究における古典的な問題、すなわち機械交換問題と在庫補充問題として定式化することができる。この定式化に基づいて、侵入耐性システムのための新しい制御アーキテクチャであるTOLERANCEを設計する。両レベルにおける最適制御戦略がしきい値構造を持ち、それらの計算に効率的なアルゴリズムを設計することを証明する。 10種類のネットワーク侵入を行うエミュレーション環境でのTOLERANCEの実装と評価を行う。その結果、TOLERANCEは、最先端の侵入耐性システムと比較して、サービスの可用性を向上し、運用コストを低減できることがわかった。 We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.	翻訳日:2024-05-14 23:44:37 公開日:2024-05-12
# 室温動作のための高感度、1550nm光検出器 Highly sensitive and efficient 1550 nm photodetector for room temperature operation ( http://arxiv.org/abs/2404.15218v2 ) ライセンス: Link先を確認	Rituraj, Zhi Gang Yu, R. M. E. B. Kandegedara, Shanhui Fan, Srini Krishnamurthy,	(参考訳) 効果的な量子通信のようなフォトニック量子技術は、1550nmの波長で高い外部量子効率(EQE)を持つ単一または少数の光子センサーを操作する室温(RT)を必要とする。このセグメントのデバイスの主要なクラスは、特にガイガーモードで動作する雪崩光検出器である。 RT操作と高いEQEの要求は相反することが多く、結果として妥協された解決策となる。我々は,共最適化誘電体フォトニック結晶基板上に2次元(2D)半導体材料を用いて,RTの3桁のダーク電流を同時に低減し,EQEを99%以上維持する装置を開発した。超低暗電流と高光検出効率を有する単一光子検出の基礎を形成する。 2D素材のハイキャリアモビリティを損なうため、ジッタ時間は~psで、大型の2Dアレイカメラに統合することができる。 Photonic quantum technologies such as effective quantum communication require room temperature (RT) operating single- or few- photon sensors with high external quantum efficiency (EQE) at 1550 nm wavelength. The leading class of devices in this segment is avalanche photodetectors operating particularly in the Geiger mode. Often the requirements for RT operation and for a high EQE are in conflict, resulting in a compromised solution. We have developed a device which employs a two-dimensional (2D) semiconductor material on a co-optimized dielectric photonic crystal substrate to simultaneously decrease the dark current by three orders of magnitude at RT and maintain an EQE of >99%. The device is amenable to avalanching and form a basis for single photon detection with ultra-low dark current and high photodetection efficiency. Harnessing the high carrier mobility of 2D materials, the device has ~ps jitter time and can be integrated into a large 2D array camera.	翻訳日:2024-05-14 23:44:37 公開日:2024-05-12
# 光の変調モーメントにおけるスピンハミルトニアン Spin Hamiltonians in the Modulated Momenta of Light ( http://arxiv.org/abs/2405.00484v2 ) ライセンス: Link先を確認	Juan Feng, Zengya Li, Luqi Yuan, Erez Hasman, Bo Wang, Xianfeng Chen,	(参考訳) 異なるスピンハミルトニアンの基底状態を見つけることができるフォトニックソルバは、多くの対話的な物理系や組合せ最適化問題の研究に利用できる。ここでは、空間光輸送によるスピンハミルトニアンの実空間対応を確立する。実空間スピン相互作用は光の運動量-空間の流れを変調することによって決定される。この原理は一般化されたプランシェレルの定理として定式化され、任意の変位依存スピン相互作用の基底状態を見つけるための単純な光学シミュレータを実装できる。特に、この原理を用いて、J1-J2-J3モデルからエキゾチックな磁気位相図を明らかにし、また、XYモデルから渦を介するベレジンスキー-コステリッツ-Thoulessのダイナミクスも観察する。これらの実験は光の運動量空間からスピン相互作用を微妙に制御することで高い計算精度を示し、新しい物理効果を探求する有望なスキームを提供する。 Photonic solvers that are able to find the ground states of different spin Hamiltonians can be used to study many interactive physical systems and combinatorial optimization problems. Here, we establish a real-and-momentum space correspondence of spin Hamiltonians by spatial light transport. The real-space spin interaction is determined by modulating the momentum-space flow of light. This principle is formulated as a generalized Plancherel theorem, allowing us to implement a simple optical simulator that can find the ground states for any displacement-dependent spin interactions. Particularly, we use this principle to reveal the exotic magnetic phase diagram from a J1-J2-J3 model, and we also observe the vortex-mediated Berezinskii-Kosterlitz-Thouless dynamics from the XY model. These experiments exhibit high calculation precision by subtly controlling spin interactions from the momentum space of light, offering a promising scheme to explore novel physical effects.	翻訳日:2024-05-14 23:44:37 公開日:2024-05-12
# インディネイティブラテンアメリカの言語におけるNLPの進歩 NLP Progress in Indigenous Latin American Languages ( http://arxiv.org/abs/2404.05365v2 ) ライセンス: Link先を確認	Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio,	(参考訳) この論文は、急速な技術進歩に直面した先住民コミュニティの限界化に焦点を当てている。我々は、これらの言語の文化的豊かさと、自然言語処理(NLP)の領域で見落とされがちなリスクを強調した。我々はこれらのコミュニティと研究者のギャップを埋めることを目指しており、先住民のコミュニティ観を尊重する包括的技術進歩の必要性を強調している。我々は、ラテンアメリカ先住民言語のNLPの進展と、ラテンアメリカ先住民言語の地位、NLPにおける表現、その保存と発展に必要な課題と革新について調査する。この論文は、ラテンアメリカの先住民コミュニティ、特に低資源・先住民コミュニティにおけるNLPの必要性と進歩を理解する上での現在の文献に貢献する。 The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.	翻訳日:2024-05-14 23:34:50 公開日:2024-05-12
# RAR-b:検索ベンチマークとしての推論 RAR-b: Reasoning as Retrieval Benchmark ( http://arxiv.org/abs/2404.06347v2 ) ライセンス: Link先を確認	Chenghao Xiao, G Thomas Hudson, Noura Al Moubayed,	(参考訳) セマンティックテキスト類似性(STS)と情報検索タスク(IR)タスクは,過去数年間の埋め込みモデルの進展を記録するための主要な方法である。新たなRAG(Retrieval-augmented Generation)パラダイムの下では、埋め込みモデルの次世代言語理解能力を評価し、それらに格納される推論能力について意識的に検討する必要がある。検索者は推論の問題を解けるだろうか? 推論タスクを検索タスクに変換することで、推論レベルの言語理解の訓練がなければ、現在の最先端の検索モデルは、特に推論集約タスクにおいてLLMを補助する役割を演じる能力にはまだ及ばないことが分かる。さらに、指示に気付くように訓練されているにもかかわらず、命令を意識したIRモデルは、推論タスクの推論時間に指示を使わずに、しばしば、研究コミュニティが協調するように見落としているレトリバー-LLMの行動ギャップを装う。しかし、最近のデコーダベースの埋め込みモデルは、そのギャップを狭め、推論レベルの言語理解を達成するための埋め込みモデルの経路を強調している。また,現行のオフ・ザ・シェルフ・リランカモデルではこれらのタスクではフェールするが,微調整による推論能力の注入はバイエンコーダよりも容易であることを示す。 Reasoning as Retrieval Benchmark (RAR-b) は、検索モデルに格納された推論能力を評価するためのタスクと設定の総合的なスイートである。 RAR-bはhttps://github.com/gowitheflow-1998/RAR-bで入手できる。 Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored in them. Addressing this, we pose the question: Can retrievers solve reasoning problems? By transforming reasoning tasks into retrieval tasks, we find that without specifically trained for reasoning-level language understanding, current state-of-the-art retriever models may still be far from being competent for playing the role of assisting LLMs, especially in reasoning-intensive tasks. Moreover, albeit trained to be aware of instructions, instruction-aware IR models are often better off without instructions in inference time for reasoning tasks, posing an overlooked retriever-LLM behavioral gap for the research community to align. However, recent decoder-based embedding models show great promise in narrowing the gap, highlighting the pathway for embedding models to achieve reasoning-level language understanding. We also show that, although current off-the-shelf re-ranker models fail on these tasks, injecting reasoning abilities into them through fine-tuning still appears easier than doing so to bi-encoders, and we are able to achieve state-of-the-art performance across all tasks by fine-tuning a reranking model. We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models. RAR-b is available at https://github.com/gowitheflow-1998/RAR-b.	翻訳日:2024-05-14 23:34:50 公開日:2024-05-12
# CNNオートエンコーダによる画像分類作業への影響評価 Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks ( http://arxiv.org/abs/2404.10664v2 ) ライセンス: Link先を確認	Mohsen Hami, Mahdi JameBozorg,	(参考訳) 現実世界から撮影された画像は、しばしば異なる種類のノイズに影響され、コンピュータビジョンシステムの性能と視覚データの品質に大きな影響を与える。本研究では, 鋳造品のノイズ画像における欠陥検出のための新しい手法を提案する。この手法は、VGG16、InceptionV3などの深層学習モデルを空間領域と周波数領域の両方で利用し、ノイズタイプと欠陥状態を特定する。研究プロセスは、前処理イメージから始まり、続いて特定のノイズカテゴリに合わせてデノナイジング技術を適用する。ノイズ検出とデノナイズを分類パイプラインに統合することにより、欠陥検出の精度と堅牢性を高めることが目的である。本研究は周波数領域のノイズタイプ分類にVGG16を用い,99%以上の精度を実現した。塩とペッパーノイズの除去は平均87.9であり、ガウスノイズ除去は平均64.0であり、周期ノイズ除去は平均81.6である。この包括的アプローチは、現実世界の産業アプリケーションにおいて、Deep AutoEncoderモデルとCentral Filterの有効性を示す。最後に, 欠陥検出における二分法分類精度は, 従来法に比べて大幅に向上した。 VGG16分類器の精度は94.6%から97.0%に向上し、提案手法の有効性を示した。同様に、InceptionV3分類器では、精度が84.7%から90.0%に向上し、さらにノイズ分析を分類パイプラインに統合する利点が検証された。 Images captured from the real world are often affected by different types of noise, which can significantly impact the performance of Computer Vision systems and the quality of visual data. This study presents a novel approach for defect detection in casting product noisy images, specifically focusing on submersible pump impellers. The methodology involves utilizing deep learning models such as VGG16, InceptionV3, and other models in both the spatial and frequency domains to identify noise types and defect status. The research process begins with preprocessing images, followed by applying denoising techniques tailored to specific noise categories. The goal is to enhance the accuracy and robustness of defect detection by integrating noise detection and denoising into the classification pipeline. The study achieved remarkable results using VGG16 for noise type classification in the frequency domain, achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of 64.0, and periodic noise removal yielded an average SSIM of 81.6. This comprehensive approach showcases the effectiveness of the deep AutoEncoder model and median filter, for denoising strategies in real-world industrial applications. Finally, our study reports significant improvements in binary classification accuracy for defect detection compared to previous methods. For the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the effectiveness of the proposed noise detection and denoising approach. Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to 90.0%, further validating the benefits of integrating noise analysis into the classification pipeline.	翻訳日:2024-05-14 23:34:50 公開日:2024-05-12
# InfoMatch:半スーパービジョン画像分類のためのエントロピーニューラル推定 InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification ( http://arxiv.org/abs/2404.11003v3 ) ライセンス: Link先を確認	Qi Han, Zhibo Tian, Chengwei Xia, Kun Zhan,	(参考訳) 擬似的監督と整合性正規化を利用した半教師画像分類は顕著な成功を収めた。しかし、現在進行中の課題は、ラベルなしデータの可能性を完全に活用することにある。これを解決するために,情報エントロピーニューラル推定を用いて,ラベルのないサンプルのポテンシャルを利用する。コントラスト学習にインスパイアされたエントロピーは、異なる拡張ビュー間での相互情報の低境界を最大化することによって推定される。さらに,画像分類器の後部の情報エントロピーが,ソフトマックス予測の確率関数を最大化することにより近似されることを理論的に分析する。これらの知見に導かれ、予測確率分布が基底構造分布と密接に一致することを保証するため、両視点からモデルを最適化する。情報エントロピーとの理論的関連性を考えると、我々はこの手法をInfoMatchと命名する。広範な実験を通じて,その優れた性能を示す。ソースコードはhttps://github.com/kunzhan/InfoMatch.comで入手できる。 Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information across different augmented views. Moreover, we theoretically analyze that the information entropy of the posterior of an image classifier is approximated by maximizing the likelihood function of the softmax predictions. Guided by these insights, we optimize our model from both perspectives to ensure that the predicted probability distribution closely aligns with the ground-truth distribution. Given the theoretical connection to information entropy, we name our method InfoMatch. Through extensive experiments, we show its superior performance. The source code is available at https://github.com/kunzhan/InfoMatch.	翻訳日:2024-05-14 23:10:20 公開日:2024-05-12
# OPTiML: 自己監督型医用画像表現のための最適輸送を用いた高密度セマンティック不変性 OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation ( http://arxiv.org/abs/2404.11868v3 ) ライセンス: Link先を確認	Azad Singh, Vandan Gorade, Deepak Mishra,	(参考訳) 自己教師付き学習(SSL)は、アノテーションなしで学習できることから、医用画像解析の有望な技術として登場した。しかし、有望な可能性にもかかわらず、従来のSSLメソッドでは、セマンティックアライメントの達成や微妙な詳細の取得など、制限に直面している。これは、解剖学的構造や病理的詳細を正確に把握できない、最適下界表現につながる。これらの制約に対応するため,医用画像表現学習におけるSSLの全体的な効果を高めるために,最適なトランスポート(OT)を用いた新しいSSLフレームワークOPTiMLを導入する。中心となる考え方は、OTとクロスビューポイントセマンティクス・インフュージョン・モジュール(CV-SIM)を統合することである。 CV-SIMモジュールに加えて、OPTiMLはOTフレームワーク内での分散と共分散の規則化を強制し、臨床的に関係のある情報に焦点を絞ると同時に、より少ない情報的特徴を破棄する。提案するフレームワークは,様々な医用画像タスクに適用可能な意味豊かな表現を学習する能力を示す。その有効性を検証するために,胸部X線モダリティから利用可能な3つのデータセットについて実験を行った。実験の結果,OPTiMLはすべての評価課題において,最先端の手法よりも優れていることがわかった。 Self-supervised learning (SSL) has emerged as a promising technique for medical image analysis due to its ability to learn without annotations. However, despite the promising potential, conventional SSL methods encounter limitations, including challenges in achieving semantic alignment and capturing subtle details. This leads to suboptimal representations, which fail to accurately capture the underlying anatomical structures and pathological details. In response to these constraints, we introduce a novel SSL framework OPTiML, employing optimal transport (OT), to capture the dense semantic invariance and fine-grained details, thereby enhancing the overall effectiveness of SSL in medical image representation learning. The core idea is to integrate OT with a cross-viewpoint semantics infusion module (CV-SIM), which effectively captures complex, fine-grained details inherent in medical images across different viewpoints. In addition to the CV-SIM module, OPTiML imposes the variance and covariance regularizations within OT framework to force the model focus on clinically relevant information while discarding less informative features. Through these, the proposed framework demonstrates its capacity to learn semantically rich representations that can be applied to various medical imaging tasks. To validate its effectiveness, we conduct experimental studies on three publicly available datasets from chest X-ray modality. Our empirical results reveal OPTiML's superiority over state-of-the-art methods across all evaluated tasks.	翻訳日:2024-05-14 23:10:20 公開日:2024-05-12
# 表構造と文字認識のためのマルチセルデコーダと相互学習 Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition ( http://arxiv.org/abs/2404.13268v2 ) ライセンス: Link先を確認	Takaya Kawakatsu,	(参考訳) 学術論文や財務報告などの文書から表の内容を取り出し,それを大規模言語モデルで処理可能な形式に変換することは,知識情報処理において重要な課題である。テーブル構造だけでなくセル内容も認識するエンドツーエンドアプローチは、外部文字認識システムを用いた最先端モデルに匹敵する性能を達成し、さらなる改善の可能性を秘めている。さらに、これらのモデルでは、数百セルの長いテーブルを局所的な注意を払って認識できるようになった。しかし、モデルでは、ヘッダーからフッタへの1方向のテーブル構造を認識し、各セルごとにセル内容の認識を行うため、近隣セルから有用な情報を検索する機会はない。本稿では,エンド・ツー・エンドアプローチを改善するために,マルチセルコンテンツデコーダと双方向相互学習機構を提案する。この効果は2つの大きなデータセットで実証され、実験結果は、多数のセルを持つ長いテーブルであっても、最先端のモデルに匹敵する性能を示す。 Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.	翻訳日:2024-05-14 23:10:20 公開日:2024-05-12
# くさびの縁としてのW:拘束型衝突体による鐘の相関 W as the Edge of a Wedge: Bell Correlations via Constrained Colliders ( http://arxiv.org/abs/2404.13928v2 ) ライセンス: Link先を確認	Huw Price,	(参考訳) Ken Wharton との以前の研究において、ベル相関は特別な選択アーチファクトであり、組み合わせによって説明されている。 (i)コライダーバイアスと (ii)コライダー変数上の境界制約。これは光円錐の外側に直接的な因果的影響を必要としないため、ベル非局所性と相対性理論を和解する新しい方法を提供する可能性がある。この記事は提案に対する新たな議論の概要である。これは、遅延チョイスエンタングルメントスワップを含む特別な(W字型)ベルの実験に対してどのように有効かを説明し、一般的な(V字型)ケースに拡張できると主張している。 In previous work with Ken Wharton, it was proposed that Bell correlations are a special sort of selection artefact, explained by a combination of (i) collider bias and (ii) a boundary constraint on the collider variable. This requires no direct causal influence outside lightcones, and may hence offer a new way to reconcile Bell nonlocality and relativity. This piece outlines a new argument for the proposal. It explains how it is valid for a special class of ('W-shaped') Bell experiments involving delayed-choice entanglement swapping, and argues that it can be extended to the general ('V-shaped') case.	翻訳日:2024-05-14 23:10:20 公開日:2024-05-12
# EvaNet:地球画像上の標高誘導洪水のマッピング EvaNet: Elevation-Guided Flood Extent Mapping on Earth Imagery ( http://arxiv.org/abs/2404.17917v2 ) ライセンス: Link先を確認	Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou,	(参考訳) 高解像度衛星画像からの洪水範囲の正確なタイムリーマッピングは、被害評価や救援活動などの災害管理において重要な役割を担っている。しかし、現在の最先端のソリューションはU-Netに基づいており、これは、スペクトルの特徴のみを直接判断することができない不明瞭なピクセル(例えば、ツリーキャノピー、雲)のために、フラッドピクセルを正確にセグメント化できない。米国地質調査所 (USGS) などのソースから取得可能なデジタル標高モデル (DEM) により, 洪水範囲マッピングの改善を目的とした標高マップの活用が検討されている。エンコーダ・デコーダアーキテクチャに基づく標高誘導セグメンテーションモデルであるEvaNetを提案する。(1) 重力の物理則を符号化した損失関数であり,(1) 位置が浸水(乾式)した場合,その位置が低い(乾式)位置も浸水(乾式)する必要がある。大規模な実験により、EvaNetはU-Netベースラインを著しく上回り、洪水範囲マッピングの既存のソリューションにおけるU-Netの完全な代替として機能することが示された。 Accurate and timely mapping of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent mapping. We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp. dry), then its adjacent locations with a lower (resp. higher) elevation must also be flooded (resp. dry); (2) a new (de)convolution operation that integrates the elevation map by a location sensitive gating mechanism to regulate how much spectral features flow through adjacent layers. Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent mapping.	翻訳日:2024-05-14 21:13:39 公開日:2024-05-12
# M3oE: マルチドメインマルチタスク混合専門家推薦フレームワーク M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework ( http://arxiv.org/abs/2404.18465v3 ) ライセンス: Link先を確認	Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai,	(参考訳) マルチドメインレコメンデーションとマルチタスクレコメンデーションは、異なるドメインと目的の共通情報を総合的なユーザモデリングに活用する効果を実証している。それでも、実際的な推奨は通常、複数のドメインとタスクを同時に直面する。この目的のために,適応型マルチドメインマルチタスク・マルチタスク・オブ・エキスパート・リコメンデーションフレームワークであるM3oEを紹介する。 M3oEはマルチドメイン情報を統合し、ドメインとタスク間で知識をマッピングし、複数の目的を最適化する。共通、ドメイン・アスペクト、タスク・アスペクトの3つのミックス・オブ・エキスパート・モジュールを利用して、複数のドメインとタスク間の複雑な依存関係を、互いに絡み合った方法で処理する。さらに,多様な領域やタスクをまたいだ特徴抽出と融合を正確に制御するための2段階融合機構を設計する。動的構造最適化を可能にするAutoML技術を適用することにより、フレームワークの適応性はさらに向上する。著者たちの知る限りでは、M3oEはマルチドメインのマルチタスクレコメンデーションを自己適応的に解決する最初の試みです。多様なベースラインに対する2つのベンチマークデータセットの大規模な実験は、M3oEの優れたパフォーマンスを示している。実装コードは再現性を保証するために利用可能である。 Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive Multi-domain Multi-task Mixture-of-Experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.	翻訳日:2024-05-14 21:13:39 公開日:2024-05-12
# 3次元ガウススプレイティングによるブーストラップ3次元再構成シーン Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting ( http://arxiv.org/abs/2404.18669v2 ) ライセンス: Link先を確認	Yifei Gao, Jie Ou, Lei Wang, Jun Cheng,	(参考訳) ニューラルレンダリング技術の最近の進歩は、学術分野と商業分野の両方にわたって、フォトリアリスティックな3Dシーンのレンダリングを大幅に強化している。最新の手法は3D Gaussian Splatting(3D-GS)と呼ばれ、レンダリングの品質とスピードのベンチマークを新たに設定した。それでも、3D-GSの限界は新しい視点の合成において顕著となり、特にトレーニング中に見られるものとは大きく異なる視点についてである。また、ズームインやアウト時にダイレーションやエイリアスなどの問題が発生する。これらの課題はすべて、1つの根本的な問題、すなわち不十分なサンプリングに遡ることができる。本稿では,この問題に対処するブートストラップ法を提案する。このアプローチでは,3D-GSを用いた新しいビューのレンダリングを強化するために拡散モデルを用いて,トレーニングプロセスの合理化を行う。以上の結果から,ブートストレッピングはアーティファクトを効果的に削減し,評価指標の明確化を図っている。さらに,本手法は汎用性が高く,容易に統合可能であることを示し,様々な3次元再構成プロジェクトが本手法の恩恵を受けることができることを示した。 Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrapping method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.	翻訳日:2024-05-14 21:13:38 公開日:2024-05-12
# 組織学における弱スーパービジョン対象定位モデルのソースフリー領域適応 Source-Free Domain Adaptation of Weakly-Supervised Object Localization Models for Histology ( http://arxiv.org/abs/2404.19113v2 ) ライセンス: Link先を確認	Alexis Guichemerre, Soufiane Belharbi, Tsiry Mayet, Shakeeb Murtaza, Pourya Shamsolmoali, Luke McCaffrey, Eric Granger,	(参考訳) 深層学習の出現に伴い, 組織像に基づく癌診断において, デジタル病理学が注目されている。ディープ弱教師付きオブジェクトローカライゼーション(WSOL)モデルは、安価なグローバルな画像クラスアノテーションを使用して、がんのグレードに応じて組織像を分類し、解釈のための関心領域(ROI)を特定するために訓練することができる。当初、ラベル付きソース画像データに基づいてトレーニングされたWSOLモデルは、染色、スキャナー、癌タイプの変化によって生じる大きなドメインシフトの場合に、ラベルなしのターゲットデータを使用して適応することができる。本稿では、プライバシと効率の理由から、ソースドメインデータを一切使用せずに、事前学習したソースモデルを新しいターゲットドメインに適合させるという難題である、ソースフリー(教師なし)ドメイン適応(SFDA)に焦点を当てる。 WSOLモデルのSFDAは、分類タスクとローカライゼーションタスクの両方に適応することを意図していないため、組織学におけるいくつかの課題を提起している。本報告では, 主要SFDAファミリーの代表者である4つの最先端SFDA法について, 分類と位置推定の精度でWSOLと比較した。 SFDA-Distribution Estimation, Source HypOthesis Transfer, Cross-Domain Contrastive Learning, Adaptively Domain Statistics Alignmentである。 Glas (小, 乳癌) とCamelyon16 (大, 大腸癌) の組織学的データセットの実験結果から, これらのSFDA法は, 分類に最適化された場合, 適応後の局所化にはあまり役に立たないことが示唆された。 Given the emergence of deep learning, digital pathology has gained popularity for cancer diagnosis based on histology images. Deep weakly supervised object localization (WSOL) models can be trained to classify histology images according to cancer grade and identify regions of interest (ROIs) for interpretation, using inexpensive global image-class annotations. A WSOL model initially trained on some labeled source image data can be adapted using unlabeled target data in cases of significant domain shifts caused by variations in staining, scanners, and cancer type. In this paper, we focus on source-free (unsupervised) domain adaptation (SFDA), a challenging problem where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons. SFDA of WSOL models raises several challenges in histology, most notably because they are not intended to adapt for both classification and localization tasks. In this paper, 4 state-of-the-art SFDA methods, each one representative of a main SFDA family, are compared for WSOL in terms of classification and localization accuracy. They are the SFDA-Distribution Estimation, Source HypOthesis Transfer, Cross-Domain Contrastive Learning, and Adaptively Domain Statistics Alignment. Experimental results on the challenging Glas (smaller, breast cancer) and Camelyon16 (larger, colon cancer) histology datasets indicate that these SFDA methods typically perform poorly for localization after adaptation when optimized for classification.	翻訳日:2024-05-14 21:13:38 公開日:2024-05-12
# 不変リスク最小化は全変動モデルである Invariant Risk Minimization Is A Total Variation Model ( http://arxiv.org/abs/2405.01389v2 ) ライセンス: Link先を確認	Zhao-Rong Lai, Weiwen Wang,	(参考訳) 不変リスク最小化(英: Invariant risk minimization、IRM)とは、機械学習において、不変の機能を様々な環境に一般化する手法である。関連するほとんどの研究は、新しいIRM設定や新しいアプリケーションシナリオに焦点を当てているが、IRMの数学的本質は、まだ適切に説明されていない。 IRM は本質的に分類器変数に関する学習リスクの $L^2$ norm (TV-$\ell_2$) に基づく総変量であることを示す。さらに,TV-$\ell_1$モデルに基づく新しいIRMフレームワークを提案する。学習リスクとして使用できる関数のクラスを拡大するだけでなく、コアレア式に基づいたデノナイズおよび不変の特徴保存における堅牢な性能も備えている。 IRM-TV-$\ell_1$のアウト・オブ・ディストリビューションの一般化の要求についても述べる。実験結果から,提案フレームワークは,いくつかのベンチマーク機械学習シナリオにおいて,競合性能を実現することが示された。 Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.	翻訳日:2024-05-14 21:13:38 公開日:2024-05-12
# プログラム自動修復のための大規模言語モデルに関する体系的文献レビュー A Systematic Literature Review on Large Language Models for Automated Program Repair ( http://arxiv.org/abs/2405.01466v2 ) ライセンス: Link先を確認	Quanjun Zhang, Chunrong Fang, Yang Xie, YuXiang Ma, Weisong Sun, Yun Yang, Zhenyu Chen,	(参考訳) 自動プログラム修復(APR)は、ソフトウェアのバグにパッチを当て、手作業によるデバッグ作業を減らす。最近、LLM(Large Language Models)の進歩に伴い、ソフトウェア開発とメンテナンスを容易にし、優れたパフォーマンスを示すAPR技術が提案されている。しかし、LLMベースのAPR分野の探索が進行中であるため、研究者が現在の成果、課題、潜在的な機会を理解することは困難である。この研究は、2020年から2024年までのAPRにおけるLLMの応用を要約する最初の体系的な文献レビューを提供する。 LLM,APRおよびそれらの統合の観点から,127件の関連論文を分析した。まず、APRをサポートするために適用されている既存のLLMを分類し、3種類の利用戦略を概説する。さらに、LLM、例えばセマンティックバグやセキュリティ脆弱性の恩恵を受ける、いくつかの特定の修復シナリオについて詳述する。さらに、ALMをAPR研究、例えば入力形式、オープンサイエンスに統合する際のいくつかの重要な側面について論じる。最後に,今後検討すべき課題と今後の研究ガイドラインについて紹介する。本稿は,APRコミュニティにおける研究状況の体系的概要を提供し,研究成果の包括的理解と今後の研究の促進を支援する。 Automated Program Repair (APR) attempts to patch software bugs and reduce manual debugging efforts. Very recently, with the advances in Large Language Models (LLMs), an increasing number of APR techniques have been proposed, facilitating software development and maintenance and demonstrating remarkable performance. However, due to ongoing explorations in the LLM-based APR field, it is challenging for researchers to understand the current achievements, challenges, and potential opportunities. This work provides the first systematic literature review to summarize the applications of LLMs in APR between 2020 and 2024. We analyze 127 relevant papers from LLMs, APR and their integration perspectives. First, we categorize existing popular LLMs that are applied to support APR and outline three types of utilization strategies for their deployment. Besides, we detail some specific repair scenarios that benefit from LLMs, e.g., semantic bugs and security vulnerabilities. Furthermore, we discuss several critical aspects of integrating LLMs into APR research, e.g., input forms and open science. Finally, we highlight a set of challenges remaining to be investigated and the potential guidelines for future research. Overall, our paper provides a systematic overview of the research landscape to the APR community, helping researchers gain a comprehensive understanding of achievements and promote future research.	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# SSUMamba:ハイパースペクトル画像復調のための空間スペクトル選択状態空間モデル SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising ( http://arxiv.org/abs/2405.01726v3 ) ライセンス: Link先を確認	Guanyiman Fu, Fengchao Xiong, Jianfeng Lu, Jun Zhou, Yuntao Qian,	(参考訳) ハイパースペクトル画像(HSI)のデノイングは、画像内メカニズムや環境要因から生じるノイズにより、重要な前処理手順である。スペクトル相関,空間自己相似性,空間スペクトル相関といったHSIのドメイン固有知識を活用することは,深層学習に基づく認知に不可欠である。既存の手法はしばしば、時間、空間の複雑さ、計算の複雑さによって制約され、これらの先行を別々に探索する戦略を採用する。これらの戦略は、いくつかの冗長な情報を避けることができるが、画像復元に肯定的な影響を与える、より広く、より根底にある長距離空間スペクトル情報を見落としてしまう。本稿では,空間スペクトル選択状態モデルに基づくU字型ネットワークであるSpatial-Spectral U-Mamba(SSUMamba)を提案する。状態空間モデル(SSM)計算における線形空間複雑性のおかげで,モジュール内の全地球空間スペクトル相関が得られる。本研究では3次元HSIにおける複数方向の情報フローのモデル化を支援する空間スペクトル交互走査(SSAS)戦略を提案する。実験の結果,本手法は比較手法よりも優れていた。ソースコードはhttps://github.com/lronkitty/SSUMamba.comから入手できる。 Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more underlying long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Scan (SSAS) strategy for HSIs, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms compared methods. The source code will be available at https://github.com/lronkitty/SSUMamba.	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# LLMアプリケーションにおけるタスクユーティリティの評価と検証 Assessing and Verifying Task Utility in LLM-Powered Applications ( http://arxiv.org/abs/2405.02178v2 ) ライセンス: Link先を確認	Negar Arabzadeh, Siqing Huo, Nikhil Mehta, Qinqyun Wu, Chi Wang, Ahmed Awadallah, Charles L. A. Clarke, Julia Kiseleva,	(参考訳) LLM(Large Language Models)の急速な開発は、複数のエージェント間のコラボレーションを促進し、人間の日常的な作業を支援するアプリケーションの増加につながっている。しかし、LDMを利用したアプリケーションが実際のユーザエクスペリエンスとタスク実行効率をどの程度向上させるかを評価する上で、大きなギャップが残っている。このことは、特にアプリケーションの機能とエンドユーザのニーズの整合性を確保することによって、LLMベースのアプリケーションのユーティリティを検証する必要性を強調している。 AgentEvalは,アプリケーション固有の目的に合わせた一連の基準を自動提案することで,ユーティリティ検証プロセスを簡素化する新しいフレームワークである。これにより、提案された基準に対してアプリケーションの実用性を定量化する、包括的な評価が可能になる。本稿では,AgentEval の有効性とロバスト性について,Math Problemsolving や ALFWorld House-hold 関連タスクを含む2つのオープンソースデータセットに対して包括的な解析を行った。再現性のために、データ、コード、すべてのログをhttps://bit.ly/3w3yKcSで公開しています。 The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the effectiveness and robustness of AgentEval for two open source datasets including Math Problem solving and ALFWorld House-hold related tasks. For reproducibility purposes, we make the data, code and all the logs publicly available at https://bit.ly/3w3yKcS .	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# 位置:Quo Vadis, Unsupervised Time Series Anomaly Detection? Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? ( http://arxiv.org/abs/2405.02678v2 ) ライセンス: Link先を確認	M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis,	(参考訳) Timeseries Anomaly Detection (TAD)における機械学習奨学金の現在の状況は、欠陥のある評価指標の使用、一貫性のないベンチマークプラクティス、新しいディープラーニングベースのモデル設計における選択に対する適切な正当化の欠如に悩まされている。本稿は,TADにおける現状を批判的に分析し,現在の研究の誤解を招き,問題となる方法や評価の実践を明らかにする。我々の立場は、単に新しいモデル設計を追求することから、ベンチマークプラクティスの改善、非自明なデータセットの作成、より単純なベースラインに対して複雑なメソッドの有用性を批判的に評価することへと焦点を移すことを提唱している。その結果,厳密な評価プロトコルの必要性,単純なベースラインの作成,および最先端の深部異常検出モデルが線形写像を効果的に学習できることが示唆された。これらの結果から, 簡便かつ解釈可能なTAD法のさらなる探索と開発の必要性が示唆された。最先端のディープラーニングベースのモデルにおけるモデルの複雑さの増加は、残念ながら、ほとんど改善しない。この分野を前進させるための洞察と提案を提供する。コード:https://github.com/ssarfraz/QuoVadisTAD The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# Negative Prompt: 負の感情刺激による大規模言語モデル強化のための心理学の活用 NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli ( http://arxiv.org/abs/2405.02814v2 ) ライセンス: Link先を確認	Xu Wang, Cheng Li, Yi Chang, Jindong Wang, Yuan Wu,	(参考訳) 大規模言語モデル(LLM)は、従来の計算タスクから高度な人工知能(AI)アプリケーションまで、幅広いアプリケーションに不可欠なものとなっている。この普及により、社会科学を含む様々な分野のLSMの研究が盛んになった。特に、LLMはポジティブな感情刺激によってさらに発展できる感情知能を持っていることが研究によって明らかにされている。この発見は興味深い疑問を提起する: 否定的な感情はLLMにも影響し、パフォーマンスを向上する可能性があるか? この問いに応えて,心理学的原則を基盤とした新たなアプローチである否定的刺激(Negative Prompt)を紹介する。我々は,Flan-T5-Large,Vicuna,Llama 2,ChatGPT,GPT-4の5つのLLMを,45のタスクで厳密に評価した。 NegativePromptは、命令誘導タスクの12.89%とBIG-Benchタスクの46.25%の相対的な改善により、LLMの性能を著しく向上させる。さらに,NegativePromptの影響のメカニズムを解明するための注意可視化実験を行った。本研究は,LLMの理解と感情相互作用に大きく貢献し,感情駆動型手法としてのNegativePromptの有効性を実証し,現実の応用におけるLLMの強化に向けた新たな洞察を提供する。コードはhttps://github.com/wangxu0820/NegativePrompt.comで公開されている。 Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt.	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# 実効性エクストリーム再スケーリングのための境界対応非結合流網 Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling ( http://arxiv.org/abs/2405.02941v2 ) ライセンス: Link先を確認	Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, Rizen Guo,	(参考訳) Invertible rescaling Network (IRN) やgenerative adversarial Network (GAN) など,最近開発された生成手法は,画像再スケーリングにおいて例外的な性能を示した。しかし、IRNベースの手法はオーバースムースな結果を生成する傾向があり、一方、GANベースの手法はフェイクの詳細を簡単に生成し、実際のアプリケーションを妨げる。この問題に対処するため,現実的で視覚的に満足な結果を生成するために,境界対応デカップリングフローネットワーク(BDFlow)を提案する。標準ガウス分布として高周波情報をモデル化する従来の手法とは異なり、我々のBDFlowはまず、その高周波情報を境界分布に従属する \textit{semantic high- frequency} とガウス分布に従属する \textit{non-semantic high- frequency} に分解する。具体的には、意味的な高周波部分を正確に捉えるために、境界認識マスク(BAM)を用いて、モデルを制約してリッチテクスチャを生成する一方、非意味的な高周波部分はガウス分布からランダムにサンプリングされる。特に、我々のBDFlowは、パラメータの74%と計算の20%しか利用せず、PSNRを4.4dB、SSIMを0.1に改善しています。コードはhttps://github.com/THU-Kingmin/BAFlow.comから入手できる。 Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into \textit{semantic high-frequency} that adheres to a Boundary distribution and \textit{non-semantic high-frequency} counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution.Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by 4.4 dB and the SSIM by 0.1 on average over GRAIN, utilizing only 74% of the parameters and 20% of the computation. The code will be available at https://github.com/THU-Kingmin/BAFlow.	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# 可逆的残留再スケーリングモデル Invertible Residual Rescaling Models ( http://arxiv.org/abs/2405.02945v2 ) ライセンス: Link先を確認	Jinmin Li, Tao Dai, Yaohua Zha, Yilu Luo, Longfei Lu, Bin Chen, Zhi Wang, Shu-Tao Xia, Jingyun Zhang,	(参考訳) Invertible Rescaling Networks (IRNs)とその変種は、画像再スケーリングのような様々な画像処理タスクにおいて顕著な成果をみせた。しかし、より深いネットワークを持つIRNは訓練が難しいため、IRNの表現能力が損なわれる。この問題に対処するために,高解像度画像と高解像度画像とのビジェクションを特定の分布で学習することにより,画像再スケーリングのための可逆残留再スケーリングモデル(IRRM)を提案する。具体的には、長いスキップ接続を持つResidual Downscaling Modules (RDM) を含むディープネットワークを構築するためのIRRMを提案する。それぞれのRDMは、短い接続を持ついくつかのInvertible Residual Blocks (IRB) で構成されている。このようにして、RDMは接続をスキップすることでリッチな低周波情報をバイパスし、画像から高周波情報を抽出することに集中させる。大規模な実験により、IRRMは、パラメータや複雑さがはるかに少ない他の最先端の手法よりも、はるかに優れた性能を示します。特に, IRRMは, HCFlowとIRNのX4再スケーリングにおいてそれぞれ少なくとも0.3dBのPSNRゲインを有し, 60%のパラメータと50%のFLOPしか使用していない。コードはhttps://github.com/THU-Kingmin/IRRM.comから入手できる。 Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0.3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https://github.com/THU-Kingmin/IRRM.	翻訳日:2024-05-14 21:03:09 公開日:2024-05-12
# Outlier Gradient Analysis: ヘシアンフリーインフルエンス関数によるディープラーニングモデルの性能向上 Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions ( http://arxiv.org/abs/2405.03869v2 ) ライセンス: Link先を確認	Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, Hongfu Liu,	(参考訳) 影響関数は、各トレーニングデータサンプルがモデル予測に与える影響を評価するための堅牢なフレームワークを提供する。様々なタスクで広く使われているにもかかわらず、モデルに対する強い凸性仮定と、ヘッセン行列の逆数を計算することに関連する計算コストは、特に大きな深層モデルを分析する際に制約となる。本稿では、古典的なデータ中心のシナリオ、トリミング・デトリメンタル・サンプルに焦点を当て、統一されたフレームワークにおける両方の課題に対処する。具体的には、影響関数と外乱勾配検出による有害トレーニングサンプルの同定の同値変換を確立する。この変換は単純でヘッセン自由な定式化を提示するだけでなく、試料衝突における勾配の役割について深い洞察を与える。さらに、影響関数の凸性仮定を緩和し、その適用性を非凸深度モデルに拡張する。系統的な実験的な評価を通じて,提案した合成データセットのアウトリー勾配解析の正しさを検証し,その効果を視覚モデルにおける誤ラベルサンプルの検出,自然言語処理におけるトランスフォーマーモデルの性能向上のためのデータサンプルの選択,微調整された大規模言語モデルにおける影響力のあるサンプルの同定などに適用した。 Influence functions offer a robust framework for assessing the impact of each training data sample on model predictions, serving as a prominent tool in data-centric learning. Despite their widespread use in various tasks, the strong convexity assumption on the model and the computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large deep models. This paper focuses on a classical data-centric scenario--trimming detrimental samples--and addresses both challenges within a unified framework. Specifically, we establish an equivalence transformation between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides profound insights into the role of the gradient in sample impact. Moreover, it relaxes the convexity assumption of influence functions, extending their applicability to non-convex deep models. Through systematic empirical evaluations, we first validate the correctness of our proposed outlier gradient analysis on synthetic datasets and then demonstrate its effectiveness in detecting mislabeled samples in vision models, selecting data samples for improving performance of transformer models for natural language processing, and identifying influential samples for fine-tuned Large Language Models.	翻訳日:2024-05-14 20:52:15 公開日:2024-05-12
# DALK: LLMとKGの動的併用によるアルツハイマー病問題への科学的回答 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature ( http://arxiv.org/abs/2405.04819v2 ) ライセンス: Link先を確認	Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、様々なアプリケーションで有望なパフォーマンスを実現している。それでも、長い知識を統合するという継続的な課題は、専門分野におけるLLMのシームレスな採用を妨げるものとなっている。本研究は, LLMs と KG の動的共増強(Dynamic Co-Augmentation of LLMs and KG)である DALK を導入し, この限界に対処し, バイオメディシンの専門的サブフィールドであるアルツハイマー病(AD)の研究におけるその能力を実証する。 LLMとKGの相乗化フレームワークを相互に強化し、まずLLMを利用して、AD関連科学文献から得られたAD固有知識グラフ(KG)を構築する。 ADQA(ADQA)ベンチマークを用いて,DALKの有効性を実証した。さらに我々は,KG と LLM を相互に強化する新たなトピックについて,貴重な洞察とガイドラインを提供するための詳細な分析を行う。コードとデータはhttps://github.com/David-Li0406/DALK.comで公開します。 Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.	翻訳日:2024-05-14 20:52:15 公開日:2024-05-12
# エントロピー・エニグマ:エントロピー最小化の成功と失敗 The Entropy Enigma: Success and Failure of Entropy Minimization ( http://arxiv.org/abs/2405.05012v2 ) ライセンス: Link先を確認	Ori Press, Ravid Shwartz-Ziv, Yann LeCun, Matthias Bethge,	(参考訳) エントロピー最小化(EM)は、テスト時に新しいデータに直面した場合に、分類モデルの精度を高めるために頻繁に使用される。 EMは、分類器を最適化し、上位予測クラスにさらに高い確率を割り当てる自己教師型学習手法である。本稿では,EMがいくつかのステップでモデルに適応する際の動作の理由と,多くのステップで適応した後に最終的に失敗する理由を解析する。 EMはまず,実験画像をトレーニング画像の近くに埋め込むことで,モデルの精度を向上することを示した。多くの最適化のステップの後、EMはモデルをトレーニング画像の埋め込みから遠ざけるようにし、その結果精度が低下する。そこで本研究では,任意のデータセット上で,ラベルにアクセスせずにモデルの精度を推定する手法を提案する。提案手法は,エントロピーの最小化のためにモデルが最適化されるにつれて,入力画像の埋め込みがどう変化するかを調べることで,精度を推定する。 23の挑戦的なデータセットの実験では、我々の方法では、平均絶対誤差が5.75 %$で、前回のSoTAよりも29.62 %$で改善されていることが示されている。私たちのコードはhttps://github.com/oripress/EntropyEnigmaで利用可能です。 Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to their top predicted classes. In this paper, we analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for many steps. We show that, at first, EM causes the model to embed test images close to training images, thereby increasing model accuracy. After many steps of optimization, EM makes the model embed test images far away from the embeddings of training images, which results in a degradation of accuracy. Building upon our insights, we present a method for solving a practical problem: estimating a model's accuracy on a given arbitrary dataset without having access to its labels. Our method estimates accuracy by looking at how the embeddings of input images change as the model is optimized to minimize entropy. Experiments on 23 challenging datasets show that our method sets the SoTA with a mean absolute error of $5.75\%$, an improvement of $29.62\%$ over the previous SoTA on this task. Our code is available at https://github.com/oripress/EntropyEnigma	翻訳日:2024-05-14 20:41:54 公開日:2024-05-12
# CoViews: コントラスト学習強化のための協調視点を用いた適応的拡張 CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning ( http://arxiv.org/abs/2405.07116v1 ) ライセンス: Link先を確認	Nazim Bendib,	(参考訳) データ拡張は、効果的なコントラスト学習に必要な高品質な正と負のペアを生成する上で重要な役割を果たす。しかしながら、一般的なプラクティスでは、複数のビューを生成するために、単一の拡張ポリシを繰り返し使用することで、ビュー間の協力の欠如による非効率なトレーニングペアにつながる可能性がある。さらに、拡張の最適セットを見つけるために、既存の多くの手法は、トレーニングを通して異なる拡張を必要とするかもしれないモデルの進化的な性質を見越して、広範囲に教師付き評価を必要とする。他のアプローチでは微分可能拡張生成器を訓練し、したがって文献からの微分不可能変換関数の使用を制限する。本稿では、計算オーバーヘッドを最小限に抑えたコントラスト学習のための効率的な適応データ拡張ポリシーを学習するためのフレームワークを提案し、これらの課題に対処する。当社のアプローチでは,トレーニング中に新たなデータ拡張ポリシを継続的に生成し,監督なしに効果的なポジティブ/ネガティブなデータを生成する。このフレームワークでは、すべてのビューで使用される拡張ポリシーを生成する \ac{IndepViews} と、各ビューに依存する拡張ポリシーを生成する \ac{CoViews} の2つの方法を提案する。これにより、各ビューに適用された変換間の依存関係を学習し、異なるビューに適用された拡張戦略が相互に補完し合い、より有意義で差別的な表現につながることを保証する。複数のデータセットやコントラスト学習フレームワークの広範な実験を通じて、我々の手法はベースラインソリューションを一貫して上回り、ビューに依存した拡張ポリシーによるトレーニングは、ビュー間で共有される独立したポリシーによるトレーニングよりも優れており、コントラスト学習性能の強化におけるその効果を示す。 Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# コンテキストニューラルネットワーク:時系列予測のためのスケーラブルな多変量モデル Context Neural Networks: A Scalable Multivariate Model for Time Series Forecasting ( http://arxiv.org/abs/2405.07117v1 ) ライセンス: Link先を確認	Abishek Sriramulu, Christoph Bergmeir, Slawek Smyl,	(参考訳) 実世界の時系列は、しばしば孤立して取得できない複雑な相互依存性を示す。時系列をローカルに生成しながら、複数の時系列から過去のデータを世界規模でモデル化するグローバルモデルは、今や一般的である。しかし、各シリーズの予測は依然として孤立しており、近隣シリーズの現在の状況を説明できない。多変量アテンションやグラフニューラルネットワークのような多変量モデルは、シリーズ間情報を明示的に組み込むことができ、グローバルモデルの欠点に対処することができる。しかし、これらの手法は時間の経過ごとに2次的な複雑さを示し、スケーラビリティを制限している。本稿では,計算オーバーヘッドを伴わずに,近隣の時系列から関連する文脈的洞察を持つ時系列モデルを拡張するための,効率的な線形複雑化手法であるContext Neural Networkを紹介する。提案手法は,大域的モデルの制約に対処しながら,大規模データセットに対して計算的に抽出可能でありながら,近隣からのリアルタイム情報をターゲットシリーズに提供することにより,予測モデルを強化する。 Real-world time series often exhibit complex interdependencies that cannot be captured in isolation. Global models that model past data from multiple related time series globally while producing series-specific forecasts locally are now common. However, their forecasts for each individual series remain isolated, failing to account for the current state of its neighbouring series. Multivariate models like multivariate attention and graph neural networks can explicitly incorporate inter-series information, thus addressing the shortcomings of global models. However, these techniques exhibit quadratic complexity per timestep, limiting scalability. This paper introduces the Context Neural Network, an efficient linear complexity approach for augmenting time series models with relevant contextual insights from neighbouring time series without significant computational overhead. The proposed method enriches predictive models by providing the target series with real-time information from its neighbours, addressing the limitations of global models, yet remaining computationally tractable for large datasets.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# 球状ダイニングプレートとボウルの野生楕円パラメータ推定 In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls ( http://arxiv.org/abs/2405.07121v1 ) ライセンス: Link先を確認	Akil Pathiranage, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong,	(参考訳) 楕円推定は, 皿やボウルのパラメータ化に利用することができるため, 食品画像処理において重要な話題である。プレートとボウルの楕円縁を自動的に検出し、その楕円パラメータを「地中」のデータとして推定することは困難であり、様々なカメラアングルとプレート形状が、撮影、ノイズの多い背景、複数の不均一なプレートとボウルが画像に存在している可能性がある。基礎モデルの最近の進歩は、ゼロショットセマンティック理解とオブジェクトセグメンテーションに有望な機能を提供する。しかし、これらのモデルによって生成されたプレートとボウルの出力マスク境界は、従来の楕円フィッティング法に比べて一貫性と精度が欠けることが多い。本稿では,ゼロショット基礎モデルから抽出した楕円フィッティングと意味情報を組み合わせて,プレートとボウルの楕円リムを検出する手法であるWildEllipseFitを提案する。提案したYummly-ellipseデータセットの評価は、実世界のシナリオにおけるその有効性とゼロショット能力を示す。 Ellipse estimation is an important topic in food image processing because it can be leveraged to parameterize plates and bowls, which in turn can be used to estimate camera view angles and food portion sizes. Automatically detecting the elliptical rim of plates and bowls and estimating their ellipse parameters for data "in-the-wild" is challenging: diverse camera angles and plate shapes could have been used for capture, noisy background, multiple non-uniform plates and bowls in the image could be present. Recent advancements in foundational models offer promising capabilities for zero-shot semantic understanding and object segmentation. However, the output mask boundaries for plates and bowls generated by these models often lack consistency and precision compared to traditional ellipse fitting methods. In this paper, we combine ellipse fitting with semantic information extracted by zero-shot foundational models and propose WildEllipseFit, a method to detect and estimate the elliptical rim for plate and bowl. Evaluation on the proposed Yummly-ellipse dataset demonstrates its efficacy and zero-shot capability in real-world scenarios.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# トラベリングセールスマン問題の解法のための2ステップ量子探索アルゴリズムの回路設計 Circuit Design of Two-Step Quantum Search Algorithm for Solving Traveling Salesman Problems ( http://arxiv.org/abs/2405.07129v1 ) ライセンス: Link先を確認	Rei Sato, Gordon Cui, Kazuhiro Saito, Hideyuki Kawashima, Tetsuro Nikuni, Shohei Watabe,	(参考訳) グロバーのアルゴリズムのような量子探索アルゴリズムは、制約付き組合せ最適化問題を効率的に解くことが期待されている。しかし、サーキット上での走行セールスマン問題(TSP)を解決するための量子探索アルゴリズムの実装は、現在のTSPの量子探索アルゴリズムが、制約を満たす実現可能な解状態の等重畳の初期状態が既に予め用意されていると仮定しているため、潜在的に困難である。ブライト力による初期状態の生成の時間的複雑さは、実現可能な解の因子的成長とともに指数関数的に増加し、大規模TSPのための量子回路の設計においてかなりの障害となる。この問題を解決するために,2つの異なる演算子を持つ2段階の量子探索アルゴリズムを提案し,初期状態を作成してTSPを解く。このアルゴリズムはまず、TSPのすべての実現可能な解の等しい重ね合わせ状態を増幅し、その後、これらの実現可能な解状態の最適解状態を増幅する。我々のアルゴリズムは、高次非制約バイナリ最適化(HOBO)表現に符号化されており、特に要求されるキュービット数を減らし、統一回路設計による初期状態の効率的な作成と、実現可能な解の事前知識がない2次高速化によるTSPの解決を可能にしている。 Quantum search algorithms, such as Grover's algorithm, are expected to efficiently solve constrained combinatorial optimization problems. However, implementing a quantum search algorithm for solving the traveling salesman problem (TSP) on a circuit poses a potential challenge because current quantum search algorithms for TSP assume that an initial state of equal superposition of feasible solution states satisfying the constraint is already prepared a priori. The time complexity of brute-force preparation of the initial state increases exponentially with the factorial growth of feasible solutions, posing a considerable obstacle in designing quantum circuits for large-scale TSP. To overcome this problem, we propose a two-step quantum search algorithm with two distinct operators for preparing the initial state and solving TSP. The algorithm first amplifies an equal superposition state of all feasible solutions of TSP and subsequently amplifies the optimal solution states among these feasible solution states. Our algorithm, encoded in the higher-order unconstrained binary optimization (HOBO) representation, notably reduces the required number of qubits, enabling efficient preparation of the initial state with a unified circuit design and solving TSP with a quadratic speedup in the absence of prior knowledge of feasible solutions.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# 振動モードギャップ:開量子多体系における相転移の指標 Oscillating-mode gap: an indicator of phase transition in open quantum many-body systems ( http://arxiv.org/abs/2405.07132v1 ) ライセンス: Link先を確認	Taiki Haga,	(参考訳) これは、開量子多体系の相と、密度行列がどのように進化するかを決定するリウヴィリアンのスペクトル構造との関係を解明する重要な課題である。これまでの研究では、最も緩やかな退化モードの崩壊速度として定義されるリウヴィリアのギャップに焦点が当てられ、放射相転移の鍵となる指標として、対称性の破れた相の閉ざしと乱れた相の開裂に言及されている。本研究では、最も緩やかな発振モードの減衰速度として定義される発振モードギャップと呼ばれる追加のスペクトルギャップを提案する。原型発散ボソン系の解析を通じて, 系の相と相転移の包括的解析を行うために, リウビリアギャップと発振モードギャップの両方の必要性を実証する。 It presents a significant challenge to elucidate the relationship between the phases of open quantum many-body systems and the spectral structure of their governing Liouvillian, which determines how the density matrix evolves. Previous studies have focused on the Liouvillian gap, defined as the decay rate of the most slowly-decaying mode, as a key indicator of dissipative phase transition, noting its closure in symmetry-broken phases and opening in disordered phases. In this work, we propose an additional spectral gap, termed the oscillating-mode gap, defined as the decay rate of the most slowly-decaying oscillating mode. Through the analysis of a prototype dissipative boson system, we demonstrate the necessity of both the Liouvillian gap and the oscillating-mode gap for the comprehensive characterization of the system's phases and the transitions between them.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# 最も効率的な量子化LDMを実現するために複数のポストトレーニング手法を組み合わせる Combining multiple post-training techniques to achieve most efficient quantized LLMs ( http://arxiv.org/abs/2405.07135v1 ) ライセンス: Link先を確認	Sayeh Sharify, Zifei Xu, Wanzin Yazar, Xin Wang,	(参考訳) LLM(Large Language Models)は、複雑な言語モデリングタスクにおいて卓越した性能を持つが、計算と記憶に重大な課題がある。本稿では,これらの課題を緩和する量子化の可能性について検討する。 SmoothQuant と GPTQ の2つのよく知られたポストトレーニング手法の組み合わせを体系的に研究し、それらの相互作用と LLM 量子化の進展に対する影響を包括的に分析する。マイクロスケーリング(MX)フォーマットの量子化を実現し,初期固定点フォーマットのターゲットを超えて適用範囲を広げることで,両手法の汎用性を高める。我々は、GPTQとSmoothQuantを適用し、MXフォーマットを用いてモデルを定量化することにより、OPTモデルのサイズを最大4倍、LLaMAモデルで最大3倍、無視できるパープレキシティが1-3%向上できることを示す。 Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of two well-known post-training techniques, SmoothQuant and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of both techniques by enabling quantization to microscaling (MX) formats, expanding their applicability beyond their initial fixed-point format targets. We show that by applying GPTQ and SmoothQuant, and employing MX formats for quantizing models, we can achieve a significant reduction in the size of OPT models by up to 4x and LLaMA models by up to 3x with a negligible perplexity increase of 1-3%.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# ノイズ量子多項時間と多項階層のOracle分離 Oracle Separation between Noisy Quantum Polynomial Time and the Polynomial Hierarchy ( http://arxiv.org/abs/2405.07137v1 ) ライセンス: Link先を確認	Nai-Hui Chia, Min-Hsiu Hsieh, Shih-Han Hung, En-Jui Kuo,	(参考訳) 本研究は、Chen, Cotler, Huang, Li (2022) などの定義に触発された、ノイズ量子回路の物理的に動機付けられた複雑性クラス間のオラクルの分離について研究する。一定の誤差率で、分離はNPの観点で達成できると証明する。誤差レートが$\Omega(\log n/n)$の場合、この結果をPHの分離にまで拡張することができる。これは、誤りの少ない量子コンピュータでさえ、様々なシナリオや仮定の下で古典的な複雑性クラスを超える可能性があることを示している。また,Raz と Tal (2022年) と Bassirian, Bouland, Fefferman, Gunn, Tal (2021年) の研究で見出された様々なノイズ設定や,新しい古典的硬度結果についても検討する。 This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $\Omega(\log n/n)$, we can extend this result to the separation of PH. Notably, our oracles, in all separations, do not necessitate error correction schemes or fault tolerance, as all quantum circuits are of constant depth. This indicates that even quantum computers with minor errors, without error correction, may surpass classical complexity classes under various scenarios and assumptions. We also explore various common noise settings and present new classical hardness results, generalizing those found in studies by Raz and Tal (2022) and Bassirian, Bouland, Fefferman, Gunn, and Tal (2021), which are of independent interest.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# バッチと量子化を用いた大規模言語モデル推論のためのエッジインテリジェンス最適化 Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization ( http://arxiv.org/abs/2405.07140v1 ) ライセンス: Link先を確認	Xinyuan Zhang, Jiang Liu, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang,	(参考訳) Generative Artificial Intelligence(GAI)は、非並列なコンテンツ生成能力で世界を席巻している。大規模言語モデル(LLM)がこの運動の最前線にある。しかし、LLMの重要なリソース要求は、しばしばクラウドホスティングを必要とするため、プライバシ、レイテンシ、利用制限に関する問題が発生する。エッジインテリジェンス(エッジインテリジェンス)は、データソースに近いユビキタスなエッジリソース上でリアルタイムのAI計算を可能にすることで、これらの課題に長年利用されてきたが、ほとんどの研究は、従来のAIモデルに焦点を当てており、モデルサイズや自動回帰プロセス、自己保持機構など、LLM推論のユニークな特徴に対処する際のギャップを残している。本稿では,LLM推論に適したエッジインテリジェンス最適化問題を提案する。具体的には,資源制限エッジデバイス上でのバッチ処理手法の展開とモデル量子化により,トランスフォーマーデコーダを用いたLCMの推論モデルを定式化する。さらに,バッチスケジューリングによる推論スループットの最大化と通信資源と計算資源の同時割り当てを目標とし,エッジリソースの制約とレイテンシと精度の変動を考慮した。このNP-hard問題に対処するため,オンラインツリー探索(DFTSP)を用いたDepth-First Tree-Searchingアルゴリズムを開発した。シミュレーションの結果, DFTSPは, 多様なユーザ設定や量子化技術にまたがるスループットの他のバッチベンチマークを上回り, ブルートフォースサーチ法と比較して, 時間複雑性を45%以上低減することがわかった。 Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# デコヒーレンス効果を有する宇宙ベル試験 Cosmological Bell Tests with Decoherence Effects ( http://arxiv.org/abs/2405.07141v1 ) ライセンス: Link先を確認	Chon Man Sou, Junqi Wang, Yi Wang,	(参考訳) インフレーション宇宙は粒子対を作り、運動量保存のためにその瞬間に絡み合っている。ゆらぎのモータを含むオペレータは、Gour-Khanna-Mann-Revzen (GKMR) のような擬似スピン演算子に書き換えることができる。これらの擬スピン作用素を利用することで、宇宙的ベルの不等式を定式化することができる。これらのベルの不等式に違反することは、原始揺らぎの量子的性質を示している。本研究では,原曲率摂動に着目した。曲率摂動は重力から生じるため、その作用はギボンズ・ホーキング・ヨーク境界項を含む。線形摂動の初期条件の選択における境界項の役割を明らかにする。その後、宇宙論的摂動の相互作用(バルクおよび境界相互作用項を含む)を進め、デコヒーレンス効果を導入する。これらのデコヒーレンス効果はベル演算子の期待値を変化させ、ベルの不等式を徐々に復元する。この過程を 'Bell test curve'' で記述し、宇宙論的摂動の量子起源をテストするための窓を提供する。また,ベル試験曲線からデコヒーレンス率の情報と一次相互作用の構造を抽出する可能性についても検討した。 The inflationary universe creates particle pairs, which are entangled in their momenta due to momentum conservation. Operators involving the momenta of the fluctuations can be rewritten into pseudo-spin operators, such as the Gour-Khanna-Mann-Revzen (GKMR) pseudo-spin. Making use of these pseudo-spin operators, cosmological Bell inequalities can be formulated. The violation of these Bell inequalities indicates the quantum nature of primordial fluctuations. In this work, we focus on primordial curvature perturbations. Since curvature perturbations arise from gravity, their action includes the Gibbons-Hawking-York boundary term. We clarify the role of the boundary term in selecting suitable initial conditions for linear perturbations. After that, we proceed to the interactions of cosmological perturbations, including the bulk and boundary interaction terms, which introduce decoherence effects. These decoherence effects change the expectation value of the Bell operator, and gradually restore the Bell inequality. We describe this process by a ``Bell test curve'', which offers a window for testing the quantum origin of cosmological perturbations. We also explore the possibility of extracting the information of the decoherence rate and the structure of primordial interactions from the Bell test curve.	翻訳日:2024-05-14 18:18:14 公開日:2024-05-12
# CLAMPによるクロスドメイン連続学習 Cross-Domain Continual Learning via CLAMP ( http://arxiv.org/abs/2405.07142v1 ) ライセンス: Link先を確認	Weiwei Weng, Mahardhika Pratama, Jie Zhang, Chen Chen, Edward Yapp Kien Yee, Ramasamy Savitha,	(参考訳) 人工ニューラルネットワークは、人間のような認知学習能力で有名だが、よく知られた破滅的な忘れ(CF)問題に遭遇する。 CFを緩和するための多くの努力にもかかわらず、特に複雑な変化環境において、これは重要な課題である。この課題は、継続学習(CL)の設定に従って、ドメイン間の適応においてさらに顕著になる。この目的のために、本稿では、追加のラベリングコストを伴わずに、そのような環境で単一モデルをデプロイできるクロスドメインCLアプローチを提案する。提案手法は,多くのプロセス (CLAMP) に対する連続的な学習手法であり,クラスアウェアな敵ドメイン適応戦略を統合して,ソースドメインとターゲットドメインを整合させる。各サンプルの影響や損失関数の相互作用を制御する各サンプルに重みの集合を割り当てるベースモデルの学習過程を、安定性と可塑性ジレンマのバランスを保ち、CF問題を防止すべく、評価者誘導学習プロセスが進められる。第1評価器は、ソースドメインの無関係なサンプルを拒絶する負の転送問題に焦点を当て、第2評価器はターゲットドメインのノイズの多い擬似ラベルを防止する。どちらのアセスラも、ランダム変換技術やソースドメインの類似したサンプルを使用して、メタラーニングアプローチで訓練されている。理論解析と広範な数値検証により、CLAMPは、すべての実験で確立されたベースラインアルゴリズムを少なくとも10\%$マージンで大幅に上回っていることが示された。 Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# シリコンT中心の光遷移パラメータ Optical transition parameters of the silicon T centre ( http://arxiv.org/abs/2405.07144v1 ) ライセンス: Link先を確認	Chloe Clear, Sara Hosseini, Amirhossein AlizadehKhaledi, Nicholas Brunelle, Austin Woolverton, Joshua Kanaganayagam, Moein Kazemi, Camille Chartrand, Mehdi Keshavarz, Yihuang Xiong, Oney O. Soykal, Geoffroy Hautier, Valentin Karassiouk, Mike Thewalt, Daniel Higginbottom, Stephanie Simmons,	(参考訳) シリコンTセンタの狭く、通信帯域の光学発光、長いスピンコヒーレンス、直接光子統合は、分散量子コンピューティングとネットワークのためのスピン光子インターフェースとしてのこのエミッタへの関心を喚起している。しかし、T中心のスピン選択光学遷移の重要なパラメータは、文学において未決定または曖昧である。本稿では、T中心TX状態のハミルトニアンを示し、T$_0$からTX$_0$への光学遷移の鍵パラメータを、公表された結果、密度汎関数理論、新しい分光法との組み合わせから決定する。文献中の内部欠陥電位の曖昧さを解消し,電気的に調整されたT中心放射の初回測定を行った。その結果、ひずみ、電気、磁場下でのT中心の光学特性とスピン特性のモデルを提供し、量子技術の実現に利用することができる。 The silicon T centre's narrow, telecommunications-band optical emission, long spin coherence, and direct photonic integration have spurred interest in this emitter as a spin-photon interface for distributed quantum computing and networking. However, key parameters of the T centre's spin-selective optical transitions remain undetermined or ambiguous in literature. In this paper we present a Hamiltonian of the T centre TX state and determine key parameters of the optical transition from T$_0$ to TX$_0$ from a combined analysis of published results, density functional theory, and new spectroscopy. We resolve ambiguous values of the internal defect potential in the literature, and we present the first measurements of electrically tuned T centre emission. As a result, we provide a model of the T centre's optical and spin properties under strain, electric, and magnetic fields that can be utilized for realizing quantum technologies.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 安定なシグナチャは不安定:拡散モデルから画像の透かしを取り除く Stable Signature is Unstable: Removing Image Watermark from Diffusion Models ( http://arxiv.org/abs/2405.07145v1 ) ライセンス: Link先を確認	Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong,	(参考訳) Watermarkは、AI生成画像を検出するために、業界によって広くデプロイされている。最近のウォーターマークフレームワークである 'emph{Stable Signature} (Meta が提案) は、ウォーターマークを拡散モデルのデコーダのパラメータに根付け、生成した画像が本質的にウォーターマークされる。安定署名は、emph{open-source}拡散モデルによって生成された画像を透かし、除去攻撃に対して堅牢であると主張した。本研究では,拡散モデルから透かしを微調整して除去する新たな攻撃法を提案する。この結果から, 画像の視覚的品質を維持しつつ, 画像が非透かしとなるような拡散モデルから, 効果的に透かしを除去できることが示唆された。我々の結果は、Stable Signatureは以前考えられていたほど安定していないことを強調している。 Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 光誘起非ガウス交絡ボース-アインシュタイン凝縮体における光子損失効果と異なる光子測定結果 Photon loss effects on light-mediated non-Gaussian entangled Bose-Einstein condensates projecting with different photon measurement outcomes ( http://arxiv.org/abs/2405.07153v1 ) ライセンス: Link先を確認	Shuai Gao, Manish Chaudhary, Alexey N. Pyrkov, Ebubechukwu O. Ilo-Okeke, Xin Meng, Jingyan Feng, Muhammad Jamil Khan, Tim Byners, Chaogang Lou,	(参考訳) マクロ的量子ビットに対する量子情報処理の理論は、すべてのマクロ的量子ビットが保存された数の粒子を持つという事実に基づいている。しかしながら、実験的な観点からは、これらの量子ビットは、これらの量子ビット間の絡み合いの発生可能性に影響を及ぼし、量子情報処理に効率的に使用されるようなデコヒーレンス(decoherence)の過程を経験する。遠方の原子BEC間の絡み合いを発生させる最も先進的な方法の1つは、量子非破壊測定である。本稿では,光子損失デコヒーレンスを含む場合の光子測定の影響について検討する。我々は、光子損失チャネルにおける正確な密度行列を得るために、熱絡み合った状態表現(TESR)と順序演算子(IWOP)アプローチの積分を用いる。我々は,光子数測定の結果が異なる絡み合った状態の生成につながり,それぞれが独特の特性を示すことを示した。我々は,ホフマン・タケウチとデュアン・ギドケ・シラク・ゾラー基準を用いることで,ワインランド・スクイージングやEPRのステアリング基準と比較して,絡み検出の優位性が得られることを見出した。 The theory of quantum information processing for macroscopic qubits is based on the fact that every macroscopic qubit has a conserved number of particles. However, from an experimental point of view, every such qubit experiences processes of decoherence that impact the possibilities for entanglement generation between such qubits and use in quantum information processing efficiently. One of the most prospective methods for generating entanglement between distant atomic BECs is quantum nondemolition measurements. Here, we study how the effects of photon measurement impact the entanglement when photon loss decoherence is included. We employ the thermally entangled state representation (TESR) and integral within the ordered operator(IWOP) approach to obtain the accurate density matrix in a photon loss channel. We demonstrate that varying outcomes of photon number measurements lead to the generation of distinct entangled states, each exhibiting unique characteristics. We find that using the Hofmann-Takeuchi and Duan-Giedke-Cirac-Zoller criterion provides advantages in entanglement detection compared to the Wineland squeezing and EPR steering criterion in such settings.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# マルチモーダル・ラーニングの強化:メタ学習型クロスモーダル・ナレッジ蒸留 Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities ( http://arxiv.org/abs/2405.07155v1 ) ライセンス: Link先を確認	Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro,	(参考訳) マルチモーダル学習では、いくつかのモダリティは他のモダリティよりも影響を受けており、それらの欠如は分類・分類精度に大きな影響を及ぼす可能性がある。したがって、トレーニングされたマルチモーダルモデルが、入力データから影響力のあるモダリティが欠如している場合でも、高い精度を持つことができるかどうかが重要な研究課題である。本稿では,メタ学習型クロスモーダル知識蒸留(MCKD)と呼ばれる新しい手法を提案する。 MCKDはメタラーニングプロセスを通じて各モードの重要性重みを適応的に推定する。これらの動的に学習されたモダリティの重要性重みは、重みの大きいモダリティから重みの低いモダリティへ知識を移すために、対方向のクロスモーダルな知識蒸留プロセスで使用される。このクロスモーダルな知識蒸留は、影響力のあるモダリティがなくても非常に正確なモデルを生成する。従来の手法と異なり、本手法は最小限の適応で複数のタスク(例えば、セグメンテーションや分類)で機能するように設計されている。 Brain tumor Segmentation Dataset 2018 (BraTS2018)とAudiovision-MNIST分類データセットの実験結果は、現在の最先端モデルよりもMCKDの方が優れていることを示している。特に BraTS2018 では, 腫瘍増強率 3.51 %, 腫瘍コア率 2.19 %, 腫瘍全体に対する 1.14 % が平均セグメンテーションDice スコアで有意に改善した。 In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 半自己監督型ドメイン適応:小麦頭セグメンテーションのための限定アノテートデータを用いたディープラーニングモデルの開発 Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation ( http://arxiv.org/abs/2405.07157v1 ) ライセンス: Link先を確認	Alireza Ghanbari, Gholamhassan Shirdel, Farhad Maleki,	(参考訳) 精密農業は、廃棄物や環境への影響を最小限に抑えつつ、農業の生産性、効率、利益性を向上させるための先進技術の適用を含む。ディープラーニングアプローチは、多くの視覚的タスクに対して自動意思決定を可能にする。しかし、農業領域では、成長段階の変動と天候や照明などの環境条件が、異なる条件をまたいで一般化する深層学習技術を開発する上で大きな課題となっている。これらの変数をキャプチャする広範なアノテートデータセットを作成するというリソース集約的な性質は、これらのアプローチの広範な採用を妨げる。これらの課題に対処するために,確率的拡散過程を持つ深層畳み込みニューラルネットワークに基づく半自己教師付きドメイン適応手法を導入し,手動データアノテーションの最小化を求める。 3つの手動アノテート画像とコムギ畑からのビデオクリップの選択を用いて,画像マスク対の大規模アノテートデータセットとビデオフレームから抽出した非アノテート画像の大規模データセットを生成した。合成画像-マスクペアと無注釈画像の両方を用いた2分岐畳み込みエンコーダ・デコーダモデルアーキテクチャを開発し,実画像への効果的な適応を実現した。提案したモデルは、内部テストデータセットのDiceスコア80.7\%、外部テストセットのDiceスコア64.8\%を達成し、5つの国からの画像で構成され、18のドメインにまたがる。 Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 光ポンピングによるヒ素ガリウムのコヒーレント振動発生における縦型光フォノンの役割 Dual role of longitudinal optical phonons for generation of coherent oscillations in gallium arsenide under optical pumping ( http://arxiv.org/abs/2405.07159v1 ) ライセンス: Link先を確認	Itsuki Takagi, Yuma Konno, Yosuke Kayanuma, Kazutaka G. Nakamura,	(参考訳) 低温近似を用いたガリウム(GaAs)の超高速赤外ポンプパルスによるコヒーレント長手光(LO)フォノンとLO-フォノンプラズモンカップリング(LOPC)モードの生成ダイナミクスの新規かつ簡便な図式を示す。 LOフォノンは、GaAsの励起状態にある光励起電子によって形成されるプラズモンと顕著に結合している。この結合は、励起状態におけるLOPCモードのコヒーレント振動をもたらす。ポンプパルスはまた、刺激されたラマン散乱を誘導し、基底状態でコヒーレントなLOフォノン振動を発生させる。この図は単純化されたモデルに組み込まれ、密度演算子の時間発展はリンドブラッド型量子マスター方程式を用いて計算される。理論的な結果は、過渡反射測定により観測されたLOフォノンとLOPCモードのコヒーレント振動に関する報告実験結果をよく説明できる。さらに,我々のモデルは,LOフォノンとLOPCモードの同時出現の自然な理由を提供する。 We present a novel and simple picture of the generation dynamics of coherent longitudinal optical (LO) phonons and LO-phonon-plasmon-coupled (LOPC) modes by the ultrafast infrared pump-pulses in gallium arsenide (GaAs) employing the low-temperature approximation. LO phonons exhibit a pronounced coupling with plasmons formed by the optically excited electrons in the excited states of GaAs. This coupling results in the coherent oscillation of the LOPC modes in the excited states. The pump pulse also induces stimulated Raman scattering, which generates the coherent LO-phonon oscillation in the ground state. This picture is incorporated into a simplified model, and the time evolution of the density operator is calculated using the Lindblad-type quantum master equation. The theoretical results explain well the reported experimental results on the coherent oscillation of LO phonons and LOPC modes observed through transient reflection measurements. Above all, our model provides a natural reason for the simultaneous manifestation of the LO phonons and the LOPC modes.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 自己アライメントによる大規模言語モデルを用いたロボットスキルの学習 Learning Reward for Robot Skills Using Large Language Models via Self-Alignment ( http://arxiv.org/abs/2405.07162v1 ) ライセンス: Link先を確認	Yuwei Zeng, Yao Mu, Lin Shao,	(参考訳) 報酬関数の学習は、幅広いスキルのレパートリーを持つロボットを装備する上で、依然としてボトルネックとなっている。大規模言語モデル(LLM)には、報酬関数の学習を支援する可能性のある、貴重なタスク関連の知識が含まれている。しかし,提案した報酬関数は不正確であり,環境情報にさらに根ざす必要がある。ヒトがいない場合に報酬をより効率的に学習する方法を提案した。まず、LLMを用いて報酬の特徴とパラメータ化を提案し、次に反復的な自己調整プロセスを通じてパラメータを更新する。特に、このプロセスは、実行フィードバックに基づいてLLMと学習報酬関数とのランキングの不整合を最小化する。この手法は2つのシミュレーション環境で9つのタスクで検証された。トレーニングの有効性と効率性に対して一貫した改善が示される一方で、代替の突然変異ベースの方法と比較して、GPTトークンをはるかに少なく消費する。 Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 教育のための視覚的質問応答の実現:マルチモーダルAIとしてのGPT-4V Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI ( http://arxiv.org/abs/2405.07163v1 ) ライセンス: Link先を確認	Gyeong-Geon Lee, Xiaoming Zhai,	(参考訳) 教育学者は、教室のダイナミクスを示す写真、学習内容に関する学生の図面、教科書のイラストなど、教育や学習の状況から得られた様々な画像データを分析してきた。必然的に、画像データの質的な分析と説明は、機械による自動化なしに人間の研究者によって行われてきた。それは、ほとんどの画像処理人工知能モデルは、一般の教育学者がアクセスできなかったり、複雑なディープニューラルネットワークアーキテクチャのために説明ができなかったためである。しかし、近年のVQA(Visual Question Answering)技術は、ユーザから与えられた画像に関する質問を受け取り、自然言語の両方で回答を返す、使用可能なビジュアル言語モデルを実現している。特にOpenAIがリリースしたGPT-4Vは、VQAを様々な目的で使用できるように、最先端のビジュアルランガウジュモデルサービスを大きく開放した。しかしながら、VQAとGPT-4Vは、まだ教育研究にはあまり適用されていない。本稿では,GPT-4Vが教育用VQAの実現に寄与することを提案する。 GPT-4Vは、技術・アクセシビリティ障壁のない教育学者によるVQA技術の利用を実現し、(2)GPT-4Vは、教育研究におけるVQAの有用性を実現する。これらのことから,本論文は教育研究のためのVQAの導入を目標とし,教育研究方法論のマイルストーンを提供する。本稿では,第2章でGPT-4VのリリースにともなうVQA技術開発について概説する。第3章は、教育研究における画像分析の利用についてレビューする。第4章では、第3章でレビューされた各研究使用法において、GPT-4Vをどのように使用できるかを示し、オペレーティングプロンプトを提供している。最後に、第5章は将来の意味について論じている。 Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# エネルギー計画による多モーダル確率軌道予測のための歩行者固有の不確かさのモデル化 Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising ( http://arxiv.org/abs/2405.07164v1 ) ライセンス: Link先を確認	Yao Liu, Quan Z. Sheng, Lina Yao,	(参考訳) 歩行者の軌道予測は、自動運転とスマートシティの領域において重要な役割を果たす。シーケンスモデルと生成モデルを用いた広範な先行研究にもかかわらず、歩行者の予測不可能な性質は、社会的相互作用や個人の嗜好に影響され、不確実性と多目的性によって特徴づけられる課題を提示する。そこで本研究では,確率的軌道予測のためのエネルギー計画デノイング(EPD)モデルを提案する。 EPDは当初、ランゲヴィンエネルギーモデル(Langevin Energy Model)を用いて、プランと呼ばれる将来の軌道の分布を粗い見積もっている。その後、確率拡散モデルによる偏極化により、この推定を洗練する。計画の導入により、EPDは反復的なステップの必要性を効果的に低減し、効率を向上する。さらに、EPDは個々の軌跡の代わりに軌跡の分布をモデル化することで従来の手法と異なる。これにより、歩行者固有の不確実性の明示的なモデリングが可能になり、複数の認知操作の必要性を排除できる。単一復調操作は、複数のサンプルを描画できる分布を生成し、効率を大幅に向上させる。さらに、EDDによるプランの微調整はモデル性能の向上に寄与する。 2つの公開データセットでEPDを検証することで、最先端の結果が得られます。さらに、アブレーション実験は個々のモジュールの寄与を裏付け、提案手法の有効性を裏付けるものである。 Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# ビジョンシステムのための資源効率のよい認識 Resource Efficient Perception for Vision Systems ( http://arxiv.org/abs/2405.07166v1 ) ライセンス: Link先を確認	A V Subramanyam, Niyati Singal, Vinay K Verma,	(参考訳) 画像認識分野の急速な進歩にもかかわらず、高解像度画像の処理は依然として計算上の課題である。しかし、この処理は、自律走行車ナビゲーションから医療画像解析まで幅広い領域における詳細な物体の洞察を抽出する上で重要である。本研究では,高解像度画像に対するメモリ効率のパッチベース処理を活用することにより,これらの課題を軽減するためのフレームワークを提案する。ローカルなパッチ情報と共にグローバルなコンテキスト表現が組み込まれており、画像の内容の包括的な理解を可能にする。メモリ制約によって制限される従来のトレーニング手法とは対照的に,本手法は超高解像度画像のトレーニングを可能にする。分類,オブジェクト検出,セグメンテーションにまたがる7つのベンチマークにおいて,本手法の有効性を示す。提案手法は,Jetson Nanoのような資源制約のあるデバイスでも高い性能を実現する。私たちのコードはhttps://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systemsで利用可能です。 Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at https://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systems.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# カメラ空間における単眼RGBからの3Dハンドメッシュの回収 3D Hand Mesh Recovery from Monocular RGB in Camera Space ( http://arxiv.org/abs/2405.07167v1 ) ライセンス: Link先を確認	Haonan Li, Patrick P. K. Chen, Yitong Zhou,	(参考訳) 仮想現実、拡張現実、ジェスチャーコントロールなどの技術の急速な進歩により、ユーザはコンピュータインターフェースとのインタラクションがより自然で直感的なものになることを期待している。既存のビジュアルアルゴリズムは、高精度で信頼性の高い絶対的な空間予測手法を必要とする、高度な人間とコンピュータのインタラクションタスクを達成するのに苦労することが多い。さらに、単眼画像における複雑なシーンやオクルージョンを扱うことは、全く新しい課題をもたらす。本研究では,ルート相対格子とルート回復タスクの並列処理を行うネットワークモデルを提案する。このモデルにより、モノクロRGB画像からカメラ空間における3Dハンドメッシュの復元が可能となる。エンド・ツー・エンドのトレーニングを容易にするために、2Dヒートマップに暗黙的な学習アプローチを用い、異なるサブタスク間の2Dキューの互換性を向上させる。インセプションの概念をスペクトルグラフ畳み込みネットワークに組み込んで、根の相対メッシュを探索し、根の回復探索のために設計された局所的詳細かつ世界的な注意深い手法と統合する。このアプローチは、複雑な環境や自己排除シーンにおけるモデルの予測性能を改善する。大規模ハンドデータセットFreiHANDの評価を通じて,提案モデルが最先端モデルに匹敵することを示した。本研究は,様々な人-コンピュータインタラクションアプリケーションにおいて,高精度かつ信頼性の高い絶対空間予測技術の発展に寄与する。 With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges. This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. The model enables the recovery of 3D hand meshes in camera space from monocular RGB images. To facilitate end-to-end training, we utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Incorporate the Inception concept into spectral graph convolutional network to explore relative mesh of root, and integrate it with the locally detailed and globally attentive method designed for root recovery exploration. This approach improves the model's predictive performance in complex environments and self-occluded scenes. Through evaluation on the large-scale hand dataset FreiHAND, we have demonstrated that our proposed model is comparable with state-of-the-art models. This study contributes to the advancement of techniques for accurate and reliable absolute spatial prediction in various human-computer interaction applications.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 特徴量コサインアライメントによるオンラインテスト時間適応の強化 Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment ( http://arxiv.org/abs/2405.07171v1 ) ライセンス: Link先を確認	WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar,	(参考訳) オンラインテスト時間適応(OTTA)は、ソースデータを必要とせずに、事前学習されたモデルを新しいターゲットドメインにオンザフライで適応させる、分散シフトを処理する効果的な戦略として登場した。 OTTAのエントロピー最小化法は,決定境界付近のあいまいさや誤った低エントロピー予測によるノイズ勾配に悩まされていることがわかった。このような制約を克服するために,クラス予測の精度と新しい領域への適応性を向上する二目的損失関数を用いたコサインアライメント最適化手法を提案する。具体的には、特徴ベクトルとクラス重みベクトルのコサイン類似性を最適化し、クラス予測の精度を高め、新しい領域へのモデルの適応性を向上する。 CIFAR-10-C、CIFAR-100-C、ImageNet-C、Office-Home、DomainNetデータセットなど、最先端の手法より優れており、多様な腐敗やドメインシフトに対して高い精度と堅牢性を示す。 Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# オントロジーに基づくログモニタリングを用いたマネージドサーバーレス環境における観測可能性とインシデント応答 Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log Monitoring ( http://arxiv.org/abs/2405.07172v1 ) ライセンス: Link先を確認	Lavi Ben-Shimol, Edita Grolman, Aviad Elyashar, Inbar Maimon, Dudu Mimran, Oleg Brodt, Martin Strassmann, Heiko Lehmann, Yuval Elovici, Asaf Shabtai,	(参考訳) フルマネージドなサーバレス環境では、クラウドサービスプロバイダがクラウドインフラストラクチャの確保に責任を持ち、アプリケーション開発者の運用とメンテナンスの労力を削減します。しかし、この環境は既存のサイバーセキュリティフレームワークやツールの使用を制限するため、監視可能性や状況認識能力(リスク評価、インシデント対応など)が低下する。加えて、サーバーレスアプリケーションの既存のセキュリティフレームワークは、すべてのアプリケーションアーキテクチャにうまく一般化せず、完全に管理されたサーバーレス環境での使用には、適応、専門的な専門知識などが必要である。本稿では,フルマネージドなサーバレス環境にデプロイされたアプリケーションに対して,3層セキュリティ方式を提案する。最初の2つのレイヤには、サーバレスログのみに基づくユニークなオントロジーが含まれており、それらを統合されたアプリケーションアクティビティ知識グラフに変換するために使用される。第3のレイヤでは、グラフベースの表現を利用する2つの状況認識ツールを実装することにより、可観測性と状況認識能力の必要性に対処する。 1) オントロジーを利用したインシデント対応ダッシュボードにより,サイバーセキュリティアラートのコンテキストにおけるアプリケーションアクティビティログの可視化と調査を行う。ユーザ調査では、ダッシュボードによって、参加者はベースラインツールよりも、より正確に、迅速に新しいセキュリティアラートに応答できることがわかった。 2)サイバーセキュリティの文脈における専門家による効果的な優先順位付けを実現するためのリスクアセスメントフレームワーク(CoA)の批判性。 In a fully managed serverless environment, the cloud service provider is responsible for securing the cloud infrastructure, thereby reducing the operational and maintenance efforts of application developers. However, this environment limits the use of existing cybersecurity frameworks and tools, which reduces observability and situational awareness capabilities (e.g., risk assessment, incident response). In addition, existing security frameworks for serverless applications do not generalize well to all application architectures and usually require adaptation, specialized expertise, etc. for use in fully managed serverless environments. In this paper, we introduce a three-layer security scheme for applications deployed in fully managed serverless environments. The first two layers involve a unique ontology based solely on serverless logs which is used to transform them into a unified application activity knowledge graph. In the third layer, we address the need for observability and situational awareness capabilities by implementing two situational awareness tools that utilizes the graph-based representation: 1) An incident response dashboard that leverages the ontology to visualize and examine application activity logs in the context of cybersecurity alerts. Our user study showed that the dashboard enabled participants to respond more accurately and quickly to new security alerts than the baseline tool. 2) A criticality of asset (CoA) risk assessment framework that enables efficient expert-based prioritization in cybersecurity contexts.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# CRSFL: 継続的認証のためのクラスタベースのリソース認識型フェデレーション学習 CRSFL: Cluster-based Resource-aware Split Federated Learning for Continuous Authentication ( http://arxiv.org/abs/2405.07174v1 ) ライセンス: Link先を確認	Mohamad Wazzeh, Mohamad Arafeh, Hani Sami, Hakima Ould-Slimane, Chamseddine Talhi, Azzam Mourad, Hadi Otrok,	(参考訳) 絶え間なく変化するテクノロジーの世界では、デバイスとのユーザーインタラクションにおいて、継続的な認証と包括的なアクセス管理が不可欠である。分散学習(SL)とフェデレート学習(FL)は、最近、分散機械学習(ML)モデルをトレーニングするための有望な技術として登場した。スマートフォンとIoT(Internet of Things)デバイスの利用が増加する中、これらの分散技術により、限られたリソースを持つユーザは、サーバーアシストによるニューラルネットワークモデルのトレーニングを完了し、異なるノード間の知識を協調的に組み合わせることができる。本研究では,ユーザプライバシ保護とデバイスリソース使用制限を両立させながら,これらの技術を組み合わせて継続的な認証課題に対処することを提案する。しかし、SLシーケンシャルなトレーニングと、異なる仕様のIoTデバイス間のリソース差のため、モデルのトレーニングは遅くなっている。したがって、クラスタベースのアプローチを用いて、同様の機能を持つデバイスをグループ化し、遅いデバイスの影響を軽減すると同時に、モデルをトレーニングできないデバイスをフィルタリングする。さらに、SLおよびFL技術を用いて機械学習モデルの学習効率とロバスト性を改善し、プロセスのオーバーヘッドを解析しながらクライアントを同時に訓練する。クラスタリングに続いて、慎重に設計された目的のリストに最適化された遺伝的アルゴリズム(GA)を用いて、トレーニングに参加するための最良のクライアント群を選択する。提案手法の性能をベースライン法と比較し,実生活型 UMDAA-02-FD 顔検出データセットを用いてその利点を実証した。その結果,提案手法であるCRSFLは,ユーザのプライバシを保ちながら高い精度を維持し,継続的な認証シナリオのオーバヘッドを低減できることが示唆された。 In the ever-changing world of technology, continuous authentication and comprehensive access management are essential during user interactions with a device. Split Learning (SL) and Federated Learning (FL) have recently emerged as promising technologies for training a decentralized Machine Learning (ML) model. With the increasing use of smartphones and Internet of Things (IoT) devices, these distributed technologies enable users with limited resources to complete neural network model training with server assistance and collaboratively combine knowledge between different nodes. In this study, we propose combining these technologies to address the continuous authentication challenge while protecting user privacy and limiting device resource usage. However, the model's training is slowed due to SL sequential training and resource differences between IoT devices with different specifications. Therefore, we use a cluster-based approach to group devices with similar capabilities to mitigate the impact of slow devices while filtering out the devices incapable of training the model. In addition, we address the efficiency and robustness of training ML models by using SL and FL techniques to train the clients simultaneously while analyzing the overhead burden of the process. Following clustering, we select the best set of clients to participate in training through a Genetic Algorithm (GA) optimized on a carefully designed list of objectives. The performance of our proposed framework is compared to baseline methods, and the advantages are demonstrated using a real-life UMDAA-02-FD face detection dataset. The results show that CRSFL, our proposed approach, maintains high accuracy and reduces the overhead burden in continuous authentication scenarios while preserving user privacy.	翻訳日:2024-05-14 18:08:19 公開日:2024-05-12
# 深層強化学習によるフェデレーション学習におけるオンデマンドモデルとクライアント展開 On-Demand Model and Client Deployment in Federated Learning with Deep Reinforcement Learning ( http://arxiv.org/abs/2405.07175v1 ) ライセンス: Link先を確認	Mario Chahoud, Hani Sami, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Mohsen Guizani,	(参考訳) フェデレートラーニング(FL)では、多様な場所やユーザタイプからのデータへのアクセスが制限されているため、ユーザの参加が制限されているため、大きな課題となる。クライアントアクセスの拡大とデータの多様化により、さまざまな視点を取り入れてモデルを強化し、適応性を向上させる。しかし、あるデバイスがFLクライアントとしてアクセス不能になり、データの可用性やクライアントの選択方法に影響を及ぼすような動的およびモバイル環境では、課題が生じる。これを解決するために、Docker Containersをオンザフライで使用する新しいクライアントをデプロイするOn-Demandソリューションを提案します。当社のオンデマンドソリューションは、Deep Reinforcement Learning(DRL)を採用して、データシフトやコンテナデプロイメントの複雑さを考慮して、クライアントの可用性と選択を目標としています。モデルデプロイメントとクライアント選択を処理するために、自律的なエンドツーエンドソリューションを採用している。 DRL戦略はMarkov Decision Process(MDP)フレームワークを使用し、Master LearnerとJoiner Learnerを使用する。設計されたコスト関数は、動的クライアントの配置と選択の複雑さを表している。シミュレーションテストは、アーキテクチャが環境の変化に容易に対応し、オン・デマンド・リクエストに応答できることを示しています。これにより、クライアントの可用性、能力、正確性、学習効率を向上し、ヒューリスティックで表型的な強化学習ソリューションを超えることができる。 In Federated Learning (FL), the limited accessibility of data from diverse locations and user types poses a significant challenge due to restricted user participation. Expanding client access and diversifying data enhance models by incorporating diverse perspectives, thereby enhancing adaptability. However, challenges arise in dynamic and mobile environments where certain devices may become inaccessible as FL clients, impacting data availability and client selection methods. To address this, we propose an On-Demand solution, deploying new clients using Docker Containers on-the-fly. Our On-Demand solution, employing Deep Reinforcement Learning (DRL), targets client availability and selection, while considering data shifts, and container deployment complexities. It employs an autonomous end-to-end solution for handling model deployment and client selection. The DRL strategy uses a Markov Decision Process (MDP) framework, with a Master Learner and a Joiner Learner. The designed cost functions represent the complexity of the dynamic client deployment and selection. Simulated tests show that our architecture can easily adjust to changes in the environment and respond to On-Demand requests. This underscores its ability to improve client availability, capability, accuracy, and learning efficiency, surpassing heuristic and tabular reinforcement learning solutions.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# ホログラム:LiDARによるリアルタイムホログラフィーオーバーレイ Hologram: Realtime Holographic Overlays via LiDAR Augmented Reconstruction ( http://arxiv.org/abs/2405.07178v1 ) ライセンス: Link先を確認	Ekansh Agrawal,	(参考訳) 悪名高いスター・ウォーズシリーズのホログラム技術を用いて、LiDARによる3D再構成によるリアルタイムホログラムオーバーレイを作成するアプリケーションを提案する。以前の試みではSLAMやNeRFは高度に調整されたシーンを必要とするか、急激な計算コストがかかるか、動的シーンのレンダリングに失敗する。本稿では,iPhone 14 Proなどの携帯端末上で動作可能な3つの高忠実度再構築ツールを提案する。私のシステムはインタラクティブで没入的なホログラフィック体験を可能にし、拡張現実、テレプレゼンス、エンターテイメントなど幅広い用途に利用できる。 Guided by the hologram technology of the infamous Star Wars franchise, I present an application that creates real-time holographic overlays using LiDAR augmented 3D reconstruction. Prior attempts involve SLAM or NeRFs which either require highly calibrated scenes, incur steep computation costs, or fail to render dynamic scenes. I propose 3 high-fidelity reconstruction tools that can run on a portable device, such as a iPhone 14 Pro, which can allow for metric accurate facial reconstructions. My systems enable interactive and immersive holographic experiences that can be used for a wide range of applications, including augmented reality, telepresence, and entertainment.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# エントロピー不確実性関係による量子電池のショーケース抽出に関する研究 Showcasing extractable work of quantum battery via entropic uncertainty relations ( http://arxiv.org/abs/2405.07185v1 ) ライセンス: Link先を確認	Meng-Long Song, Xue-Ke Song, Liu Ye, Dong Wang,	(参考訳) 本研究では,バッテリチャージャーフィールドをモデルとした量子電池(QB)のボソニックおよびフェルミオン系貯水池存在下でのエネルギー変動に対するエントロピー不確実性関係(EURs)の有効性について検討した。以上の結果から,抽出可能な作業(エクセルギーとエルゴトロピー)は異なるシナリオで多種多様であり,厳密性と抽出可能な作業との間には複雑な関係があることが示唆された。エントロピー不確実性の低い境界の厳密性は、充電QBにおけるエネルギー変換効率のよい指標となることは注目に値する。さらに,不確実性および低拘束性を含むEURがQBシステムのエネルギー変換効率にどのように寄与するかを明らかにする。これらの知見は、量子電池の性能を評価する上での量子不確実性の役割をよりよく理解するために有用であると考えられている。 In this study, we investigate the effectiveness of entropic uncertainty relations (EURs) in discerning the energy variation in quantum batteries (QBs) modelled by battery-charger-field in the presence of bosonic and fermionic reservoirs. Our results suggest that the extractable works (exergy and ergotropy) have versatile characteristics in different scenarios, resulting in a complex relationship between tightness and extractable work. It is worth noting that the tightness of the lower bound of entropic uncertainty can be a good indicator for energy conversion efficiency in charging QBs. Furthermore, we disclose how the EUR including uncertainty and lower bound contributes to energy conversion efficiency in the QB system. It is believed that these findings will be beneficial for better understanding the role of quantum uncertainty in evaluating quantum battery performance.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 実世界データを用いたランダム化制御試験に基づく平均処理効果の適応TMLE Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data ( http://arxiv.org/abs/2405.07186v1 ) ライセンス: Link先を確認	Mark van der Laan, Sky Qiu, Lars van der Laan,	(参考訳) ランダム化制御試験(RCT)データと実世界データ(RWD)データの両方が利用可能である場合,平均処理効果(ATE)を推定する問題を考察する。本研究では, RCT と RWD を統合したプール時間推定器と, RCT の登録条件が結果に与える影響を推定するバイアス推定器との差として ATE 推定器を分解する。適応型最小損失ベース推定(A-TMLE)フレームワークを導入し、それらを推定する。我々は、A-TMLE推定器がルート-n-一貫性を持ち、漸近的に正規であることを証明する。さらに, 有限試料では, RCT の登録条件が結果に与える影響について, 1 つの既知のオラクルモデルを持つ超効率が得られる。その結果、RWDによって誘導されるバイアスの作用モデルが小さくなればなるほど、我々の推定器の効率は向上するが、我々の推定器は常にRCTデータのみを使用する効率的な推定器と同じくらい効率的である。 A-TMLEは平均二乗誤差が小さく、95%の信頼区間を持つことで、シミュレーションにおいて既存の手法よりも優れている。 A-TMLEは、介入効果の見積をバイアスすることなく、ランダム化試験結果の効率を向上させるためにRWDを利用するのに役立つ。このアプローチは、より小さく、より高速な治験を可能にし、患者が効果的な治療を受けるまでの時間を短縮することができる。 We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is root-n-consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller the working model of the bias induced by the RWD is, the greater our estimator's efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95% confidence intervals. A-TMLE could help utilize RWD to improve the efficiency of randomized trial results without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# トラップを用いたフォトニックランダムウォーク Photonic random walks with traps ( http://arxiv.org/abs/2405.07192v1 ) ライセンス: Link先を確認	Stefano Longhi,	(参考訳) ランダムウォークは古典的および量子的粒子に対して非常に異なる振る舞いをする。ここでは, 有限個のトラップが存在する場合の1次元格子内の光子のランダムウォークのユビキタスな挙動を明らかにする。古典的なランダムウォークでは、光子はトラップによって避けられないほど破壊され、量子ウォークでは光子は生き続けることができ、ウォークは永遠に続く。このような興味深い振る舞いは、制御可能なデコヒーレンスを持つ合成メッシュ格子におけるフォトニックランダムウォークを考慮し、量子的なランダムウォークから古典的なランダムウォークに切り替えることができる。 Random walks behave very differently for classical and quantum particles. Here we unveil a ubiquitous distinctive behavior of random walks of a photon in a one-dimensional lattice in the presence of a finite number of traps, at which the photon can be destroyed and the walk terminates. While for a classical random walk the photon is unavoidably destroyed by the traps, for a quantum walk the photon can remain alive and the walk continues forever. Such an intriguing behavior is illustrated by considering photonic random walks in synthetic mesh lattices with controllable decoherence, which enables to switch from quantum to classical random walks.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 微分可能トポックを用いた微分モデルスケーリング Differentiable Model Scaling using Differentiable Topk ( http://arxiv.org/abs/2405.07194v1 ) ライセンス: Link先を確認	Kai Liu, Ruohui Wang, Jianfei Gao, Kai Chen,	(参考訳) 過去数年間、大規模な言語モデルがインテリジェンス発生の時代に定着し、ネットワークのスケーリングに重点が置かれてきた。現在、多くのネットワークアーキテクチャは手動で設計されており、しばしばサブ最適構成をもたらす。ニューラルアーキテクチャサーチ(NAS)手法は,このプロセスを自動化するために提案されているが,探索効率の低下に悩まされている。本研究では,ネットワークの最適幅と深さを探索する効率を高めるため,微分可能モデルスケーリング(DMS)を提案する。 DMSは、幅と深さの両方を、直接的かつ完全に異なる方法でモデル化できるため、最適化が容易である。我々は、視覚タスクからNLPタスク、CNNやTransformerなど様々なネットワークアーキテクチャまで、さまざまなタスクでDMSを評価してきた。結果は,我々のDMSが改良された構造を見つけ,最先端NAS法より優れていることを一貫して示している。具体的には、ImageNet上の画像分類において、当社のDMSは、EfficientNet-B0とDeit-Tinyのトップ1の精度をそれぞれ1.4%、Deit-Tinyは0.6%改善し、検索に0.4GPU日しか必要とせず、最先端のゼロショットNASであるZiCoを1.3%上回っている。 COCO上の物体検出では、DMSはYolo-v8-nのmAPを2.0%改善する。言語モデリングでは,Llama-7Bは従来の手法よりも低いパープレキシティと高いゼロショット分類精度で優れていた。将来、コードをリリースします。 Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods. Specifically, for image classification on ImageNet, our DMS improves the top-1 accuracy of EfficientNet-B0 and Deit-Tiny by 1.4% and 0.6%, respectively, and outperforms the state-of-the-art zero-shot NAS method, ZiCo, by 1.3% while requiring only 0.4 GPU days for searching. For object detection on COCO, DMS improves the mAP of Yolo-v8-n by 2.0%. For language modeling, our pruned Llama-7B outperforms the prior method with lower perplexity and higher zero-shot classification accuracy. We will release our code in the future.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# InsightNet: 顧客からのフィードバックから構造化されたインサイトマイニング InsightNet: Structured Insight Mining from Customer Feedback ( http://arxiv.org/abs/2405.07195v1 ) ライセンス: Link先を確認	Sandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam Mohan,	(参考訳) 顧客レビューから構造化された洞察を自動的に抽出する新しいアプローチであるInsightNetを提案する。私たちのエンドツーエンドの機械学習フレームワークは、特定トピックの構造の欠如、非標準アスペクト名、豊富なトレーニングデータの欠如など、現在のソリューションの限界を克服するために設計されています。提案手法は,ラベル付きデータを生成する意味的類似性ヒューリスティックアプローチである生のレビューから半教師付きマルチレベル分類法を構築し,LLMを微調整してマルチタスクの洞察抽出アーキテクチャを採用する。 InsightNetは、顧客の感情と各トピックに対する口頭で、より粒度の細かいアクション可能なトピックを特定する。実際の顧客レビューデータによる評価では、InsightNetは構造、階層、完全性の観点から既存のソリューションよりも優れたパフォーマンスを示している。我々は、InsightNetがマルチラベルのトピック分類において現在の最先端手法より優れていることを実証的に証明し、F1スコアが0.85となり、前回のベストスコアよりも11%のF1スコアが向上した。さらにInsightNetは、目に見えない側面を一般化し、分類に新たなトピックを追加することを提案している。 We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 合成データジェネレータのランク付けを許可されたブロックチェーンベースのフレームワーク Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators ( http://arxiv.org/abs/2405.07196v1 ) ライセンス: Link先を確認	Narasimha Raghavan Veeraragavan, Mohammad Hossein Tabatabaei, Severin Elvatun, Vibeke Binz Vallevik, Siri Larønningen, Jan F Nygård,	(参考訳) 合成データ生成は、不足、バイアス、プライバシといったデータ関連の課題に対処するための重要なソリューションとして、ますます認識されている。合成データの増加に伴い、利用可能なさまざまなオプションを考えると、合成データジェネレータを選択するための堅牢な評価フレームワークの必要性が高まっている。本研究では,2つの質問について検討する。 1) 特定の目的のための選択肢の集合から最適な合成データ生成装置をどうやって選択できるのか。 2) 選択プロセスをより透明に、説明責任を持ち、監査可能にするにはどうすればよいのか? これらの問題に対処するために、Sawtoothと呼ばれる認可されたブロックチェーンフレームワーク内で、提案されたランキングアルゴリズムをスマートコントラクトとして実装する、新たなアプローチを導入する。本フレームワークは,最先端のベースラインランキングソリューションとの総合的な実験と比較を通じて,望ましくない特性と望ましくない特性の両方を考慮したランキングを提供する上で,その有効性を示す。さらに,本フレームワークは,データ保護原則の遵守を確保しつつ,特定のニーズに対して最適な合成データジェネレータを選択するための貴重なツールとして機能する。 Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# Qsyn: NISQ時代以降のための開発者フレンドリーな量子回路合成フレームワーク Qsyn: A Developer-Friendly Quantum Circuit Synthesis Framework for NISQ Era and Beyond ( http://arxiv.org/abs/2405.07197v1 ) ライセンス: Link先を確認	Mu-Te Lau, Chin-Yi Cheng, Cheng-Hua Lu, Chia-Hsu Chuang, Yi-Hsiang Kuo, Hsiang-Chun Yang, Chien-Tung Kuo, Hsin-Yu Chen, Chen-Ying Tung, Cheng-En Tsai, Guan-Hao Chen, Leng-Kai Lin, Ching-Huan Wang, Tzu-Hsu Wang, Chung-Yang Ric Huang,	(参考訳) 本稿では、新しい量子回路合成(QCS)フレームワークであるQsynを紹介し、開発者がQCSアルゴリズムとツールを研究、開発、試験、実験し、そしてフレームワークに貢献できるようにする。 1) 開発者が様々なテストシナリオを簡単に設計し、アルゴリズムで柔軟に実験できるように、リッチなコマンドラインインターフェースを設計します。 2) 開発者がアルゴリズムを極端に最適化できるように,異なる抽象レベルの量子回路上で多くのデータ表現に詳細なアクセスを提供する。 (3)私たちは,開発者が開発品質を,最新のソフトウェアエンジニアリングのベストプラクティスで確保できるように,厳格な開発フローと環境を定義します。筆者らは,T-Count Optimizationアルゴリズムの開発を実演し,最近のQCSフレームワークと同等に比較して,性能上の優位性を示す。 In this paper, we introduce a new quantum circuit synthesis (QCS) framework, Qsyn, for developers to research, develop, test, experiment, and then contribute their QCS algorithms and tools to the framework. Our framework is more developer-friendly than other modern QCS frameworks in three aspects: (1) We design a rich command-line interface so that developers can easily design various testing scenarios and flexibly conduct experiments on their algorithms. (2) We offer detailed access to many data representations on different abstract levels of quantum circuits so that developers can optimize their algorithms to the extreme. (3) We define a rigid developing flow and environment so that developers can ensure their development qualities with the best modern software engineering practices. We illustrate the friendliness of our framework with a showcase of developing a T-Count Optimization algorithm and demonstrate our performance superiority with fair comparisons to other modern QCS frameworks.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 準結晶における劣化誘起運動エッジ Dephasing-induced mobility edges in quasicrystals ( http://arxiv.org/abs/2405.07198v1 ) ライセンス: Link先を確認	Stefano Longhi,	(参考訳) アンダーソン局在状態と拡張状態とを分離するモビリティエッジ(ME)は、ある1次元格子の1次元エネルギースペクトルにおいて周期的な順序で生じることが知られている。デファスティングとデコヒーレンス効果は、アンダーソンの局在を損なうことや輸送の促進に広く認められており、MEと局在はデファスティングの存在下では観測できないことが示唆されている。ここでは、そのような知恵とは対照的に、MEは、全ての状態がコヒーレントダイナミクスの下で非局在化される準結晶における純粋に退化効果によって生成できることが示される。脱落効果によって引き起こされる局所状態の寿命は極端に長くなりうるので、反故意に脱コヒーレンスによって格子内の励起の局在化が促進される。この結果は、合成メッシュ格子におけるフォトニック量子ウォークを考慮することで説明できる。 Mobility edges (ME), separating Anderson-localized states from extended states, are known to arise in the single-particle energy spectrum of certain one-dimensional lattices with aperiodic order. Dephasing and decoherence effects are widely acknowledged to spoil Anderson localization and to enhance transport, suggesting that ME and localization are unlikely to be observable in the presence of dephasing. Here it is shown that, contrary to such a wisdom, ME can be created by pure dephasing effects in quasicrystals in which all states are delocalized under coherent dynamics. Since the lifetimes of localized states induced by dephasing effects can be extremely long, rather counter-intuitively decoherence can enhance localization of excitation in the lattice. The results are illustrated by considering photonic quantum walks in synthetic mesh lattices.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# Chebyshev Polynomial-based Kolmogorov-Arnold Networks: 非線形関数近似のための効率的なアーキテクチャ Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation ( http://arxiv.org/abs/2405.07200v1 ) ライセンス: Link先を確認	Sidharth SS,	(参考訳) 複素非線形関数の正確な近似は、多くの科学および工学領域における根本的な挑戦である。従来のニューラルネットワークアーキテクチャは、高次元関数に存在する複雑なパターンや不規則を捉えるのに苦労することが多い。本稿では、Chebyshev Kolmogorov-Arnoldネットワーク(Chebyshev Kan)を紹介し、Kelmogorov-Arnold理論の理論的基礎とChebyshev多項式の強力な近似能力を組み合わせた新しいアプローチを提案する。 1 Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures often struggle to capture intricate patterns and irregularities present in high-dimensional functions. This paper introduces the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a novel approach that combines the theoretical foundations of the Kolmogorov-Arnold Theorem with the powerful approximation capabilities of Chebyshev polynomials. 1	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 汎用3次元大規模知覚のための強力な事前学習ベースラインの構築 Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception ( http://arxiv.org/abs/2405.07201v1 ) ライセンス: Link先を確認	Haoming Chen, Zhizhong Zhang, Yanyun Qu, Ruixin Zhang, Xin Tan, Yuan Xie,	(参考訳) 汎用的な3D表現を備えた効果的な事前学習フレームワークは、大規模な動的シーンを知覚するのに非常に望ましい。しかし、タスクジェネリックかつラベル効率の両方の理想的なフレームワークを確立することは、様々な場面で同じプリミティブの表現を統一する上での課題となる。現在のコントラスト的な3D事前学習法は、典型的にはフレームレベルの一貫性に従っており、各分離画像における2D-3D関係に焦点を当てている。このような不整合性は,(1)クロスシーンセマンティック・セルフ・コンフリクト,すなわち,異なるシーンからの同じ意味論の原始的セグメント間の激しい衝突,(2)クロスシーンセマンティック・セマンティック・セマンティクスを3次元表現学習に推し進めるグローバルな統一結合の欠如といった,普遍的な事前訓練の枠組みに到達するための有望な道を大いに妨げている。上記の課題に対処するために,シーンレベルのセマンティックセマンティックセマンティックセマンティクスを心臓に配置し,類似したセマンティクスセグメントの接続を様々なシーンにブリッジするCSCフレームワークを提案する。この目的を達成するために、視覚基盤モデルによって提供される一貫性のあるセマンティック・キューと、相補的なマルチモーダル情報から導かれる知識に富んだクロスシーンのプロトタイプを組み合わせる。これにより、様々な下流タスクを容易にし、微調整の少ないユニバーサルな3D事前学習モデルを訓練することができる。実験により,SOTA事前学習アプローチ(+1.4% mIoU),オブジェクト検出(+1.0% mAP),パノプティックセグメンテーション(+3.0% PQ)に対して,タスク固有3Dネットワークを用いたnuScenesで一貫した改善を実現した。コードはhttps://github.com/chenhaomingbob/CSCでリリースされ、将来の研究に刺激を与えたいと考えている。 An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes. To achieve this goal, we combine the coherent semantic cues provided by the vision foundation model and the knowledge-rich cross-scene prototypes derived from the complementary multi-modality information. These allow us to train a universal 3D pre-training model that facilitates various downstream tasks with less fine-tuning efforts. Empirically, we achieve consistent improvements over SOTA pre-training approaches in semantic segmentation (+1.4% mIoU), object detection (+1.0% mAP), and panoptic segmentation (+3.0% PQ) using their task-specific 3D network on nuScenes. Code is released at https://github.com/chenhaomingbob/CSC, hoping to inspire future research.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 同期オーディオによる一元化ビデオ言語事前学習 Unified Video-Language Pre-training with Synchronized Audio ( http://arxiv.org/abs/2405.07202v1 ) ライセンス: Link先を確認	Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang,	(参考訳) ビデオ言語事前学習は,大規模データから視覚的およびテキスト的表現を自己指導的に学習することを目的とした,典型的で困難な問題である。既存の事前学習アプローチは、画像とテキストのペアの対応を捉えるか、フレームの時間的順序付けを利用するかのいずれかである。しかし、彼らは音声と他の2つのモード間の自然な同期を明示的に調べていない。本稿では,VLSAと呼ばれる同期音声によるビデオ言語事前学習のための拡張フレームワークを提案する。具体的には、VLSAは、ビデオ、テキスト、オーディオのローカルパッチとグローバルトークンの埋め込みを共同で集約します。さらに,ローカル・パッチ・マスクド・モデリングを用いてモダリティを意識した特徴を学習し,グローバル・オーディオ・マッチングを利用して映像やテキストの音声誘導機能をキャプチャする。テキスト,ビデオ,音声の検索について広範な実験を行った。 0.9Mデータのみを事前学習した簡単なモデルでは,最先端のベースラインに対する結果の改善が期待できる。さらに、定性的可視化は、識別的視覚・テクスチャ表現の学習において、VLSAの優位性を鮮明に示している。 Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two modalities. In this work, we propose an enhanced framework for Video-Language pre-training with Synchronized Audio, termed as VLSA, that can learn tri-modal representations in a unified self-supervised transformer. Specifically, our VLSA jointly aggregates embeddings of local patches and global tokens for video, text, and audio. Furthermore, we utilize local-patch masked modeling to learn modality-aware features, and leverage global audio matching to capture audio-guided features for video and text. We conduct extensive experiments on retrieval across text, video, and audio. Our simple model pre-trained on only 0.9M data achieves improving results against state-of-the-art baselines. In addition, qualitative visualizations vividly showcase the superiority of our VLSA in learning discriminative visual-textual representations.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# C++11コードをC++03に変換してレガシーコンパイル環境をサポートする Transforming C++11 Code to C++03 to Support Legacy Compilation Environments ( http://arxiv.org/abs/2405.07204v1 ) ライセンス: Link先を確認	Gábor Antal, Dávid Havas, István Siket, Árpád Beszédes, Rudolf Ferenc, József Mihalicza,	(参考訳) 新しい技術 - プログラミング言語、環境、ライブラリ - は急速に変化します。しかし、様々な内部および外部の制約により、プロジェクトはこれらの変更に迅速に適用できなくなることが多い。例えば、顧客は、ソフトウェアベンダーから特定のプラットフォーム互換性を必要とするかもしれません。本研究では、C++プログラミング言語の文脈におけるそのような問題に対処する。私たちの産業パートナーは、古いC++言語エディションのみをサポートするSDKを使用する必要があります。しかし彼らは、開発者がコードで最新の言語構造を使えるようにしたいと思っている。この問題に対処するため、私たちは、C++11標準に従って書かれたソースコードを、機能的に等価なC++03変種に自動的にバックポートする、ソースコード変換フレームワークを作成しました。私たちのフレームワークでは、開発者は最新の言語機能を自由に利用できます。本稿では,トランスフォーメーションエンジンの技術的詳細と,大規模な2つのコードベースと4つのオープンソースシステムに適用した経験について報告する。私たちのソリューションは無料で、オープンソースです。 Newer technologies - programming languages, environments, libraries - change very rapidly. However, various internal and external constraints often prevent projects from quickly adopting to these changes. Customers may require specific platform compatibility from a software vendor, for example. In this work, we deal with such an issue in the context of the C++ programming language. Our industrial partner is required to use SDKs that support only older C++ language editions. They, however, would like to allow their developers to use the newest language constructs in their code. To address this problem, we created a source code transformation framework to automatically backport source code written according to the C++11 standard to its functionally equivalent C++03 variant. With our framework developers are free to exploit the latest language features, while production code is still built by using a restricted set of available language constructs. This paper reports on the technical details of the transformation engine, and our experiences in applying it on two large industrial code bases and four open-source systems. Our solution is freely available and open-source.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 静的JavaScriptコールグラフの比較研究 Static JavaScript Call Graphs: A Comparative Study ( http://arxiv.org/abs/2405.07206v1 ) ライセンス: Link先を確認	Gábor Antal, Péter Hegedűs, Zoltán Tóth, Rudolf Ferenc, Tibor Gyimóthy,	(参考訳) クライアント側とサーバ側の両方でJavaScriptの人気と広く採用されているため、コード解析はこれまで以上に重要になっている。脆弱性分析、コーディング問題検出、型推論のアルゴリズムのほとんどは、基礎となるプログラムのコールグラフ表現に依存している。動的解析のいくつかの明らかな利点にもかかわらず、静的アルゴリズムは、プログラムの広範なテストベッドやコストのかかる実行とトレースを必要としないため、コールグラフの構築にも考慮すべきである。本稿では,npmコールグラフ,IBM WALA,Google Closure Compiler,Approximate Call Graph,Type Analyzer for JavaScriptツールによって実装された,26のWebKit SunSpiderベンチマークプログラムと6つの実世界のNode.jsモジュール上でJavaScriptコールグラフを構築するための,広く採用されている5つの静的アルゴリズムを体系的に比較する。結果の定量的,定性的な評価だけでなく,性能分析も提供する。その結果,アルゴリズム間のコールエッジの交点は比較的大きく,精度は100であることがわかった。しかし、ツールのほとんどは、他のすべてに見逃されたエッジを見つけました。 ACGはTAJSの直後に最も精度が高かったが,ACGの呼び出しエッジは有意に増加した。ツールの組み合わせに関して、ACGとTAJSは、すべてのアルゴリズムで見つかった真のエッジの99%をカバーし、精度は98%まで維持した。言語機能が不完全なため、最新のマルチファイルNode.jsモジュールを解析できたのは2つだけだった。彼らは約60%の呼び出しエッジに同意したが、それぞれが、もう一方が見逃した有効なエッジを見つけた。 The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100 precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.	翻訳日:2024-05-14 17:57:54 公開日:2024-05-12
# 等変QAOAとミキサーの重複 Equivariant QAOA and the Duel of the Mixers ( http://arxiv.org/abs/2405.07211v1 ) ライセンス: Link先を確認	Boris Tsvelikhovskiy, Ilya Safro, Yuri Alexeev,	(参考訳) 量子近似最適化アルゴリズム(QAOA)の最適混合器の構築は、組合せ最適化問題の解法におけるQAOAの性能向上に不可欠である。本稿では,古典的最適化問題対象の固有対称性と整合性を確保したQAOA調整ミキサーHamiltonianを構築するための体系的方法論を提案する。このアプローチの鍵となるのは、ヒルベルト空間の根底にあるQAOA 上の対称性群の作用に可換な作用素を同定し、有効混合ハミルトニアン函数に必須の技術的基準を満たすことである。様々な組合せ最適化問題でよく見られる対称群 $S_d$ に特化して構成法を提供する。必要な特性を厳密に検証し、具体的公式とそれに対応する量子回路を実装することにより、提案したミキサーハミルトニアンの生存性を確立する。さらに、古典ミキサー$B$は、グループ自体よりもはるかに小さな$S_d$の部分群でのみ可換であることを示し、提案手法の効率性を高める。提案手法の有効性を評価するため,異なるミキサーハミルトン多様体を用いた2つのQAOA変種の比較を行った。平均値の統計的に有意な差を観測し、新しい変種は複数の独立シミュレーションにおいて常に優れた性能を示す。さらに,本研究では,近年の文献的知見を裏付ける概念的説明として,代替のウォームスタート型QAOA変種における性能低下現象を分析した。 Constructing an optimal mixer for Quantum Approximate Optimization Algorithm (QAOA) Hamiltonian is crucial for enhancing the performance of QAOA in solving combinatorial optimization problems. We present a systematic methodology for constructing the QAOA tailored mixer Hamiltonian, ensuring alignment with the inherent symmetries of classical optimization problem objectives. The key to our approach is to identify an operator that commutes with the action of the group of symmetries on the QAOA underlying Hilbert space and meets the essential technical criteria for effective mixer Hamiltonian functionality. We offer a construction method specifically tailored to the symmetric group $S_d$, prevalent in a variety of combinatorial optimization problems. By rigorously validating the required properties, providing a concrete formula and corresponding quantum circuit for implementation, we establish the viability of the proposed mixer Hamiltonian. Furthermore, we demonstrate that the classical mixer $B$ commutes only with a subgroup of $S_d$ of significantly smaller order than the group itself, enhancing the efficiency of the proposed approach. To evaluate the effectiveness of our methodology, we compare two QAOA variants utilizing different mixer Hamiltonians: conventional $B=\sum X_i$ and the newly proposed $H_M$ in edge coloring and graph partitioning problems across various graphs. We observe statistically significant differences in mean values, with the new variant consistently demonstrating superior performance across multiple independent simulations. Additionally, we analyze the phenomenon of poor performance in alternative warm-start QAOA variants, providing a conceptual explanation supported by recent literature findings.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# LLM支援推論による最適化における決定処理の強化:ニューラルネットワークの視点から Enhancing Decision-Making in Optimization through LLM-Assisted Inference: A Neural Networks Perspective ( http://arxiv.org/abs/2405.07212v1 ) ライセンス: Link先を確認	Gaurav Singh, Kavitesh Kumar Bali,	(参考訳) 本稿では,大規模多目的最適化分野におけるジェネレーティブAI(GenAI)と進化的アルゴリズム(EA)のシームレスな統合について検討する。大規模言語モデル(LLM)の変換的役割に着目し,LLM支援推論による意思決定プロセスの自動化と向上の可能性について検討した。具体的には、進化的に最適化されたソリューションにおいて、文脈的トレードオフを明確に表現しながら、重要な決定変数を照らし出す効果を強調した。複雑な多目的最適化ソリューションを大規模に見積もることに固有の課題に対処するために,我々のアプローチはLLMの適応性を強調し,曖昧な説明を提供し,さまざまな利害関係者の専門知識レベルとドメインの嗜好とを整合させる。実世界の意思決定シナリオにおける LLM-Assisted Inference の実践的適用性と影響について実証的研究を行った。 This paper explores the seamless integration of Generative AI (GenAI) and Evolutionary Algorithms (EAs) within the domain of large-scale multi-objective optimization. Focusing on the transformative role of Large Language Models (LLMs), our study investigates the potential of LLM-Assisted Inference to automate and enhance decision-making processes. Specifically, we highlight its effectiveness in illuminating key decision variables in evolutionarily optimized solutions while articulating contextual trade-offs. Tailored to address the challenges inherent in inferring complex multi-objective optimization solutions at scale, our approach emphasizes the adaptive nature of LLMs, allowing them to provide nuanced explanations and align their language with diverse stakeholder expertise levels and domain preferences. Empirical studies underscore the practical applicability and impact of LLM-Assisted Inference in real-world decision-making scenarios.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 脆弱性のあるJavaScript関数の予測における機械学習アルゴリズムの適応 Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions ( http://arxiv.org/abs/2405.07213v1 ) ライセンス: Link先を確認	Rudolf Ferenc, Péter Hegedűs, Péter Gyimesi, Gábor Antal, Dénes Bán, Tibor Gyimóthy,	(参考訳) サイバー犯罪活動の急速な増加と、それらによって脅かされるデバイスの増加は、ソフトウェアセキュリティの問題を目立たせている。攻撃の約90%が既知のセキュリティ問題を利用しており、脆弱なコンポーネントを見つけ、既存の緩和技術を適用している。本稿では,JavaScriptプログラムにおけるセキュリティ脆弱性の可能性のある関数の予測において,一般的なディープラーニングアルゴリズムを含む最先端の機械学習技術がどのように機能するかを検討する。私たちは8つの機械学習アルゴリズムを適用して、Node Security ProjectとSnykプラットフォームの公開データベースの脆弱性情報とGitHubからのコード修正パッチから、この研究のために構築された新しいデータセットを使用して、予測モデルを構築しました。静的なソースコードメトリクスを予測子として使用し、グリッド探索アルゴリズムを使って最高のパフォーマンスモデルを見つけました。また、データセットの不均衡性を扱うために、様々な再サンプリング戦略の効果についても検討した。最高性能のアルゴリズムはKNNであり、F値が0.96(精度0.91、リコール0.66)の脆弱性関数の予測モデルを作った。さらに, 深層学習, 樹木および林質分類器, SVMは0.70以上でF値と競合した。再サンプリング戦略ではF値の差は認められなかったが, 精度とリコールの分布は変化した。再サンプリングは精度の高いモデルを作るようには見えなかったが、再サンプリング戦略はIR対策のバランスを保った。 The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# ニューラルネットワークによる連続変数上の局所的独立性の発見について On Discovery of Local Independence over Continuous Variables via Neural Contextual Decomposition ( http://arxiv.org/abs/2405.07220v1 ) ライセンス: Link先を確認	Inwoo Hwang, Yunhyeok Kwak, Yeon-Ji Song, Byoung-Tak Zhang, Sanghack Lee,	(参考訳) 条件付き独立は、興味のある変数間の因果関係を理解する手段を提供する。基礎となるシステムは、特に変数とその親の間のよりきめ細かい因果関係を示す可能性があり、これは局所的な独立関係と呼ばれる。最も広く研究されているローカルな関係の1つは、条件付き変数の特定の割り当てに係わるコンテキスト特化独立(CSI)である。しかし、連続変数を許可しないため、その適用性はしばしば制限される: 連続変数の特定の値に条件付けられたデータには、ほとんどインスタンスが含まれていないが、独立性をテストすることは不可能である。本研究では,親変数の協調代入に係わる局所的独立関係を定義・特徴化し,その関係を文脈集合依存独立(CSSI)と呼ぶ。次に、CSSIの標準表現を提供し、その基本特性を証明します。理論的な結果から,システム内の複数のCSSI関係を,連立結果空間の分割として発見する問題を提起した。最後に,各セットに条件分布をモデル化してCSSIを誘導することにより,その分割を学習するニューラルコンテクスト分解(NCD)を提案する。提案手法は,実世界の物理力学を反映した合成データセットと複雑なシステムの両方において,真相の局所的独立関係の発見に成功していることを実証的に実証した。 Conditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specific assignment of conditioned variables. However, its applicability is often limited since it does not allow continuous variables: data conditioned to the specific value of a continuous variable contains few instances, if not none, making it infeasible to test independence. In this work, we define and characterize the local independence relationship that holds in a specific set of joint assignments of parental variables, which we call context-set specific independence (CSSI). We then provide a canonical representation of CSSI and prove its fundamental properties. Based on our theoretical findings, we cast the problem of discovering multiple CSSI relationships in a system as finding a partition of the joint outcome space. Finally, we propose a novel method, coined neural contextual decomposition (NCD), which learns such partition by imposing each set to induce CSSI via modeling a conditional distribution. We empirically demonstrate that the proposed method successfully discovers the ground truth local independence relationships in both synthetic dataset and complex system reflecting the real-world physical dynamics.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 量子並列性とは何でしょう? What is Quantum Parallelism, Anyhow? ( http://arxiv.org/abs/2405.07222v1 ) ライセンス: Link先を確認	Stefano Markidis,	(参考訳) 量子コンピューティングのパワーの中心は量子並列性の概念であり、量子システムは複数の計算経路を同時に探索して処理することができる。本稿では,従来の並列計算モデルと並列処理を併用し,その基本特性とアルゴリズム性能への影響を解明する量子並列処理の解明について論じる。まず、量子状態の重ね合わせから生じる量子並列性を定義し、複数の計算経路を並列に探索できるようにする。量子並列性の定量化と可視化のために,量子アルゴリズムとその並列実行経路のグラフィカルな表現を提供する量子データフロー図(quantum Dataflow diagrams)の概念を導入する。本稿では、量子データフロー図を用いて量子フーリエ変換(QFT)や振幅増幅(AA)といった量子アルゴリズムを解析することにより、量子並列性を計測し評価する方法を実証する。さらに、Amdahl法やGustafson法則を含む古典並列法則と量子並列法則の相互作用について検討する。これらの法則は、もともと古典的な並列計算システムのために定式化されたものであるが、量子コンピューティング領域におけるそれらの適用性を再考する。古典的並列化法則は価値ある洞察を与えるが、量子並列化の独特な性質や古典的量子I/Oの本質的な制限などにより、量子コンピューティングへの直接的な応用は限定的であると論じる。我々の分析は、量子並列性に対する理解を深めることの必要性と、アルゴリズムの設計と性能に与える影響を強調している。 Central to the power of quantum computing is the concept of quantum parallelism: quantum systems can explore and process multiple computational paths simultaneously. In this paper, we discuss the elusive nature of quantum parallelism, drawing parallels with classical parallel computing models to elucidate its fundamental characteristics and implications for algorithmic performance. We begin by defining quantum parallelism as arising from the superposition of quantum states, allowing for the exploration of multiple computational paths in parallel. To quantify and visualize quantum parallelism, we introduce the concept of quantum dataflow diagrams, which provide a graphical representation of quantum algorithms and their parallel execution paths. We demonstrate how quantum parallelism can be measured and assessed by analyzing quantum algorithms such as the Quantum Fourier Transform (QFT) and Amplitude Amplification (AA) iterations using quantum dataflow diagrams. Furthermore, we examine the interplay between quantum parallelism and classical parallelism laws, including Amdahl's and Gustafson's laws. While these laws were originally formulated for classical parallel computing systems, we reconsider their applicability in the quantum computing domain. We argue that while classical parallelism laws offer valuable insights, their direct application to quantum computing is limited due to the unique characteristics of quantum parallelism, including the role of destructive interference and the inherent limitations of classical-quantum I/O. Our analysis highlights the need for an increased understanding of quantum parallelism and its implications for algorithm design and performance.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# オフライン-オンライン強化学習におけるタスク一般化のためのエンサンブル継承表現 Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning ( http://arxiv.org/abs/2405.07223v1 ) ライセンス: Link先を確認	Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang,	(参考訳) 強化学習(Reinforcement Learning, RL)では、探索が困難であるため、オンライン体験をゼロからトレーニングすることは非効率である。最近、オフラインRLは、オンラインインタラクションによって洗練される初期化オフラインポリシーを提供することで、有望なソリューションを提供する。しかし,既存の手法では,オフラインからオンラインへの適応におけるタスク一般化問題を考慮せずに,オフラインとオンラインの学習を同一タスクで行う。実世界のアプリケーションでは、特定のタスクからのオフラインデータセットしか持たず、複数のタスクに対する高速なオンライン適応を目指していないことが一般的である。この問題に対処するため、オンラインRLにおけるタスク一般化のための後継表現の調査を基盤として、オフライン-オンライン学習を組み込むためのフレームワークを拡張した。提案手法は,オンラインの微調整によりオフラインデータを効果的に活用することができず,新たなタスクの性能向上を図っている。これを軽減するために、オフラインデータを利用して後続表現のアンサンブルを取得し、その後にアンサンブルQ関数を構成する新しい手法を提案する。このアプローチは、異なるカバレッジを持つデータセットからの堅牢な表現学習を可能にし、オンラインの微調整フェーズにおいて、Q関数の新たなタスクへの迅速な適応を容易にする。広範囲にわたる経験的評価は,本手法の多様さ,さらには見当たらない課題に一般化する上で,優れた性能を示す説得力のある証拠となる。 In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 有限ゲームの幾何学的分解:非回帰学習における収束対再発 A geometric decomposition of finite games: Convergence vs. recurrence under no-regret learning ( http://arxiv.org/abs/2405.07224v1 ) ライセンス: Link先を確認	Davide Legacci, Panayotis Mertikopoulos, Bary Pradelski,	(参考訳) ゲームにおける非回帰学習のダイナミクスの複雑さを考慮すると、有限ゲームは、ダイナミクスの日々の振る舞いがよく理解されている単純な構成要素に分解される。これに対する自然な出発点としてヘルムホルツの定理があり、ベクトル場をポテンシャルと非圧縮成分に分解する。しかし、非回帰力学の幾何学、特に指数的/乗法的重み(EW)スキームの力学はヘルムホルツの定理のユークリッド基底と互換性がないため、シャーシャハニ計量に基づくリーマンの枠組みを考えることができる。第一に、容積保存に加えて、非圧縮ゲームにおける連続時間EWダイナミクスは運動の定数を許容し、ポアンカー'e が再帰する - すなわち、ほぼすべてのプレイの軌道は、その始点に無限に近くなる。第二に、よく知られたゲームの分解と(プレイヤーの目的がそれぞれ整列し、反整列している)ポテンシャルと調和成分との深い関係を確立する: ゲームが非圧縮的であることと、それが調和である場合に限り、EWダイナミクスがポインカーの繰り返しを調和ゲームで導くことを暗示する。 In view of the complexity of the dynamics of no-regret learning in games, we seek to decompose a finite game into simpler components where the day-to-day behavior of the dynamics is well understood. A natural starting point for this is Helmholtz's theorem, which resolves a vector field into a potential and an incompressible component. However, the geometry of no-regret dynamics - and, in particular, the dynamics of exponential / multiplicative weights (EW) schemes - is not compatible with the Euclidean underpinnings of Helmholtz's theorem, leading us to consider a Riemannian framework based on the Shahshahani metric. Using this geometric construction, we introduce the class of incompressible games, and we prove the following results: First, in addition to being volume-preserving, the continuous-time EW dynamics in incompressible games admit a constant of motion and are Poincar\'e recurrent - i.e., almost every trajectory of play comes arbitrarily close to its starting point infinitely often. Second, we establish a deep connection with a well-known decomposition of games into a potential and harmonic component (where the players' objectives are aligned and anti-aligned respectively): a game is incompressible if and only if it is harmonic, implying in turn that the EW dynamics lead to Poincar\'e recurrence in harmonic games.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# ノーランチ理論のレンズによる古典的および量子的学習プロトコルの分離パワー Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem ( http://arxiv.org/abs/2405.07226v1 ) ライセンス: Link先を確認	Xinbiao Wang, Yuxuan Du, Kecheng Liu, Yong Luo, Bo Du, Dacheng Tao,	(参考訳) No-Free-Lunch (NFL)定理は、最適化プロセスにかかわらず問題とデータに依存しない一般化誤差を定量化する定理であり、多様な学習プロトコルの可能性を理解するための基礎的な枠組みを提供する。その重要性にも拘わらず、量子機械学習モデルに対するNFL定理の確立は、量子学習プロトコルと古典学習プロトコルの基本的な関係に関するより広範な洞察を見越して、ほとんど未解明のままである。このギャップに対処するため、様々な量子学習アルゴリズムを、特定の観測可能条件下で量子力学を学習するための3つの学習プロトコルに分類し、NFLの定理を確立する。 Classical Learning Protocols (CLC-LPs)、Restricted Quantum Learning Protocols (ReQu-LPs)、Quantum Learning Protocols (Quantum Learning Protocols (Qu-LPs)は、様々なレベルの量子リソースへのアクセスを提供するプロトコルである。得られたNFLの定理は,CLC-LP,ReQu-LP,Qu-LPの2次的複雑性の減少を示し,量子状態の直交性と可観測物の対角性に基づく。この性能差は、量子力学に固有の特異な物理的特徴である非直交量子状態のグローバル位相に関する情報を間接的に利用するために、量子関連学習プロトコルのユニークな能力に起因している。我々の発見は、量子学習プロトコルの能力の理解を深めるだけでなく、高度な量子学習アルゴリズムの開発のための実践的な洞察も提供する。 The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights into the fundamental relationship between quantum and classical learning protocols. To address this gap, we categorize a diverse array of quantum learning algorithms into three learning protocols designed for learning quantum dynamics under a specified observable and establish their NFL theorem. The exploited protocols, namely Classical Learning Protocols (CLC-LPs), Restricted Quantum Learning Protocols (ReQu-LPs), and Quantum Learning Protocols (Qu-LPs), offer varying levels of access to quantum resources. Our derived NFL theorems demonstrate quadratic reductions in sample complexity across CLC-LPs, ReQu-LPs, and Qu-LPs, contingent upon the orthogonality of quantum states and the diagonality of observables. We attribute this performance discrepancy to the unique capacity of quantum-related learning protocols to indirectly utilize information concerning the global phases of non-orthogonal quantum states, a distinctive physical feature inherent in quantum mechanics. Our findings not only deepen our understanding of quantum learning protocols' capabilities but also provide practical insights for the development of advanced quantum learning algorithms.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 自然の物理的前提下における量子通信の情報容量 Information capacity of quantum communication under natural physical assumptions ( http://arxiv.org/abs/2405.07231v1 ) ライセンス: Link先を確認	Jef Pauwels, Stefano Pironio, Armin Tavakoli,	(参考訳) 量子準備と測定のシナリオは、放出された状態に関する様々な物理的仮定の下で研究されている。ここでは、まず、異なる仮定が概念的および形式的にどのように関連しているかについて議論する。次に、状態アンサンブルのワンショットアクセス可能な情報に対する制限に対応する、他のすべての緩和に役立つものを特定します。このことは、これらの様々な物理的仮定の対象となるソースの最適状態判別確率を研究する動機となる。量子次元、真空成分、任意の一様重なり合い、高次元信号の大きさ、実験者のデバイスに対する信頼度によって制限された状態に対して、一般および厳密な境界を導出する。この結果は、半デバイス非依存の量子情報処理のより統一された図への第一歩となる。 The quantum prepare-and-measure scenario has been studied under various physical assumptions on the emitted states. Here, we first discuss how different assumptions are conceptually and formally related. We then identify one that can serve as a relaxation of all others, corresponding to a limitation on the one-shot accessible information of the state ensemble. This motivates us to study the optimal state discrimination probability of a source subject to these various physical assumptions. We derive general and tight bounds for states restricted by their quantum dimension, their vacuum component, an arbitrary uniform overlap, the magnitude of higher-dimensional signals and the experimenter's trust in their device. Our results constitute a first step towards a more unified picture of semi-device-independent quantum information processing.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# A Flowはパケットのストリーム: DDoS検出のためのストリーム構造化データアプローチ A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection ( http://arxiv.org/abs/2405.07232v1 ) ライセンス: Link先を確認	Raja Giryes, Lior Shafir, Avishai Wool,	(参考訳) DDoS(Distributed Denial of Service)攻撃はインターネットにますます有害になり、スローダウンの兆候はない。 DDoS攻撃を阻止するための正確な検出メカニズムの開発は、これらの攻撃の多様さと新たな攻撃ベクトルの出現により、依然として大きな課題である。本稿では,集約フロー統計を含む従来の固定サイズレコード構造ではなく,フローをストリーム構造として操作する新しいツリーベースDDoS検出手法を提案する。集約フロー記録は,過去10年間で人気を博し,総トラフィック量のごく一部を検査することで,フローベースの侵入検知に有効な手段を提供しているが,本質的に制限されている。検出精度はパケットペイロードの欠如だけでなく,パケット順序や時間的関係といった粒度の細かいパケット間関係をモデル化できない構造によって制限される。さらに、集約フロー統計の推測は、完全なフローが終わるまで待たなければならない。ここでは、フロー入力を、関連するパケットヘッダーからなる可変長ストリームとして考えることにより、悪意のあるフローを極めて正確かつ高速に検出できることを示す。我々は、CICDDoS2019およびCICIDS2017データセットに対して、包括的なDDoS攻撃を含む提案した戦略を評価する。我々のアプローチは、最先端のディープラーニング手法を含む既存の機械学習技術の精度と一致しているか、あるいは超えている。さらに,最初の2つのパケットをベースとしたCICDDoS2019検出では,99.79%,トラフィック量の4-6%しか使用していない。 Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# OXYGENERATOR: 深層学習による1世紀にわたる海洋脱酸素の再構築 OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning ( http://arxiv.org/abs/2405.07233v1 ) ライセンス: Link先を確認	Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang,	(参考訳) 海洋生態系の評価と保護には,1世紀にわたる海洋の脱酸素の正確な再構築が不可欠である。既存の専門家が支配する数値シミュレーションは、地球温暖化や人的活動によって引き起こされる動的変動に追いつかなかった。さらに、高コストのデータ収集のため、歴史的観測は極めて少ないため、正確な復元には大きな課題が伴う。そこで本研究では,1920年から2023年にかけての海洋の脱酸素を再構築するための,最初の深層学習モデルであるOxyGeneratorを提案する。具体的には、大規模な時間的・空間的スケールでの不均一性に対処するため、欠落した値とスパース観測の間の複雑な海洋学的相関を捉えるために、ゾン化変化グラフメッセージパッシングを提案する。さらに,不確かさを校正するために,溶存酸素(DO)の変動と化学効果から誘導バイアスを取り入れた。その場でのDO観測と比較すると、OxyGeneratorはCMIP6数値シミュレーションを著しく上回り、MAPEを38.77%削減し、データ駆動方式で「ブレスレス・オーシャン」を理解する有望な可能性を示している。 Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# コンセプタを用いたリカレントニューラルネットワークの適応制御 Adaptive control of recurrent neural networks using conceptors ( http://arxiv.org/abs/2405.07236v1 ) ライセンス: Link先を確認	Guillaume Pourcel, Mirko Goldmann, Ingo Fischer, Miguel C. Soriano,	(参考訳) リカレントニューラルネットワークは複雑な高次元の時間パターンの予測と生成に優れる。本質的に非線形ダイナミクスとメモリのため、データから非有界時間依存を学習することができる。機械学習の設定では、ネットワークのパラメータはトレーニングフェーズ中に適応され、与えられたタスク/プロブレムの要求に適合して計算能力が向上する。トレーニング後、学習した計算を利用するために、ネットワークパラメータは固定される。これにより、静的パラメータは、外部または内部の摂動のような変化条件に適応しないネットワークをレンダリングする。本論文では,トレーニング後のネットワークの適応性維持が,その機能と堅牢性を高めることを実証する。本稿では,ネットワークの動作を連続的に解析し,その時間変化した内部表現を所望の目標に従うように調整する適応制御ループを概念化する。本稿では、時間的パターンの補間、部分的ネットワーク劣化に対する安定化、入力歪みに対する堅牢性という3つのタスクにおいて、ネットワークの適応性が計算機能をどのようにサポートするかを示す。我々の研究結果は、機械学習における適応型ネットワークの可能性を強調し、複雑なパターンを学習するだけでなく、環境の変化に合わせて動的に調整し、最終的に適用範囲を広げることを可能にする。 Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters thereby render the network unadaptive to changing conditions, such as external or internal perturbation. In this manuscript, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network's behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 多細胞系における新規性・複雑性・適応の事例研究 Case Study of Novelty, Complexity, and Adaptation in a Multicellular System ( http://arxiv.org/abs/2405.07241v1 ) ライセンス: Link先を確認	Matthew Andres Moreno, Santiago Rodriguez Papa, Charles Ofria,	(参考訳) 新規性、複雑さ、適応の継続的な生成は、オープンエンド進化の中核的な側面として確立されている。しかし、これらの現象がどの程度結合し、どのような意味によって相互作用するかは、まだ確定していない。本研究では,デジタル多細胞性の進化を研究するために設計されたdisHTINYシミュレーションシステムを用いて,新規性,複雑性,適応の共進化を事例として追跡する。本症例では, 定性的に異なる10種類の多細胞形態を記述し, そのうちのいくつかは非対称な成長と異なる生活段階を示す。我々は、これらの形態学の進化史を複雑さと適応度の測定で文脈化する。我々のケーススタディは、新奇性、複雑さ、適応の間に緩やかな(時には相違する)関係が存在することを示唆している。 Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to study the evolution of digital multicellularity. In this case study, we describe ten qualitatively distinct multicellular morphologies, several of which exhibit asymmetrical growth and distinct life stages. We contextualize the evolutionary history of these morphologies with measurements of complexity and adaptation. Our case study suggests a loose -- sometimes divergent -- relationship can exist among novelty, complexity, and adaptation.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# フォールトトレラント量子LDPCエンコーダ Fault-Tolerant Quantum LDPC Encoders ( http://arxiv.org/abs/2405.07242v1 ) ライセンス: Link先を確認	Abhi Kumar Sharma, Shayan Srinivasa Garan,	(参考訳) 量子低密度パリティチェック(LDPC)符号に対するフォールトトレラントエンコーダを提案する。連続ブロック上の量子コード内に量子ビットをグループ化し、これらのブロックに事前共有の絡み合わせを適用することにより、超越的な実装を実現する方法を示す。提案するエンコーダは、マルチキュービットゲートを用いてエラー伝搬を低減し、絡み付き無支援および絡み付き量子LDPC符号の両方に適用できる。 We propose fault-tolerant encoders for quantum low-density parity check (LDPC) codes. By grouping qubits within a quantum code over contiguous blocks and applying preshared entanglement across these blocks, we show how transversal implementation can be realized. The proposed encoder reduces the error propagation while using multi-qubit gates and is applicable for both entanglement-unassisted and entanglement-assisted quantum LDPC codes.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# ハイブリッドコールグラフベースの呼び出しメトリックを用いたJavaScriptプログラムにおけるバグ予測の強化 Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics ( http://arxiv.org/abs/2405.07244v1 ) ライセンス: Link先を確認	Gábor Antal, Zoltán Tóth, Péter Hegedűs, Rudolf Ferenc,	(参考訳) バグ予測は、欠陥を含む可能性のあるソフトウェアシステム内のソースコード要素を見つけることを目的としている。プログラムの最もエラーを起こしやすい部分を認識して、限られた量のテストとコードレビューリソースを効率的に割り当てることができる。したがって、バグ予測はソフトウェアの保守と進化をかなり支援できる。本稿では,関数呼び出しの入出力数(HNII,HNOI)のハイブリッド(静的および動的)コード解析を用いた静的ソースコードメトリクスに基づく関数レベルのJavaScriptバグ予測モデルを提案する。これに対する私たちのモチベーションは、JavaScriptが静的コード解析が非常に不正確であるかもしれない非常に動的なスクリプト言語であることです。 ESLint JavaScriptプロジェクトで公開されているBugsJSデータセットから824のバグギーと1943の非バグ関数を抽出した結果から、MLモデルの予測性能に対するハイブリッドコードメトリクスの肯定的な影響を確認することができる。 MLアルゴリズム、適用されたハイパーパラメータ、および我々が考慮する目標尺度により、ハイブリッド呼び出しメトリクスはモデル性能(精度、リコール、F測定)を2-10%向上させる。興味深いことに、静的NOIとNIIメトリクスをハイブリッドなHNOIとHNIIに置き換えることで、モデルのパフォーマンスが向上する。 Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2-10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.	翻訳日:2024-05-14 17:47:28 公開日:2024-05-12
# 生態・空間構造・選択圧力は系統構造に強いシグナルをもたらす Ecology, Spatial Structure, and Selection Pressure Induce Strong Signatures in Phylogenetic Structure ( http://arxiv.org/abs/2405.07245v1 ) ライセンス: Link先を確認	Matthew Andres Moreno, Santiago Rodriguez-Papa, Emily Dolson,	(参考訳) 進化力学は、空間構造、生態学、選択圧力を含む、様々な基本的で汎用的なドライバによって形成される。これらのドライバは進化の軌跡に影響を与え、系統構造に影響を与えると仮定されている。そこで我々は,(1) 空間構造, 生態学, 選択圧力が系統構造に検出可能なシグネチャを残しているかどうか, (2) 空間構造の存在下で生態が検出・識別できる程度, (3) 進化系全体にわたってこれらのシグネチャが一般化する程度について検討した。そこで我々は, 空間構造, 生態, 選択圧の操作によって発生する系統を, 様々な範囲と高度の3種類の計算モデルで解析する。選択圧力,空間構造,生態は系統学的指標に特徴的な影響を与えるが,これらの影響は複雑で直感的とは限らない。シグナチャは、等価な分類単位の定義(例えば、個人、遺伝子型、種)を使用するとき、システム間で一定の一貫性を持つ。さらに,空間構造の存在下では,十分に強い生態学が検出できることがわかった。また、低分解能の系統的再構成はいくつかの系統学的指標に偏りがあるが、高分解能の再構成はそれらを忠実に再カプセル化する。本研究は, 系統解析による空間構造, 生態, 選択圧の進化的推測の可能性を示すものであるが, 両者の系統的特徴を識別し, 系統的指標を適切に正規化するためには, さらなる手法の開発が必要である。このような研究により、系統解析は大規模に進化する個体群を研究するための汎用的なツールキットを提供することができる。 Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) the extent, in particular, to which ecology can be detected and discerned in the presence of spatial structure, and (3) the extent to which these phylogenetic signatures generalize across evolutionary systems. To this end, we analyze phylogenies generated by manipulating spatial structure, ecology, and selection pressure within three computational models of varied scope and sophistication. We find that selection pressure, spatial structure, and ecology have characteristic effects on phylogenetic metrics, although these effects are complex and not always intuitive. Signatures have some consistency across systems when using equivalent taxonomic unit definitions (e.g., individual, genotype, species). Further, we find that sufficiently strong ecology can be detected in the presence of spatial structure. We also find that, while low-resolution phylogenetic reconstructions can bias some phylogenetic metrics, high-resolution reconstructions recapitulate them faithfully. Although our results suggest potential for evolutionary inference of spatial structure, ecology, and selection pressure through phylogenetic analysis, further methods development is needed to distinguish these drivers' phylometric signatures from each other and to appropriately normalize phylogenetic metrics. With such work, phylogenetic analysis could provide a versatile toolkit to study large-scale evolving populations.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 連続可変量子プロセスのためのZXグラフ計算 ZX Graphical Calculus for Continuous-Variable Quantum Processes ( http://arxiv.org/abs/2405.07246v1 ) ライセンス: Link先を確認	Hironari Nagayoshi, Warit Asavanant, Ryuhoh Ide, Kosuke Fukui, Atsushi Sakaguchi, Jun-ichi Yoshikawa, Nicolas C. Menicucci, Akira Furusawa,	(参考訳) 連続可変(CV)量子情報処理は大規模フォールトトレラント量子計算の候補となる。しかし、CV量子過程の解析は、主にハイゼンベルク図における作用素の進化の直接計算に依存しており、CV空間の特徴は直感的に研究されていない。 CV量子コンピューティングのさらなる探索の鍵となる要素は、視覚的直観と分析のための新しいツールをもたらす計算モデルの構築である。本稿では、任意のCV量子過程を単純な有向グラフとして表現できるZX~calculusと呼ばれる量子ビット系の類似モデルに着想を得たグラフィカル・コンピューティング・モデルについて検討する。本稿では,2つの異なる量子プロセス間の等価性が,ある場合において図形変換のシーケンスとしてどのように証明できるかを示すことによって,直感的にCVプロセスを理解するためのグラフィカルツールとしての我々のモデルの有用性を実証する。また、計測に基づく量子コンピューティング、ガウスおよび非ガウス過程のキャラクタリゼーション、回路最適化などのモデルの適用可能性についても検討する。 Continuous-variable (CV) quantum information processing is a promising candidate for large-scale fault-tolerant quantum computation. However, analysis of CV quantum process relies mostly on direct computation of the evolution of operators in the Heisenberg picture, and the features of CV space has yet to be thoroughly investigated in an intuitive manner. One key ingredient for further exploration of CV quantum computing is the construction of a computational model that brings visual intuition and new tools for analysis. In this paper, we delve into a graphical computational model, inspired by a similar model for qubit-based systems called the ZX~calculus, that enables the representation of arbitrary CV quantum process as a simple directed graph. We demonstrate the utility of our model as a graphical tool to comprehend CV processes intuitively by showing how equivalences between two distinct quantum processes can be proven as a sequence of diagrammatic transformations in certain cases. We also examine possible applications of our model, such as measurement-based quantum computing, characterization of Gaussian and non-Gaussian processes, and circuit optimization.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# ヒト心理行動のシミュレーションにおけるLDMの限られた能力:心理学的分析 Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis ( http://arxiv.org/abs/2405.07248v1 ) ライセンス: Link先を確認	Nikolay B Petrov, Gregory Serapio-García, Jason Rentfrow,	(参考訳) 大規模言語モデル(LLM)の人間的な反応は、社会科学者に、実験、世論調査、調査において人間の参加者をシミュレートするためにLLMを使用できるかどうかを調査するよう促している。この研究の中心的な関心は、標準化されたアンケートに答えるよう促すことによって、LCMの心理的プロファイルをマッピングすることである。この研究の矛盾する発見は、LCMのテキスト応答から質問への基礎的、あるいは潜伏的な特徴のマッピングは容易な作業ではないことを考えると、驚くにあたらない。これを解決するために、心理測定学(サイコメトリックス)を用いる。本研究では,OpenAI のフラッグシップモデルである GPT-3.5 と GPT-4 に対して,異なるペルソナを仮定し,パーソナ構成の標準化された範囲に対応するよう促す。我々は、ジェネリック(4人か5人のランダムな人格記述)と特定の(主に大規模な人間のデータセットの実際の人間の人口統計)の2種類のペルソナ記述を使用しました。 GPT-4の反応は, GPT-3.5ではなく, GPT-3.5ではなく, GPT-3.5の反応は, 完全ではなく, 人間の規範と類似した, 有望な心理指標特性を示すが, 特定の人口統計学的プロファイルを用いた場合, 両者のデータは, 心理指標特性が劣ることを示している。現在、LLMがシリコンペルソナをシミュレートするよう求められている場合、それらの応答は潜在的な潜在特性の弱い信号である、と結論付けている。したがって、本研究は、複数の質問応答タスクにまたがる個人レベルの人間の振る舞いをシミュレートするLLMの能力に疑問を投げかけている。 The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# ミスセグメンテーション設定下でのユニバーサルバッチ学習 Universal Batch Learning Under The Misspecification Setting ( http://arxiv.org/abs/2405.07252v1 ) ライセンス: Link先を確認	Shlomi Vituri, Meir Feder,	(参考訳) 本稿では,ログロスを伴う不特定設定における普遍的バッチ学習の問題点について考察する。この設定では、仮説クラスはモデルの集合 $\Theta$ である。しかし、データは、この集合に属さないが、より大きなモデルの集合である$\Phi \supset \Theta$から生成される未知の分布によって生成される。トレーニングサンプルが与えられた場合、ユニバーサル学習者が次の結果の確率分布を予測するように要求され、ログロスが発生する。ユニバーサルラーナーのパフォーマンスは、$\Theta$から選択されたデータにマッチする最良の仮説に対する後悔によって測定される。ミニマックス定理と情報理論ツールを用いて、データ生成分布の集合上の混合である最適普遍学習者を導出し、min-max後悔の閉形式式を得る。我々は,この後悔を,データとその生成分布の条件付き容量の制約版と考えることができることを示す。この問題の複雑さは仮説モデルの豊かさによって支配され、データ生成分布セットの$\Phi$には支配されないことを暗示する。本研究では,有本・ブラフトアルゴリズムを拡張して,先行分布における後悔と能力の数値評価を行う。仮定クラス $\Theta$ はこの分布の族の部分集合に過ぎず、観測が $K$-parameters の多重項分布から来る場合の結果を実証する。 In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 半教師型医用画像分割のための固定・動的擬似ラベルの活用 Leveraging Fixed and Dynamic Pseudo-labels for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2405.07256v1 ) ライセンス: Link先を確認	Suruchi Kumari, Pravendra Singh,	(参考訳) 半監督型医用画像セグメンテーションは、注釈のないデータを利用する能力によって、関心が高まりつつある。現在の最先端の手法は、主にコトレーニングフレームワーク内の擬似ラベルに依存している。これらの手法はトレーニングのために1つの擬似ラベルに依存するが、ラベル付きデータの基本真実ほど正確ではない。 1つの擬似ラベルのみを頼りにすると、しばしば準最適結果をもたらす。そこで本研究では,従来の固定擬似ラベルと,新たに導入された動的擬似ラベルという,同一の無注釈画像のための複数の擬似ラベルを用いて,未表示データから学習する手法を提案する。同一の未注釈画像に対して複数の擬似ラベルをコトレーニングフレームワークに組み込むことで、モデル性能と一般化機能を改善するためのより堅牢なトレーニングアプローチを提供する。我々は,半教師付き医療ベンチマークセグメンテーションデータセット,左アトリウムデータセット,パンクレアCTデータセット,ブラッツ2019データセットの3つの新しいアプローチを検証する。提案手法は, ラベル付きデータ比の異なる複数のベンチマークセグメンテーションデータセットに対して, 最先端の手法を著しく上回っている。また, 本手法における各種成分の有効性を示すために, いくつかのアブレーション実験を行った。 Semi-supervised medical image segmentation has gained growing interest due to its ability to utilize unannotated data. The current state-of-the-art methods mostly rely on pseudo-labeling within a co-training framework. These methods depend on a single pseudo-label for training, but these labels are not as accurate as the ground truth of labeled data. Relying solely on one pseudo-label often results in suboptimal results. To this end, we propose a novel approach where multiple pseudo-labels for the same unannotated image are used to learn from the unlabeled data: the conventional fixed pseudo-label and the newly introduced dynamic pseudo-label. By incorporating multiple pseudo-labels for the same unannotated image into the co-training framework, our approach provides a more robust training approach that improves model performance and generalization capabilities. We validate our novel approach on three semi-supervised medical benchmark segmentation datasets, the Left Atrium dataset, the Pancreas-CT dataset, and the Brats-2019 dataset. Our approach significantly outperforms state-of-the-art methods over multiple medical benchmark segmentation datasets with different labeled data ratios. We also present several ablation experiments to demonstrate the effectiveness of various components used in our approach.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 聞き, 遠方, 制御:制御可能な音声駆動音声ヘッド生成 Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation ( http://arxiv.org/abs/2405.07257v1 ) ライセンス: Link先を確認	Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu,	(参考訳) 発話顔生成に関する最も初期の研究は、唇の動きと音声内容の同期に焦点を当てている。しかしながら、人間の頭部のポーズと顔の感情は、自然の人間の顔の同様に重要な特徴である。音声による発話顔生成は顕著な進歩を見せているが、既存の方法は顔の感情を見落としているか、特定の個人に限られており、任意の対象に適用できない。本稿では、感情的・姿勢的制御を可能にして、一般のトーキング・フェイス・ジェネレーションと区別するワンショットトーキング・ヘッド・ジェネレーション・フレームワーク(SPEAK)を提案する。具体的には、人間の顔の特徴を3つの潜在空間に分離するIRFD(Inter-Reconstructed Feature Disentanglement)手法を提案する。次に、音声コンテンツと顔の潜時符号を1つの潜時空間に修正する顔編集モジュールを設計する。次に、編集モジュールから派生した修正潜在コードを用いて、表情の合成における感情表現、頭部ポーズ、音声内容の制御を行う新しい生成器を提案する。本手法は, 唇の動き, 顔の表情, スムーズな頭部の動きを調整して, リアルな話し声を生成できることを, 広範囲にわたる試行錯誤により実証した。デモビデオは匿名リンクで公開されている。 https://anonymous.4open.science/r/SPEAK-F56E Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce the Inter-Reconstructed Feature Disentanglement (IRFD) method to decouple human facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available at the anonymous link: https://anonymous.4open.science/r/SPEAK-F56E	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 適応型シンドローム同定を用いた記憶補正量子リピータ Memory-corrected quantum repeaters with adaptive syndrome identification ( http://arxiv.org/abs/2405.07258v1 ) ライセンス: Link先を確認	Alena Romanova, Peter van Loock,	(参考訳) 符号化された量子メモリを、小型・中規模量子リピータの秘密鍵レート分析に組み込むという課題に対処する。この目的のために、チェック行列モデルを導入し、最大11キュービットの安定化符号のパウリ雑音に対するレジリエンスを定量化し、効率的な論理的誤差確率に関する解析式を得る。一般に、5量子ビット符号とステアン符号は、実験的に関連するパラメータ規則においてより複雑で大きな符号より優れているか、リソースオーバーヘッドが低いかが分かる。その後、メモリ量子ビット上の5量子ビットまたはステアン符号を使用する場合、メモリ補正量子リピータにおける漸近秘密鍵レートの低い境界を計算するために、本結果を適用した。 5量子ビット符号は、有効メモリコヒーレンス時間を劇的に増加させ、量子ノイズチャネルに適応したエラーシンドローム識別を使用する場合、位相フリップ確率を1\%$から0.001\%$に下げる。さらに、不良ベル状態測定と不完全な状態準備の影響を軽減し、非ゼロシークレットキーレートの最小限の非偏極パラメータを9,8.4\%から9,6.4\%に下げる。その結果、メモリ補正された量子リピータは、未符号化のリピータが秘密鍵を生成できない実験パラメータ方式で秘密鍵をしばしば生成できる。 8セグメントのリピータでは、メモリコヒーレンス時間$t_c = 10$ s以下の距離まで、多重化を使って、非消滅秘密鍵レートを2000kmまで達成することができる。ゼロ距離リンク結合効率$p_0 = 0.7$, 脱分極パラメータ$\mu = 0.99$, $t_c = 10$ s, 800 kmの総リピータ長を仮定すると、秘密鍵レートは4.85Hzとなり、1.25Hzの未符号化リピータと1.71Hzの理想的ツインフィールド量子鍵分布の両方をGHzクロックレートで打ち破る。 We address the challenge of incorporating encoded quantum memories into an exact secret key rate analysis for small and intermediate-scale quantum repeaters. To this end, we introduce the check matrix model and quantify the resilience of stabilizer codes of up to eleven qubits against Pauli noise, obtaining analytical expressions for effective logical error probabilities. Generally, we find that the five-qubit and Steane codes either outperform more complex, larger codes in the experimentally relevant parameter regimes or have a lower resource overhead. Subsequently, we apply our results to calculate lower bounds on the asymptotic secret key rate in memory-corrected quantum repeaters when using the five-qubit or Steane codes on the memory qubits. The five-qubit code drastically increases the effective memory coherence time, reducing a phase flip probability of $1\%$ to $0.001\%$ when employing an error syndrome identification adapted to the quantum noise channel. Furthermore, it mitigates the impact of faulty Bell state measurements and imperfect state preparation, lowering the minimally required depolarization parameter for non-zero secret key rates in an eight-segment repeater from $98.4\%$ to $96.4\%$. As a result, the memory-corrected quantum repeater can often generate secret keys in experimental parameter regimes where the unencoded repeater fails to produce a secret key. In an eight-segment repeater, one can even achieve non-vanishing secret key rates up to distances of 2000 km for memory coherence times of $t_c = 10$ s or less using multiplexing. Assuming a zero-distance link-coupling efficiency $p_0 = 0.7$, a depolarization parameter $\mu = 0.99$, $t_c = 10$ s, and an 800 km total repeater length, we obtain a secret key rate of 4.85 Hz, beating both the unencoded repeater that provides 1.25 Hz and ideal twin-field quantum key distribution with 0.71 Hz at GHz clock rates.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 脳波を用いた感情認識のためのマルチグラニュラリティコントラスト学習フレームワークの改訂 A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition ( http://arxiv.org/abs/2405.07260v1 ) ライセンス: Link先を確認	Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu,	(参考訳) 本研究では,脳波に基づく感情認識(SICLEER)のための新しい情報強調型コントラスト学習フレームワークを提案する。 SI-CLEERは、マルチグラニュラリティコントラスト学習を用いて、堅牢なEEGコンテキスト表現を作成し、感情認識の有効性を向上させる可能性がある。分類損失のみによって導かれる既存の方法とは異なり、自己教師付きコントラスト学習損失と教師付き分類損失を組み合わせた共同学習モデルを提案する。このモデルは両方の損失関数を最適化し、感情検出に特有の微妙な脳波信号の差を捉える。 SI-CLEERの頑健さとSEEDデータセットの精度を最先端の手法と比較した大規模な実験を行った。さらに、感情検出における中心前頭葉と側頭葉の脳波の意義を強調し、電極性能を解析した。本研究は、多種多様な脳波分類タスクに対する潜在的な利点を持つ普遍的なアプローチを提供する。 This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 効果的なフレーズマイニングのためのSpan-Aggregatable, Contextualized Word Embeddings Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining ( http://arxiv.org/abs/2405.07263v1 ) ライセンス: Link先を確認	Eyal Orbach, Lev Haikin, Nelly David, Avi Faizakof,	(参考訳) 近年, 文の複雑なベクトル表現は, 文類似性タスクで見られるように, 顕著な進歩を遂げている。一方、実世界のフレーズ検索アプリケーションは、高密度表現を効果的に活用するための課題に直面している。目的語句が1つの重み付きベクトルで全文を表す雑音のある文脈内に存在する場合,有効な句検索には不十分であることを示す。そこで我々は、複数のサブ文、連続する単語をそれぞれ自作の高密度ベクトルで表すという概念を考察する。本稿では,この手法がフレーズマイニングに有用であるが,有用なスパン表現を得るためには,かなりの計算が必要であることを示す。そこで, 任意の単語スパンに対して, スパンの意味を保ちながら, 任意の単語スパンに対して集約可能な文脈型単語/トークン埋め込みを議論する。文の埋め込みに使用される一般的なコントラスト損失の修正を導入し、単語の埋め込みにこの特性を付与する。本手法の有効性を示すために, STS-Bデータセットに基づくデータセットに生成したテキストを付加し, より広い文脈で最もよく一致するパラフレーズを検索し, 原語句と類似性の度合いを報告する。本稿では,提案手法が計算量を大幅に増加させることなく,より優れた結果が得られることを示す。 Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a modification to the common contrastive loss used for sentence embeddings that encourages word embeddings to have this property. To demonstrate the effect of this method we present a dataset based on the STS-B dataset with additional generated text, that requires finding the best matching paraphrase residing in a larger context and report the degree of similarity to the origin phrase. We demonstrate on this dataset, how our proposed method can achieve better results without significant increase to compute.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# UAVネットワークにおける分散認証の一手法 An Approach for Decentralized Authentication in Networks of UAVs ( http://arxiv.org/abs/2405.07265v1 ) ライセンス: Link先を確認	Nicholas Jäger, Andreas Aßmuth,	(参考訳) 無人航空機のネットワークに対する分散型認証システムを提案する。ブロックチェーンベースの公開鍵インフラストラクチャは、公開鍵暗号と公開鍵ベースの認証プロトコルの使用を可能にする。ブロックチェーンは公開鍵とその関係の共通ストレージを提供し、認証プロセスに必要な情報を提供する。さらに、無人航空機は、インターネットにアクセスできない可能性のある地域で独立して運用するために、ブロックチェーンの選ばれた部分を格納する。これにより、無人航空機は、他の無人航空機、クラウドサービス、自動車、あらゆるコンピュータのように、ネットワークの実体を認証することができる。 We propose a decentralized authentication system for networks of unmanned aerial vehicles. A blockchain-based public key infrastructure allows the usage of public key cryptography and public key based authentication protocols. The blockchain provides a common storage of the public keys and their relations and can provide the required information for the authentication process. Furthermore, the unmanned aerial vehicles store selected parts of the blockchain in order to operate independently in areas where they might not have access to the Internet. This allows unmanned aerial vehicles to authenticate entities of the network, like other unmanned aerial vehicles, cloud services, cars, and any computer.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# MAML MOT:メタラーニングに基づく複数物体追跡 MAML MOT: Multiple Object Tracking based on Meta-Learning ( http://arxiv.org/abs/2405.07272v1 ) ライセンス: Link先を確認	Jiayi Chen, Chunhua Deng,	(参考訳) 映像解析技術の進歩に伴い、歩行者を含む複雑な場面における多目的追跡(MOT)問題の重要性が高まっている。この課題は主に、歩行者検出と再識別という2つの重要なタスクを含む。近年,歩行者検出タスクにおいて顕著な進歩がみられてきたが,再識別タスクの有効性の向上は引き続き課題である。この困難は、多目的追跡データセットにおける多数の歩行者サンプルと、個々のサンプルの不足から生じる。近年,メタ学習技術の急速な進歩により,メタ学習に基づくマルチオブジェクト追跡のトレーニング手法であるMAML MOTを導入する。このアプローチは,メタラーニングの迅速な学習能力を活用して,歩行者再識別作業におけるサンプル不足問題に対処し,モデルの一般化性能と堅牢性を向上させることを目的とする。実験の結果,提案手法はMOTチャレンジの主流データセットに対して高い精度を実現することが示された。これは、歩行者多目的追跡の分野の研究のための新しい視点と解決策を提供する。 With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 大規模言語モデルを用いた短文の人間解釈可能なクラスタリング Human-interpretable clustering of short-text using large language models ( http://arxiv.org/abs/2405.07278v1 ) ライセンス: Link先を確認	Justin K. Miller, Tristram J. Alexander,	(参考訳) 大規模な言語モデルは、人間のライクなコンテンツ生成能力によって、非常に人気が高まっている。これらのモデルは人間の生成したコンテンツをクラスタリングするのにも有効であり、その成功は識別性と解釈可能性の尺度によって定義される。この成功は、人間レビュアーとChatGPTによって検証され、短文クラスタリングに挑戦する‘バリデーションギャップ’を閉じるための自動化された手段を提供する。機械と人間のアプローチを比較して、それぞれに固有のバイアスを特定し、人間のコーディングへの依存を「金の標準」として疑問視する。提案手法をTwitterのバイオスに適用し,従来の専門的な研究とよく一致しているが,アイデンティティを表現するために使用される媒体の特色は興味深い。 Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# 風力力学:マルチステップ推論による風力発生の促進 Humor Mechanics: Advancing Humor Generation with Multistep Reasoning ( http://arxiv.org/abs/2405.07280v1 ) ライセンス: Link先を確認	Alexey Tikhonov, Pavel Shtykovskiy,	(参考訳) 本稿では,多段階推論による一線ジョークの生成について検討する。我々の研究は、ユーモラスなワンライナーの作成プロセスの再構築と、ユーモラスな生成のための作業プロトタイプの開発であった。提案手法を,人間によるジョーク,ゼロショット GPT-4 生成ユーモア,その他のベースラインと比較した。評価は、ヒトのラベルをベンチマークとして、生成したユーモアの品質に焦点を当てた。以上の結果から,多段階推論手法は生成したユーモアの品質を継続的に改善することが示された。実験で使用したデータセットを提示し、AIによるユーモア生成の強化に関する洞察を提供する。 In this paper, we explore the generation of one-liner jokes through multi-step reasoning. Our work involved reconstructing the process behind creating humorous one-liners and developing a working prototype for humor generation. We conducted comprehensive experiments with human participants to evaluate our approach, comparing it with human-created jokes, zero-shot GPT-4 generated humor, and other baselines. The evaluation focused on the quality of humor produced, using human labeling as a benchmark. Our findings demonstrate that the multi-step reasoning approach consistently improves the quality of generated humor. We present the results and share the datasets used in our experiments, offering insights into enhancing humor generation with artificial intelligence.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# ブランチナラティブ:文字決定点検出 Branching Narratives: Character Decision Points Detection ( http://arxiv.org/abs/2405.07282v1 ) ライセンス: Link先を確認	Alexey Tikhonov,	(参考訳) 本稿では,登場人物が物語の方向性に大きな影響を及ぼす可能性のある意思決定を行う物語内のポイントを識別するタスクであるキャラクタ決定点検出(CHADPOD)タスクを提案する。本稿では,CYOAライクなゲームグラフをベースとした新しいデータセットを提案する。本稿では,2つのLLMと複数のMLMをベースラインとし,最大89%の精度で異なるモデルの性能の比較分析を行う。このことは物語分析の複雑さを浮き彫りにし、キャラクター駆動型ストーリーダイナミクスの理解に関わる課題を示している。さらに、そのようなモデルを既存のテキストに適用して、潜在的な分岐点によって分割された線形セグメントを生成する方法を示し、物語分析における我々の発見の実践的応用を実証する。 This paper presents the Character Decision Points Detection (CHADPOD) task, a task of identification of points within narratives where characters make decisions that may significantly influence the story's direction. We propose a novel dataset based on CYOA-like games graphs to be used as a benchmark for such a task. We provide a comparative analysis of different models' performance on this task, including a couple of LLMs and several MLMs as baselines, achieving up to 89% accuracy. This underscores the complexity of narrative analysis, showing the challenges associated with understanding character-driven story dynamics. Additionally, we show how such a model can be applied to the existing text to produce linear segments divided by potential branching points, demonstrating the practical application of our findings in narrative analysis.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# BeautyMap:グローバルマップにおける動的点除去のためのバイナリエンコード適応グラウンドマトリックス BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps ( http://arxiv.org/abs/2405.07283v1 ) ライセンス: Link先を確認	Mingkai Jia, Qingwen Zhang, Bowen Yang, Jin Wu, Ming Liu, Patric Jensfelt,	(参考訳) 静的環境機能を正しく表現するグローバルポイントクラウドは、正確なローカライゼーションと堅牢なパス計画を容易にする。しかし、動的オブジェクトは、静的環境と混ざった望ましくないゴーストトラックを導入します。既存の動的除去法は通常、計算効率と精度のバランスをとるのに失敗する。これに対して,高忠実度グローバルマップの静的な特徴を維持しつつ,動的点を効率的に除去するBeautyMapを提案する。本手法では, 環境特徴を効率的に抽出するために, バイナリ符号化行列を用いる。各フレームの行列と対応するマップ領域をビット単位で比較することにより、ポテンシャル動的領域を抽出できる。次に、粗さを用いて、地形変動を扱うために、$z$-軸の階層的セグメンテーションを微調整する。最終的な静的復元モジュールは、各スキャンのレンジ可視性を考慮し、視界外の静的ポイントを保護する。比較実験は、他の動的点除去法と比較して、精度と効率の両面で、BeautyMapの優れた性能を示している。コードはhttps://github.com/MKJia/BeautyMapで公開されている。 Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the $z$-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/MKJia/BeautyMap.	翻訳日:2024-05-14 17:30:59 公開日:2024-05-12
# SLIP(SAM+CLIP)を用いたゼロショットコンテキストベースオブジェクトセグメンテーション Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP) ( http://arxiv.org/abs/2405.07284v1 ) ライセンス: Link先を確認	Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal,	(参考訳) ゼロショットオブジェクトセグメンテーションのための拡張アーキテクチャであるSLIP(SAM+CLIP)を提案する。 SLIPはSegment Anything Model (SAM) \cite{kirillov2023segment}とContrastive Language- Image Pretraining (CLIP) \cite{radford2021learning}を組み合わせたものである。 CLIPを使ってSAMにテキストプロンプトを組み込むことで、SLIPは特定のクラスやカテゴリの事前トレーニングなしにオブジェクトセグメンテーションを可能にする。 Pokemonデータセット上でCLIPを微調整し、意味のある画像テキスト表現を学習できるようにします。 SLIPは、テキストプロンプトからコンテキスト情報に基づいて画像中のオブジェクトを認識およびセグメント化できることを示し、多目的オブジェクトセグメンテーションのためのSAMの機能を拡張する。本実験は,テキストによる画像のセグメント化におけるSLIPアーキテクチャの有効性を実証するものである。 CLIPのテキストイメージ理解機能をSAMに統合することで、元のアーキテクチャの機能を拡張し、より汎用的でコンテキスト対応のオブジェクトセグメンテーションを可能にする。 We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# わずかな未学習によるテキスト・画像拡散モデルからの概念の消去 Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning ( http://arxiv.org/abs/2405.07288v1 ) ライセンス: Link先を確認	Masane Fuchi, Tomohiro Takagi,	(参考訳) テキストから画像を生成することは、拡散モデルのスケーリングと視覚・言語分野の進歩により容易になっている。これらのモデルは、インターネットから大量のデータを使って訓練されている。したがって、著作権のある資料のような望ましくない内容もしばしば含んでいる。このようなデータを取り除き、モデルを再訓練することは難しいため、事前訓練されたモデルから特定の概念を消去する方法が研究されている。本稿では,テキストエンコーダを数発のアンラーニングで更新するコンセプト・エミッション手法を提案する。概念の消去後の生成画像に関する議論は欠落している。概念の移行先を特定する方法はあるが,その妥当性は明らかではない。提案手法は,モデルや画像に固有の潜在概念に遷移することで,暗黙的にこれを実現する。提案手法は10秒以内に概念を消去し,概念の消去をこれまで以上に容易に行えるようにする。暗黙的に関連する概念に移行することは、より自然な概念の消去につながる。提案手法を様々な概念に適用し, 提案手法の数十倍から数百倍の速度で実現可能であることを確認した。更新すべきパラメータを変化させることで、従来の研究と同様に、知識が主にテキストエンコーダのフィードフォワードネットワークに蓄積されていることを示唆する結果を得た。 Generating images from text has become easier because of the scaling of diffusion models and advancements in the field of vision and language. These models are trained using vast amounts of data from the Internet. Hence, they often contain undesirable content such as copyrighted material. As it is challenging to remove such data and retrain the models, methods for erasing specific concepts from pre-trained models have been investigated. We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning in which a few real images are used. The discussion regarding the generated images after erasing a concept has been lacking. While there are methods for specifying the transition destination for concepts, the validity of the specified concepts is unclear. Our method implicitly achieves this by transitioning to the latent concepts inherent in the model or the images. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before. Implicitly transitioning to related concepts leads to more natural concept erasure. We applied the proposed method to various concepts and confirmed that concept erasure can be achieved tens to hundreds of times faster than with current methods. By varying the parameters to be updated, we obtained results suggesting that, like previous research, knowledge is primarily accumulated in the feed-forward networks of the text encoder.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 量子力学の非線形拡張の幾何学的解釈 Geometric Interpretation of a nonlinear extension of Quantum Mechanics ( http://arxiv.org/abs/2405.07289v1 ) ライセンス: Link先を確認	Alan Chodos, Fred Cooper,	(参考訳) 我々は最近、通常の線形量子力学問題のハミルトニアンの固有値と固有関数の観点から正確に解ける性質を持つ特定の非線形量子力学の一般化を導入した。本稿では,波動関数の2つの成分が時空の2つの異なる漸近領域におけるハミルトニアンHによって記述された系を表すことを示唆し,非線型項が重力効果をもたらすと考えられることを示す。 We recently introduced a particular nonlinear generalization of quantum mechanics which has the property that it is exactly solvable in terms of the eigenvalues and eigenfunctions of the Hamiltonian of the usual linear quantum mechanics problem. In this paper we suggest that the two components of the wave function represent the system described by the Hamiltonian H in two different asymptotic regions of spacetime and we show that the non-linear terms can be viewed as giving rise to gravitational effects.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# CCTV動画のスパルスサンプリングと高速サイクル検出 Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV Videos ( http://arxiv.org/abs/2405.07293v1 ) ライセンス: Link先を確認	Jing Xu, Wentao Shi, Sheng Ren, Pan Gao, Peng Zhou, Jie Qin,	(参考訳) 輸送の分野では、モーターと非モーターの両方が行う違法行為に対処し、緩和することが最重要である。これらの行動の中で、自転車や電動自転車を指定した交通の流れの反対方向に走ることが、自転車と他の道路利用者の両方に重大なリスクをもたらす。そこで本論文では,CCTVビデオにおける不正なサイクル比を検出する問題を定式化する。具体的には,WWC予測器(WWC-Predictor)と呼ばれるスパースサンプリング手法を提案する。本手法では,境界ボックスからの情報を利用する検出ベース情報と,画像自体に対する洞察を提供する方位ベース情報の両方を活用して,瞬時情報取得能力を向上させる。提案手法は,35分間のビデオシーケンスと微小レベルのアノテーションからなるベンチマークデータセットを用いて,11.475%の平均誤差率を実現し,同じ検出モデルの下では,19.12%のGPU時間のみを要した。この顕著な性能は、不正なサイクリングの事例を特定し予測する上で、我々のアプローチの有効性を示している。 In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detecting wrong-way cycling ratios in CCTV videos. Specifically, we propose a sparse sampling method called WWC-Predictor to efficiently solve this problem, addressing the inefficiencies of direct tracking methods. Our approach leverages both detection-based information, which utilizes the information from bounding boxes, and orientation-based information, which provides insights into the image itself, to enhance instantaneous information capture capability. On our proposed benchmark dataset consisting of 35 minutes of video sequences and minute-level annotation, our method achieves an average error rate of a mere 1.475% while taking only 19.12% GPU time of straightforward tracking methods under the same detection model. This remarkable performance demonstrates the effectiveness of our approach in identifying and predicting instances of wrong-way cycling.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 環境富化 : 連続学習における前向き移動の生物学的モデル Environmental enrichment: a biological model of forward transfer in continual learning ( http://arxiv.org/abs/2405.07295v1 ) ライセンス: Link先を確認	Rajat Saxena, Bruce L. McNaughton,	(参考訳) 連続学習(きゅうがく、Continuous Learning, CL)とは、エージェントがデータの連続的ストリームから学習し、古い情報を忘れずに知識を伝達する能力である。 CLの重要な側面は、フォワード転送(フォワード転送)、すなわち、前の知識からの情報を活用することで、新しいタスクの改善と学習の高速化である。この能力は自然界の脳にもたらされますが、人工知能(AI)には大きな課題があります。ここでは,環境富化(EE)が,人間のようなAI開発を刺激する前向き移動の研究の生物学的モデルとして利用できることを示唆する。 EEは、認知、社会的、運動、感覚の刺激を高める動物研究であり、人間では「認知的予備」と呼ばれるモデルである。豊かになった動物は、新しいタスクにおける学習速度と性能を著しく改善し、通常、前方移動を示す。我々は、EE後の解剖学的、分子的、神経学的変化を探求し、人工知能ニューラルネットワーク(ANN)が、豊かな経験の後の神経計算の変化を予測するためにどのように使用できるかについて議論する。最後に、我々は神経科学とAI研究を組み合わせたシナジスティックな方法を提供し、迅速かつ効率的な新しいタスク学習が可能なAI開発への道を歩む。 Continual learning (CL) refers to an agent's capability to learn from a continuous stream of data and transfer knowledge without forgetting old information. One crucial aspect of CL is forward transfer, i.e., improved and faster learning on a new task by leveraging information from prior knowledge. While this ability comes naturally to biological brains, it poses a significant challenge for artificial intelligence (AI). Here, we suggest that environmental enrichment (EE) can be used as a biological model for studying forward transfer, inspiring human-like AI development. EE refers to animal studies that enhance cognitive, social, motor, and sensory stimulation and is a model for what, in humans, is referred to as 'cognitive reserve'. Enriched animals show significant improvement in learning speed and performance on new tasks, typically exhibiting forward transfer. We explore anatomical, molecular, and neuronal changes post-EE and discuss how artificial neural networks (ANNs) can be used to predict neural computation changes after enriched experiences. Finally, we provide a synergistic way of combining neuroscience and AI research that paves the path toward developing AI capable of rapid and efficient new task learning.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 編集可能なNeRFモデルに対する点再サンプリングと光変換 Point Resampling and Ray Transformation Aid to Editable NeRF Models ( http://arxiv.org/abs/2405.07306v1 ) ライセンス: Link先を確認	Zhenyang Li, Zilong Chen, Feifan Qu, Mingqing Wang, Yizhou Zhao, Kai Zhang, Yifan Peng,	(参考訳) NeRF支援編集作業では,物体位置の可変性の導入により,物体の動きが監視生成の困難を呈する。さらに、特定のシーンオブジェクトの除去操作は、しばしば空の領域につながり、それらを効果的に塗布する際のNeRFモデルの課題を提示する。我々は3次元物体の姿勢を直接操作できる暗黙の光線変換戦略を提案する。潜在的な空き領域を塗布するという課題に対処するため,DNRと呼ばれるプラグ・アンド・プレイの塗布モジュールが提案され,これらの領域を暗黙空間内の元の線源位置の3次元空間に補間することで,物体の除去とシーンの塗布作業が容易になる。重要なことに、DNRを用いることで、地上の真実と暗黙の特徴のギャップを効果的に狭め、光線をまたいだ特徴の相互情報(MI)を増大させる可能性がある。そして、DNRとレイ変換を利用して、点ベースの編集可能なNeRFパイプラインPR^2T-NeRFを構築する。主に3Dオブジェクトの除去および塗装タスクで評価した結果、パイプラインが最先端のパフォーマンスを達成することを示す。さらに、我々のパイプラインは、余分な監督を必要とせず、多様な編集操作のための高品質なレンダリング視覚化をサポートしています。 In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# DiffGen: 微分物理シミュレーション、微分レンダリング、ビジョンランゲージモデルによるロボットデモ生成 DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model ( http://arxiv.org/abs/2405.07309v1 ) ライセンス: Link先を確認	Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu,	(参考訳) シミュレーションによるロボットのデモンストレーションの生成は、ロボットデータのスケールアップに有効な方法として広く認識されている。従来の作業では、専門家のポリシーを生成するために強化学習エージェントを訓練することが多かったが、このアプローチにはサンプル効率が欠如している。近年,ロボットによる実演を微分可能シミュレーションで実現しようと試みている。これは有望だが,報酬設計に大きく依存している。本稿では,微分可能物理シミュレーションと微分可能レンダリングを統合した新しいフレームワークであるDiffGenと,自動かつ効率的なロボットデモ生成を実現するビジョン言語モデルを提案する。 DiffGenは、シミュレーションロボット操作シナリオと自然言語命令を前提として、言語命令の埋め込みと操作後のシミュレーション観察の埋め込みとの距離を最小化することにより、現実的なロボットデモを生成することができる。組込みは視覚言語モデルから得られ、微分可能シミュレーション、微分可能レンダリング、および視覚言語モデルコンポーネントを用いて勾配を計算・下降させることにより、特定のタスクを達成できる。実験によると、DiffGenを使えば、人間の努力やトレーニング時間を最小限に抑えて、ロボットデータを効率よく、効果的に生成できる。 Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward design, a labor-intensive process. In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations. Given a simulated robot manipulation scenario and a natural language instruction, DiffGen can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation after manipulation. The embeddings are obtained from the vision-language model, and the optimization is achieved by calculating and descending gradients through the differentiable simulation, differentiable rendering, and vision-language model components, thereby accomplishing the specified task. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 非パラメトリック制御-コオプマン演算子学習:予測と制御のための柔軟でスケーラブルなモデル Nonparametric Control-Koopman Operator Learning: Flexible and Scalable Models for Prediction and Control ( http://arxiv.org/abs/2405.07312v1 ) ライセンス: Link先を確認	Petar Bevanda, Bas Driessen, Lucian Cristian Iacob, Roland Toth, Stefan Sosnowski, Sandra Hirche,	(参考訳) クープマン作用素の線形性とそれらの推定器の単純さとモデル推論能力は、力学系を学習するアプリケーションにおいて大きな人気をもたらした。無限次元再生カーネルヒルベルト空間における非パラメトリッククープマン作用素の学習は、自律系においてよく理解されているが、制御系類似性はほとんど探索されていない。特に既存の手法では、表現的ヒューリスティックスや、限定的な表現性や拡張性を持つパラメトリックモデルを利用することが多いため、制御器の完全なデータ駆動学習には、制御入力によるシステムへの対処が不可欠である。本稿では,制御系においても単一演算子の直接推定が可能な制御アフィン再生カーネルによるユニバーサルフレームワークを提案することで,上記の課題に対処する。提案手法は制御・クープマン作用素回帰(英語版)(cKOR)と呼ばれ、自律の場合のクープマン作用素回帰(英語版)(Koopman operator regression)と完全に類似している。まず、制御入力次元の呪いに苦しむことのない非線形制御アフィン系のクープマン演算子表現を学習するための非パラメトリックフレームワークを提案する。これにより、有限次元空間における無限次元学習問題を、他のアプローチのように有限な関数や入力に制限されるため、アプリオリの精度を失うことなく、データのみに基づいて再構成することができる。大規模制御システムへの応用を実現するため,ランダムプロジェクション(スケッチング)を活用して制御クープマン演算子推定器のスケーラビリティを向上させる。予測タスクと制御タスクの両方において,新しいcKORアプローチの有効性を実証した。 Linearity of Koopman operators and simplicity of their estimators coupled with model-reduction capabilities has lead to their great popularity in applications for learning dynamical systems. While nonparametric Koopman operator learning in infinite-dimensional reproducing kernel Hilbert spaces is well understood for autonomous systems, its control system analogues are largely unexplored. Addressing systems with control inputs in a principled manner is crucial for fully data-driven learning of controllers, especially since existing approaches commonly resort to representational heuristics or parametric models of limited expressiveness and scalability. We address the aforementioned challenge by proposing a universal framework via control-affine reproducing kernels that enables direct estimation of a single operator even for control systems. The proposed approach, called control-Koopman operator regression (cKOR), is thus completely analogous to Koopman operator regression of the autonomous case. First in the literature, we present a nonparametric framework for learning Koopman operator representations of nonlinear control-affine systems that does not suffer from the curse of control input dimensionality. This allows for reformulating the infinite-dimensional learning problem in a finite-dimensional space based solely on data without apriori loss of precision due to a restriction to a finite span of functions or inputs as in other approaches. For enabling applications to large-scale control systems, we also enhance the scalability of control-Koopman operator estimators by leveraging random projections (sketching). The efficacy of our novel cKOR approach is demonstrated on both forecasting and control tasks.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# VALID: 可逆的存在を考慮した分散ネットワーク学習のための検証アルゴリズム VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence ( http://arxiv.org/abs/2405.07316v1 ) ライセンス: Link先を確認	Mayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka, Allison Beemer, Oliver Kosut, Joerg Kliewer,	(参考訳) 異種データと対向的浸透の可能性を持つ非方向性ネットワークに対して、検証された分散学習のパラダイムを導入する。必要です。 (a)敵がいないときの世界的経験損失最小化器に収束し、 b) 敵対的構成にかかわらず,許容的コンセンサスに対する敵的収束の有無を検出すること。この目的のために、我々は、我々の知る限り、検証された学習保証を最初に達成するVALIDプロトコルを提案する。さらに、VALIDはO(1/T)収束率(関連する正則性仮定の下で)と、非逆分散確率勾配勾配に匹敵する計算と通信の複雑さを提供する。注目すべきは、VALIDは、逆境のない環境での最適なパフォーマンス指標を保持し、以前のビザンチン・ロバスト法で観察された堅牢性ペナルティをサイドステッピングすることである。本研究の特筆すべき側面は、グローバルな経験損失最小化器で計算された個々のエージェントの勾配のノルムに基づく不均一度計量である。これは、重要なビザンチン破壊を検出するための自然な統計を提供するだけでなく、VALIDの最適性を幅広い一般性で証明することを可能にする。最後に, 敵がいない場合, VALIDは最先端のビザンチン頑健なアルゴリズムよりも高速に収束する一方で, 敵が存在する場合, VALIDは各誠実さで終了するか, あるいはネットワーク内の敵の存在を宣言する許容的コンセンサスに収束する。 We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestepping the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# コントラスト学習における機械学習 Machine Unlearning in Contrastive Learning ( http://arxiv.org/abs/2405.07317v1 ) ライセンス: Link先を確認	Zixin Wang, Kongyang Chen,	(参考訳) 機械学習は、モデルの精度を最小限に保ちながら、トレーニングデータの影響を減少させるために必要な複雑なプロセスである。近年の機械学習に関する多くの研究にもかかわらず、その大半は教師付き学習モデルに重点を置いており、対照的な学習モデルの研究は比較的過小評価されている。自己教師型学習は,教師型学習よりも有望な可能性を秘めており,教師型学習に匹敵するものであるという信念から,コントラスト型学習モデルを中心とした機械学習の手法を探究した。本研究では,機械学習を効果的に実現するために,モデルトレーニングのための勾配制約に基づく新しいアプローチを提案する。本手法では,学習対象データの最小限の学習エポック数と,学習対象データの同定しか必要としない。また,本手法は,コントラスト学習モデルだけでなく,教師付き学習モデルにも有能な性能を示し,その汎用性と適応性を様々な学習パラダイムで示している。 Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# LayGA: Animatable Clothing Transferのための層状ガウスアバター LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer ( http://arxiv.org/abs/2405.07319v1 ) ライセンス: Link先を確認	Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu,	(参考訳) キャラクター間の衣料品の着替えやアニメ化をめざすアニマタブルな衣料転送は難しい問題である。ほとんどの人間のアバターは、人間の体と衣服の表現をくっつけることで、仮想的な試行錯誤の難しさを招いている。さらに悪いことに、絡み合った表現は通常、衣服の滑りの動きを正確に追跡することができないのです。この制限を克服するために、我々はLayGA(Layered Gaussian Avatars)という新しい表現を紹介した。我々の表現は、ガウスの地図に基づくアバターの上に構築され、衣服の詳細の表現力に優れています。しかし、ガウス写像は実際の曲面の周りに分布する非構造的な3次元ガウス写像を生成する。スムーズな表面がないことは、正確な衣服追跡と体と衣服の衝突処理の課題を提起する。そこで本研究では,単層再構築と多層フィッティングを含む2段階のトレーニングを提案する。単層リコンストラクション段階において,スムーズな表面を再構築し,同時に体と衣服のセグメンテーションを得るための一連の幾何的制約を提案する。次に,多層フィッティング段階では,体と衣服を表すために2つの異なるモデルを訓練し,再構築された衣服のジオメトリーを3次元監視として利用し,より正確な衣服追跡を行う。さらに,高品質な幾何再構成と高忠実なレンダリングのための幾何層とレンダリング層を提案する。全体として、提案したLayGAは、フォトリアリスティックなアニメーションと仮想トライオンを実現し、他のベースライン手法よりも優れている。私たちのプロジェクトページはhttps://jsnln.github.io/layga/index.htmlです。 Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# L(u)PIN:LLMによる政治イデオロギー放送 L(u)PIN: LLM-based Political Ideology Nowcasting ( http://arxiv.org/abs/2405.07320v1 ) ライセンス: Link先を確認	Ken Kato, Annabelle Purnomo, Christopher Cochrane, Raeid Saqur,	(参考訳) 政治的イデオロギー的立場の定量的分析は難しい課題である。過去には、政治家、政党宣言、議会演説の議決データに焦点が当てられ、様々な政治制度における政治的不一致と分極を推定した。しかし、従来の定量的政治的分析手法は、分析に利用可能なデータの量という共通の課題に悩まされていた。以前の手法では、議会全体の分極や政党全体の政治イデオロギー的立場など、より一般的な政治分析に重点を置いていた。本稿では,LLMの潜在知識を活用して,各議員のイデオロギー的立場を分析する手法を提案する。この方法により、選択の軸として政治家のスタンスを評価することができ、選択の話題・論争に関して政治家のスタンスを柔軟に測定することができる。提案手法は,一対の参照シードに対して,各代表に対する平均BERT埋め込みを投影し,各代表者の音声から意見に基づく文章を抽出するために,微調整のBERT分類器を用いて実現する。これらの参照シードは、特定のトピックに対する反対の見解を持つことが知られている手動で選択された代表者か、OpenAIのGPT-4モデルを用いて生成された文である。我々は、GPT-4モデルに特定の立場を擁護する政治家から発せられるスピーチを生成するよう促すことで、文を作成した。 The quantitative analysis of political ideological positions is a difficult task. In the past, various literature focused on parliamentary voting data of politicians, party manifestos and parliamentary speech to estimate political disagreement and polarization in various political systems. However previous methods of quantitative political analysis suffered from a common challenge which was the amount of data available for analysis. Also previous methods frequently focused on a more general analysis of politics such as overall polarization of the parliament or party-wide political ideological positions. In this paper, we present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs. The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice. We achieve this by using a fine-tuned BERT classifier to extract the opinion-based sentences from the speeches of representatives and projecting the average BERT embeddings for each representative on a pair of reference seeds. These reference seeds are either manually chosen representatives known to have opposing views on a particular topic or they are generated sentences which where created using the GPT-4 model of OpenAI. We created the sentences by prompting the GPT-4 model to generate a speech that would come from a politician defending a particular position.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 米議会演説の計算分析、証拠から直感へのシフトを明らかに Computational analysis of US Congressional speeches reveals a shift from evidence to intuition ( http://arxiv.org/abs/2405.07323v1 ) ライセンス: Link先を確認	Segun Taofeek Aroyehun, Almog Simchon, Fabio Carrella, Jana Lasser, Stephan Lewandowsky, David Garcia,	(参考訳) 誠実で誠実な意思決定を求めることは、民主主義における統治と説明責任にとって不可欠である。しかし、正直であることの意味と、誠実さを追求する方法について、人々は異なる視点を取ることがある。ここでは、確証可能な事実とデータに根ざしたエビデンスに基づく推論から、感情や主観的な解釈によって引き起こされる直感的な決定まで、視点の連続性を探る。我々は1879年から2022年までの議会演説において、対照的な視点の言語的痕跡を分析した。 1970年代中頃からエビデンスベースの言語は、立法生産性の低下とともに減少し続けています。この減少は、議会における党派偏極の増大と、社会における所得格差の増大に伴った。結果は、政治意思決定における証拠に基づく言語の重要性を浮き彫りにする。 Pursuit of honest and truthful decision-making is crucial for governance and accountability in democracies. However, people sometimes take different perspectives of what it means to be honest and how to pursue truthfulness. Here we explore a continuum of perspectives from evidence-based reasoning, rooted in ascertainable facts and data, at one end, to intuitive decisions that are driven by feelings and subjective interpretations, at the other. We analyze the linguistic traces of those contrasting perspectives in Congressional speeches from 1879 to 2022. We find that evidence-based language has continued to decline since the mid-1970s, together with a decline in legislative productivity. The decline was accompanied by increasing partisan polarization in Congress and rising income inequality in society. Results highlight the importance of evidence-based language in political decision-making.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# 連続学習のための液体アンサンブル選択 Liquid Ensemble Selection for Continual Learning ( http://arxiv.org/abs/2405.07327v1 ) ライセンス: Link先を確認	Carter Blair, Ben Armstrong, Kate Larson,	(参考訳) 継続的学習は、機械学習モデルが、すでに学んだことを忘れずに、シフトするデータ分布から継続的に学習できるようにすることを目的としている。異なる部分集合上でアンサンブルの各メンバーを訓練することにより、アンサンブル全体の精度をナイーブモデルよりもはるかに高い精度で達成することができる。アンサンブル内のどのモデルを任意のデータで学習し、どのモデルを予測すべきかという問題に対処する。代表投票から作業を引き出すことにより,どのモデルがアクティブであるかを動的に選択するアルゴリズムを開発した。さまざまなデリゲート手法とパフォーマンス指標について検討し、最終的に分散シフトに直面した上で、デリゲートがナイーブな学習よりも大きなパフォーマンス向上を提供できることを発見した。 Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# ReLUニューラルネットワークを用いた確率帯域 Stochastic Bandits with ReLU Neural Networks ( http://arxiv.org/abs/2405.07331v1 ) ライセンス: Link先を確認	Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani,	(参考訳) 本稿では,ReLUニューラルネットワーク構造を用いた確率的帯域幅問題について検討する。我々は, 1層ReLUニューラルネットワークの帯域を考慮すれば, $\tilde{O}(\sqrt{T})$ 後悔の保証が達成可能であることを示す。本稿では,この上限を達成できるOFU-RELUアルゴリズムを提案する。このアルゴリズムはまず線形状態に到達するまでランダムに探索し、続いて探索と利用のバランスをとるために UCB 型線形バンドイットアルゴリズムを実装した。我々の重要な洞察は、探索段階でReLUのパラメータを相対的に正確に学習すると、ReLUアクティベーションの断片的線形構造を利用して、変換された特徴空間における問題を線形帯域に変換することができるということである。モデルパラメータへの依存を取り除くため,バッチ化戦略に基づくOFU-ReLU+アルゴリズムを設計し,同じ理論的保証を提供する。 We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.	翻訳日:2024-05-14 15:34:20 公開日:2024-05-12
# PotatoGANs:Potato病の特定と分類のためのジェネレーティブ・ディバイサル・ネットワーク、インスタンス・セグメンテーション、説明可能なAIの利用 PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification ( http://arxiv.org/abs/2405.07332v1 ) ライセンス: Link先を確認	Mohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md Hasib,	(参考訳) 深層学習技術を用いた農業病のセグメンテーションの自動化により、多くの応用がもたらされた。しかし、新しい条件を適用すると、これらのアプリケーションはオーバーフィッティングの困難に直面し、セグメンテーション性能が低下する。ジャガイモ農業では、病気が収量に大きな影響を与えているため、農業経済がこれらの病気を迅速かつ適切に識別することが重要である。回転、フリップ、翻訳といった従来のデータ拡張アプローチには制限があり、しばしば強力な一般化結果の提供に失敗する。これらの課題に対処するため,本研究では,PotatoGANと呼ばれる新しいアプローチを採用している。この新たなデータ拡張アプローチでは,2種類のGANを用いて,健康なジャガイモ画像から合成ジャガイモ病画像を生成する。このアプローチはデータセットを拡大するだけでなく、モデル一般化の強化に役立つバラエティも追加する。インセプションスコアを指標として,本実験では,PotatoGANsが生成した画像の品質と現実性が向上し,実際の疾患画像と密に類似する能力を強調した。 CycleGANモデルは、画像品質の点でPix2Pix GANモデルよりも優れており、より高いISスコアにより、CycleGANはブラックスカーフとコモンスハーブでそれぞれ1.2001と1.0900のインセプションスコア(IS)を達成している。この合成データは、大規模なニューラルネットワークのトレーニングを大幅に改善することができる。また、データの多様性と一般化能力を高めながら、データ収集コストを低減する。我々の研究は、3つのグラデーションベースのExplainable AIアルゴリズム(GradCAM, GradCAM++, ScoreCAM)と3つの異なるCNNアーキテクチャ(DenseNet169, Resnet152 V2, InceptionResNet V2)を組み合わせてジャガイモ病の分類を行うことにより、解釈可能性を向上させる。 Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# Quantum Mini-Apps: 量子HPCアプリケーションの開発とベンチマークのためのフレームワーク Quantum Mini-Apps: A Framework for Developing and Benchmarking Quantum-HPC Applications ( http://arxiv.org/abs/2405.07333v1 ) ライセンス: Link先を確認	Nishant Saurabh, Pradeep Mantha, Florian J. Kiwit, Shantenu Jha, Andre Luckow,	(参考訳) 量子ハードウェアの成熟度と規模の増加とHPCシステムへの統合により、量子HPCアプリケーションやミドルウェアシステムの開発、特徴化、ベンチマークを行うための堅牢な技術を開発する必要がある。これは、量子と古典的なワークロードタスクとコンポーネント間の相互作用、結合、一般的な実行パターンをよりよく理解する必要があります。本稿では,異なる結合モードと相互作用モードを特徴とする6つの量子HPC実行モチーフを同定する。これらのモチーフは、プロダクションシステムの本質的な特性をカプセル化した、一連の量子ミニアプリ - 単純化されたアプリケーションプロトタイプの基礎を提供する。これらの開発を支援するために、異種量子HPCインフラストラクチャをまたいだミニアプリの作成と実行に必要な抽象化を提供するミニアプリケーションフレームワークを導入し、パフォーマンス評価とミドルウェア開発に有用なツールとなる。 With the increasing maturity and scale of quantum hardware and its integration into HPC systems, there is a need to develop robust techniques for developing, characterizing, and benchmarking quantum-HPC applications and middleware systems. This requires a better understanding of interaction, coupling, and common execution patterns between quantum and classical workload tasks and components. This paper identifies six quantum-HPC execution motifs - recurring execution patterns characterized by distinct coupling and interaction modes. These motifs provide the basis for a suite of quantum mini-apps - simplified application prototypes that encapsulate essential characteristics of production systems. To support these developments, we introduce a mini-app framework that offers the necessary abstractions for creating and executing mini-apps across heterogeneous quantum-HPC infrastructure, making it a valuable tool for performance characterizations and middleware development.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# VR応用におけるアクセシブルレイベース相互作用のための震動低減 Tremor Reduction for Accessible Ray Based Interaction in VR Applications ( http://arxiv.org/abs/2405.07335v1 ) ライセンス: Link先を確認	Dr Corrie Green, Dr Yang Jiang, Dr John Isaacs, Dr Michael Heron,	(参考訳) 従来の2Dインタラクション手法と比較して、バーチャルリアリティ(VR)は、ユニークなインターフェースとインタラクション設計決定の機会を示す。現在、既存のインタラクション技術がすべてのユーザにとって使用できない可能性があるため、アクセス可能なVRエクスペリエンスを開発する上で、これは課題となっている。従来の2次元インタフェースのインタラクション手法の多くは、従来のカーソル用に設計されたレーザーポインターの使用など、入力機構にほとんど変更を加えることなく、VR空間で直接動作するように変換されていることが判明した。距離に依存しないミリメートルは、仮想世界でスケールするインタフェースの開発においてデザイナを支援することができると認識されている。関連して、Fittsの法則では、距離が大きくなるにつれて、ユーザーの動きは徐々に遅くなり、正確性が低下する。本稿では,低域通過フィルタを用いてユーザ入力ノイズの正規化を行い,光線による相互作用におけるモータの細かな要求を緩和する手法を提案する。このようなフィルタの実装の可能性を理解し,エンドユーザー体験への影響を探るための開発研究を行った。アルゴリズムが、不随意の手震動をフィルタリングして軽減することで、より正確で結果としてフラストレーションの少ない体験の機会を、どのように提供できるかを実証する。既存のVRデザイン哲学に関するさらなる議論も行われ、多感覚フィードバックと心理モデルを支持する証拠を分析している。完成した研究はGitHubからダウンロードできる。 Comparative to conventional 2D interaction methods, virtual reality (VR) demonstrates an opportunity for unique interface and interaction design decisions. Currently, this poses a challenge when developing an accessible VR experience as existing interaction techniques may not be usable by all users. It was discovered that many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism, such as the use of a laser pointer designed to that of a traditional cursor. It is recognized that distanceindependent millimetres can support designers in developing interfaces that scale in virtual worlds. Relevantly, Fitts law states that as distance increases, user movements are increasingly slower and performed less accurately. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction. A development study was conducted to understand the feasibility of implementing such a filter and explore its effects on end users experience. It demonstrates how an algorithm can provide an opportunity for a more accurate and consequently less frustrating experience by filtering and reducing involuntary hand tremors. Further discussion on existing VR design philosophies is also conducted, analysing evidence that supports multisensory feedback and psychological models. The completed study can be downloaded from GitHub.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 指数メカニズムに基づくデータトレーディング複合オークション機構 Data Trading Combination Auction Mechanism based on the Exponential Mechanism ( http://arxiv.org/abs/2405.07336v1 ) ライセンス: Link先を確認	Kongyang Chen, Zeming Xu, Bing Mi,	(参考訳) 近年、機械学習技術の普及に伴い、トレーニングデータの需要が大幅に増加し、データトレーディングなどの研究分野が出現している。この分野での仕事はまだ発展段階にある。異なる購入者は様々な種類のデータに対する需要の程度が異なり、オークションはその真正さと公正さのためにこのようなシナリオで重要な役割を果たしている。近年の研究では、異なるドメインに対する組み合わせオークション機構が提案されている。しかし、こうしたメカニズムは購入者のプライバシー上の懸念に対処していない。本稿では,購入者の入札プライバシの漏洩を防止するために,指数的メカニズム(DCAE)に基づく「textit{Data Trading Combination Auction Mechanism」を設計する。本稿では,この指数的メカニズムを適用して,競売の最終決着価格を選択し,価格と収益の関係に基づいて確率分布を生成する。実験的な側面では,2つのシナリオの下で異なるメカニズムを選択することを考慮し,本手法は高いオークション収入を確保し,購入者のプライバシが侵害されるのを防ぐことができることを示した。 With the widespread application of machine learning technology in recent years, the demand for training data has increased significantly, leading to the emergence of research areas such as data trading. The work in this field is still in the developmental stage. Different buyers have varying degrees of demand for various types of data, and auctions play a role in such scenarios due to their authenticity and fairness. Recent related work has proposed combination auction mechanisms for different domains. However, such mechanisms have not addressed the privacy concerns of buyers. In this paper, we design a \textit{Data Trading Combination Auction Mechanism based on the exponential mechanism} (DCAE) to protect buyers' bidding privacy from being leaked. We apply the exponential mechanism to select the final settlement price for the auction and generate a probability distribution based on the relationship between the price and the revenue. In the experimental aspect, we consider the selection of different mechanisms under two scenarios, and the experimental results show that this method can ensure high auction revenue and protect buyers' privacy from being violated.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 眼底画像からの網膜血管の網膜基底分類と切削エッジ分割モデルのための説明可能な畳み込みニューラルネットワーク Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images ( http://arxiv.org/abs/2405.07338v1 ) ライセンス: Link先を確認	Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah,	(参考訳) 本研究は,眼底画像における網膜血管検査による早期診断の重要領域に焦点を当てた。網膜血管の自動セグメンテーションは早期発見を約束するが、既存の方法の限界のために正確な分析は困難であり、しばしば識別能力が欠如しており、病理領域の影響を受けやすい。基礎画像解析の研究は,8つの事前学習CNNモデルを用いたディープラーニングに基づく分類を進歩させる。本研究では,Grad-CAM,Grad-CAM++,Score-CAM,Faster Score-CAM,Layer CAMなどの説明可能なAI技術を利用する。これらのテクニックは、モデルの意思決定プロセスを照らし、透明性を促進し、予測に対する信頼を高める。調査を拡大し、ResNetバックボーンを使用したTransUNet、DenseNetとResNetバックボーンによるAtention U-Net、Swin-UNETを含む10のモデルを調査しました。 ResNet50V2、ResNet101V2、ResNet152V2、DenseNet121などの多様なアーキテクチャを組み込んだ総合的研究により、ファンドス画像解析の強化のための注意機構に関する洞察を深めることができた。基礎画像分類の評価モデルのうち、ResNet101は最高精度で登場し、94.17%を達成した。一方、EfficientNetB0はモデルの中で最も精度が低く、88.33%のスコアを得た。さらに、眼底画像セグメンテーションの分野では、Swin-Unetは86.19%の平均画素精度を示し、眼底画像内の関心領域を正確に記述する効果を示した。逆に、Attention U-Net with DenseNet201 backboneは評価されたモデルの中で最も低い平均画素精度を示し、スコアは75.87%に達した。 Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 疑似科学としての機械意識:意識機械の神話 Machine Consciousness as Pseudoscience: The Myth of Conscious Machines ( http://arxiv.org/abs/2405.07340v1 ) ライセンス: Link先を確認	Eduardo C. Garrido-Merchán,	(参考訳) 意識的機械の仮説は、人工知能の概念が発明されてから議論され、システムによって達成された計算知性は、そのシステムにおいてエピノメノンとして現象的意識が出現する原因、あるいはシステムの行動的・内部的複雑さがしきい値を超えた結果である、という仮定に基づいている。その結果、マシン意識の可能性と、それをコンピュータにどのように実装するかを探求する膨大な文献が公表された。さらに、民間心理学やトランスヒューマニズム文学はこの仮説をSF文学の人気と結び付けており、知的ロボットは通常異形化され、即ち現象意識が与えられる。しかし、本研究では、これらの文献が科学的厳密さに欠けており、反対の仮説を偽造することは不可能であり、機械意識文学が公表した全てのアプローチが科学的方法によって証明できない哲学的仮定に依存していることを示す議論の一覧を示す。具体的には, 現象意識が, アルゴリズムやモデルの複雑さとは独立して, 客観的に測定あるいは定量的に定義できないことを示し, 基本的には観察者にとって主観的で内部的な現象であることを示す。これらすべての議論を踏まえると、なぜ意識的な機械という概念が現在トランスヒューマニズムとサイエンスフィクション文化の神話であるかについて論じる作業は終了する。 The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a consequence, a huge amount of literature exploring the possibility of machine consciousness and how to implement it on a computer has been published. Moreover, common folk psychology and transhumanism literature has fed this hypothesis with the popularity of science fiction literature, where intelligent robots are usually antropomorphized and hence given phenomenal consciousness. However, in this work, we argue how these literature lacks scientific rigour, being impossible to falsify the opposite hypothesis, and illustrate a list of arguments that show how every approach that the machine consciousness literature has published depends on philosophical assumptions that cannot be proven by the scientific method. Concretely, we also show how phenomenal consciousness is not computable, independently on the complexity of the algorithm or model, cannot be objectively measured nor quantitatively defined and it is basically a phenomenon that is subjective and internal to the observer. Given all those arguments we end the work arguing why the idea of conscious machines is nowadays a myth of transhumanism and science fiction culture.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 進化的グリッドトポロジー下における電力グリッド運用リスク評価のためのグラフニューラルネットワーク Graph neural networks for power grid operational risk assessment under evolving grid topology ( http://arxiv.org/abs/2405.07343v1 ) ライセンス: Link先を確認	Yadong Zhang, Pranav M Karve, Sankaran Mahadevan,	(参考訳) 本稿では,今後の発電機のオン/オフ状態(グリッドトポロジ)や配電決定に関する高精細な情報なしに,電力網内の危険条件を特定するグラフニューラルネットワーク(GNN)の能力について検討する。 GNNは教師付き学習を用いて訓練され、電力グリッドの集積バスレベル(ゾーンレベルまたはシステムレベル)または個々のブランチレベル状態を異なる電力供給および需要条件下で予測する。トレーニングデータに対する入力を生成しながら、確率格子変数(風/ソラ生成と負荷需要)の変動とそれらの統計的相関を厳格に考慮する。多数の混合整数線形計画法(MILP)最適電力フロー問題を解くことで得られたトレーニングデータの出力は、システムレベル、粒子レベル、伝送ラインレベルの関心量(QoIs)に対応する。 GNNが予測するQoIは、時間前、サンプリングベースの信頼性、リスクアセスメント(英語版)において、水平およびシステムレベル(ロードシェディング)および分岐レベル(オーバーロード)障害イベントの実行に使用される。提案手法は, バスサイズが118から2848の3種類の合成格子に対して実証された。以上の結果から,GNNはQoIの高速かつ高精度な予測が可能であり,計算コストのかかるMILPアルゴリズムに優れたプロキシであることを示す。 GNNに基づく信頼性とリスクアセスメントの優れた精度は、厳密な信頼性とリスク推定を迅速に提供することにより、GNNモデルが状況認識を大幅に改善できることを示唆している。 This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-level) or individual branch-level state under different power supply and demand conditions. The variability of the stochastic grid variables (wind/solar generation and load demand), and their statistical correlations, are rigorously considered while generating the inputs for the training data. The outputs in the training data, obtained by solving numerous mixed-integer linear programming (MILP) optimal power flow problems, correspond to system-level, zonal and transmission line-level quantities of interest (QoIs). The QoIs predicted by the GNNs are used to conduct hours-ahead, sampling-based reliability and risk assessment w.r.t. zonal and system-level (load shedding) as well as branch-level (overloading) failure events. The proposed methodology is demonstrated for three synthetic grids with sizes ranging from 118 to 2848 buses. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and can be good proxies for computationally expensive MILP algorithms. The excellent accuracy of GNN-based reliability and risk assessment suggests that GNN models can substantially improve situational awareness by quickly providing rigorous reliability and risk estimates.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# TKAN: 一時的コルモゴロフ・アルノルドネットワーク TKAN: Temporal Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2405.07344v1 ) ライセンス: Link先を確認	Remi Genet, Hugo Inzirillo,	(参考訳) リカレントニューラルネットワーク(RNN)は、特に自然言語やデータシーケンス処理において、機械学習の多くの領域に革命をもたらした。 LSTM(Long Short-Term Memory)は、シーケンシャルデータにおける長期的な依存関係をキャプチャする能力を示している。 MLP(Multi-Layer Perceptrons)に代わる有望な代替手段であるKolmogorov-Arnold Networks(KAN)に触発された我々は、kanとLSTM、TKAN(Temporal Kologorov-Arnold Networks)に触発された新しいニューラルネットワークアーキテクチャを提案した。 TKANは両方のネットワークの強みを組み合わせたもので、メモリ管理を組み込んだRecurring Kolmogorov-Arnold Networks (RKANs) Layersで構成されている。この革新により、精度と効率を向上したマルチステップ時系列予測が可能となる。複雑なシーケンシャルパターンを扱う場合の従来のモデルの限界に対処することにより、TKANアーキテクチャは予測を1段階以上進める必要がある分野において、大きな可能性をもたらす。 Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# インストラクションチューニングによるAI生成画像の人間の嗜好理解と評価 Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning ( http://arxiv.org/abs/2405.07346v1 ) ライセンス: Link先を確認	Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min,	(参考訳) 人工知能生成コンテンツ(AIGC)は近年急速に成長しており、AIベースの画像生成は、その効率的で想像力のある画像生成能力によって広く注目を集めている。しかし、AIGI(AI- generated Images)は、その独特の歪みのために人間の嗜好を満足させておらず、AIGIに対する人間の嗜好を理解し評価する必要性を強調している。そこで本論文では,AIGIを対象とした画像品質評価(IQA)データベース,AIGCIQA2023+を構築し,人間の視覚的嗜好スコアと,品質,信頼性,対応性といった3つの視点から詳細な嗜好説明を提供する。そして,構築したAIGCIQA2023+データベースをベースとして,インストラクションチューニングを用いたマルチパースペクティブからAIGIに対する人間の嗜好を評価・説明するためのMINT-IQAモデルを提案する。具体的には、MINT-IQAモデルは、まず、マルチパースペクティブからAI生成画像に対する人間の嗜好を学習し、評価し、次に、視覚言語による指示チューニング戦略を通じて、AIGIに対する人間の視覚的嗜好に対する強力な理解と説明能力を得る。 MINT-IQAモデルはAIGIに対する人間の視覚的嗜好の理解と評価において最先端の性能を達成し,提案モデルは最先端IQAモデルと比較して従来のIQAタスクと競合する結果も得ることを示した。 AIGCIQA2023+データベースとMINT-IQAモデルは、将来の研究を促進するためにリリースされる。 Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for AIGIs. To this end, in this paper, we first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+, which provides human visual preference scores and detailed preference explanations from three perspectives including quality, authenticity, and correspondence. Then, based on the constructed AIGCIQA2023+ database, this paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning. Specifically, the MINT-IQA model first learn and evaluate human preferences for AI-generated Images from multi-perspectives, then via the vision-language instruction tuning strategy, MINT-IQA attains powerful understanding and explanation ability for human visual preference on AIGIs, which can be used for feedback to further improve the assessment capabilities. Extensive experimental results demonstrate that the proposed MINT-IQA model achieves state-of-the-art performance in understanding and evaluating human visual preferences for AIGIs, and the proposed model also achieves competing results on traditional IQA tasks compared with state-of-the-art IQA models. The AIGCIQA2023+ database and MINT-IQA model will be released to facilitate future research.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# MedConceptsQA - オープンソースの医療概念QAベンチマーク MedConceptsQA -- Open Source Medical Concepts QA Benchmark ( http://arxiv.org/abs/2405.07348v1 ) ライセンス: Link先を確認	Ofir Ben Shoham, Nadav Rappoport,	(参考訳) MedConceptsQAは、医療概念質問応答のための専用のオープンソースベンチマークである。このベンチマークは、診断、手順、薬物など、さまざまな語彙にわたる様々な医学概念に関する質問で構成されている。質問は、簡単、中、困難の3つのレベルに分類される。各種大規模言語モデルを用いて評価を行った。以上の結果より, 事前訓練を受けた臨床用大言語モデルでは, 医用データで事前訓練を受けたにもかかわらず, ランダムな推定値に近い精度の精度が得られたことが示唆された。しかし、GPT-4は、臨床大言語モデルと比較して、27%-37%(ゼロショット学習では27%、少数ショット学習では37%)の絶対的な平均改善を実現している。我々のベンチマークは、大規模言語モデルによる医学的概念の理解と推論を評価するための貴重なリソースとして役立ちます。私たちのベンチマークはhttps://huggingface.co/datasets/ofir408/MedConceptsQAで公開されています。 We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at https://huggingface.co/datasets/ofir408/MedConceptsQA	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# WeedScout:専用ハードウェアを用いたリアルタイム自動ブラックグラス分類とマッピング WeedScout: Real-Time Autonomous blackgrass Classification and Mapping using dedicated hardware ( http://arxiv.org/abs/2405.07349v1 ) ライセンス: Link先を確認	Matthew Gazzard, Helen Hicks, Isibor Kennedy Ihianle, Jordan J. Bird, Md Mahmudul Hasan, Pedro Machado,	(参考訳) クログラス(Alopecurus myosuroides)は、作物の収穫量を減らし、栽培コストを増大させることで、食品の安全性に広範囲に影響を及ぼす競争雑草である。農業の財政的負担に加えて、黒草への除草剤としての除草剤の応用は、清潔な水や衛生へのアクセスに悪影響を及ぼす可能性がある。 WeedScoutプロジェクトは、黒草のリアルタイム検出に適した最先端ソリューションであるRT-ABGCM(Real-Rime Autonomous Black-Grass Classification and Mapping)を導入し、精密雑草管理を実践している。人工知能(AI)アルゴリズムを活用することで、システムはライブイメージフィードを処理し、ブラックグラス密度を推測し、成熟の2段階をカバーする。この研究は、YOLO(You Only Look Once)モデル、具体的には、NVIDIA Jetson Nano (NJN)でエッジで加速されたYOLOv8とYOLO-NASの合理化について調査している。推論速度とモデルパフォーマンスを最適化することにより、プロジェクトはAIを農業プラクティスに統合し、除草剤耐性や環境影響といった課題に対する潜在的な解決策を提供する。さらに、2つのデータセットとモデルウェイトが研究コミュニティに提供され、雑草の検出と精密農業技術のさらなる進歩を促進する。 Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 光の非ガウス状態の反復生成のための多目的かつスケーラブルなスキームの実験的実証 Experimental demonstration of a versatile and scalable scheme for iterative generation of non-Gaussian states of light ( http://arxiv.org/abs/2405.07350v1 ) ライセンス: Link先を確認	Hector Simon, Lucas Caron, Romaric Journet, Viviane Cotte, Rosa Tualle-Brouri,	(参考訳) GKP状態のような非ガウス状態は、光連続変数量子コンピューティングにとって必須の資源である。これらの状態を効率的に生成する能力は、一般に量子技術、特にフォールトトレラントな量子コンピューティングの膨大な可能性を開くだろう。このレターでは、量子メモリキャビティを用いて、育種プロトコルの確率的性質を克服し、スケーラビリティの観点から高速で非ガウス状態を生成する。実験装置の性能は, 振幅アルファ = 1.63 のシュリンガー猫状態の生成により, kHz 域における生成速度において 60% 以上の忠実度を呈し, それらの状態に対する「最先端猫状態」よりも高い値を示した。 Non-Gaussian states of light, such as GKP states, are essential resources for optical continuous-variable quantum computing. The ability to efficiently produce these states would open up tremendous prospects for quantum technologies in general and fault-tolerant quantum computing in particular. This letter demonstrates a versatile method using a quantum memory cavity to overcome the probabilistic nature of the breeding protocols and generate non-Gaussian states at high rates with scalability perspectives. The performances of our experimental setup are illustrated with the generation of Schr\"odinger cat states of amplitude alpha = 1.63 with a fidelity of more than 60% at a generation rate in the kHz range, which is higher than the state of the art for such states.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# SoccerNet-Echoes: サッカーゲームのオーディオ解説データセット SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset ( http://arxiv.org/abs/2405.07354v1 ) ライセンス: Link先を確認	Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah,	(参考訳) サッカーにおける自動音声認識(ASR)技術の応用は、スポーツ分析に多くの機会を提供する。具体的には、ASRでオーディオコメンタリーを抽出することで、ゲームのイベントに関する貴重な洞察を与え、自動ハイライト生成などの下流アプリケーションへの扉を開く。本稿では,サッカーゲーム放送から音声コメントを自動的に書き起こし,ASRを用いてゲーム音声から派生したリッチなテキスト情報を用いて映像コンテンツを拡張した,サッカーネットデータセットの強化について述べる。 Whisperモデルを使用して生成され、Google Translateで翻訳されたこれらのテキストコメンタリーは、アクションスポッティングの強化、自動キャプション生成、ゲーム要約など、さまざまなアプリケーションにおける SoccerNetデータセットの有用性を拡張している。視覚的および聴覚的コンテンツとともにテキストデータを組み込むことで、サッカーゲームのダイナミクスを捉えるアルゴリズムを開発するための総合的なリソースとなることを目的としている。本稿では,このデータセットのキュレーションとASRの統合に関わる手法について詳述する。また,スポーツ分析におけるマルチモーダルなアプローチの意義と,リッチなデータセットが多様なアプリケーションをどのようにサポートするかを強調し,スポーツ分析の分野における研究と開発の範囲を広げる。 The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 交通・インフラにおけるサイバーセキュリティイノベーションのための価値駆動型フレームワーク A Value Driven Framework for Cybersecurity Innovation in Transportation & Infrastructure ( http://arxiv.org/abs/2405.07358v1 ) ライセンス: Link先を確認	Lampis Alevizos, Lalit Bhakuni, Stefan Jaschke,	(参考訳) 本稿では、この分野を支配してきた従来の市場中心のアプローチとは対照的に、輸送・インフラ分野における価値駆動型サイバーセキュリティ革新の枠組みを紹介する。イノベーションカテゴリーを持続的、漸進的、破壊的、変革的へと再定義し、私たちは組織内の自己革新の文化を育み、ビジネス価値と戦略的目標に直接貢献するサイバーセキュリティ対策に戦略的に焦点をあてることを目指しています。このアプローチは、主にサイバー防衛の運用効率と効率を高めると同時に、サイバーセキュリティイニシアチブをミッションクリティカルな目標と整合させる。本稿では,サイバーセキュリティイノベーションのビジネス価値を評価するための実践的手法を詳述する。このフレームワークは、インフラの整合性を維持しながら、進化するサイバー脅威の状況に対するサイバーセキュリティ機能を強化するように設計されている。一般市場へのアピールからセクター特有のニーズへと焦点を移すため、当社のフレームワークは、サイバーセキュリティのリーダーに、影響力のあるイニシアティブの優先順位付けに必要な戦略的サイバー監視を提供する。 This paper introduces a value-driven cybersecurity innovation framework for the transportation and infrastructure sectors, as opposed to the traditional market-centric approaches that have dominated the field. Recontextualizing innovation categories into sustaining, incremental, disruptive, and transformative, we aim to foster a culture of self-innovation within organizations, enabling a strategic focus on cybersecurity measures that directly contribute to business value and strategic goals. This approach enhances operational effectiveness and efficiency of cyber defences primarily, while also aligns cybersecurity initiatives with mission-critical objectives. We detail a practical method for evaluating the business value of cybersecurity innovations and present a pragmatic approach for organizations to funnel innovative ideas in a structured and repeatable manner. The framework is designed to reinforce cybersecurity capabilities against an evolving cyber threat landscape while maintaining infrastructural integrity. Shifting the focus from general market appeal to sector-specific needs, our framework provides cybersecurity leaders with the strategic cyber-foresight necessary for prioritizing impactful initiatives, thereby making cybersecurity a core business enabler rather than a burden.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# N次元ランゲヴィン方程式とニューラル正規微分方程式による予測 Forecasting with an N-dimensional Langevin Equation and a Neural-Ordinary Differential Equation ( http://arxiv.org/abs/2405.07359v1 ) ライセンス: Link先を確認	Antonio Malpica-Morales, Miguel A. Duran-Olivencia, Serafim Kalliadasis,	(参考訳) 競争力のある電気市場では、電気の日頭価格の正確な予測が不可欠である。静電気価格予測技術は注目されているが、電力市場における非定常的特徴の一般的な普及にもかかわらず、非定常的手法の研究は比較的少ない。具体的には、既存の非定常的手法は、単独で個々の非定常的特徴に対処することを目的としており、同時に複数の非定常的効果を探索すること以外にはならない。本研究の目的は,非定常行動の範囲を包含する,非定常電気価格時系列を体系的にモデル化し,予測する枠組みの定式化である。この目的のために、N次元ランゲヴィン方程式(LE)とニューラル正規微分方程式(NODE)を組み合わせたデータ駆動モデルを開発する。 LEは定常状態における電気価格の挙動を詳細に把握するが、非定常状態には不十分である。この制約を克服するために、我々はNODEアプローチを用いて学習し、同時にLEが生み出す実際の電気価格時系列と模擬価格軌跡との差を予測する。この違いを学習することで、NODEはLEがキャプチャできない時系列の非定常成分を再構成する。スペイン電力日頭市場を原型事例研究として用いた枠組みの有効性を実証する。その結果,NODEはLEを良好に補完し,定常的および非定常的電気価格の双方に対処するための包括的戦略を提供することがわかった。フレームワークの信頼性とロバスト性は、様々な非定常シナリオを通じて、様々な基本的な単純法と比較することによって実証される。 Accurate prediction of electricity day-ahead prices is essential in competitive electricity markets. Although stationary electricity-price forecasting techniques have received considerable attention, research on non-stationary methods is comparatively scarce, despite the common prevalence of non-stationary features in electricity markets. Specifically, existing non-stationary techniques will often aim to address individual non-stationary features in isolation, leaving aside the exploration of concurrent multiple non-stationary effects. Our overarching objective here is the formulation of a framework to systematically model and forecast non-stationary electricity-price time series, encompassing the broader scope of non-stationary behavior. For this purpose we develop a data-driven model that combines an N-dimensional Langevin equation (LE) with a neural-ordinary differential equation (NODE). The LE captures fine-grained details of the electricity-price behavior in stationary regimes but is inadequate for non-stationary conditions. To overcome this inherent limitation, we adopt a NODE approach to learn, and at the same time predict, the difference between the actual electricity-price time series and the simulated price trajectories generated by the LE. By learning this difference, the NODE reconstructs the non-stationary components of the time series that the LE is not able to capture. We exemplify the effectiveness of our framework using the Spanish electricity day-ahead market as a prototypical case study. Our findings reveal that the NODE nicely complements the LE, providing a comprehensive strategy to tackle both stationary and non-stationary electricity-price behavior. The framework's dependability and robustness is demonstrated through different non-stationary scenarios by comparing it against a range of basic naive methods.	翻訳日:2024-05-14 15:24:35 公開日:2024-05-12
# 量子連続可変状態における絡み合いダイナミクス Entanglement Dynamics in Quantum Continuous-Variable States ( http://arxiv.org/abs/2405.07362v1 ) ライセンス: Link先を確認	Ankit Kumar,	(参考訳) 重力結合の弱さのため、重力が地球の磁場を利用する全ての量子実験が現在まで行われている。この場は量子粒子から事実上検出不可能なバックアクションを行うため、固定背景ニュートン場あるいは時空として古典的な記述を効果的に認めている。この議論は、重力の量子的特徴を観測できる最も単純なシナリオの1つであるため、2つの量子質量間の重力の実証に向けた理論的および実験的研究を強く動機付けている。いくつかの提案は、2つの巨大な物体間の絡み合いを発生させる可能性について研究した。同じ線に沿って、特に重力に焦点を当て、この論文は相互作用を媒介する絡み合いに対処するための一般的なツールを導入し、連続可変状態の2つの粒子に適用する。 Due to the weakness of gravitational coupling, all quantum experiments up to date in which gravity plays a role utilized the field of the Earth. Since this field undergoes practically undetectable back-action from quantum particles, it effectively admits a classical description as a fixed background Newtonian field or spacetime. This argument strongly motivates theoretical and experimental research towards a demonstration of gravitation between two quantum masses, as this is one of the most straightforward scenarios where quantum features of gravity could be observed. Several proposals studied the possibility of generating entanglement between two massive objects. Along the same lines, with a particular focus on gravity, this thesis introduces general tools to tackle interaction-mediated entanglement and applies them to two particles prepared in continuous-variable states.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 議会における多言語パワーとイデオロギーの同定--参照データセットと簡易ベースライン Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines ( http://arxiv.org/abs/2405.07363v1 ) ライセンス: Link先を確認	Çağrı Çöltekin, Matyáš Kopp, Katja Meden, Vaidas Morkevicius, Nikola Ljubešić, Tomaž Erjavec,	(参考訳) 政治的指向と権力位置の識別に関するデータセットを導入する。このデータセットは29の州議会と地方議会から書き起こされた議会演説に匹敵するコーパスであるParlaMintから派生している。筆者らは、データセットを導入し、その作成中の選択の背景として、データセットに関する統計を提示し、簡単な分類器を用いて、左から右へ軸の政治的指向を予測するための基礎的な結果、すなわち、連立党員と野党員の演説を区別する力的位置同定を行う。 We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# BoQ: 学習可能なクエリの袋としての価値 BoQ: A Place is Worth a Bag of Learnable Queries ( http://arxiv.org/abs/2405.07364v1 ) ライセンス: Link先を確認	Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère,	(参考訳) 視覚的位置認識では、環境条件や視点の異なる場所の正確な識別とマッチングが重要な課題である。本稿では,Bag-of-Queries (BoQ)と呼ばれる新しい手法を提案する。自己アテンションを使用し、入力機能から直接クエリを生成する既存の方法とは異なり、BoQは異なる学習可能なグローバルクエリを採用し、クロスアテンションを通じて入力機能を探索し、一貫性のある情報アグリゲーションを保証する。さらに,本手法は,CNNとVision Transformerの両バックボーンを統合し,解釈可能なアテンション機構を提供する。 BoQの性能は14の大規模ベンチマークで広範な実験によって実証されている。 NetVLAD、MixVPR、EigenPlacesといった最先端技術よりも一貫して優れています。さらに、グローバル検索技術(ワンステージ)として、BoQはPatch-NetVLAD、TransVPR、R2Formerといった2段階の検索手法を超越し、桁違いに高速かつ効率的である。コードとモデルの重み付けはhttps://github.com/amaralibey/Bag-of-Queries.comで公開されている。 In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 深層学習によるS状腸炎検出における全身性増強と進行予測のための解剖学的認識の導入 Incorporating Anatomical Awareness for Enhanced Generalizability and Progression Prediction in Deep Learning-Based Radiographic Sacroiliitis Detection ( http://arxiv.org/abs/2405.07369v1 ) ライセンス: Link先を確認	Felix J. Dorfner, Janis L. Vahldiek, Leonhard Donle, Andrei Zhukov, Lina Xu, Hartmut Häntze, Marcus R. Makowski, Hugo J. W. L. Aerts, Fabian Proft, Valeria Rios Rodriguez, Judith Rademacher, Mikhail Protopopov, Hildrun Haibel, Torsten Diekhoff, Murat Torgutalp, Lisa C. Adams, Denis Poddubnyy, Keno K. Bressem,	(参考訳) 目的: 深層学習モデルに解剖学的認識を組み込むことで, 一般化性が向上し, 疾患の進行を予測することができるかを検討すること。方法: 本研究は, 大学および地域病院で収集した軸椎関節症(axSpA)に焦点を当てた4種類の患者コホートの骨盤X線撮影を行った。 1483個のX線写真からなる最初のコホートは、訓練(n=1261)と検証(n=222)に分割された。 436人, 340人, 163人からなる他のコホートは, それぞれ独立したテストデータセットとして使用した。第2コホートでは311人の追跡データを用いて進行予測能力について検討した。 2つのニューラルネットワークが訓練され、1つは仙腸関節の境界ボックス(解剖学的認識)に収穫された画像で、もう1つは完全なX線写真で撮影された。モデルの性能は, 受信機動作特性曲線(AUC)下の領域, 精度, 感度, 特異性を用いて比較した。結果: 3つのテストデータセットにおいて,標準モデルは0.770,0.724,0.850の精度でAUCスコア0.853,0.817,0.947を達成した。解剖学的モデルではAUCスコアは0.899, 0.846, 0.957, 精度は0.821, 0.744, 0.906であった。解剖学的に高いリスクと診断された患者は2年以内に放射線性仙腸炎を発症する確率比が2.16(95% CI: 1.19, 3.86)であった。結論: 解剖学的認識は, 放射線性仙腸炎の検出において, 深層学習モデルの一般化性を向上させることができる。このモデルは、この研究とともに完全にオープンソースとして公開されている。 Purpose: To examine whether incorporating anatomical awareness into a deep learning model can improve generalizability and enable prediction of disease progression. Methods: This retrospective multicenter study included conventional pelvic radiographs of 4 different patient cohorts focusing on axial spondyloarthritis (axSpA) collected at university and community hospitals. The first cohort, which consisted of 1483 radiographs, was split into training (n=1261) and validation (n=222) sets. The other cohorts comprising 436, 340, and 163 patients, respectively, were used as independent test datasets. For the second cohort, follow-up data of 311 patients was used to examine progression prediction capabilities. Two neural networks were trained, one on images cropped to the bounding box of the sacroiliac joints (anatomy-aware) and the other one on full radiographs. The performance of the models was compared using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. Results: On the three test datasets, the standard model achieved AUC scores of 0.853, 0.817, 0.947, with an accuracy of 0.770, 0.724, 0.850. Whereas the anatomy-aware model achieved AUC scores of 0.899, 0.846, 0.957, with an accuracy of 0.821, 0.744, 0.906, respectively. The patients who were identified as high risk by the anatomy aware model had an odds ratio of 2.16 (95% CI: 1.19, 3.86) for having progression of radiographic sacroiliitis within 2 years. Conclusion: Anatomical awareness can improve the generalizability of a deep learning model in detecting radiographic sacroiliitis. The model is published as fully open source alongside this study.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 確率的満足度と因果的満足度:マリナライゼーションの影響 Probabilistic and Causal Satisfiability: the Impact of Marginalization ( http://arxiv.org/abs/2405.07373v1 ) ライセンス: Link先を確認	Julian Dörfler, Benito van der Zander, Markus Bläser, Maciej Liskiewicz,	(参考訳) パールズ・コーサル・ヒエラルキー(PCH)の枠組みは、因果関係に関する人間の思考の進歩的洗練を反映した観察的、介入的、反ファクト的という3つのタイプの推論を定式化した。本稿では,PCH全体にわたる確率的および因果的言語で表される満足度問題を中心に,この枠組みにおける推論の計算複雑性の側面を考察する。つまり、標準確率言語および因果言語における式体系を考えると、式を満たすモデルが存在するだろうか? 結果として生じる複雑性は、階層のレベルや公式で許される演算子(加算、乗算、余剰化)によって変化する。我々は,確率的および因果推論に広く用いられている辺縁化を含む式に着目するが,その複雑性問題はほとんど検討されていない。我々の主な貢献は、線形言語(加算と余剰化が可能である)がPCHのレベルに応じてNP^PP-, PSPACE-, NEXP完全満足度問題をもたらすことを示す正確な計算複雑性の結果である。さらに,クラス succ$\exists$R に対して,全言語(余剰乗算も可能)の問題が最上位の対実レベルで完備であることを証明した。以前の研究は、偽事実のケースを開いている下層の Succ$\exists$R に対して満足度問題は完備であることを示した。最後に、小さな多項式サイズに制限された制約付きモデルを考察する。サイズに対する制約は、介入言語と反ファクト言語の複雑さをNEXP完全に減らす。 The framework of Pearl's Causal Hierarchy (PCH) formalizes three types of reasoning: observational, interventional, and counterfactual, that reflect the progressive sophistication of human thought regarding causation. We investigate the computational complexity aspects of reasoning in this framework focusing mainly on satisfiability problems expressed in probabilistic and causal languages across the PCH. That is, given a system of formulas in the standard probabilistic and causal languages, does there exist a model satisfying the formulas? The resulting complexity changes depending on the level of the hierarchy as well as the operators allowed in the formulas (addition, multiplication, or marginalization). We focus on formulas involving marginalization that are widely used in probabilistic and causal inference, but whose complexity issues are still little explored. Our main contribution are the exact computational complexity results showing that linear languages (allowing addition and marginalization) yield NP^PP-, PSPACE-, and NEXP-complete satisfiability problems, depending on the level of the PCH. Moreover, we prove that the problem for the full language (allowing additionally multiplication) is complete for the class succ$\exists$R for languages on the highest, counterfactual level. Previous work has shown that the satisfiability problem is complete for succ$\exists$R on the lower levels leaving the counterfactual case open. Finally, we consider constrained models that are restricted to a small polynomial size. The constraint on the size reduces the complexity of the interventional and counterfactual languages to NEXP-complete.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# コンフォーマル化サバイバル分布:キャリブレーション向上のためのジェネリックポストプロシース Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration ( http://arxiv.org/abs/2405.07374v1 ) ライセンス: Link先を確認	Shi-ang Qi, Yakun Yu, Russell Greiner,	(参考訳) 判別と校正は生存分析の重要な2つの特性を表しており、前者は被験者を正確にランク付けするモデルの能力、後者は予測結果と実際の事象のアライメントを評価する。特に, キャリブレーションの改善により識別性能が低下する傾向にあるため, 生存モデルでは両者を同時に最適化することは困難である。本稿では, モデルキャリブレーションを劣化させることなく, モデルキャリブレーションを改善するためのコンフォメーションレグレッションを利用した新しい手法を提案する。上記の主張に対する理論的保証を提供し、11の現実世界のデータセットにまたがるアプローチの効率を厳格に検証し、その実践的適用性と多様なシナリオにおける堅牢性を示す。 Discrimination and calibration represent two important properties of survival analysis, with the former assessing the model's ability to accurately rank subjects and the latter evaluating the alignment of predicted outcomes with actual events. With their distinct nature, it is hard for survival models to simultaneously optimize both of them especially as many previous results found improving calibration tends to diminish discrimination performance. This paper introduces a novel approach utilizing conformal regression that can improve a model's calibration without degrading discrimination. We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 量子投影フィルタによるフィードバック安定化 Feedback stabilization via a quantum projection filter ( http://arxiv.org/abs/2405.07379v1 ) ライセンス: Link先を確認	Nina H. Amini, Paolo Mason, Ibrahim Ramadan,	(参考訳) 本稿では,プロジェクションフィルタによる不完全な測定を行うオープン量子系の簡易モデルについて考察する。この近似フィルタは、量子非破壊測定(QND)において特にフィードバック安定化問題において用いられる。フィードバック設計は、射影過程に使用される指数族の構造に依存している。提案手法では, 提案したフィードバックが, 推定演算子の固有状態に対応する目標状態に対して, 元のフィルタ方程式の指数収束を保証することを示す。 This paper considers a simplified model of open quantum systems undergoing imperfect measurements obtained via a projection filter approach. We use this approximate filter in the feedback stabilization problem specifically in the case of Quantum Non-Demolition (QND) measurements. The feedback design relies on the structure of the exponential family utilized for the projection process. We demonstrate that the introduced feedback guarantees exponential convergence of the original filter equation toward a predefined target state, corresponding to an eigenstate of the measurement operator.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 古典ゲームにおける許容4ストラテジー量子拡大 Permissible four-strategy quantum extensions of classical games ( http://arxiv.org/abs/2405.07380v1 ) ライセンス: Link先を確認	Piotr Frąckiewicz, Anna Gorczyca-Goraj, Marek Szopa,	(参考訳) この研究は、2つのユニタリ演算によりアイザート・ウィルケンス・リューエンシュタインスキームで拡張された戦略形式ゲームに焦点を当てている。条件は、一対のユニタリ作用素と古典的戦略が入力された古典的ゲームの同型変換の下でゲーム不変量を形成する条件を決定する。これらの条件がこれらの作用素を決定するために適用され、その結果、同型規準を満たすゲームの5つの主要なクラスが成立し、この同型に対する実践的な規準を与える定理が証明される。拡張の異なるクラス間の相互依存性は、あるクラスが別のクラスに変換される極限ケースを含む特定される。 The study focuses on strategic-form games extended in the Eisert-Wilkens-Lewenstein scheme by two unitary operations. Conditions are determined under which the pair of unitary operators, along with classical strategies, form a game invariant under isomorphic transformations of the input classical game. These conditions are then applied to determine these operators, resulting in five main classes of games satisfying the isomorphism criterion, and a theorem is proved providing a practical criterion for this isomorphism. The interdependencies between different classes of extensions are identified, including limit cases in which one class transforms into another.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 神経・筋肉構造予測のための意味的損失関数 Semantic Loss Functions for Neuro-Symbolic Structured Prediction ( http://arxiv.org/abs/2405.07387v1 ) ライセンス: Link先を確認	Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck,	(参考訳) 構造化出力予測問題は機械学習においてユビキタスである。顕著なアプローチは、ニューラルネットワークを強力な特徴抽出器として利用し、そうでなければ出力の独立性を仮定する。しかしながら、これらの出力は、グラフ内のパスを例に、オブジェクトを共同で符号化するので、出力空間の基盤となる構造を通して関連付けられる。このような構造に関する知識を象徴的に定義したセマンティックロスを,ネットワークの依存性の侵害を最小限に抑え,ネットワークを基盤構造を満たす分布の予測に向けて制御することにより,トレーニングに投入する意味損失について議論する。同時に、シンボルの配置に非依存であり、それによって表現されるセマンティクスにのみ依存すると同時に、効率的なエンドツーエンドのトレーニングと推論を可能にしている。また、セマンティックロスの重要な改善と応用についても論じる。セマンティックな損失の1つの制限は、ターゲットクラスのメンバシップを認証する特定の特徴を持つすべてのデータポイントの関連を利用していないことである。したがって, 有効構造よりも最小エントロピー分布を優先すべきであり, ニューロシンボリックエントロピーを最小化することによって得られる。このより洗練された定式化の利点を実証的に実証する。さらに、セマンティックロスはモジュラーとして設計されており、識別と生成の両方のニューラルモデルと組み合わせることができる。これは、基底領域の構造に従う複雑なオブジェクトを効率的に合成できる新しい種類の深層生成モデルである、制約付き対向ネットワークを生成することによって、これを生成的対向ネットワークに統合することによって説明される。 Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# グラノン相互作用を持つ観測量子粒子系 Observed quantum particles system with graphon interaction ( http://arxiv.org/abs/2405.07389v1 ) ライセンス: Link先を確認	Sofiane Chalal, Nina H. Amini, Gaoyue Guo, Hamed Amini,	(参考訳) 本稿では,間接連続測定対象の量子粒子を不均一に相互作用させるシステムについて考察する。相互作用は平均場型であると仮定される。我々は、新しい制限量子グラノン系を導出し、この系の正当性を証明し、安定結果を確立する。 In this paper, we consider a system of heterogeneously interacting quantum particles subject to indirect continuous measurement. The interaction is assumed to be of the mean-field type. We derive a new limiting quantum graphon system, prove the well-posedness of this system, and establish a stability result.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# AnyRotate: Sim-to-Real Touchによる重力不変物体回転 AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch ( http://arxiv.org/abs/2405.07391v1 ) ライセンス: Link先を確認	Max Yang, Chenghua Lu, Alex Church, Yijiong Lin, Chris Ford, Haoran Li, Efi Psomopoulou, David A. W. Barton, Nathan F. Lepora,	(参考訳) 手の操作は人間の器用さの不可欠な要素である。私たちの手は触覚フィードバックに頼って、安定的で反応的な動きをすることで、操作中に物体が意図せずに滑り落ちないようにしています。ロボットハンドの場合、このディクスタリティのレベルは、精密なモーター制御のために、リッチな接触情報を抽出し、活用する必要がある。本稿では,高密度のsim-to-realタッチを用いた重力不変多軸物体回転システムであるAnyRotateを提案する。本研究では,シミュレーションにおけるポリシをトレーニングするための触覚フィードバックを提供するための連続的接触特徴表現を構築し,シミュレート・トゥ・リアルギャップをブリッジする観察モデルをトレーニングすることでゼロショットポリシー転送を行うアプローチを提案する。実験では,様々な特性を持つ物体を扱う際に,詳細な接触情報の利点を強調した。実世界では、密接な触覚ポリシーのシミュレートと現実の伝達を成功させ、様々な回転軸や手方向の様々な物体に一般化し、他の低次元タッチよりも優れた形状の触覚を再現する。興味深いことに、スリップ検出が明示されていないにもかかわらず、リッチな多指触覚は、把持中の物体の動きを暗黙的に検出し、ポリシーの堅牢性を改善するリアクティブな行動を提供し、手動操作における情報豊富な触覚センシングの重要性を強調している。 In-hand manipulation is an integral component of human dexterity. Our hands rely on tactile feedback for stable and reactive motions to ensure objects do not slip away unintentionally during manipulation. For a robot hand, this level of dexterity requires extracting and utilizing rich contact information for precise motor control. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We construct a continuous contact feature representation to provide tactile feedback for training a policy in simulation and introduce an approach to perform zero-shot policy transfer by training an observation model to bridge the sim-to-real gap. Our experiments highlight the benefit of detailed contact information when handling objects with varying properties. In the real world, we demonstrate successful sim-to-real transfer of the dense tactile policy, generalizing to a diverse range of objects for various rotation axes and hand directions and outperforming other forms of low-dimensional touch. Interestingly, despite not having explicit slip detection, rich multi-fingered tactile sensing can implicitly detect object movement within grasp and provide a reactive behavior that improves the robustness of the policy, highlighting the importance of information-rich tactile sensing for in-hand manipulation.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# NGD-SLAM:GPUのない動的環境のためのリアルタイムSLAMを目指して NGD-SLAM: Towards Real-Time SLAM for Dynamic Environments without GPU ( http://arxiv.org/abs/2405.07392v1 ) ライセンス: Link先を確認	Yuhao Zhang,	(参考訳) ダイナミック環境における高精度でロバストなカメラトラッキングは、視覚SLAM(Simultaneous Localization and Mapping)にとって大きな課題となる。この分野での最近の進歩は、動的オブジェクトのマスクを生成するためにディープラーニング技術を使用することが多い。そこで本稿では,CPU上でのリアルタイムパフォーマンスを実現する動的環境のための新しい視覚SLAMシステムを提案する。これに基づいて、さらに2段階の光フロー追跡手法を導入し、光学フローとORBのハイブリッド利用を採用し、システムの効率性とロバスト性を大幅に向上させる。最先端の手法と比較して、ハードウェアアクセラレーションなしで1台のラップトップCPU上で56fpsのトラッキングフレーム率を実現しつつ、動的環境における高いローカライズ精度を維持し、GPUサポートなしでもディープラーニング手法が動的SLAMに対してまだ実現可能であることを証明した。利用可能な情報に基づいて、これが最初に実現したSLAMシステムである。 Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 固有フェアネス-等化オッドによる精度トレードオフ Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds ( http://arxiv.org/abs/2405.07393v1 ) ライセンス: Link先を確認	Meiyu Zhong, Ravi Tandon,	(参考訳) 法執行機関、刑事司法、財務、雇用、入場といった分野における機械学習(ML)システムの採用の増加に伴い、MLが支援する意思決定の公正性を保証することがますます重要になっている。本稿では,等化確率の統計的概念の下で,公平性と精度のトレードオフについて検討する。フェアネス予算の関数として、精度(どの分類器にも当てはまる)の新たな上限を提示する。さらに、データやラベル、センシティブなグループ属性の基盤となる統計にも依存しています。実世界の3つのデータセット(CompAS、アダルト、ロースクール)を実証分析し、理論上界を検証した。具体的には、文献における様々な既成の公正分類器によって達成されるトレードオフと比較する。以上の結果から,低バイアス群に対する高い精度の達成は,グループ間の統計的格差に基づいて根本的に制限される可能性が示唆された。 With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# CaFA:地球規模の気象予報, 球面上の因子的注意 CaFA: Global Weather Forecasting with Factorized Attention on Sphere ( http://arxiv.org/abs/2405.07395v1 ) ライセンス: Link先を確認	Zijie Li, Anthony Zhou, Saurabh Patil, Amir Barati Farimani,	(参考訳) 正確な天気予報は様々な分野において重要であり、意思決定プロセスや社会イベントに影響を及ぼす。機械学習モデルに基づくデータ駆動型アプローチは、歴史的データから異なるスケールの物理を捉え、予測段階の計算コストを大幅に削減する可能性から、数値天気予報モデルに代わる有望な選択肢として最近登場した。さまざまなドメインにわたる最先端のパフォーマンスで有名だが、Transformerモデルは機械学習の天気予報にも人気がある。しかし、特に世界規模での天気予報にトランスフォーマーアーキテクチャを適用することは、注意の2次複雑さと解像度が増大するにつれて空間点の2次増加のため、計算的に困難である。本研究では, この問題を緩和するために, 球面測地に適した分解アテンションモデルを提案する。より具体的には、カーネルの計算複雑性が全分解能ではなく軸分解能の2倍であるような異なる軸に対向する多次元因子化カーネルを利用する。 1.5^\circ$および0-7dayのリードタイムにおける提案モデルの決定論的予測精度は、純粋にデータ駆動型機械学習天気予報モデルと同等である。また,提案モデルでは,トランスフォーマーモデルよりも計算コストの低い精度で精度を向上し,標準的な注意力を持つトランスフォーマーモデルよりも精度の高いパレートを推し進めることができることを示す。 Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12
# 高速展開のための半監督雑草検出と高効率化 Semi-Supervised Weed Detection for Rapid Deployment and Enhanced Efficiency ( http://arxiv.org/abs/2405.07399v1 ) ライセンス: Link先を確認	Alzayat Saleh, Alex Olsen, Jake Wood, Bronson Philippa, Mostafa Rahimi Azghadi,	(参考訳) 雑草は農業において重大な課題を示し、収量減少を引き起こし、高価な規制措置を必要とする。コンピュータビジョンとディープラーニングを用いた自動雑草検出は、有望な解決策を提供する。しかし,従来のディープラーニング手法では,大量のラベル付きトレーニングデータを必要とする場合が多い。本稿では,2つの主要成分からなる半教師付き雑草検出手法を提案する。まず,異なる規模の雑草の特徴を捉えるために,マルチスケールの特徴表現手法を用いる。第2に、トレーニング中にラベル付き画像の小さなセットを活用する適応的な擬似ラベル割り当て戦略を提案する。この戦略は、ラベルなしデータから生成された疑似ラベルに信頼スコアを動的に割り当てる。さらに,本手法は,エポック対応と擬似ラベルを融合して学習プロセスをさらに強化する。 COCOデータセットと、CottonWeedDet12、CropAndWeed、Palmer amaranth、RadishWheat、RoboWeedMapの5つの顕著な雑草データセットに関する実験結果から、既存の技術と比較してラベル付きデータが少なくても、雑草検出における最先端のパフォーマンスを実現する方法が示されている。このアプローチは,現実世界の農業シナリオにおける雑草検出において,ラベル付け負担を軽減するとともに,深層学習の実現可能性と展開速度を高める可能性を秘めている。 Weeds present a significant challenge in agriculture, causing yield loss and requiring expensive control measures. Automatic weed detection using computer vision and deep learning offers a promising solution. However, conventional deep learning methods often require large amounts of labelled training data, which can be costly and time-consuming to acquire. This paper introduces a novel method for semi-supervised weed detection, comprising two main components. Firstly, a multi-scale feature representation technique is employed to capture distinctive weed features across different scales. Secondly, we propose an adaptive pseudo-label assignment strategy, leveraging a small set of labelled images during training. This strategy dynamically assigns confidence scores to pseudo-labels generated from unlabeled data. Additionally, our approach integrates epoch-corresponding and mixed pseudo-labels to further enhance the learning process. Experimental results on the COCO dataset and five prominent weed datasets -- CottonWeedDet12, CropAndWeed, Palmer amaranth, RadishWheat, and RoboWeedMap -- illustrate that our method achieves state-of-the-art performance in weed detection, even with significantly less labelled data compared to existing techniques. This approach holds the potential to alleviate the labelling burden and enhance the feasibility and deployment speed of deep learning for weed detection in real-world agricultural scenarios.	翻訳日:2024-05-14 15:14:45 公開日:2024-05-12

Title

Authors

Abstract

論文公表日・翻訳日

# Dual-Domain Deep D-bar Method for Solving Electro Impedance Tomography (特集電気インピーダンス・トモグラフィー)

Dual-Domain Deep D-bar Method for Solving Electrical Impedance Tomography ( http://arxiv.org/abs/2407.03335v1 )

ライセンス: Link先を確認

Xiang Cao, Qiaoqiao Ding, Xiaoqun Zhang,

(参考訳) 電気インピーダンストモグラフィー(EIT)の高効率化と簡易化により, 正則化D-bar法は最も顕著な解法の一つである。非線型フーリエ領域の散乱データにローパスフィルタを適用して直接アプローチし、滑らかな導電率近似を与える。しかしDバー画像は、正確な高周波情報の欠如と問題の不備により、コントラストが低く、解像度が低いことが多い。本稿では、低コントラストDバー画像から高コントラストDバー画像列を検索するデュアルドメインニューラルネットワークアーキテクチャを提案する。導電率分布の空間的特徴をより強調するために、広く採用されているU-netは、予測されたD-bar画像列から導電率画像の校正のために調整されている。このようなハイブリッド手法をDual-Domain Deep D-bar法と呼ぶのは,散乱データと画像情報の両方を考慮するためである。単スケール構造と比較して, 提案するマルチスケール構造は, アーティファクトの低減と導電率近似の精細化に優れた性能を示す。さらに、GMRESアルゴリズムを用いて離散Dバーシステムを解くには、CPUベースのデバイスで非常に時間がかかる計算の複雑さが伴う。そこで我々は,Dバーによるデータ拡張処理を高速化するために,GPUをベースとしたリヒャルトソン反復法を設計した。 KIT4 および ACT4 システムのシミュレーション EIT データの数値計算を行い,既存の手法と比較して絶対 EIT 画像品質が顕著に向上したことを示す。

The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolution due to the absence of accurate high-frequency information and ill-posedness of the problem. In this paper, we proposed a dual-domain neural network architecture to retrieve high-contrast D-bar image sequences from low-contrast D-bar images. To further accentuate the spatial features of the conductivity distribution, the widely adopted U-net has been tailored for conductivity image calibration from the predicted D-bar image sequences. We call such a hybrid approach by Dual-Domain Deep D-bar method due to the consideration of both scattering data and image information. Compared to the single-scale structure, our proposed multi-scale structure exhibits superior capabilities in reducing artifacts and refining conductivity approximation. Additionally, solving discrete D-bar systems using the GMRES algorithm entails significant computational complexity, which is extremely time-consuming on CPU-based devices. To remedy this, we designed a surrogate GPU-based Richardson iterative method to accelerate the data enhancement process by D-bar. Numerical results are presented for simulated EIT data from the KIT4 and ACT4 systems to demonstrate notable improvements in absolute EIT imaging quality when compared to existing methodologies.

翻訳日:2024-07-22 22:09:05 公開日:2024-05-12

# 抽象的多文書要約のためのディペンタングリング特異性

Disentangling Specificity for Abstractive Multi-document Summarization ( http://arxiv.org/abs/2406.00005v1 )

ライセンス: Link先を確認

Congbo Ma, Wei Emma Zhang, Hu Wang, Haojie Zhuang, Mingyu Guo,

(参考訳) 多文書要約(MDS)は文書集合から要約を生成する。セット内の各ドキュメントはトピック関連の概念を記述し、各ドキュメントは独自の内容を持っている。しかし、文書の特異性は既存のMDSアプローチからはほとんど注目されていない。各文書の特定の情報を無視することは、生成された要約の包括性を制限します。この問題を解決するために,本稿では,文書から特定の内容を1つの文書集合に切り離す手法を提案する。文書固有の表現は、提案した直交制約によって互いに距離を置くことを奨励され、特定の表現学習者によって学習される。より広範な分析を行い、特定の情報と文書集合の表現が独特な強みに寄与し、それらの組み合わせがMDSにとってより包括的な解決策をもたらすという興味深い知見を得た。また、共通情報(つまり共有情報)がMDS設定下での全体的なパフォーマンスにはあまり寄与しないことがわかった。 Implemetationのコードはhttps://github.com/congboma/DisentangleSum.comで公開されている。

Multi-document summarization (MDS) generates a summary from a document set. Each document in a set describes topic-relevant concepts, while per document also has its unique contents. However, the document specificity receives little attention from existing MDS approaches. Neglecting specific information for each document limits the comprehensiveness of the generated summaries. To solve this problem, in this paper, we propose to disentangle the specific content from documents in one document set. The document-specific representations, which are encouraged to be distant from each other via a proposed orthogonal constraint, are learned by the specific representation learner. We provide extensive analysis and have interesting findings that specific information and document set representations contribute distinctive strengths and their combination yields a more comprehensive solution for the MDS. Also, we find that the common (i.e. shared) information could not contribute much to the overall performance under the MDS settings. Implemetation codes are available at https://github.com/congboma/DisentangleSum.

翻訳日:2024-06-09 16:19:21 公開日:2024-05-12

# 数値ライブラリの自動チューニングへのXAIの適用

Adaptation of XAI to Auto-tuning for Numerical Libraries ( http://arxiv.org/abs/2405.10973v1 )

ライセンス: Link先を確認

Shota Aoki, Takahiro Katagiri, Satoshi Ohshima, Masatoshi Kawai, Toru Nagai, Tetsuya Hoshino,

(参考訳) 人工知能(AI)のアウトプットの非規制利用に関する懸念が持ち上がり、様々な社会問題に繋がる可能性がある。人間は定期的に情報を検証するが、膨大な量のAI生成結果を手動で検査することは現実的ではない。したがって、自動化と可視化が不可欠である。この状況において、説明可能なAI(XAI)技術は、AIモデル開発の合理化と、ユーザへのAI出力の説明の負担軽減を目的として、注目を集めている。同時に、数値計算におけるパフォーマンスチューニングに必要な時間を削減することを目的として、ソフトウェア自動チューニング(AT)技術が出現している。 ATはパラメータ最適化と数値計算のための高性能プログラミングにおけるコスト削減のための強力なツールである。 ATのメカニズムとAI技術の相乗効果は注目に値する。しかし、AIをATメカニズムに適用することは、AIモデル説明可能性の課題をもたらす。本研究は、精度保証数値計算の性能パラメータチューニングとスパース反復アルゴリズムという、2つの異なるプロセスに統合されたAIモデルのXAIに焦点を当てる。

Concerns have arisen regarding the unregulated utilization of artificial intelligence (AI) outputs, potentially leading to various societal issues. While humans routinely validate information, manually inspecting the vast volumes of AI-generated results is impractical. Therefore, automation and visualization are imperative. In this context, Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users. Simultaneously, software auto-tuning (AT) technology has emerged, aiming to reduce the man-hours required for performance tuning in numerical calculations. AT is a potent tool for cost reduction during parameter optimization and high-performance programming for numerical computing. The synergy between AT mechanisms and AI technology is noteworthy, with AI finding extensive applications in AT. However, applying AI to AT mechanisms introduces challenges in AI model explainability. This research focuses on XAI for AI models when integrated into two different processes for practical numerical computations: performance parameter tuning of accuracy-guaranteed numerical calculations and sparse iterative algorithm.

翻訳日:2024-05-27 03:08:05 公開日:2024-05-12

# ランダム行列アンサンブルによる無限次元アンダーソン転移の臨界挙動の記述:対数的多フラクタル性と臨界局在

Describing the critical behavior of the Anderson transition in infinite dimension by random-matrix ensembles: logarithmic multifractality and critical localization ( http://arxiv.org/abs/2405.10975v1 )

ライセンス: Link先を確認

Weitao Chen, Olivier Giraud, Jiangbin Gong, Gabriel Lemarié,

(参考訳) 解析的トラクタビリティのため、ランダム行列アンサンブルは、計算的に要求されるシステムにおいて、エキゾチックな現象を探索するための堅牢なプラットフォームとして機能する。本稿では,共用文字 (arXiv:2312.17481) に基づいて,アンダーソン転移の無限次元における臨界挙動を解析的手法と広範囲な数値シミュレーションを用いて解析する。本研究は対数的多フラクタル性と臨界局在の2種類の臨界挙動を明らかにする。従来の多フラクタル性とは対照的に、新しい対数的多フラクタル性は、システムサイズの対数と代数的にスケールする固有状態モーメントを特徴付ける。局所化を示す有限値に収束する位数$q>1/2$の固有状態モーメントによって特徴づけられる臨界局所化は、ランダム正則と有効無限次元のエルド・オズ・レニイグラフで観測される臨界挙動と一致する特徴的な対数的有限サイズまたは時間効果を示す。摂動法を用いて,本モデルにおける対数的多フラクタル性と臨界局所化の存在を確立する。さらに、時間力学と空間相関関数における新しいスケーリング行動の出現について検討する。我々のモデルは、無限次元量子乱れ系を研究するための貴重な枠組みを提供し、我々の発見の普遍性は、無限次元におけるアンダーソン転移と似た、競合する多体局在遷移を含む、有限サイズ効果とスローダイナミクスの系への広範な適用を可能にする。

Due to their analytical tractability, random matrix ensembles serve as robust platforms for exploring exotic phenomena in systems that are computationally demanding. Building on a companion letter [arXiv:2312.17481], this paper investigates two random matrix ensembles tailored to capture the critical behavior of the Anderson transition in infinite dimension, employing both analytical techniques and extensive numerical simulations. Our study unveils two types of critical behaviors: logarithmic multifractality and critical localization. In contrast to conventional multifractality, the novel logarithmic multifractality features eigenstate moments scaling algebraically with the logarithm of the system size. Critical localization, characterized by eigenstate moments of order $q>1/2$ converging to a finite value indicating localization, exhibits characteristic logarithmic finite-size or time effects, consistent with the critical behavior observed in random regular and Erd\"os-R\'enyi graphs of effective infinite dimensionality. Using perturbative methods, we establish the existence of logarithmic multifractality and critical localization in our models. Furthermore, we explore the emergence of novel scaling behaviors in the time dynamics and spatial correlation functions. Our models provide a valuable framework for studying infinite-dimensional quantum disordered systems, and the universality of our findings enables broad applicability to systems with pronounced finite-size effects and slow dynamics, including the contentious many-body localization transition, akin to the Anderson transition in infinite dimension.

翻訳日:2024-05-27 03:08:05 公開日:2024-05-12

# 光系II反応中心における深層学習による電荷輸送予測

Charge-transport forecasted via deep learning in the photosystem II reaction center ( http://arxiv.org/abs/2405.12232v1 )

ライセンス: Link先を確認

Zi-Ran Zhao, Shun-Cai Zhao, Yi-Meng Huang,

(参考訳) 限られた理論シミュレーションデータを通じて将来の物理行動を予測することは、人工知能技術と量子物理学の統合による新たな研究パラダイムである。本研究では,光合成II反応中心(PSII-RC)における長寿命記憶(LSTM)ネットワークと誤差しきい値学習法により,電荷輸送(CT)の挙動を長期にわたって予測した。 8 fs以内の理論的シミュレーションデータを改良LSTMネットワークに入力し, トレーニングセットの収集時間と比較すると, 10^{-4}=桁違いの差が長時間に渡り, 明らかな予測結果が得られた。その結果、LSTMを用いて、量子物理法に加えて、CTを制御している物理を解明する可能性が示唆された。本研究の意義は、分子スケールでの光合成の理解を深めるために、LSTMのスコープと有効性を完全に解明することである。

Predicting future physical behavior through the limited theoretical simulation data available is an emerging research paradigm resulted by the integration of artificial intelligence technology and quantum physics. In this work, the charge-transport(CT) behavior was forecasted over a long time by a deep learning model, the long short-term memory (LSTM) network with error threshold training method in the photosynthesis II reaction center (PSII-RC). The theoretical simulation data within 8 fs was fed to the modified LSTM network for training, which brings out a distinct prediction with difference of $10^{-4}$ orders of magnitude over a long time period compared to the collection time for training sets. The results indicate the potential of employing LSTM to reveal the physics governing CT in addition to quantum physical methods. The implications of this work warrant further investigation to fully elucidate the scope and efficacy of LSTM for advancing our understanding of photosynthesis at the molecular scale.

翻訳日:2024-05-27 03:08:05 公開日:2024-05-12

# 教育用大規模言語モデル:調査

Large Language Models for Education: A Survey ( http://arxiv.org/abs/2405.13001v1 )

ライセンス: Link先を確認

Hanyi Xu, Wensheng Gan, Zhenlian Qi, Jiayang Wu, Philip S. Yu,

(参考訳) 人工知能(AI)は伝統的な教育に大きな影響を与えている。近年,自然言語処理,コンピュータビジョン,音声認識,自律運転など,大規模言語モデル (LLM) が多用されている。 LLMは、レコメンデーション、金融、政府、教育、法務、金融など、多くの分野にも適用されている。強力な補助ツールとして、LLMは深層学習、事前学習、微調整、強化学習といった様々な技術を取り入れている。 LLMをスマート教育(LLMEdu)に利用することは、世界中の国々にとって重要な戦略的方向性である。 LLMは、教育の質の向上、教育モデルの変更、教師の役割の変更において大きな期待を示してきたが、これらの技術は依然としていくつかの課題に直面している。本稿では,LLMEduの体系的レビューを行い,現在の技術,課題,今後の発展に焦点をあてる。まず,LLMEduの現状を概説し,LLMと教育の特徴を紹介するとともに,LLMを教育に組み込むことのメリットも紹介する。また,LLMを教育産業に統合するプロセスや,関連技術の導入についても検討する。最後に,LLMEduが直面する課題と課題,および今後のLLMEduの最適化の可能性について議論する。

Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and finance. As powerful auxiliary tools, LLMs incorporate various technologies such as deep learning, pre-training, fine-tuning, and reinforcement learning. The use of LLMs for smart education (LLMEdu) has been a significant strategic direction for countries worldwide. While LLMs have shown great promise in improving teaching quality, changing education models, and modifying teacher roles, the technologies are still facing several challenges. In this paper, we conduct a systematic review of LLMEdu, focusing on current technologies, challenges, and future developments. We first summarize the current state of LLMEdu and then introduce the characteristics of LLMs and education, as well as the benefits of integrating LLMs into education. We also review the process of integrating LLMs into the education industry, as well as the introduction of related technologies. Finally, we discuss the challenges and problems faced by LLMEdu, as well as prospects for future optimization of LLMEdu.

翻訳日:2024-05-27 03:08:05 公開日:2024-05-12

# DuetRAG: 共同検索強化世代

DuetRAG: Collaborative Retrieval-Augmented Generation ( http://arxiv.org/abs/2405.13002v1 )

ライセンス: Link先を確認

Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang,

(参考訳) Retrieval-Augmented Generation (RAG) 法は,Large Language Models (LLMs) の入力を関連付け,知識集約タスクにおける事実誤りを低減する。しかしながら、現代のRAGアプローチは、対応するドメイン知識の欠如により、複雑なドメイン質問(例えば、HotPot QA)において、無関係な知識検索問題に悩まされ、低品質世代に繋がる。この問題に対処するため,我々は新しい協調検索型生成フレームワークであるDuetRAGを提案する。我々のブートストラッピング哲学は、知識検索の品質を向上させるため、ドメインフィニングとRAGモデルを同時に統合し、生成品質を向上させることである。最後に、HotPot QAにおいて、DuetRAGの人間研究者とのマッチングを実証した。

Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generations. To address this issue, we propose a novel Collaborative Retrieval-Augmented Generation framework, DuetRAG. Our bootstrapping philosophy is to simultaneously integrate the domain fintuning and RAG models to improve the knowledge retrieval quality, thereby enhancing generation quality. Finally, we demonstrate DuetRAG' s matches with expert human researchers on HotPot QA.

翻訳日:2024-05-27 03:08:05 公開日:2024-05-12

# 会話データ生成の最近の進歩に関する調査研究

A Survey on Recent Advances in Conversational Data Generation ( http://arxiv.org/abs/2405.13003v1 )

ライセンス: Link先を確認

Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi,

(参考訳) 近年の会話システムの進歩は、様々な領域における人間と機械の相互作用を著しく向上させてきた。しかし,特殊な対話データが不足しているため,これらのシステムの訓練は困難である。伝統的に、会話データセットはクラウドソーシングによって作成されていたが、この手法はコストがかかり、規模が限られ、労働集約的であることが証明された。ソリューションとして、既存のデータセットを拡張したり、テキストリソースを会話形式に変換する技術を活用して、データセット作成のためのより効率的でスケーラブルなアプローチを提供する合成対話データの開発が登場した。本稿では,オープンドメイン,タスク指向,情報検索の3種類の対話システムに着目し,マルチターン対話データ生成の体系的・包括的レビューを行う。本稿では,シードデータ生成や発話生成,品質フィルタリングといったキーコンポーネントに基づく既存研究を分類し,会話データ生成システムの主な原理を概説する一般的なフレームワークを紹介する。さらに、合成会話データの評価のための評価指標と手法について検討し、現場における課題に対処し、今後の研究に向けた可能性を探る。我々のゴールは、最先端の手法の概要を提示し、この分野のさらなる研究の機会を強調することで、研究者や実践者の進歩を加速することである。

Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.

翻訳日:2024-05-27 02:58:21 公開日:2024-05-12

# MathDivide: 大規模言語モデルによる数学的推論の改善

MathDivide: Improved mathematical reasoning by large language models ( http://arxiv.org/abs/2405.13004v1 )

ライセンス: Link先を確認

Saksham Sahai Srivastava, Ashutosh Gandhi,

(参考訳) 大規模言語モデルは複雑な言語的および認知的なタスクを扱うことができることが証明されている。そのため、それらの用法は数学のような論理的推論能力を必要とするタスクにまで拡張された。本稿では,数学的問題をより単純なサブプロブレムに分解するMathDivideというプロンプト手法を提案する。各サブプロブレムは、対応する代数式に対してLLMによって生成されたPythonコードによって評価された値の代数式として定式化される。 Pythonコードに供給される値は、問題ステートメントで提供される数値である。サブプロブレムの解は、問題文の最終的な答えを得るために構成される。最後に、最終回答を正解と比較する。最終回答が正しい答えと一致する場合、出力として生成され、その他のものとして、精製プロンプトがLLMに供給される。我々は、GSM8Kデータセットを用いて、このプロンプトをクローズドソースLLMモデルとオープンソースLLMモデルの両方で実験する。その結果、MathDivideはMath-prompterと呼ばれる先進的なプロンプト技術を大幅に上回った。

Large language models have been proven to be capable of handling complex linguistic and cognitive tasks. Therefore their usage has been extended to tasks requiring logical reasoning ability such as Mathematics. In this paper, we propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems. Each of the subproblems is formulated as an algebraic expression whose value is evaluated by the Python code generated by the LLM for the corresponding algebraic expression. The values fed to the Python code are the numerical values provided in the problem statement. The solutions for the subproblems are composed together to obtain the final answer for the problem statement. Finally, the final answer is compared to the correct answer. If the final answer matches the correct answer, it is produced as output else a refinement prompt is fed to the LLM. We experiment with this prompting technique on both closed-source LLM models and open-source LLM models using GSM8K dataset. The results obtained demonstrate that MathDivide was able to significantly outperform the leading prompting technique called Math-prompter.

翻訳日:2024-05-27 02:58:21 公開日:2024-05-12

# 大規模言語モデルとソーシャルメディアデータを用いた急性炎症性疾患の理解

Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data ( http://arxiv.org/abs/2405.13005v1 )

ライセンス: Link先を確認

Nan Miles Xi, Hong-Long Ji, Lin Wang,

(参考訳) サルコイドーシスは,各種臓器に肉芽腫が出現する稀な炎症性疾患である。この病気は、その多様な症状と予測不可能な性質により、診断と治療の課題を呈する。本研究では,ソーシャルメディアプラットフォームRedditにおけるサルコイドーシスに関連する議論を分析するために,Large Language Model (LLM)を用いた。サルコイドーシス関連物質を正確に同定するためのLSMの有用性について検討した。症状は, 疲労, 腫大したリンパ節, 呼吸の短さなど多岐にわたる。プレドニゾンが最も処方された薬剤であり, インフリキシマブは予後改善に最も有効であった。特に, 年齢, 性別による予後の相違がみられ, 女性, 若年者の予後は良好であった。さらに、教師なしクラスタリングでは、独自の症状プロファイル、予後結果、人口分布を持つ3つの異なる患者サブグループ(フェノタイプ)が同定された。最後に、感情分析の結果、特に女性や若年者において、患者の精神疾患後の健康に適度なネガティブな影響が認められた。本研究は,ソーシャルメディアデータによるサルコイドーシスの理解にLLMを応用した最初の事例である。これは、その症状、治療、予後、患者の生活への影響に関するデータ駆動的な洞察を提供することによって、疾患を理解するのに寄与する。本研究は,サルコイドーシス患者に対するパーソナライズされた治療戦略の改善と,ケアの質の向上に直接的な意味を持っている。

Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of LLMs in accurately identifying sarcoidosis-related content. We discovered a wide array of symptoms reported by patients, with fatigue, swollen lymph nodes, and shortness of breath as the most prevalent. Prednisone was the most prescribed medication, while infliximab showed the highest effectiveness in improving prognoses. Notably, our analysis revealed disparities in prognosis based on age and gender, with women and younger patients experiencing good and polarized outcomes, respectively. Furthermore, unsupervised clustering identified three distinct patient subgroups (phenotypes) with unique symptom profiles, prognostic outcomes, and demographic distributions. Finally, sentiment analysis revealed a moderate negative impact on patients' mental health post-diagnosis, particularly among women and younger individuals. Our study represents the first application of LLMs to understand sarcoidosis through social media data. It contributes to understanding the disease by providing data-driven insights into its manifestations, treatments, prognoses, and impact on patients' lives. Our findings have direct implications for improving personalized treatment strategies and enhancing the quality of care for individuals living with sarcoidosis.

翻訳日:2024-05-27 02:58:21 公開日:2024-05-12

# Triple-CFN:抽象推論プロセスの強化のための概念空間の再構築

Triple-CFN: Restructuring Conceptual Spaces for Enhancing Abstract Reasoning process ( http://arxiv.org/abs/2403.03190v7 )

ライセンス: Link先を確認

Ruizhuo Song, Beiming Yuan,

(参考訳) 抽象推論は人工知能アルゴリズムに重大な課題をもたらし、知覚タスクに必要なものよりも高いレベルの認知能力を要求する。本研究では,Bongard Logo問題に対処するTriple-CFN法を導入し,競合するインスタンスの概念空間を暗黙的に再編成することで,顕著な推論精度を実現する。さらに、必要な修正を加えることで、トリプルCFNパラダイムはRPM(Raven's Progressive Matrices)問題でも有効であることが証明され、競争結果が得られた。 RPM問題におけるTriple-CFNの性能をさらに向上させるため,提案手法をMeta Triple-CFNネットワークにアップグレードし,RPM問題の概念空間を明示的に構築し,概念解釈性を確保しつつ高い推論精度を確保した。 Meta Triple-CFNの成功は、概念空間をモデル化するパラダイムに起因している。この考え方に基づいて、我々はRe-spaceレイヤを導入し、Meta Triple-CFNとTriple-CFNの両方の性能を高めました。本稿では,機械知能の進歩に寄与し,抽象的推論問題を解くための革新的なネットワーク設計を探求することによって,この分野におけるさらなるブレークスルーの道を開くことを目的とする。

Abstract reasoning poses significant challenges to artificial intelligence algorithms, demanding a higher level of cognitive ability than that required for perceptual tasks. In this study, we introduce the Triple-CFN method to tackle the Bongard Logo problem, achieving remarkable reasoning accuracy by implicitly reorganizing the conflicting concept spaces of instances. Furthermore, with necessary modifications, the Triple-CFN paradigm has also proven effective on the RPM (Raven's Progressive Matrices) problem, yielding competitive results. To further enhance Triple-CFN's performance on the RPM problem, we have upgraded it to the Meta Triple-CFN network, which explicitly constructs the concept space of RPM problems, ensuring high reasoning accuracy while achieving conceptual interpretability. The success of Meta Triple-CFN can be attributed to its paradigm of modeling the concept space, which is tantamount to normalizing reasoning information. Based on this idea, we have introduced the Re-space layer, boosting the performance of both Meta Triple-CFN and Triple-CFN. This paper aims to contribute to the advancement of machine intelligence and pave the way for further breakthroughs in this field by exploring innovative network designs for solving abstract reasoning problems.

翻訳日:2024-05-16 15:45:06 公開日:2024-05-12

# D4Cグラブトレイン:概念記述と建築分布によるRPMとボンガードログ問題の解法

D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts ( http://arxiv.org/abs/2403.03452v7 )

ライセンス: Link先を確認

Ruizhuo Song, Beiming Yuan,

(参考訳) 本稿では,抽象的推論の領域において,特にRaven's Progressive Matrices (RPM) と Bongard-Logo の課題に対処する上で,注目すべき進歩を実現する。リコネット(Lico-Net)は,RPM問題に顕著な精度で対処する新しいベースラインモデルである。この基礎を生かして、我々はD3Cアプローチを推進し、分布を通して抽象的推論問題の根底にある概念を提唱する。この観点は、Lico-NetとBongard-Logoタスクに優れたベースラインモデルの両方のパフォーマンスを向上させる。 D3Cの計算効率を高めるために,D3C-cosの変種を示す。さらに,これらの領域における概念的境界を再定義するD2C法を提案する。最後に、我々の方法論をD4Cに拡張し、さらに概念境界を洗練させ、RPMとBongard-Logoの課題において実質的な改善を示す。全体として、我々の貢献は抽象的推論の分野における新たな展望と実践的な進歩を示している。

This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstract reasoning problems through distributions. This perspective enhances the performance of both Lico-Net and a baseline model excelling in Bongard-Logo tasks. To bolster the computational efficiency of D3C, we present the D3C-cos variant, offering a streamlined yet precise solution. Furthermore, we propose the D2C method, redefining conceptual boundaries within these domains and bridging the divide between high-level abstractions and their lower-dimensional counterparts. Finally, we extend our methodology to D4C, employing adversarial techniques to refine conceptual boundaries further and demonstrate substantial improvements in both RPM and Bongard-Logo challenges. Overall, our contributions present a fresh outlook and practical advancements in the field of abstract reasoning.

翻訳日:2024-05-16 15:45:06 公開日:2024-05-12

# 学習3期における熱力学限界

Thermodynamic limit in learning period three ( http://arxiv.org/abs/2405.08825v1 )

ライセンス: Link先を確認

Yuichiro Terasaki, Kohei Nakajima,

(参考訳) 周期 3 の連続した一次元写像はすべての周期を含む。周期軌道は3つのデータポイントだけを学習することで得られるのか? ランダムニューラルネットワークを用いた学習期間3について検討し,それに関連する普遍性について報告する。まず、トレーニングされたネットワークには、ターゲットデータとネットワーク設定の選択に依存する熱力学的制限があることを示します。分析の結果,学習期間のほとんどすべてが不安定であり,各ネットワークに特有のアトラクタ(トレーニングされていない場合もある)があることがわかった。本稿では,ネットワークに固有の埋め込み型アトラクタを表現した特性分岐の概念を提案し,対象データポイントとネットワーク重みのスケールを分岐パラメータとして機能させる。結論として、学習期間3は、システムの最近に存在する多数の不安定な期間の安定性の変化により、特徴的分岐によって様々な誘引子を生成する。

A continuous one-dimensional map with period three includes all periods. This raises the following question: Can we obtain any types of periodic orbits solely by learning three data points? We consider learning period three with random neural networks and report the universal property associated with it. We first show that the trained networks have a thermodynamic limit that depends on the choice of target data and network settings. Our analysis reveals that almost all learned periods are unstable and each network has its characteristic attractors (which can even be untrained ones). Here, we propose the concept of characteristic bifurcation expressing embeddable attractors intrinsic to the network, in which the target data points and the scale of the network weights function as bifurcation parameters. In conclusion, learning period three generates various attractors through characteristic bifurcation due to the stability change in latently existing numerous unstable periods of the system.

翻訳日:2024-05-16 15:24:45 公開日:2024-05-12

# データマイニングによる異なる言語におけるセキュリティ脆弱性タイプとその軽減に関する研究

A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages ( http://arxiv.org/abs/2405.08025v1 )

ライセンス: Link先を確認

Gábor Antal, Balázs Mosolygó, Norbert Vándor, Péter Hegedüs,

(参考訳) オンラインサービスにアクセスする人の数は日々増えており、新しいユーザーとともに、効果的でレスポンシブなサイバーセキュリティの必要性が高まっている。本研究の目的は,セキュリティ問題や修正点の観点から,最も広く使用されているプログラミング言語に共通するパターンが存在するかどうかを確かめることであった。本稿では,これらの言語から抽出したデータに基づいて,いくつかの統計値を示す。より人気のあるものを分析すると、同じセキュリティ問題が異なる言語で異なるように見え、提供されたソリューションも同じように異なる可能性があることが分かりました。また、同じサイズのプロジェクトでも、非常に異なる結果が得られ、同じタスクに対してソリューションを提供しても、共通の弱点が生まれることもわかりました。これらの統計は、セキュリティに関してプロジェクトの標準を完全に示すものではないかもしれないが、期待すべきことのよい参照ポイントを提供する。サンプルのサイズが大きくなると、さらに正確になり、与えられた言語で書かれたプロジェクト内のセキュリティ関連アクティビティをより深く理解することが可能になる。

The number of people accessing online services is increasing day by day, and with new users, comes a greater need for effective and responsive cyber-security. Our goal in this study was to find out if there are common patterns within the most widely used programming languages in terms of security issues and fixes. In this paper, we showcase some statistics based on the data we extracted for these languages. Analyzing the more popular ones, we found that the same security issues might appear differently in different languages, and as such the provided solutions may vary just as much. We also found that projects with similar sizes can produce extremely different results, and have different common weaknesses, even if they provide a solution to the same task. These statistics may not be entirely indicative of the projects' standards when it comes to security, but they provide a good reference point of what one should expect. Given a larger sample size they could be made even more precise, and as such a better understanding of the security relevant activities within the projects written in given languages could be achieved.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# ExplainableDetector:説明可能性分析によるSMSスパム検出のためのトランスフォーマーに基づく言語モデリング手法の探索

ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis ( http://arxiv.org/abs/2405.08026v1 )

ライセンス: Link先を確認

Mohammad Amaz Uddin, Muhammad Nazrul Islam, Leandros Maglaras, Helge Janicke, Iqbal H. Sarker,

(参考訳) SMS(ショートメッセージサービス)は、広く使われ、費用対効果の高い通信媒体であり、悲しいことにSMSスパムとして知られる望ましくないメッセージの避難所となった。スマートフォンやインターネット接続の急速な普及により、SMSスパムは大きな脅威となっている。スパマーは、携帯電話ユーザーにとってSMSの重要性に注目している。その結果、新たなサイバーセキュリティの脅威が出現し、SMSスパムの数は近年大幅に増加している。 SMSデータの非構造化フォーマットは、SMSスパム検出に重大な課題をもたらし、サイバーセキュリティ領域におけるスパム攻撃に成功させるのがより困難になる。本研究では、スパムメッセージ検出の問題を解決するために、最適化および微調整された変換器ベース大規模言語モデル(LLM)を用いる。このスパム検出にSMSスパムデータセットのベンチマークを使用し、いくつかの前処理技術を用いてクリーンでノイズのないデータを取得し、テキスト拡張手法を用いてクラス不均衡問題を解決する。総合実験の結果、最適化された細調整BERT (Bidirectional Encoder Representations from Transformers) 変種モデルRoBERTaは99.84\%の精度で得られた。また、このテキストベースのスパムSMS検出タスクにおいて、微調整されたモデルの透明性を探索し、説明する正と負の係数スコアを計算するために、説明可能な人工知能(XAI)技術を用いて作業する。さらに、従来の機械学習(ML)モデルも、その性能をトランスフォーマーベースモデルと比較するために検討された。この分析は、LLMがサイバーセキュリティ分野における複雑なテキストベースのスパムデータにどのように影響を与えるかを説明する。

SMS, or short messaging service, is a widely used and cost-effective communication medium that has sadly turned into a haven for unwanted messages, commonly known as SMS spam. With the rapid adoption of smartphones and Internet connectivity, SMS spam has emerged as a prevalent threat. Spammers have taken notice of the significance of SMS for mobile phone users. Consequently, with the emergence of new cybersecurity threats, the number of SMS spam has expanded significantly in recent years. The unstructured format of SMS data creates significant challenges for SMS spam detection, making it more difficult to successfully fight spam attacks in the cybersecurity domain. In this work, we employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection. We use a benchmark SMS spam dataset for this spam detection and utilize several preprocessing techniques to get clean and noise-free data and solve the class imbalance problem using the text augmentation technique. The overall experiment showed that our optimized fine-tuned BERT (Bidirectional Encoder Representations from Transformers) variant model RoBERTa obtained high accuracy with 99.84\%. We also work with Explainable Artificial Intelligence (XAI) techniques to calculate the positive and negative coefficient scores which explore and explain the fine-tuned model transparency in this text-based spam SMS detection task. In addition, traditional Machine Learning (ML) models were also examined to compare their performance with the transformer-based models. This analysis describes how LLMs can make a good impact on complex textual-based spam data in the cybersecurity field.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性

Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v1 )

ライセンス: Link先を確認

Tian Xie, Xueru Zhang,

(参考訳) 機械学習(ML)モデルは、人間に関する連続的な決定を行うために、社会的ドメインでますます使われているため、データ分散を再形成する能力を持つことが多い。人間は、戦略的エージェントとして、学習システムに反応して継続的に行動に適応する。人口が動的に変化するにつれて、MLシステムは高いパフォーマンスを保証するために頻繁な更新を必要とする可能性がある。しかし、高品質な人名サンプルの取得は、社会的領域において非常に困難であり、不可能である。この問題に対処する一般的なプラクティスは、モデル自体を使用してラベルのないデータサンプルを注釈付けすることです。本稿では,MLモデルが人的戦略応答を組み込んだモデルアノテート標本で再訓練された場合の長期的影響について検討する。まず,戦略エージェントとモデル間の相互作用を形式化し,それらの動的相互作用の下でどのように進化するかを分析する。モデルが再訓練されるにつれて、エージェントは肯定的な決定を受ける傾向が増し、一方、ポジティブなラベルを持つエージェントの割合は、時間とともに減少する可能性がある。そこで本研究では,力学を安定化させる改良されたリトレーニングプロセスを提案する。最後に、これらの再訓練プロセスによってアルゴリズム的公正性がどのように影響するかを検証し、各ラウンドで共通公正性制約を課すことは、長期的には不利なグループにとって利益にならないことを発見した。半合成および実データの実験は理論的な結果を検証する。

As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# PHUDGE: スケーラブルな審査員としてのPhi-3

PHUDGE: Phi-3 as Scalable Judge ( http://arxiv.org/abs/2405.08029v1 )

ライセンス: Link先を確認

Mahesh Deshwal, Apoorva Chawla,

(参考訳) 本稿では,PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test aheading each and every existing model in latency and throughput。 GPT4だけでなく、人間のアノテータにも、絶対的および相対的なグルーピングタスクにおいて、非常に強い相関関係を示す。我々は、コスト効率のよい運用グレードシステムにおいて、小さなLMの使用に対処しただけでなく、Causalモデリングが本質的に遅いだけでなく、学習能力を阻害し、システム全体をより速く、より良くするためには、より簡単なタスクに置き換えるべきであることを示した。我々は、体系的なML実験、思慮深いデータ拡張、問題自体の浄化に従えば、より少ないトレーニングデータでも10倍のモデルを達成できることを示した。我々の知る限り、我々は、ミンコフスキー距離とペナルティと損失の平滑化を制御し、クロスエントロピーの代わりに損失関数として使用し、安定したトレーニングと成績向上のためのより良い結果を得るために、アースモーバー距離の一般化版(別名ワッサースタイン距離)の試験と実演を行う。

In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser training data. To the best of our knowledge, we are re the first one to experiment and showcase the usage of generalised version of Earth Movers Distance AKA Wasserstein distance by using Minkowski Distance with a penalty to control loss smoothing and can be used as a loss function instead of Cross Entropy to get stable training and better results for grading tasks.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# HGTDR:不均質グラフ変換器による薬物再精製の促進

HGTDR: Advancing Drug Repurposing with Heterogeneous Graph Transformers ( http://arxiv.org/abs/2405.08031v1 )

ライセンス: Link先を確認

Ali Gharizadeh, Karim Abbasi, Amin Ghareyazi, Mohammad R. K. Mofrad, Hamid R. Rabiee,

(参考訳) モチベーション(Motivation): 薬物再資源化は、薬物開発に関連する時間とコストを削減するための有効な解決策である。しかし、これまでのところ、提案されている薬物再資源化アプローチは依然として期待に応える必要がある。したがって、コスト削減と人命向上のために、医薬品再資源化のための体系的なアプローチを提供することが不可欠である。近年, 生物学的ネットワークを用いた薬物再資源化法は, 有望な結果を生んでいる。しかし、これらの方法には制限がある。主に、これらの手法の範囲は、彼らが効果的に扱えるデータのサイズと多様性に制限される。もう一つの問題は、均質なデータに対処または変換する必要がある異質なデータを扱うことで起こり、情報の喪失につながる。重大な欠点は、これらのアプローチのほとんどはエンドツーエンドの機能がなく、手動による実装と特定の段階でのエキスパートの知識を必要としていることです。結果: 薬物再資源化に伴う課題に対処するため, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing) を提案する。 HGTDRは知識グラフに基づく薬物再資源化のための3段階のアプローチである。 1)異種知識グラフの構築 2ヘテロジニアスグラフトランス網の利用、及び 3) 完全に接続されたネットワークを用いて, 計算関係のスコアを算出した。 HGTDRを利用することで、ユーザは入力グラフを操作し、多様なエンティティから情報を抽出し、所望の出力を得ることができる。評価ステップでは,HGTDRが従来の手法と相容れない性能を示す。さらに,本手法の薬品再資源化提案の上位10点を検証するため,医療研究をレビューし,有望な結果が得られた。また,HGTDRは,薬物タンパク質や疾患タンパク質の相互関係などの数値的および実験的検証を通じて,他の種類の関係を予測する能力も実証した。

Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# エージェント型社会シミュレーションモデル設計のための会話型AIサポートの可能性を探る

Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design ( http://arxiv.org/abs/2405.08032v1 )

ライセンス: Link先を確認

Peer-Olaf Siebers,

(参考訳) AIで動くチャットボットChatGPTは、数億のユーザベースを持つが、今やグローバルな現象だ。しかし、社会シミュレーション分野の研究におけるChatGPTのような会話型AIシステム(CAIS)の利用は依然として限られている。具体的には,エージェント・ベース・ソーシャル・シミュレーション (ABSS) モデル設計におけるその使用の証拠はない。新たなものに対する懐疑論は人間の本質に固有のものであるが、我々はABSSモデル設計をサポートするためにこの革新的な技術の使用を開始することが不可欠であると強く信じている。本稿では,CAISが簡潔な時間枠と必要最小限のケースベース知識で,革新的なABSSモデルの開発をいかに促進できるかを実証する概念実証について述べる。先進的なプロンプト技術を採用し,エンジニアリングABSSフレームワークを定着させることで,CAISによるABSSモデルの設計を可能にする包括的なプロンプトスクリプトを構築した。本書の有効性は,美術館における適応型建築の利用に関する実証的な事例研究を通じて実証される。会話における不正確さや相違点が時々あったにもかかわらず、CAISはABSSモデラーにとって貴重なパートナーであることが判明した。

ChatGPT, the AI-powered chatbot with a massive user base of hundreds of millions, has become a global phenomenon. However, the use of Conversational AI Systems (CAISs) like ChatGPT for research in the field of Social Simulation is still limited. Specifically, there is no evidence of its usage in Agent-Based Social Simulation (ABSS) model design. While scepticism towards anything new is inherent to human nature, we firmly believe it is imperative to initiate the use of this innovative technology to support ABSS model design. This paper presents a proof-of-concept that demonstrates how CAISs can facilitate the development of innovative conceptual ABSS models in a concise timeframe and with minimal required upfront case-based knowledge. By employing advanced prompt engineering techniques and adhering to the Engineering ABSS framework, we have constructed a comprehensive prompt script that enables the design of ABSS models with or by the CAIS. The effectiveness of the script is demonstrated through an illustrative case study concerning the use of adaptive architecture in museums. Despite occasional inaccuracies and divergences in conversation, the CAIS proved to be a valuable companion for ABSS modellers.

翻訳日:2024-05-15 18:03:09 公開日:2024-05-12

# いくら食べるか? スポンジの食事のポーション推定

How Much You Ate? Food Portion Estimation on Spoons ( http://arxiv.org/abs/2405.08717v1 )

ライセンス: Link先を確認

Aaryam Sharma, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong,

(参考訳) 食事摂取のモニタリングは健康な生活を促進する重要な側面である。近年、コンピュータビジョン技術の進歩により、画像と深度カメラを用いて食事摂取監視が進められている。しかし、現在の最先端画像に基づく食品部分推定アルゴリズムでは、ユーザーは食事の画像を1、2回撮影するが、これは不便であり、シチューに浸した材料など、トップダウンの視点では見えない食品を捕獲することができないと仮定している。これらの制約に対処するため,我々は,固定式のユーザ向けカメラを用いて,設置後のカメラ視点の変更を必要とせず,機器上での食品の追跡を行う革新的なソリューションを導入した。食材深度の浅い道具は、食品を捕食するのに有利な角度を与え、食材の表面でそれらを追跡することは、食事後のイメージキャプチャを必要とせず、食事の摂取量をはるかに正確に推定する。本システムは,スープやシチューなどの液状固形不均一混合物の栄養含量の推定に信頼性が高い。非侵襲的で、ユーザフレンドリで、高精度な食事摂取モニタリングツールとして、我々は一連の実験を通して、我々の方法の異常な可能性を実証した。

Monitoring dietary intake is a crucial aspect of promoting healthy living. In recent years, advances in computer vision technology have facilitated dietary intake monitoring through the use of images and depth cameras. However, the current state-of-the-art image-based food portion estimation algorithms assume that users take images of their meals one or two times, which can be inconvenient and fail to capture food items that are not visible from a top-down perspective, such as ingredients submerged in a stew. To address these limitations, we introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils, not requiring any change of camera perspective after installation. The shallow depth of utensils provides a more favorable angle for capturing food items, and tracking them on the utensil's surface offers a significantly more accurate estimation of dietary intake without the need for post-meal image capture. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews. Through a series of experiments, we demonstrate the exceptional potential of our method as a non-invasive, user-friendly, and highly accurate dietary intake monitoring tool.

翻訳日:2024-05-15 13:28:19 公開日:2024-05-12

# 構造モチーフを用いた分子スキャッホールドの拡張学習

Learning to Extend Molecular Scaffolds with Structural Motifs ( http://arxiv.org/abs/2103.03864v5 )

ライセンス: Link先を確認

Krzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt,

(参考訳) 深層学習に基づく分子モデリングの最近の進歩は、シリコ薬物発見の加速を約束している。多くの生成モデルが利用可能であり、原子・バイ・原子・ボンド・バイ・フラグメント・バイ・フラグメント・バイ・フラッグメントのいずれでも分子を構築することができる。しかし、多くの薬物発見プロジェクトでは、生成した分子に固定された足場が必要であり、その制約を組み込むことは、最近になって研究されたばかりである。本稿では,生成過程の初期シードとして自然に足場をサポートするグラフベースモデルであるMoLeRを提案する。実験の結果,MoLeRは非拘束の分子最適化タスクにおいて最先端の手法と相容れない性能を示し,既存の手法よりも訓練やサンプルの処理が格段に速く,足場ベースのタスクでは性能が向上することがわかった。さらに, 外観が微妙な設計選択が全体的な性能に与える影響も示す。

Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.

翻訳日:2024-05-15 02:11:16 公開日:2024-05-12

# 分類のためのルール生成:スケーラビリティ、解釈可能性、公正性

Rule Generation for Classification: Scalability, Interpretability, and Fairness ( http://arxiv.org/abs/2104.10751v4 )

ライセンス: Link先を確認

Tabea E. Röber, Adia C. Lumadjeng, M. Hakan Akyüz, Ş. İlker Birbil,

(参考訳) 制約付き分類のためのルールベースの新しい最適化手法を提案する。提案手法は,線形プログラミングに列生成を利用するため,大規模データセットに対してスケーラブルである。その結果の価格サブプロブレムはNP-Hardであることが示されている。我々は決定木に基づくヒューリスティックに言及し、アクセラレーションのためのプロキシ価格サブプロブレムを解決する。この方法は、学習における各ルールの重要性を示す最適な重みとともに、一連のルールを返します。ルールにコスト係数を割り当て、追加制約を導入することにより、解釈可能性と公正性に対処する。特に、局所的解釈可能性に着目し、複数の属性やクラスに対して公平に分離基準を一般化する。本稿では,提案手法の性能をデータセットの集合上で検証し,その異なる側面について詳しく述べる。ルールに基づく学習手法は,一方の局所的解釈可能性と一方の公平性,他方の精度との間に良好な妥協関係を示す。

We introduce a new rule-based optimization method for classification with constraints. The proposed method leverages column generation for linear programming, and hence, is scalable to large datasets. The resulting pricing subproblem is shown to be NP-Hard. We recourse to a decision tree-based heuristic and solve a proxy pricing subproblem for acceleration. The method returns a set of rules along with their optimal weights indicating the importance of each rule for learning. We address interpretability and fairness by assigning cost coefficients to the rules and introducing additional constraints. In particular, we focus on local interpretability and generalize separation criterion in fairness to multiple sensitive attributes and classes. We test the performance of the proposed methodology on a collection of datasets and present a case study to elaborate on its different aspects. The proposed rule-based learning method exhibits a good compromise between local interpretability and fairness on the one side, and accuracy on the other side.

翻訳日:2024-05-15 02:11:16 公開日:2024-05-12

# 最適輸送写像のエントロピー推定

Entropic estimation of optimal transport maps ( http://arxiv.org/abs/2109.12004v3 )

ライセンス: Link先を確認

Aram-Alexandre Pooladian, Jonathan Niles-Weed,

(参考訳) 我々は,厳密な有限サンプル保証付きで$\mathbb{R}^d$上の2つの分布間の最適写像を推定する計算可能手法を開発した。ブレニエの定理のエントロピック版を利用することで、最適エントロピック計画の「emph{barycentric projection}」という推定器がシンクホーンのアルゴリズムを用いて容易に計算できることが示される。その結果, サンプルの次元や数が大きい場合, 評価が遅い現在の地図推定手法とは異なり, 大規模データセットにおいても並列化が可能であり, 極めて効率的であることがわかった。最適写像上の滑らかさの仮定の下では、我々の推定器は文学における他の推定器と同等の統計的性能を享受するが、計算コストははるかに低い。提案した推定器の数値例による有効性を示す。 Lepskiの手法により、基礎となる最適輸送写像の滑らかさに適応する推定器の修正版を提案する。我々の証明は、エントロピー最適輸送のための修正双対原理と、Pal (2019) による最適エントロピー計画の近似法に基づいている。

We develop a computationally tractable method for estimating the optimal map between two distributions over $\mathbb{R}^d$ with rigorous finite-sample guarantees. Leveraging an entropic version of Brenier's theorem, we show that our estimator -- the \emph{barycentric projection} of the optimal entropic plan -- is easy to compute using Sinkhorn's algorithm. As a result, unlike current approaches for map estimation, which are slow to evaluate when the dimension or number of samples is large, our approach is parallelizable and extremely efficient even for massive data sets. Under smoothness assumptions on the optimal map, we show that our estimator enjoys comparable statistical performance to other estimators in the literature, but with much lower computational cost. We showcase the efficacy of our proposed estimator through numerical examples, even ones not explicitly covered by our assumptions. By virtue of Lepski's method, we propose a modified version of our estimator that is adaptive to the smoothness of the underlying optimal transport map. Our proofs are based on a modified duality principle for entropic optimal transport and on a method for approximating optimal entropic plans due to Pal (2019).

翻訳日:2024-05-15 02:11:16 公開日:2024-05-12

# マルコフ線型確率近似の最適およびインスタンス依存保証

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation ( http://arxiv.org/abs/2112.12770v2 )

ライセンス: Link先を確認

Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett,

(参考訳) エルゴード型マルコフ連鎖から長さ$n$の軌跡を観測し,$d$次元線形不動点方程式を近似的に解くための確率近似法について検討した。まず、標準スキームの最後の繰り返しの2乗誤差に対して、$t_{\mathrm{mix}} \tfrac{d}{n}$の非漸近的境界を示す($t_{\mathrm{mix}}$は混合時間である)。次に、適切な平均化されたイテレート列上の非漸近的インスタンス依存境界を証明し、高次項のパラメータ $(d, t_{\mathrm{mix}})$ への鋭い依存を含む局所漸近的ミニマックス極限に一致する先頭項とする。これらの上界を非漸近ミニマックス下界で補い、平均化されたSA推定器のインスタンス最適性を確立する。マルコフノイズを用いた政策評価のためのこれらの結果は、すべての$\lambda \in [0, 1)$に対するTD($\lambda$)アルゴリズムファミリーと線形自己回帰モデルをカバーする。例えば、TD($\lambda$)アルゴリズムを実行するとき、$\lambda$の値を選択する。

We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).

翻訳日:2024-05-15 02:11:16 公開日:2024-05-12

# 確率的ランゲヴィン差分包と機械学習への応用

Stochastic Langevin Differential Inclusions with Applications to Machine Learning ( http://arxiv.org/abs/2206.11533v3 )

ライセンス: Link先を確認

Fabio V. Difonzo, Vyacheslav Kungurtsev, Jakub Marecek,

(参考訳) ランゲヴィン拡散形式の確率微分方程式は、ベイズサンプリングアルゴリズムと機械学習における最適化の両方において基礎的な役割を担っているため、大きな注目を集めている。後者では、過パラメータ化モデルのトレーニングにおいて、確率勾配流の概念モデルとして機能する。しかしながら、文献は通常、勾配がドリフト項であるポテンシャルの滑らかさを仮定する。それでも、ポテンシャル函数が連続的に微分可能でないような問題が多く、したがってドリフトは至る所でリプシッツ連続ではない。これは、リグレッション問題におけるロバストな損失とRectified Linear Unitsによって実証される。本稿では,Langevin型確率微分包摂のフローと漸近特性に関する基礎的な結果を示す。特に、この解の強い存在を示すとともに、標準自由エネルギー関数の漸近最小化を示す。

Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# 帯域フィードバックを用いたオンライン・サBmodular + SuPermodular (BP)最大化

Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback ( http://arxiv.org/abs/2207.03091v3 )

ライセンス: Link先を確認

Adhyyan Narang, Omid Sadeghi, Lillian J Ratliff, Maryam Fazel, Jeff Bilmes,

(参考訳) 組合せ目的を伴うオンラインインタラクティブ機械学習の文脈では、純粋にサブモジュラーな事前作業はより一般的な非サブモジュラーな目的に拡張する。これは、(1)加法的に分解可能な項を2つの項の和(単調部分モジュラー項と単調超モジュラー項、BP分解(英語版)として知られる)、(2)弱部分モジュラー項のみを含む。どちらの場合でも、これはオブジェクト間の競合(サブモジュール)だけでなく、補完的(スーパーモジュール)な関係を表現することを可能にし、この設定をより広い範囲のアプリケーション(例えば、映画レコメンデーション、医療治療など)に拡張する。さらに,従来のモノリシックなフィードバックアプローチだけでなく,それぞれの用語でフィードバックを個別に利用できる新しいフレームワークについても検討する。実世界の実用性とスケーラビリティを念頭に置いて、純粋にモジュラーなケースを含む計算コストを大幅に削減するために、Nystromスケッチ技術を統合する。ガウス過程の文脈的帯域設定では、すべての場合において準線形理論的後悔境界を示す。また,レコメンデーションシステムやデータサブセットの選択に優れた適用性を示す。

In the context of online interactive machine learning with combinatorial objectives, we extend purely submodular prior work to more general non-submodular objectives. This includes: (1) those that are additively decomposable into a sum of two terms (a monotone submodular and monotone supermodular term, known as a BP decomposition); and (2) those that are only weakly submodular. In both cases, this allows representing not only competitive (submodular) but also complementary (supermodular) relationships between objects, enhancing this setting to a broader range of applications (e.g., movie recommendations, medical treatments, etc.) where this is beneficial. In the two-term case, moreover, we study not only the more typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. With real-world practicality and scalability in mind, we integrate Nystrom sketching techniques to significantly reduce the computational cost, including for the purely submodular case. In the Gaussian process contextual bandits setting, we show sub-linear theoretical regret bounds in all cases. We also empirically show good applicability to recommendation systems and data subset selection.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# HiKonv: 量子畳み込みのスループットを、新しいビット単位の管理と計算で最大化する

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation ( http://arxiv.org/abs/2208.00763v2 )

ライセンス: Link先を確認

Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen,

(参考訳) CNNの量子化は、低ビット幅のデータ表現による計算とストレージのコスト削減を意図して、大きな進歩を見せている。しかし、CPUの ALU やFPGAの DSP など、既存のフルビット幅処理ユニットが、様々な量子化ビット幅での畳み込みにおいて、より高い計算スループットを実現するために、どのように利用できるかという体系的な研究は存在しない。本研究では,新しいビットワイド管理と並列計算により,低ビット幅の量子化データ入力を持つ処理ユニット上での畳み込みのスループットを最大化する統一解であるHiKonvを提案する。我々は,高並列化低ビット幅畳み込みのための全ビット幅乗算器を用いた理論的枠組みと性能モデルを構築し,この臨界領域における高性能コンピューティングの新しいブレークスルーを実証する。例えば、CPU内の単一の32ビット処理ユニットは、128の双項化畳み込み演算(乗算と加算)と13の4ビットの畳み込み演算を1つの乗算命令で行うことができ、FPGA DSP内の1つの27x18乗算器は、1つのクロックサイクルで1,4,8ビット入力で60,8,2の畳み込み演算を配信できる。我々は、CPUとFPGAの両方におけるHiKonvの有効性を示す。 CPUでは、HiKonvは1から8ビットの入力でベースライン実装を上回り、1-D畳み込みでは最大7.6倍と1.4倍の性能向上を実現し、4-D畳み込みでは2.74倍と3.19倍の性能向上を実現している。 FPGAでは、HiKonvソリューションにより、1つのDSPがより短い処理レイテンシで複数の畳み込みを処理することができる。バイナライズされた入力では、HiKonv を持つ各 DSP は 76.6 LUT に等しい。 DAC-SDC 2020のチャンピオンモデルと比較して、HiKonvは2.37倍のスループット向上と2.61倍のDSP効率向上を実現している。

Quantization for CNN has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data representations. There are, however, no systematic studies on how an existing full-bitwidth processing unit, such as ALU in CPUs and DSP in FPGAs, can be better utilized to deliver significantly higher computation throughput for convolution under various quantized bitwidths. In this study, we propose HiKonv, a unified solution that maximizes the throughput of convolution on a given underlying processing unit with low-bitwidth quantized data inputs through novel bit-wise management and parallel computation. We establish theoretical framework and performance models using a full-bitwidth multiplier for highly parallelized low-bitwidth convolution, and demonstrate new breakthroughs for high-performance computing in this critical domain. For example, a single 32-bit processing unit in CPU can deliver 128 binarized convolution operations (multiplications and additions) and 13 4-bit convolution operations with a single multiplication instruction, and a single 27x18 multiplier in the FPGA DSP can deliver 60, 8 or 2 convolution operations with 1, 4 or 8-bit inputs in one clock cycle. We demonstrate the effectiveness of HiKonv on both CPU and FPGA. On CPU, HiKonv outperforms the baseline implementation with 1 to 8-bit inputs and provides up to 7.6x and 1.4x performance improvements for 1-D convolution, and performs 2.74x and 3.19x over the baseline implementation for 4-bit signed and unsigned data inputs for 2-D convolution. On FPGA, HiKonv solution enables a single DSP to process multiple convolutions with a shorter processing latency. For binarized input, each DSP with HiKonv is equivalent up to 76.6 LUTs. Compared to the DAC-SDC 2020 champion model, HiKonv achieves a 2.37x throughput improvement and 2.61x DSP efficiency improvement, respectively.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# 自動推奨コード更新:まだ存在するか?

Automatically Recommend Code Updates: Are We There Yet? ( http://arxiv.org/abs/2209.07048v3 )

ライセンス: Link先を確認

Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Patanamon Thongtanunam, Li Li,

(参考訳) 近年、CodeLM(Code-trained Language Models of Code)は、様々なソフトウェアエンジニアリングタスクにおいて有望な結果を示している。そのようなタスクのひとつが自動コード更新レコメンデーションであり、古いコードスニペットを承認および修正されたコードに変換する。多くのCodeLMベースのアプローチが提案されているが、精度が高く、実際のコード更新タスクの有効性と信頼性は疑問視されている。本稿では,コード更新を自動で推奨する,最先端のCodeLMの広範な評価を行う。時間的進化, プロジェクトの特異性, メソッドサイズ, 更新複雑性などの要因を考慮し, ペア更新手法の2つの多種多様なデータセットの性能評価を行った。結果から,CodeLMは時間的情報を無視した環境では良好に機能するが,より現実的な時間的シナリオに苦しむとともに,新しいプロジェクトへの一般化が不十分であることが明らかとなった。さらに、より大規模なメソッドやより複雑な更新では、CodeLMのパフォーマンスが大幅に低下する。さらに、多くのCodeLM生成した"更新"は実際にはnullであり、特に時間的な設定では意味のある編集は難しいままである。本研究は,実世界のコード更新勧告におけるCodeLMの認識と実際の有効性の間に有意なギャップを生じさせ,実用性,堅牢性,一般化性の向上に向けたさらなる研究の必要性を強調した。

In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on real-world code update tasks remain questionable. In this paper, we present the first extensive evaluation of state-of-the-art CodeLMs for automatically recommending code updates. We assess their performance on two diverse datasets of paired updated methods, considering factors such as temporal evolution, project specificity, method size, and update complexity. Our results reveal that while CodeLMs perform well in settings that ignore temporal information, they struggle in more realistic time-wise scenarios and generalize poorly to new projects. Furthermore, CodeLM performance decreases significantly for larger methods and more complex updates. Furthermore, we observe that many CodeLM-generated "updates" are actually null, especially in time-wise settings, and meaningful edits remain challenging. Our findings highlight the significant gap between the perceived and actual effectiveness of CodeLMs for real-world code update recommendation and emphasize the need for more research on improving their practicality, robustness, and generalizability.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# ISFL:地域重要度サンプリングによる非i.d.データのフェデレーション学習

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling ( http://arxiv.org/abs/2210.02119v3 )

ライセンス: Link先を確認

Zheqi Zhu, Yuchen Shi, Pingyi Fan, Chenghui Peng, Khaled B. Letaief,

(参考訳) 計算とコミュニケーションを統合する有望な学習パラダイムとして、フェデレートラーニング(FL)は、分散クライアントからのローカルトレーニングと定期的な共有を進める。クライアント上の非IDデータ分布のため、FLモデルは勾配の多様性、性能の低下、収束不良等に悩まされる。本研究は,地域訓練に重要サンプリング(IS)を採用することで,この課題に対処することを目的とする。理論的保証のある明示的な枠組みであるISFLを提案する。まず、ISFLの収束定理を導出し、局所的な重要度サンプリングの効果を包含する。そして、最適なIS重みを選択する問題を定式化し、理論解を得る。また,IS重みを計算し,ISFLアルゴリズムを開発するために水充填法を用いる。 CIFAR-10の実験結果は、提案された定理によく適合し、ISFLがより優れた性能、サンプリング効率、および非i.d.データの説明可能性が得られることを検証した。私たちの知る限りでは、ISFLは、ニューラルネットワークモデルとの理論的互換性を示す局所的なサンプリングの側面から、最初の非一意のFLソリューションである。さらに、局所的なサンプリング手法として、ISFLは他の新しいFLフレームワークに容易に移行できる。

As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# 進化的一般化ゼロショット学習

Evolutionary Generalized Zero-Shot Learning ( http://arxiv.org/abs/2211.13174v2 )

ライセンス: Link先を確認

Dubing Chen, Chenyi Jiang, Haofeng Zhang,

(参考訳) 属性ベースのゼロショット学習(ZSL)は、トレーニング中に見えない新しいクラスを認識するモデルの能力に革命をもたらした。しかし、大規模モデルの進歩に伴い、期待は高まった。単にゼロショットの一般化を達成するだけでなく、ラベルのないデータを使って専門家の領域で継続的に進化できる普遍モデルへの需要が高まっている。これを解決するために,進化的一般化ゼロショット学習(EGZSL)という,この課題のスケールダウンインスタンス化を導入する。この設定により、低パフォーマンスのゼロショットモデルでテストデータストリームに適応し、オンラインで進化させることができる。本稿では,この特別課題の3つの課題,すなわち,破滅的忘れ,初期予測バイアス,進化的データクラスバイアスについて詳述する。さらに,各課題に対する目標解を提案することにより,与えられた初期IGZSLモデルから連続的に進化可能な汎用的手法を提案する。 3つの人気のあるGZSLベンチマークデータセットの実験では、他のベースラインがフェールしている間に、テストデータストリームからモデルを学習できることが示されています。コードは \url{https://github.com/cdb342/EGZSL} で入手できる。

Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, \ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. Codes are available at \url{https://github.com/cdb342/EGZSL}.

翻訳日:2024-05-15 02:01:31 公開日:2024-05-12

# 量子二コトミーと一階漸近現象を超えたコヒーレント熱力学

Quantum dichotomies and coherent thermodynamics beyond first-order asymptotics ( http://arxiv.org/abs/2303.05524v3 )

ライセンス: Link先を確認

Patryk Lipka-Bartosik, Christopher T. Chubb, Joseph M. Renes, Marco Tomamichel, Kamil Korzekwa,

(参考訳) すなわち、量子チャネル $\mathcal E$ mapping $\rho_1^{\otimes n}$ to $\rho_2^{\otimes R_nn}$ with an error $\epsilon_n$ and $\sigma_1^{\otimes n}$ to $\sigma_2^{\otimes R_nn}$ である。我々は、任意のペア$(\rho_1,\sigma_1) の初期状態と可換ペア$(\rho_2,\sigma_2) 最終状態の$に対して、小、中、大の偏差誤差レジームおよびゼロエラーレジームにおいて、最適変換率$R_n$の2階漸近式を導出する。また、熱ギブス状態によって与えられる$\sigma_1$および$\sigma_2$の場合、第1の3つの状態における最適変換速度は熱演算によって達成できることを示す。これにより、私たちは初めて、異なるエネルギー固有空間間のコヒーレンスを持つような完全に一般的な初期状態と熱力学的状態相互変換の2階漸近を研究することができる。そこで本研究では,コヒーレント入力を用いた熱力学プロトコルの最適性能について論じ,有限サイズ効果による変換誤差を著しく低減できる3つの新しい共振現象について述べる。さらに、量子二コトミーに関する我々の結果は、2階の漸近項、局所的な演算と古典的な通信の下での純二部共役状態間の最適変換率を得るためにも利用できる。

We address the problem of exact and approximate transformation of quantum dichotomies in the asymptotic regime, i.e., the existence of a quantum channel $\mathcal E$ mapping $\rho_1^{\otimes n}$ into $\rho_2^{\otimes R_nn}$ with an error $\epsilon_n$ (measured by trace distance) and $\sigma_1^{\otimes n}$ into $\sigma_2^{\otimes R_n n}$ exactly, for a large number $n$. We derive second-order asymptotic expressions for the optimal transformation rate $R_n$ in the small, moderate, and large deviation error regimes, as well as the zero-error regime, for an arbitrary pair $(\rho_1,\sigma_1)$ of initial states and a commuting pair $(\rho_2,\sigma_2)$ of final states. We also prove that for $\sigma_1$ and $\sigma_2$ given by thermal Gibbs states, the derived optimal transformation rates in the first three regimes can be attained by thermal operations. This allows us, for the first time, to study the second-order asymptotics of thermodynamic state interconversion with fully general initial states that may have coherence between different energy eigenspaces. Thus, we discuss the optimal performance of thermodynamic protocols with coherent inputs and describe three novel resonance phenomena allowing one to significantly reduce transformation errors induced by finite-size effects. What is more, our result on quantum dichotomies can also be used to obtain, up to second-order asymptotic terms, optimal conversion rates between pure bipartite entangled states under local operations and classical communication.

翻訳日:2024-05-15 01:51:46 公開日:2024-05-12

# 因果構造における量子ゆらぎの角スペクトル

Angular spectrum of quantum fluctuations in causal structure ( http://arxiv.org/abs/2303.06563v2 )

ライセンス: Link先を確認

Craig Hogan, Ohkyung Kwon, Nathaniel Selub,

(参考訳) スケーリング引数は、因果コヒーレントな量子重力のプランクスケール真空ゆらぎによって生じるマクロな因果ダイヤモンドの境界に歪みの角度スペクトルを制約するために用いられる。逆物理的分離における半径$R$の因果ダイヤモンドの表面への歪みの分散とゆらぎの速度は、プランク時間$t_P$によって設定された正規化により$\tau$にのみ依存し、$R$に依存してはならない。スケール$R$の場合、この原理は角スケール$\Theta$, $\langle\delta\tau^2\rangle_\Theta\simeq\tau\:\! t_p\sim\Theta R\:\! t_P/c$と角パワースペクトル$C_\ell\sim (R\:\! l_P)/\ell^3$ at $\ell\gg1$ このスペクトルは、すべての$\ell$で予想される因果コヒーレントな仮想ヌル重力ショックに基づくホログラムノイズのリレーショナルモデルと一致している。高い$\ell$スケーリングは、他のいくつかの量子モデルで予測されるものと対照的であり、これは角波数$\ell$の1つのパワーで異なり、遠方からの画像の過度なぼやけを予測することが示されている。

Scaling arguments are used to constrain the angular spectrum of distortions on boundaries of macroscopic causal diamonds, produced by Planck-scale vacuum fluctuations of causally-coherent quantum gravity. The small-angle spectrum of displacement is derived from a form of scale invariance: the variance and fluctuation rate of distortions normal to the surface of a causal diamond of radius $R$ at transverse physical separation $c\tau\ll R$ should depend only on $\tau$, with a normalization set by the Planck time $t_P$, and should not depend on $R$. For measurements on scale $R$, the principle leads to universal scaling for variance on angular scale $\Theta$, $\langle\delta\tau^2\rangle_\Theta\simeq\tau\:\!t_p\sim\Theta R\:\!t_P/c$, and angular power spectrum $C_\ell\sim (R\:\!l_P)/\ell^3$ at $\ell\gg1$. This spectrum is consistent with a relational model of holographic noise based on causally coherent virtual null gravitational shocks, a general picture conjectured for all $\ell$. The high $\ell$ scaling is contrasted with that predicted in some other quantum models, which differ by one power of angular wavenumber $\ell$ and are shown to predict excessive blurring of images from distant sources.

翻訳日:2024-05-15 01:51:46 公開日:2024-05-12

# 論理ラベルを用いたラベル分布学習

Label Distribution Learning from Logical Label ( http://arxiv.org/abs/2303.06847v2 )

ライセンス: Link先を確認

Yuheng Jia, Jiawei Tang, Jiahao Jiang,

(参考訳) ラベル分布学習(LDL)は、サンプルのラベル記述度(ラベル分布)を予測する効果的な方法である。しかし、トレーニングサンプルのアノテートラベル分布(LD)は非常にコストがかかる。そのため、最近の研究はまずまずラベル拡張(LE)を用いて、論理ラベルから推定されたラベル分布を生成し、その後、回収されたラベル分布に外部LCLアルゴリズムを適用して、見当たらないサンプルのラベル分布を予測する。しかし、このステップワイズなやり方は、LEとDLLの接続の可能性を見落としている。さらに、既存のLEアプローチは、いくつかの記述度を無効なラベルに割り当てることができる。上記の問題を解決するために,論理ラベルから直接LDLモデルを学習する新しい手法を提案し,LEとLDLを結合モデルに統合し,従来のLE手法の欠点を回避する。様々なデータセットに対する大規模な実験により、提案手法は論理ラベルから直接信頼性の高いLCLモデルを構築し、最先端のLE法よりも正確なラベル分布を生成できることが証明された。

Label distribution learning (LDL) is an effective method to predict the label description degree (a.k.a. label distribution) of a sample. However, annotating label distribution (LD) for training samples is extremely costly. So recent studies often first use label enhancement (LE) to generate the estimated label distribution from the logical label and then apply external LDL algorithms on the recovered label distribution to predict the label distribution for unseen samples. But this step-wise manner overlooks the possible connections between LE and LDL. Moreover, the existing LE approaches may assign some description degrees to invalid labels. To solve the above problems, we propose a novel method to learn an LDL model directly from the logical label, which unifies LE and LDL into a joint model, and avoids the drawbacks of the previous LE methods. Extensive experiments on various datasets prove that the proposed approach can construct a reliable LDL model directly from the logical label, and produce more accurate label distribution than the state-of-the-art LE methods.

翻訳日:2024-05-15 01:51:46 公開日:2024-05-12

# BanditQ: Fair Bandits with Guaranteed Rewards

BanditQ: Fair Bandits with Guaranteed Rewards ( http://arxiv.org/abs/2304.05219v3 )

ライセンス: Link先を確認

Abhishek Sinha,

(参考訳) アッパー信頼境界(UCB)、ヘッジ(Hedge)、EXP3(EXP3)など、古典的な非レグレトなマルチ武器の盗賊アルゴリズムは本質的に不公平である。彼らの不公平さは、最も報われる腕をできるだけ頻繁に弾きながら、残りを無視するという目的に起因している。本稿では,各アームに対する報酬の最小化率を保証した確率的設定における公平な予測問題について考察する。本研究は,全情報と帯域幅のフィードバック設定の両方で問題を調査する。本稿では,待ち行列理論と敵対的盗賊を組み合わせることで,最大$O(T^{\frac{3}{4}})の後悔と目標レート違反を容認しつつ,目標報酬率を達成するBanditQという新たなオンラインポリシーを提案する。完全な情報設定における後悔は、単調性仮定または平均的後悔を考える場合、$O(\sqrt{T})$にさらに改善することができる。提案手法は効率的で,公正な予測問題から標準逆MAB問題へのブラックボックス削減を許容する。 BanditQポリシの分析には、独立した関心を持つ可能性のある、新たな自己拘束的不平等が含まれている。

Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as possible while ignoring the rest. In this paper, we consider a fair prediction problem in the stochastic setting with a guaranteed minimum rate of accrual of rewards for each arm. We study the problem in both full-information and bandit feedback settings. Combining queueing-theoretic techniques with adversarial bandits, we propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty of at most $O(T^{\frac{3}{4}}).$ The regret bound in the full-information setting can be further improved to $O(\sqrt{T})$ under either a monotonicity assumption or when considering time-averaged regret. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem. The analysis of the BanditQ policy involves a new self-bounding inequality, which might be of independent interest.

翻訳日:2024-05-15 01:51:46 公開日:2024-05-12

# 画像ラインセグメンテーションの検出と記述に関する総合的レビュー:分類学,比較,課題

A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and Challenges ( http://arxiv.org/abs/2305.00264v2 )

ライセンス: Link先を確認

Xinyu Lin, Yingjie Zhou, Yipeng Liu, Ce Zhu,

(参考訳) イメージラインセグメントは、画像内のオブジェクトやシナリオのストレート、スレンダー、未断の部分を明確にする、基本的な低レベルの視覚的特徴である。ラインセグメントの検出と記述は多くの視覚タスクの基礎となった。多くの研究は線分の検出と記述を目的としているが、包括的なレビューは欠如しており、その進捗を妨げている。本研究は,2次元イメージラインセグメントの検出と記述に関する関連研究を網羅的にレビューし,全体像と深い理解を研究者に提供することにより,ギャップを埋めるものである。それらのメカニズムに基づき、線分検出と記述のための2つの分類法を提示し、これらの研究を導入、分析、要約し、研究者がそれらを迅速かつ広範囲に学べるようにした。主要な問題、中核的な考え、既存手法の利点とデメリット、そして各カテゴリの潜在的な応用について分析・要約し、これまで未知の発見を含む。既存の手法の課題や、それを解決するための知見も、研究者に刺激を与えるために提供される。さらに、いくつかの最先端の線分検出および記述アルゴリズムをバイアスなく評価し、評価コードを公開する。理論的解析は、実験結果と組み合わせて、研究者が目的とする視覚応用のための最良の方法を選択するのを導くことができる。最後に、この研究は、この分野の研究者からより多くの注目を集めるために、潜在的に興味深い将来の研究方向についての洞察を提供する。

An image line segment is a fundamental low-level visual feature that delineates straight, slender, and uninterrupted portions of objects and scenarios within images. Detection and description of line segments lay the basis for numerous vision tasks. Although many studies have aimed to detect and describe line segments, a comprehensive review is lacking, obstructing their progress. This study fills the gap by comprehensively reviewing related studies on detecting and describing two-dimensional image line segments to provide researchers with an overall picture and deep understanding. Based on their mechanisms, two taxonomies for line segment detection and description are presented to introduce, analyze, and summarize these studies, facilitating researchers to learn about them quickly and extensively. The key issues, core ideas, advantages and disadvantages of existing methods, and their potential applications for each category are analyzed and summarized, including previously unknown findings. The challenges in existing methods and corresponding insights for potentially solving them are also provided to inspire researchers. In addition, some state-of-the-art line segment detection and description algorithms are evaluated without bias, and the evaluation code will be publicly available. The theoretical analysis, coupled with the experimental results, can guide researchers in selecting the best method for their intended vision applications. Finally, this study provides insights for potentially interesting future research directions to attract more attention from researchers to this field.

翻訳日:2024-05-15 01:51:46 公開日:2024-05-12

# MAGDiff:ディープニューラルネットワークの活性化グラフによる共変量データセットシフト検出

MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks ( http://arxiv.org/abs/2305.13271v2 )

ライセンス: Link先を確認

Charles Arnal, Felix Hensel, Mathieu Carrière, Théo Lacombe, Hiroaki Kurihara, Yuichi Ike, Frédéric Chazal,

(参考訳) さまざまなタスクへの適用が成功したにもかかわらず、ニューラルネットワークは、他の機械学習方法と同様に、データのシフトに対する感受性によって制限されている。本稿では、任意のニューラルネットワーク分類器から抽出し、このタスク専用の新しいモデルをトレーニングすることなく、効率的な共変量データシフト検出を可能にするMAGDiffと呼ばれる新しい表現群を提案する。これらの表現は、トレーニング分布と対象分布に属するサンプルのニューラルネットワークのアクティベーショングラフを比較して計算され、データセットシフト検出に一般的に使用される2サンプルテストの強力なデータおよびタスク適応統計値が得られる。本研究では,2サンプルのコルモゴロフ・スミルノフ検定(KS)の複数の異なるデータセットとシフトタイプに対する統計的パワーを実験的に測定し,ネットワーク出力に依存する最先端のベースラインに対して,新しい表現が顕著な改善をもたらすことを示す。

Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data- and task-adapted statistics for the two-sample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of two-sample Kolmogorov-Smirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a state-of-the-art baseline relying on the network output.

翻訳日:2024-05-15 01:42:01 公開日:2024-05-12

# 大量のビジュアルプロンプトは本当に必要か?

Do We Really Need a Large Number of Visual Prompts? ( http://arxiv.org/abs/2305.17223v2 )

ライセンス: Link先を確認

Youngeun Kim, Yuhang Li, Abhishek Moitra, Ruokai Yin, Priyadarshini Panda,

(参考訳) 資源制約のあるエッジにモデルを適用することへの関心が高まっているため、パラメータ効率の高い転送学習が広く研究されている。 Visual Prompt Tuning (VPT)は、入力空間への学習可能なプロンプトを予測し、完全なネットワークパラメータのトレーニングと比較して、競争力のある微調整性能を示す。しかし、VPTは入力トークンの数を増やし、計算オーバーヘッドを増大させる。本稿では,視覚トランスアーキテクチャの微調整性能と自己注意操作に及ぼすプロンプト数の影響を解析する。理論的および経験的分析を通して、より多くのプロンプトを追加すると線形性能が向上しないことを示す。さらに,少数のプロンプトの使用による性能劣化を防止することを目的とした,PC(Prompt Condensation)技術を提案する。提案手法はFGVCとVTAB-1kのタスクに対して検証し,精度を維持しながらプロンプト数を約70%削減することを示す。

Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ~70% while maintaining accuracy.

翻訳日:2024-05-15 01:42:01 公開日:2024-05-12

# 2dディラック結晶の電子関連格子熱特性に及ぼす高次電子-フォノン相互作用の影響

Influence of higher order electron-phonon interaction on the electron-related lattice thermal properties of 2d Dirac crystal ( http://arxiv.org/abs/2305.18369v3 )

ライセンス: Link先を確認

Sina Kazemian, Giovanni Fanchini,

(参考訳) 熱伝導率などのディラック結晶の本質的性質を理解するためには、ディラック電子と分散音響フォノンとの相互作用を考えるモデルが必要である。 2次元ディラック結晶の非常に高い熱伝導度は、準理想のフォノン量子ガスによるものであるが、望ましくない制限は電子-フォノン相互作用(e-ph)によって生じる。 e-ph熱伝導率はフォノン散乱率に直接関連している。従来の計算では短波長のフォノンを見落とし、2次元ディラック結晶を解析するには不十分である。フォノン散乱速度は、電子とフォノン(EP-E*)の崩壊を含む3つの粒子相互作用を考慮すると、通常1階の大きさまで計算される。しかし、電子の崩壊と新しい電子とフォノン(E-E*P*)の生成を含む過程は無視される。本研究では,2次元ディラック結晶におけるフォノン散乱速度とe-ph熱伝導率について,短波長フォノンを考慮した正確な式を示す。フォノン散乱率とe-ph熱伝導率を計算する場合, 室温でもE-E*P*過程の意義を示す。さらに,電子とフォノンの崩壊に伴うEP-E*P*相互作用と新しい電子-フォノン対の生成を二階のe-ph相互作用,特にEP-E*P*相互作用を取り入れることの重要性を強調し,高温・低フェルミエネルギーにおけるフォノン散乱速度とe-ph熱伝導率を正確に決定する。この4粒子相互作用プロセスは、これらの特性を効果的に特徴づける上で重要な役割を果たす。

To understand the essential properties of Dirac crystals, such as their thermal conductivity, we require models that consider the interaction between Dirac electrons and dispersive acoustic phonons. The exceptionally high thermal conductivity in 2D Dirac crystals is attributed to near-ideal phonon quantum gases, while undesired limitations arise from electron-phonon (e-ph) interactions which have been shown to limit the thermal conductivity up to several microns away. The e-ph thermal conductivity is directly linked to the phonon scattering rate. Conventional calculations overlook phonons with short-dispersive wavelengths, rendering them inadequate for analyzing 2D Dirac crystals. The phonon scattering rate is typically calculated up to the first-order magnitude, considering 3-particle interactions involving the decay of an electron and phonon (EP-E*) to create a new electron. However, processes involving the decay of an electron and the creation of a new electron and phonon (E-E*P*) are neglected. In this study, we present an accurate expression for the phonon scattering rate and e-ph thermal conductivity in 2D Dirac crystals, accounting for short-dispersive wavelength phonons. We demonstrate the significance of the E-E*P* process even at room temperature in calculating the phonon scattering rate and e-ph thermal conductivity, particularly for first-order e-ph interactions. Furthermore, we emphasize the importance of incorporating second-order e-ph interactions, specifically the EP-E*P* interaction involving the decay of an electron and phonon and the creation of a new electron-phonon pair, to accurately determine the phonon scattering rate and e-ph thermal conductivity at high temperatures and low Fermi energies. This 4-particle interaction process plays a crucial role in characterizing these properties effectively.

翻訳日:2024-05-15 01:42:01 公開日:2024-05-12

# セレンディピティーの獲得:オフポリティアクター批判における過去の成功価値の爆発

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic ( http://arxiv.org/abs/2306.02865v5 )

ライセンス: Link先を確認

Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu,

(参考訳) 高品質な$Q$値関数の学習は、多くの現代のオフポリシーディープ強化学習(RL)アルゴリズムの成功に重要な役割を果たしている。これまでの研究は主に、価値過大評価問題、関数近似器の採用結果、および非政治学習に対処することに焦点を当てていた。共通視点から考えると、RLトレーニングプロセスの後半段階では、$Q$-valueが過小評価されることがしばしばあり、政策学習の妨げとなり、サンプル効率が低下する可能性がある。このような長期予測現象は、リプレイバッファのより最適なアクションサンプルと比較して、ベルマン更新における現在のポリシーからの劣ったアクションの使用とよく関係している。この問題に対処するために、我々の洞察は、探索的楽観主義を維持しながら、過去の成功を十分に活用することである。本稿では,Blended Exploitation and Exploration (BEE)演算子を提案する。 BEEに基づいて、実際のBACは50以上の連続制御タスクにおいて最先端の手法よりも優れており、失敗を招きやすいシナリオや現実のロボットタスクにおいて高いパフォーマンスを達成する。ベンチマーク結果とビデオはhttps://jity16.github.io/BEE/で公開されている。

Learning high-quality $Q$-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that $Q$-values are often underestimated in the latter stage of the RL training process, potentially hindering policy learning and reducing sample efficiency. We find that such a long-neglected phenomenon is often related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. To address this issue, our insight is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates $Q$-value using both historical best-performing actions and the current policy. Based on BEE, the resulting practical algorithm BAC outperforms state-of-the-art methods in over 50 continuous control tasks and achieves strong performance in failure-prone scenarios and real-world robot tasks. Benchmark results and videos are available at https://jity16.github.io/BEE/.

翻訳日:2024-05-15 01:42:01 公開日:2024-05-12

# 量子機械学習における絡み合ったデータの遷移の役割

Transition Role of Entangled Data in Quantum Machine Learning ( http://arxiv.org/abs/2306.03481v2 )

ライセンス: Link先を確認

Xinbiao Wang, Yuxuan Du, Zhuozhuo Tu, Yong Luo, Xiao Yuan, Dacheng Tao,

(参考訳) エンタングルメントは量子コンピューティングを強化するリソースとして機能する。最近の進歩は量子力学の学習に対する肯定的な影響を強調しており、量子演算への絡み合いの統合や量子機械学習(QML)モデルの測定により、特定の予測エラーしきい値を超えた、トレーニングデータサイズが大幅に削減される。しかし、データにおける絡み合い度がモデル性能にどのように影響するかの分析的理解はいまだに解明されていない。本研究では,この知識ギャップを,絡み合ったデータを用いて量子力学を学習する量子ノーランチ(NFL)定理を確立することによって解決する。従来の知見とは対照的に, 絡み合ったデータが予測誤差に与える影響は, 許容された測定値の数に応じて二重効果を示すことを示す。十分な数の測定で、トレーニングデータの絡み合いを増大させることで、予測誤差を一貫して減らしたり、トレーニングデータの必要なサイズを減らして、同じ予測誤差を達成することができる。逆に、少ない測定が許される場合、高度に絡み合ったデータを使用することで、予測エラーが増大する可能性がある。得られた結果は、特に量子リソースへのアクセスが制限されたアーリーステージ量子コンピュータ上での実行に適した、高度なQMLプロトコルを設計するための重要なガイダンスを提供する。

Entanglement serves as the resource to empower quantum computing. Recent progress has highlighted its positive impact on learning quantum dynamics, wherein the integration of entanglement into quantum operations or measurements of quantum machine learning (QML) models leads to substantial reductions in training data size, surpassing a specified prediction error threshold. However, an analytical understanding of how the entanglement degree in data affects model performance remains elusive. In this study, we address this knowledge gap by establishing a quantum no-free-lunch (NFL) theorem for learning quantum dynamics using entangled data. Contrary to previous findings, we prove that the impact of entangled data on prediction error exhibits a dual effect, depending on the number of permitted measurements. With a sufficient number of measurements, increasing the entanglement of training data consistently reduces the prediction error or decreases the required size of the training data to achieve the same prediction error. Conversely, when few measurements are allowed, employing highly entangled data could lead to an increased prediction error. The achieved results provide critical guidance for designing advanced QML protocols, especially for those tailored for execution on early-stage quantum computers with limited access to quantum resources.

翻訳日:2024-05-15 01:42:01 公開日:2024-05-12

# クープマン理論を用いた対話環境における効率的なダイナミクスモデリング

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory ( http://arxiv.org/abs/2306.11941v4 )

ライセンス: Link先を確認

Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh,

(参考訳) 対話環境におけるダイナミクスの正確なモデリングは、長距離予測の成功に不可欠である。このような能力は強化学習(RL)と計画アルゴリズムを前進させるが、達成は困難である。モデル推定の不正確さは複雑になり、長い地平線上でエラーが増加する。我々は、環境の非線形ダイナミクスを高次元潜在空間で線形化することができるクープマン理論のレンズからこの問題にアプローチする。これにより、エージェントのアクションを毎回考慮しながら畳み込みを用いて長距離予測のシーケンシャルな問題を効率的に並列化することができる。提案手法は安定性解析と時間経過による勾配の制御も可能とした。これらの利点は、拡張された地平線上でのモデリング力学の効率と精度の両方において、既存のアプローチよりも大幅に改善される。また、モデルベース計画とモデルフリーRLの動的モデリングにこのモデルを容易に組み込むことができ、有望な実験結果を報告する。

The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.

翻訳日:2024-05-15 01:32:16 公開日:2024-05-12

# ベルの不等式における量子重力の影

The shadows of quantum gravity on Bell's inequality ( http://arxiv.org/abs/2307.13006v3 )

ライセンス: Link先を確認

Hooman Moradpour, Shahram Jalalzadeh, Hamid Tebyanian,

(参考訳) 本研究は、量子重力の文脈における量子力学演算子の妥当性を考察し、それらの一般化の必要性を認識した。第一の目的は、ベルの不等式で示されるように、量子力学における固有の非局所性に対するこれらの一般化の反響を調査することである。さらに、この研究はベルの不平等の確立された枠組みにゼロでない最小長を導入する結果について精査している。この結果は、量子力学と重力の間の複雑な相互作用の理論的理解に大きく貢献する。さらに、ベルの不等式に対する量子重力の影響と、特にデバイス非依存プロトコル、量子鍵分布、量子ランダムネス生成の領域における量子技術におけるその実践的応用について検討する。

This study delves into the validity of quantum mechanical operators in the context of quantum gravity, recognizing the potential need for their generalization. A primary objective is to investigate the repercussions of these generalizations on the inherent non-locality within quantum mechanics, as exemplified by Bell's inequality. Additionally, the study scrutinizes the consequences of introducing a non-zero minimal length into the established framework of Bell's inequality. The findings contribute significantly to our theoretical comprehension of the intricate interplay between quantum mechanics and gravity. Moreover, this research explores the impact of quantum gravity on Bell's inequality and its practical applications within quantum technologies, notably in the realms of device-independent protocols, quantum key distribution, and quantum randomness generation.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# 多モード左手超伝導リング共振器による人工原子間のエンタングリング相互作用

Entangling interactions between artificial atoms mediated by a multimode left-handed superconducting ring resonator ( http://arxiv.org/abs/2307.15695v2 )

ライセンス: Link先を確認

T. McBroom-Carroll, A. Schlabes, X. Xu, J. Ku, B. Cole, S. Indrajeet, M. D. LaHaye, M. H. Ansari, B. L. T. Plourde,

(参考訳) 積層回路素子で実装された超伝導メタマテリアル伝送線は、グループの速度と位相速度が反対の符号を持つ左利きの分散を、超伝導人工原子に関連する周波数範囲で示すことができる。このようなメタマテリアル伝送線路をリングに形成し、リングの周りの異なる点の量子ビットに結合すると、コンパクトなフットプリントを持つマルチモードバス共振器が得られる。フラックス可変量子ビットを用いて、2つの量子ビットとリング共振器モードの結合強度の変動を特徴づけ、理論的にモデル化する。量子ビット間の直接結合は無視できるが、多モードリング共振器との相互作用は、逆交換結合と、量子ビット間のより高次の$ZZ$相互作用の両方をもたらす。リング共振器モードに対する量子ビットとそれらの周波数のゆらぎが変化するにつれて、零交叉や符号の変化を含むこれらの2つの量子ビット間相互作用の有意な変動が観測される。量子ビット周波数の小さな変化に対して、ゼロ値と大値の間のZZ$スケールのような相互作用項を変調する能力は、多くの量子ビットをホストできるシステムでエンタングゲートを実装するための有望な経路を提供する。

Superconducting metamaterial transmission lines implemented with lumped circuit elements can exhibit left-handed dispersion, where the group and phase velocity have opposite sign, in a frequency range relevant for superconducting artificial atoms. Forming such a metamaterial transmission line into a ring and coupling it to qubits at different points around the ring results in a multimode bus resonator with a compact footprint. Using flux-tunable qubits, we characterize and theoretically model the variation in the coupling strength between the two qubits and each of the ring resonator modes. Although the qubits have negligible direct coupling between them, their interactions with the multimode ring resonator result in both a transverse exchange coupling and a higher order $ZZ$ interaction between the qubits. As we vary the detuning between the qubits and their frequency relative to the ring resonator modes, we observe significant variations in both of these inter-qubit interactions, including zero crossings and changes of sign. The ability to modulate interaction terms such as the $ZZ$ scale between zero and large values for small changes in qubit frequency provides a promising pathway for implementing entangling gates in a system capable of hosting many qubits.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# 任意の忠実度に対するMargolus-Levitin量子速度限界について

Note on the Margolus-Levitin quantum speed limit for arbitrary fidelity ( http://arxiv.org/abs/2307.16854v2 )

ライセンス: Link先を確認

Krzysztof Andrzejewski, Katarzyna Bolonek-Lasoń, Piotr Kosiński,

(参考訳) 初期状態と最終状態の間の忠実性の消失には、マンデルスタム・タム限界(エネルギー分散の関与)とマルゴラス・レヴィチン限界(励起エネルギー期待値の関与)の2つの重要な量子速度限界が導出された。任意の忠実性の場合に対する前者の極限の一般化は単純であるが、ジョヴァネッティら(Phys)によるセミナー論文で与えられる後者の関連する一般化は単純である。 A67 (2003), 052109) は、一般化されたマルゴラス・レヴィチンの不等式(英語版)(Margolus-Levitin inequality)の右辺と上辺の予想される等式に基づいており、数値的に最大7桁まで証明されている。つい最近になって、この予想の証明が2つ現れている。微分計算の最も単純なツールに基づいて、非常に基本的な新しい証明を提供する。したがって、一般化されたマルゴラス・レヴィチンの速度制限は、忠実さを消すのに有効な元の限界の精神から導かれる。

For vanishing fidelity between initial and final states two important quantum speed limits, the Mandelstam-Tamm limit (involving energy dispersion) and Margolus-Levitin one (involving excitation energy expectation value) have been derived. While the generalization of the former limit to the case of arbitrary fidelity is straightforward, the relevant generalization of the latter, given in the seminal paper by Giovanetti et al (Phys. Rev. A67 (2003), 052109) was based on the conjectured equality of lower and upper bounds on the right hand side of generalized Margolus-Levitin inequality, verified numerically up to seven digits. Only recently there appear two proofs of the conjecture. We provide below a very elementary new proof, based on the simplest tools from differential calculus. Thus the generalized Margolus-Levitin speed limit can be derived much in the spirit of the original one valid for vanishing fidelity.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# グローバー適応探索の高速化:高次定式化によるビット数とゲート数削減戦略

Accelerating Grover Adaptive Search: Qubit and Gate Count Reduction Strategies with Higher-Order Formulations ( http://arxiv.org/abs/2308.01572v2 )

ライセンス: Link先を確認

Yuki Sano, Kosuke Mitarai, Naoki Yamamoto, Naoki Ishikawa,

(参考訳) グロバー適応探索(Grover Adaptive Search、GAS)は、二項最適化問題の解法として設計された量子抜粋探索アルゴリズムである。本稿では,GASに必要なキュービット数とゲート数を同時に削減できる高次二項式を提案する。具体的には、多項式分解によるゲート数を減らし、目的関数の順序を保ち、回路ランタイムと実装コストを減少させる2つの新しい戦略を考える。解析により,提案した高次定式化により,探索空間サイズと量子ゲート数の両方を削減し,GASの収束性能が向上することが示された。また,本手法はワンホット符号化を用いた一般的な組合せ最適化問題にも有用である。

Grover adaptive search (GAS) is a quantum exhaustive search algorithm designed to solve binary optimization problems. In this paper, we propose higher-order binary formulations that can simultaneously reduce the numbers of qubits and gates required for GAS. Specifically, we consider two novel strategies: one that reduces the number of gates through polynomial factorization, and the other that halves the order of the objective function, subsequently decreasing circuit runtime and implementation cost. Our analysis demonstrates that the proposed higher-order formulations improve the convergence performance of GAS by both reducing the search space size and the number of quantum gates. Our strategies are also beneficial for general combinatorial optimization problems using one-hot encoding.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# FIRE:食品画像から世代を再現する

FIRE: Food Image to REcipe generation ( http://arxiv.org/abs/2308.14391v2 )

ライセンス: Link先を確認

Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip Ilievski,

(参考訳) 近年,食品科学は多分野の研究分野として注目されている。フードコンピューティングの野心的な目標は、食品画像のレシピ情報を自律的に生成できるエンドツーエンドのインテリジェントシステムを開発することである。現在のイメージ・ツー・レシピ法は検索ベースであり、その成功はデータセットのサイズと多様性、学習された埋め込みの品質に大きく依存する。一方、強力な注意力に基づく視覚と言語モデルの出現は、正確で一般化可能なレシピ生成のための有望な道のりを示し、まだ広く研究されていない。本稿では,食品処理領域におけるレシピ生成に適した新しいマルチモーダル手法であるFIREを提案する。 FIREはBLIPモデルを利用してタイトルを生成し、Vision Transformerとデコーダを使って材料抽出を行い、T5モデルを使用してタイトルと材料を入力として組み込んだレシピを生成する。本稿では,FIREを大規模言語モデルに統合することで,レシピをユーザの好みに適合させるレシピカスタマイズと,自動調理プロセスを実現するレシピ・ツー・コード変換という2つの実践的応用を紹介した。提案手法の有効性を実験的に検証し,今後の進歩と食品コンピューティングへの普及の可能性を明らかにした。

Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learned embeddings. Meanwhile, the emergence of powerful attention-based vision and language models presents a promising avenue for accurate and generalizable recipe generation, which has yet to be extensively explored. This paper proposes FIRE, a novel multimodal methodology tailored to recipe generation in the food computing domain, which generates the food title, ingredients, and cooking instructions based on input food images. FIRE leverages the BLIP model to generate titles, utilizes a Vision Transformer with a decoder for ingredient extraction, and employs the T5 model to generate recipes incorporating titles and ingredients as inputs. We showcase two practical applications that can benefit from integrating FIRE with large language model prompting: recipe customization to fit recipes to user preferences and recipe-to-code transformation to enable automated cooking processes. Our experimental findings validate the efficacy of our proposed approach, underscoring its potential for future advancements and widespread adoption in food computing.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# 重力波に対するマクロ量子応答

Macroscopic Quantum Response to Gravitational Waves ( http://arxiv.org/abs/2309.02992v2 )

ライセンス: Link先を確認

Asuka Ito, Ryuichiro Kitano,

(参考訳) 重力波による1電子量子サイクロトロンの励起について検討する。ペニングトラップ等の電子は、波動関数の大きさによってパラメータ化された無限縮退性を有するランダウレベルが最低となるように準備される。基底状態から第1励起状態への励起速度は、電子波関数のサイズによって増大し、より大きな波動関数を持つ電子はより重力波を感じる。結果として、マクロな1電子量子サイクロトロンにおける重力波に対する優れた感度を導出する。

We study the excitation of a one-electron quantum cyclotron by gravitational waves. The electron in such as a penning trap is prepared to be at the lowest Landau level, which has an infinite degeneracy parameterized by the size of the wave function. We find that the excitation rate from the ground state to the first excited state is enhanced by the size of the electron wave function: an electron with a larger wave function feels gravitational waves more. As a consequence, we derive a good sensitivity to gravitational waves at a macroscopic one-electron quantum cyclotron.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# C-Pack:中国の一般的な埋め込みを促進するためにパッケージ化されたリソース

C-Pack: Packaged Resources To Advance General Chinese Embedding ( http://arxiv.org/abs/2309.07597v4 )

ライセンス: Link先を確認

Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, Jian-Yun Nie,

(参考訳) C-Packは、一般的な中国の埋め込みの分野を著しく前進させるリソースのパッケージである。 C-Packには3つの重要なリソースが含まれている。 1) C-MTEBは6つのタスクと35のデータセットをカバーする中国語テキスト埋め込みの総合ベンチマークである。 2) C-MTPは, ラベル付き, ラベルなしの中国語コーパスを用いて, 埋め込みモデルを訓練するための大量のテキスト埋め込みデータセットである。 3) C-TEMは、複数のサイズをカバーする埋め込みモデルのファミリーである。弊社のモデルは、C-MTEB上の以前の中国語のテキスト埋め込みを、リリース時に最大で10%上回っている。また、C-TEMのための一連のトレーニング方法を統合し、最適化します。一般的な中国語の埋め込みに関するリソースに加えて、英語のテキスト埋め込みのためのデータとモデルもリリースしています。 MTEBベンチマークでは、英語モデルは最先端のパフォーマンスを達成していますが、我々のリリースした英語データは、中国のデータより2倍も大きいのです。これらのリソースはすべてhttps://github.com/FlagOpen/FlagEmbedding.comで公開されています。

We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# MoDem-V2:実世界ロボットマニピュレーションのためのVisuo-Motor World Model

MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation ( http://arxiv.org/abs/2309.14236v2 )

ライセンス: Link先を確認

Patrick Lancaster, Nicklas Hansen, Aravind Rajeswaran, Vikash Kumar,

(参考訳) 建設されていない現実世界環境での運用を目指すロボットシステムは、オンボードセンシングを通じて世界を直接知覚する必要がある。視覚に基づく学習システムは、生の画素に基づく暗黙的な世界理解を構築することで環境計測の必要性を解消することを目的としているが、単にスパースな視覚報酬信号から接触に富んだ高次元検索空間をナビゲートすることは、探索の課題を大幅に悪化させる。このようなシステムの適用性は通常、明示的な状態推定や厳密な報酬を伴わずに現実世界でのエージェント探索が、破滅的な不安全行動や安全性の欠陥を引き起こす可能性があるため、シミュレーションされた環境や高機能な環境に制限される。本研究では,これらの制約の背後にある根本原因を分離し,非構造化現実世界で直接コンタクトリッチな操作を学習するシステムであるMoDem-V2を開発した。モデルベース強化学習(MBRL)、デモブートストレッピング、効果的な探索のアルゴリズムによる最新の進歩に基づいて、MoDem-V2は、実世界で直接、接触に富むデキスタス操作技術を取得することができる。我々は、現実世界の安全性、探索中心、代理店の引き渡し、アクター批判的なアンサンブルを尊重しながら、モデル学習におけるデモンストレーションを活用するための重要な要素を特定します。シミュレーションと実世界の両方における4つの複雑なビジュオモータ操作問題におけるこれらの成分の寄与を実証的に示す。我々の知る限り、我々の研究は実世界で直接訓練されたデモ強化視覚的MBRLのための最初の成功システムを示す。ビデオや詳細についてはhttps://sites.google.com/view/modem-v2をご覧ください。

Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit https://sites.google.com/view/modem-v2 for videos and more details.

翻訳日:2024-05-15 01:22:32 公開日:2024-05-12

# 位相確率ブリッジを用いた生成モデリング

Generative Modeling with Phase Stochastic Bridges ( http://arxiv.org/abs/2310.07805v4 )

ライセンス: Link先を確認

Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai,

(参考訳) 拡散モデル(DM)は、連続入力のための最先端の生成モデルを表す。 DMは入力空間(e, position space)に確率微分方程式(SDE)を構築し、ニューラルネットワークを用いてそれを反転させる。本稿では, 位相空間を位置と速度を包含する拡張空間として定義する, textbf{phase space dynamics} に基づく新しい生成モデリングフレームワークを提案する。 } 確率的最適制御からの洞察を活用して,効率的なサンプリングを可能にする位相空間における経路測度を構築する。 DMとは対照的に,我々のフレームワークは動的伝播の初期段階において,現実的なデータポイントを生成する能力を示している。 } この早期予測は、軌道に沿った追加の速度情報を活用することにより、効率的なデータ生成のステージを設定する。標準画像生成ベンチマークでは, 少数の機能評価(NFE)において, ベースラインよりも良好な性能が得られた。さらに,本手法は,効率的なサンプリング技術を備えた拡散モデルの性能に匹敵するものであり,新しいツール生成モデルとしての可能性を示している。

Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.

翻訳日:2024-05-15 01:12:47 公開日:2024-05-12

# 地学システムの機械学習に基づくモデリングのための大量保存・パーセプトロン

A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems ( http://arxiv.org/abs/2310.08644v4 )

ライセンス: Link先を確認

Yuan-Heng Wang, Hoshin V. Gupta,

(参考訳) 地学システムの時系列進化を予測するための物理概念(PC)モデルの構築に何十年も取り組んできたが、最近の研究は機械学習(ML)ベースのGated Recurrent Neural Network技術が、はるかに正確なモデルの開発に利用できることを示している。しかし,MLモデルから身体的理解を抽出することの難しさは,システム構造や機能に関する科学的知識を高めるために,その有用性を複雑にしている。そこで本研究では,PCベースとMLベースのモデリングアプローチのギャップを埋める手段として,物理的に解釈可能なMass Conserving Perceptron(MCP)を提案する。 MCPは、PCモデルとGRNNの両方の基盤となる有向グラフ構造間の固有同型を利用して、物理的プロセスの質量保存性を明確に表現し、それらのプロセスの機能的性質を、既製のML技術を用いて利用可能なデータから(解釈可能な方法で)直接学習できるようにする。概念実証として,MPPの機能的表現性(容量)を検証し,リーフ川流域の降雨流出(RR)動態をパロニニニに表現し,科学的仮説検証に有用であることを示す。結論として,この概念を拡張して,地学システムを通しての質量エネルギー情報流の結合特性をMLベースで物理概念的に表現する方法について論じる。

Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.

翻訳日:2024-05-15 01:12:47 公開日:2024-05-12

# Janusインターフェース: 大規模言語モデルにおける微調整がプライバシリスクをいかに増幅するか

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks ( http://arxiv.org/abs/2310.15469v2 )

ライセンス: Link先を確認

Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, Haixu Tang,

(参考訳) 大規模言語モデル(LLM)の急速な進歩は、個人識別可能な情報(PII)のプライバシー漏洩を、広範囲にわたるトレーニングデータセット内で公に懸念している。近年の研究では、敵が慎重に設計されたプロンプトを用いて、LLMのトレーニングデータから高感度なプライバシーデータを抽出できることが示されている。しかし、これらの攻撃は、訓練前の段階での幻覚と破滅的忘れ(CF)の傾向に悩まされ、希釈されたPIIの正確性は無視できない。本研究では,LLMの事前学習データから忘れられたPIIを復元するために,微調整インタフェースを利用した新しい攻撃であるJanusを提案する。 LLMのプライバシリーク問題を形式化し,オープンソース言語モデルの実証分析により,なぜ忘れられたPIIを回収できるのかを説明する。これらの知見に基づき、Janusのオープンソース言語モデルと最新のLLMであるGPT-3.5-TurboとLLaMA-2-7bの性能を評価する。実験の結果,Janusはベースラインと比較して10倍以上のプライバシーリスクを増幅し,プレフィックス攻撃やテキスト内学習(ICL)を含む最先端のプライバシ抽出攻撃を著しく上回っていることがわかった。さらに、我々の分析は、OpenAIとAzure AI Studioが提供する既存の微調整APIがJanus攻撃の影響を受けやすいことを検証し、敵がそのような攻撃を低コストで実施できるようにする。

The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model's tendency to hallucinate and catastrophic forgetting (CF) in the pre-training stage, rendering the veracity of divulged PIIs negligible. In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs. We formalize the privacy leakage problem in LLMs and explain why forgotten PIIs can be recovered through empirical analysis on open-source language models. Based upon these insights, we evaluate the performance of Janus on both open-source language models and two latest LLMs, i.e., GPT-3.5-Turbo and LLaMA-2-7b. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline and significantly outperforms the state-of-the-art privacy extraction attacks including prefix attacks and in-context learning (ICL). Furthermore, our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack, allowing an adversary to conduct such an attack at a low cost.

翻訳日:2024-05-15 01:12:47 公開日:2024-05-12

# 長期非凸制約を用いたオンライン非凸最適化

Online Non-convex Optimization with Long-term Non-convex Constraints ( http://arxiv.org/abs/2311.02426v3 )

ライセンス: Link先を確認

Shijie Pan, Wenjie Huang,

(参考訳) 目的と制約が任意に生成され、必ずしも凸ではないオンライン手法で、一般的な長期制約付き最適化問題を解くために、新しいFollow-the-Perturbed-Leader型アルゴリズムを提案し、解析した。各周期において、ランダムな線形摂動と強い凹凸摂動は、それぞれ、オフラインのオラクルに対して原始方向と双対方向に組み込まれ、その解として、大域的なミニマックス点が探索される。提案された静的累積的後悔に基づいて、この問題のクラスに対する最初のサブ線形$O(T^{8/9})$後悔の複雑さを導出する。提案アルゴリズムは,河川汚染源の長期的(極端値)の特定問題に対処し,理論的結果の検証を行い,既存手法と比較して優れた性能を示す。

A novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are arbitrarily generated and not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as the solution. Based on a proposed expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (extreme value) constrained river pollutant source identification problem, validate the theoretical results and exhibit superior performance compared to existing methods.

翻訳日:2024-05-15 01:02:54 公開日:2024-05-12

# 非局所量子状態アンサンブルと量子データ隠れ

Nonlocal quantum state ensembles and quantum data hiding ( http://arxiv.org/abs/2311.06029v2 )

ライセンス: Link先を確認

Donghoon Ha, Jeong San Kim,

(参考訳) 両部量子状態の識別を考察し,非局所量子状態アンサンブルと量子データ隠蔽処理の関係を確立する。両部量子状態の最適局所的判別に縛られ、両部量子状態アンサンブルが量子データハイディングスキームを構築するのに十分な条件を提供する。この結果は多次元二部量子系における例によって示される。

We consider the discrimination of bipartite quantum states and establish a relation between nonlocal quantum state ensemble and quantum data hiding processing. Using a bound on optimal local discrimination of bipartite quantum states, we provide a sufficient condition for a bipartite quantum state ensemble to be used to construct a quantum data-hiding scheme. Our results are illustrated by examples in multidimensional bipartite quantum systems.

翻訳日:2024-05-15 01:02:54 公開日:2024-05-12

# バイアスのジャングルを探る:依存性分析による言語モデルにおける政治的バイアス属性

Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis ( http://arxiv.org/abs/2311.08605v2 )

ライセンス: Link先を確認

David F. Jenny, Yann Billeter, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin,

(参考訳) 大規模言語モデル(LLM)の急速な進歩は、これらのモデルにおけるバイアスの出現とその緩和に関する激しい議論を引き起こしている。しかし、文献におけるデバイアス法と、より広いコミュニティからのアライメントに関連する欠陥の報告の両方の結果が示すように、その実践的関連性にもかかわらず、バイアスはよく理解されていないトピックである。偏見の内的原因の理解を深めるために、因果公正分析のレンズを通してLCMバイアスを分析し、偏見の起源を理解し、その下流の帰結と緩和の理由を解明する。このフレームワークを運用するために,LLM決定プロセスに寄与する属性の抽出と仲介を行うプロンプトベースの手法を提案する。アクティビティ依存ネットワーク(ADN)を適用することで、これらの属性がLCMの決定プロセスにどのように影響するかを分析する。政治討論における議論品質のLCM評価に本手法を適用した。観察された異種間処理は,少なくとも一部は,属性とモデルの相違とモデルの相違によるものであり,人間のAIアライメントと偏見の緩和に関する結果が議論されている。私たちのコードとデータはhttps://github.com/david-jenny/LLM-Political-Studyにあります。

The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the internal causes of bias, we analyse LLM bias through the lens of causal fairness analysis, which enables us to both comprehend the origins of bias and reason about its downstream consequences and mitigation. To operationalize this framework, we propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the LLM decision process. By applying Activity Dependency Networks (ADNs), we then analyse how these attributes influence an LLM's decision process. We apply our method to LLM ratings of argument quality in political debates. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment, and discuss the consequences of our findings for human-AI alignment and bias mitigation. Our code and data are at https://github.com/david-jenny/LLM-Political-Study.

翻訳日:2024-05-15 01:02:54 公開日:2024-05-12

# 効率的な超解法のためのスウィフトパラメータフリーアテンションネットワーク

Swift Parameter-free Attention Network for Efficient Super-Resolution ( http://arxiv.org/abs/2311.12770v3 )

ライセンス: Link先を確認

Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xuanwu Yin, Kunlong Zuo,

(参考訳) SISR(Single Image Super-Resolution)は、低解像度のコンピュータビジョンにおいて重要な課題であり、低解像度の画像から高解像度の画像を再構成することを目的としている。従来の注意機構はSISRの性能を大幅に向上させたが、しばしば複雑なネットワーク構造と多数のパラメータが発生し、推論速度が遅くなり、モデルのサイズが大きくなる。この問題に対処するために、パラメータカウント、推論速度、画像品質のバランスをとる高効率なSISRモデルであるSwift Parameter-free Attention Network (SPAN)を提案する。 SPANは、対称的なアクティベーション関数と残差接続を利用して、高寄与度情報を強化し、冗長な情報を抑制する新しいパラメータフリーアテンション機構を採用している。この設計が注意機構の目的を達成する上での有効性を理論的に示す。複数のベンチマークでSPANを評価し、画像品質と推論速度の両面で既存の高効率超解像モデルより優れており、品質と速度のトレードオフが著しく達成されていることを示す。これにより、SPANは現実世界のアプリケーション、特にリソース制約のあるシナリオに非常に適しています。特に、NTIRE 2024の全体的なパフォーマンストラックとランタイムトラックの両方において、私たちは、効率的な超解像度チャレンジで第一位を獲得しました。私たちのコードとモデルはhttps://github.com/hongyuanyu/SPAN.comで公開されています。

Single Image Super-Resolution (SISR) is a crucial task in low-level computer vision, aiming to reconstruct high-resolution images from low-resolution counterparts. Conventional attention mechanisms have significantly improved SISR performance but often result in complex network structures and large number of parameters, leading to slow inference speed and large model size. To address this issue, we propose the Swift Parameter-free Attention Network (SPAN), a highly efficient SISR model that balances parameter count, inference speed, and image quality. SPAN employs a novel parameter-free attention mechanism, which leverages symmetric activation functions and residual connections to enhance high-contribution information and suppress redundant information. Our theoretical analysis demonstrates the effectiveness of this design in achieving the attention mechanism's purpose. We evaluate SPAN on multiple benchmarks, showing that it outperforms existing efficient super-resolution models in terms of both image quality and inference speed, achieving a significant quality-speed trade-off. This makes SPAN highly suitable for real-world applications, particularly in resource-constrained scenarios. Notably, we won the first place both in the overall performance track and runtime track of the NTIRE 2024 efficient super-resolution challenge. Our code and models are made publicly available at https://github.com/hongyuanyu/SPAN.

翻訳日:2024-05-15 00:53:00 公開日:2024-05-12

# 補足ラベルによる学習:選択された完全一致のランサム設定はより実践的

Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical ( http://arxiv.org/abs/2311.15502v3 )

ライセンス: Link先を確認

Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, Masashi Sugiyama,

(参考訳) 補完ラベル学習(complementary-label learning)は、各トレーニング例が1つまたは複数の補完ラベルに関連付けられている弱い教師付き学習問題である。既存の一貫したアプローチは、相補的なラベルの生成をモデル化するための一様分布の仮定や、非一様の場合の遷移行列を推定するための通常のラベルのトレーニングセットに依存している。しかし、どちらの条件も現実のシナリオでは満たされないかもしれない。本稿では,これらの条件に依存しない新しい一貫したアプローチを提案する。本研究は,肯定的未ラベル学習(PU)学習の文献に着想を得て,相補的ラベル学習のための選択完備ランダム仮定に基づく非バイアスリスク推定器を提案する。次に、過度に適合する問題に対処するためのリスク補正アプローチを導入します。さらに, 1-versus-rest戦略を用いることで, 相補的ラベル学習を負のラベル付きバイナリ分類問題の集合として表現できることが判明した。合成および実世界のベンチマークデータセットの大規模な実験結果から,提案手法が最先端手法よりも優れていることを検証した。

Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix in non-uniform cases. However, either condition may not be satisfied in real-world scenarios. In this paper, we propose a novel consistent approach that does not rely on these conditions. Inspired by the positive-unlabeled (PU) learning literature, we propose an unbiased risk estimator based on the Selected-Completely-at-Random assumption for complementary-label learning. We then introduce a risk-correction approach to address overfitting problems. Furthermore, we find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems when using the one-versus-rest strategy. Extensive experimental results on both synthetic and real-world benchmark datasets validate the superiority of our proposed approach over state-of-the-art methods.

翻訳日:2024-05-15 00:53:00 公開日:2024-05-12

# 多周波数部分相関グラフの学習

Learning Multi-Frequency Partial Correlation Graphs ( http://arxiv.org/abs/2311.15756v2 )

ライセンス: Link先を確認

Gabriele D'Acunto, Paolo Di Lorenzo, Francesco Bonchi, Stefania Sardellitti, Sergio Barbarossa,

(参考訳) 時系列間の依存関係を学習するための大規模な研究努力にもかかわらず、最先端技術は依然として大きな限界に直面している。この微分が中心となる多くのアプリケーションによって動機付けられ、ブロックスパース、周波数依存、部分相関グラフを学習することで、この制限を克服する。本研究の目的は,2つの非凸学習問題の定式化と解法である。第1は閉形式解を持ち,部分相関数に関する事前知識がある場合に適したもので,第2は連続凸近似に基づく反復解に基づくヒンジであり,事前知識が得られない一般的な場合に対して有効である。合成データの数値計算結果から,提案手法は現状よりも優れていることがわかった。最後に、ファイナンシャル・タイム・シリーズの分析により、部分的相関が数個の周波数帯域内でのみ存在することが確認され、我々の手法が周波数領域に沿って識別することなく検出されない貴重な洞察の獲得をいかに可能かが示される。

Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.

翻訳日:2024-05-15 00:53:00 公開日:2024-05-12

# 確率近似の収束率:非有界雑音とその応用

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications ( http://arxiv.org/abs/2312.02828v3 )

ライセンス: Link先を確認

Rajeeva L. Karandikar, M. Vidyasagar,

(参考訳) 本稿では、与えられた目的関数$J(\cdot)$の定常点を求める確率勾配 Descent (SGD) 法の収束特性について検討する。目的関数は凸である必要はない。むしろ、我々の結果は `invex'' 関数のクラスに適用される。まず、$J(\cdot)$ はクルディカ・ロジャシエヴィチ(KL)条件よりもわずかに弱い性質を満たすと仮定され、ここで (KL') と表される。反復 $J({\boldsymbol \theta}_t)$ はほぼ確実に大域最小の$J(\cdot)$ に収束する。次に、$J(\cdot)$ の仮説は (KL') から Polyak-Lojasiewicz (PL) 条件に強化される。この強い仮説により、その極限まで$J({\boldsymbol \theta}_t)$の収束率の見積もりを導き出す。これらの結果から,PL特性を満たす関数に対して,SGDの収束率と凸関数の収束率が一致することを示した。これらの線に沿ったいくつかの結果が過去に発表されているが、私たちの貢献には2つの異なる改善が含まれている。第一に、確率勾配の仮定は他よりも一般的であり、第二に、我々の収束はほぼ確実であり、期待できない。また,機能評価のみを許す場合のSGDについて検討する。この設定では、'\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\ 同じアイデアの集合を用いて、既存の文献と比較して、測定誤差に関するより一般的な仮定の下で、確率近似(SA)アルゴリズムのグローバル収束を確立する。また、適切な仮定の下でのSAアルゴリズムの収束率のバウンダリを導出する。

In this paper, we study the convergence properties of the Stochastic Gradient Descent (SGD) method for finding a stationary point of a given objective function $J(\cdot)$. The objective function is not required to be convex. Rather, our results apply to a class of ``invex'' functions, which have the property that every stationary point is also a global minimizer. First, it is assumed that $J(\cdot)$ satisfies a property that is slightly weaker than the Kurdyka-Lojasiewicz (KL) condition, denoted here as (KL'). It is shown that the iterations $J({\boldsymbol \theta}_t)$ converge almost surely to the global minimum of $J(\cdot)$. Next, the hypothesis on $J(\cdot)$ is strengthened from (KL') to the Polyak-Lojasiewicz (PL) condition. With this stronger hypothesis, we derive estimates on the rate of convergence of $J({\boldsymbol \theta}_t)$ to its limit. Using these results, we show that for functions satisfying the PL property, the convergence rate of SGD is the same as the best-possible rate for convex functions. While some results along these lines have been published in the past, our contributions contain two distinct improvements. First, the assumptions on the stochastic gradient are more general than elsewhere, and second, our convergence is almost sure, and not in expectation. We also study SGD when only function evaluations are permitted. In this setting, we determine the ``optimal'' increments or the size of the perturbations. Using the same set of ideas, we establish the global convergence of the Stochastic Approximation (SA) algorithm under more general assumptions on the measurement error, compared to the existing literature. We also derive bounds on the rate of convergence of the SA algorithm under appropriate assumptions.

翻訳日:2024-05-15 00:53:00 公開日:2024-05-12

# 低次元後投射による不確かさの可視化

Uncertainty Visualization via Low-Dimensional Posterior Projections ( http://arxiv.org/abs/2312.07804v2 )

ライセンス: Link先を確認

Omer Yair, Elias Nehme, Tomer Michaeli,

(参考訳) 不測の逆問題では、単一の再構成のみを抽出するのではなく、可算解の全スペクトルについての洞察を得ることが一般的である。可算解とその可能性に関する情報は後部分布に符号化される。しかし、高次元データでは、この分布を可視化することは困難である。本研究では,低次元部分空間上のエネルギーベースモデル(EBM)を用いて後部を推定・可視化するための新しいアプローチを提案する。具体的には、入力測定と解の低次元部分空間にまたがる方向の集合を受信する条件付きEMMを訓練し、その空間内の後方の確率密度関数を出力する。提案手法の有効性を多種多様なデータセットおよび画像復元問題に適用し,不確実性定量化と可視化におけるその強みを示す。このように,本手法は拡散型後部サンプリング器からサンプルを投影するベースラインよりも優れ,精度は桁違いに向上する。さらに、ガウス後方を仮定するベースラインよりも正確である。

In ill-posed inverse problems, it is commonly desirable to obtain insight into the full spectrum of plausible solutions, rather than extracting only a single reconstruction. Information about the plausible solutions and their likelihoods is encoded in the posterior distribution. However, for high-dimensional data, this distribution is challenging to visualize. In this work, we introduce a new approach for estimating and visualizing posteriors by employing energy-based models (EBMs) over low-dimensional subspaces. Specifically, we train a conditional EBM that receives an input measurement and a set of directions that span some low-dimensional subspace of solutions, and outputs the probability density function of the posterior within that space. We demonstrate the effectiveness of our method across a diverse range of datasets and image restoration problems, showcasing its strength in uncertainty quantification and visualization. As we show, our method outperforms a baseline that projects samples from a diffusion-based posterior sampler, while being orders of magnitude faster. Furthermore, it is more accurate than a baseline that assumes a Gaussian posterior.

翻訳日:2024-05-15 00:43:11 公開日:2024-05-12

# 高次元ホルシュタインモデルにおけるクエンチダイナミクス:縮合ウィグナーアプローチからの考察

Quench dynamics in higher-dimensional Holstein models: Insights from Truncated Wigner Approaches ( http://arxiv.org/abs/2312.12291v2 )

ライセンス: Link先を確認

Eva Paprotzki, Alexander Osterkorn, Vibhu Mishra, Stefan Kehrein,

(参考訳) 量子材料中の電荷密度波の位相は、電子と格子の自由度の複雑な相互作用に由来する。今日では、様々な時間分解分光技術により、そのような位相を積極的に操作し、そのダイナミクスをリアルタイムで監視することができる。このような非平衡力学を理論的にモデル化することは大きな課題であり、正確な方法は通常少数の原子と有限個のフォノンしか扱えない。電子ホッピングの急なスイッチオン後のホルシュタインモデルにおける電荷密度波の融解にアプローチする: 非相互作用および強結合限界において、高次元超立方格子上のCDW順序パラメータは、その力学が1次元の場合に還元されるように、長い間分解関係に従うことを証明する。第二に, 半古典的手法による2次元のトレンシ化ウィグナー近似による数値計算結果を示す。ホルシュタイン連鎖で得られた正確なデータと比較すると、フォノンと電子の半古典的な扱いは音速力学を正確に記述するために必要であることを示している。これに加えて、電子-フォノン結合強度のクエンチも確認される。

Charge-density wave phases in quantum materials stem from the complex interplay of electronic and lattice degrees of freedom. Nowadays, various time-resolved spectroscopy techniques allow to actively manipulate such phases and monitor their dynamics in real time. Modeling such nonequilibrium dynamics theoretically is a great challenge and exact methods can usually only treat a small number of atoms and finitely many phonons. We approach the melting of charge-density waves in a Holstein model after a sudden switch-on of the electronic hopping from two perspectives: We prove that in the non-interacting and in the strong-coupling limit, the CDW order parameter on high-dimensional hypercubic lattices obeys a factorization relation for long times, such that its dynamics can be reduced to the one-dimensional case. Secondly, we present numerical results from semiclassical techniques based on the Truncated Wigner Approximation for two spatial dimensions. A comparison with exact data obtained for a Holstein chain shows that a semiclassical treatment of both the electrons and phonons is required in order to correctly describe the phononic dynamics. This is confirmed, in addition, for a quench in the electron-phonon coupling strength.

翻訳日:2024-05-15 00:43:11 公開日:2024-05-12

# 量子ドットデバイス自動化におけるデータニーズと課題:ワークショップ報告

Data Needs and Challenges of Quantum Dot Devices Automation: Workshop Report ( http://arxiv.org/abs/2312.14322v2 )

ライセンス: Link先を確認

Justyna P. Zwolak, Jacob M. Taylor, Reed Andrews, Jared Benson, Garnett Bryant, Donovan Buterakos, Anasua Chatterjee, Sankar Das Sarma, Mark A. Eriksson, Eliška Greplová, Michael J. Gullans, Fabian Hader, Tyler J. Kovach, Pranav S. Mundada, Mick Ramsey, Torbjoern Rasmussen, Brandon Severin, Anthony Sigillito, Brennan Undseth, Brian Weber,

(参考訳) ゲート定義量子ドットは、スケーラブルで結合された量子ビットシステムを実現するための有望な候補システムであり、量子コンピュータの基本的な構成要素として機能する。しかし、現在の量子ドットデバイスは、説明しなければならない不完全さに悩まされ、特徴づけ、チューニング、操作プロセスが妨げられる。さらに、量子ドット量子ビットの増加に伴い、関連するパラメータ空間が十分に増大し、ヒューリスティック制御が実現不可能となる。したがって、信頼性が高くスケーラブルな自律チューニング手法が開発されることが不可欠である。本稿では,量子ドットデバイスのチューニングと操作を自動化する上での現在の課題について概説する。また、量子ドットコミュニティが提案する、量子ドットの克服方法に関するアイデアも提示する。

Gate-defined quantum dots are a promising candidate system to realize scalable, coupled qubit systems and serve as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. In this report, we outline current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present ideas put forward by the quantum dot community on how to overcome them.

翻訳日:2024-05-15 00:43:11 公開日:2024-05-12

# 水稲類型データのためのサンプリングクラスタリングアルゴリズム

A Novel Sampled Clustering Algorithm for Rice Phenotypic Data ( http://arxiv.org/abs/2312.14920v2 )

ライセンス: Link先を確認

Mithun Singh, Kapil Ahuja, Milind B. Ratnaparkhe,

(参考訳) 植物種のフェノタイプ(または物理的)特性は、一般的にクラスタリングに使用される。最近の研究の一つ(Shastri et al (2021))では、確率的サンプリング(ピボットサンプリング)とスペクトル的クラスタリングアルゴリズムを用いてダイズ種を分類した。これらの手法は、高精度なクラスタリングを低コストで得るために使用された。本研究では,初期のアルゴリズムをイネの群落に拡張する。基本アルゴリズムを3つの方法で改善する。まず、スペクトルクラスタリングにおける類似度行列を構築するための新しい関数を提案する。一般に、この目的のために自然指数関数が用いられる。スペクトルグラフ理論とチーガーの不等式に基づいて、代わりに基底 "a" 指数関数を提案する。これにより、クラスタリングに好適な類似性行列スペクトルが得られ、固有値解析によってそれをサポートする。また、スペクトルクラスタリングで類似性行列を構築するために使われる関数は、以前、固定因子(グローバルスケーリングと呼ばれる)でスケールされた。 Zelnik-Manor と Perona (2004) のアイデアに基づいて、行列要素(局所スケーリングと呼ばれる)によって変化する因子を使い、よりうまく機能する。第2に、ピボットサンプリングアルゴリズムにおけるスペクティの包含確率を計算するために、私たちは以前、スペクティの特徴的な値がそれぞれの基本値からどれだけ遠いか(全種で計算される)を捉えた偏差の概念を用いていた。基本値を見つけるために、以前は最大関数が使われていた。私たちは現在、より直感的な中央値関数を使用しています。我々は統計分析を用いてこの選択を支持する。第3に、1865種のイネについての実験を行い、シルエット値の観点から、我々の新しいサンプリングスペクトルクラスタリングは階層クラスタリング(現在広く使われている)よりも61%良いことを実証した。また、新しいアルゴリズムは、関連するサンプリングのため階層的クラスタリングよりもはるかに高速である。

Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways. First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Also, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better. Second, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis. Third, with experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.

翻訳日:2024-05-15 00:43:11 公開日:2024-05-12

# 分割学習に基づくEMG補綴装置の収束率最大化

Convergence Rate Maximization for Split Learning-based Control of EMG Prosthetic Devices ( http://arxiv.org/abs/2401.03233v3 )

ライセンス: Link先を確認

Matea Marinova, Daniel Denkovski, Hristijan Gjoreski, Zoran Hadzi-Velkov, Valentin Rakovic,

(参考訳) Split Learning (SL) は筋電図に基づく補綴制御における有望な分散学習手法である。ディープラーニングやフェデレートラーニング(FL)といった他の学習手法は、補綴装置の処理能力とバッテリー寿命に極めて制限があるため、準最適ソリューションを提供する。このようなシナリオでSLを実装することは、クライアントがより小さなモデルセグメントを実行するという、その固有のモデルパーティショニングによって引き起こされる。しかし、不適切なカット層を選択することは、SLシステムのトレーニングプロセスを妨げる。本稿では,モデル収束率の最大化の観点から,最適カット層選択のためのアルゴリズムを提案する。性能評価の結果,提案アルゴリズムはEMGパターン認識タスクの収束を著しく加速し,補綴装置制御の改善を図っている。

Split Learning (SL) is a promising Distributed Learning approach in electromyography (EMG) based prosthetic control, due to its applicability within resource-constrained environments. Other learning approaches, such as Deep Learning and Federated Learning (FL), provide suboptimal solutions, since prosthetic devices are extremely limited in terms of processing power and battery life. The viability of implementing SL in such scenarios is caused by its inherent model partitioning, with clients executing the smaller model segment. However, selecting an inadequate cut layer hinders the training process in SL systems. This paper presents an algorithm for optimal cut layer selection in terms of maximizing the convergence rate of the model. The performance evaluation demonstrates that the proposed algorithm substantially accelerates the convergence in an EMG pattern recognition task for improving prosthetic device control.

翻訳日:2024-05-15 00:43:11 公開日:2024-05-12

# 単一イオン量子ビット上での逆Mpemba効果

The inverse Mpemba effect demonstrated on a single trapped ion qubit ( http://arxiv.org/abs/2401.05830v2 )

ライセンス: Link先を確認

Shahaf Aharony Shapira, Yotam Shapira, Jovan Markov, Gianluca Teza, Nitzan Akerman, Oren Raz, Roee Ozeri,

(参考訳) Mpemba効果(Mpemba effect)は、他の条件下で高温がより低温に達する反直感現象である。ここでは、最も単純な量子系である量子ビット上で、Mpemba効果の量子アナログを提案する。具体的には,冷量子ビットが熱量子ビットよりも早く高温に達する逆効果を示す。さらに,本システムでは冷量子ビットが指数関数的に速く加熱され,その効果の強いバージョンが示される。これは十分なコヒーレントな系に対してのみ起こり、量子力学的効果、すなわち干渉効果によって生じる。我々は, 1 つの $^{88}\text{Sr}^+$ イオン量子ビットについて実験を行った。単純な量子系におけるこの異常緩和効果の存在は、その基本性を明らかにし、量子情報処理デバイスの設計と運用に重要な役割を果たしている可能性がある。

The Mpemba effect is a counter-intuitive phenomena in which a hot system reaches a cold temperature faster than a colder system, under otherwise identical conditions. Here we propose a quantum analog of the Mpemba effect, on the simplest quantum system, a qubit. Specifically, we show it exhibits an inverse effect, in which a cold qubit reaches a hot temperature faster than a hot qubit. Furthermore, in our system a cold qubit can heat up exponentially faster, manifesting the strong version of the effect. This occurs only for sufficiently coherent systems, making this effect quantum mechanical, i.e. due to interference effects. We experimentally demonstrate our findings on a single $^{88}\text{Sr}^+$ trapped ion qubit. The existence of this anomalous relaxation effect in simple quantum systems reveals its fundamentality, and may have a role in designing and operating quantum information processing devices.

翻訳日:2024-05-15 00:33:27 公開日:2024-05-12

# 幾何学的推定問題に対するサンプソン近似の再検討

Revisiting Sampson Approximations for Geometric Estimation Problems ( http://arxiv.org/abs/2401.07114v2 )

ライセンス: Link先を確認

Felix Rydell, Angélica Torres, Viktor Larsson,

(参考訳) コンピュータビジョンにおける多くの問題は幾何学的推定問題として定式化することができ、例えば、測定値(例えば点対応)の集合が、観測値に一致するモデル(例えば本質的な行列)に適合することを期待する。これは、あるモデルに対する観測の ‘agrees’ の程度を測る必要がある。自然な選択は、観測が制約を完全に満たす最小の摂動を考えることである。しかし、多くの問題に対して、この計量は高価であり、計算には難解である。いわゆるサンプソン誤差は、線形化スキームを通じてこの幾何学的誤差を近似する。エピポーラ幾何学において、サンプソン誤差は一般的な選択であり、実際には対応する幾何学的残差(再射誤差)の非常に厳密な近似が得られることが知られている。本稿では,サンプソン近似を再検討し,この近似がなぜ,いつ動作するのかという新たな理論的知見を与えるとともに,いくつかの軽微な仮定の下でのタイツネスの明確な境界を与える。我々の理論結果は実データに関するいくつかの実験と異なる幾何推定タスクの文脈で検証される。

Many problems in computer vision can be formulated as geometric estimation problems, i.e. given a collection of measurements (e.g. point correspondences) we wish to fit a model (e.g. an essential matrix) that agrees with our observations. This necessitates some measure of how much an observation ``agrees" with a given model. A natural choice is to consider the smallest perturbation that makes the observation exactly satisfy the constraints. However, for many problems, this metric is expensive or otherwise intractable to compute. The so-called Sampson error approximates this geometric error through a linearization scheme. For epipolar geometry, the Sampson error is a popular choice and in practice known to yield very tight approximations of the corresponding geometric residual (the reprojection error). In this paper we revisit the Sampson approximation and provide new theoretical insights as to why and when this approximation works, as well as provide explicit bounds on the tightness under some mild assumptions. Our theoretical results are validated in several experiments on real data and in the context of different geometric estimation tasks.

翻訳日:2024-05-15 00:33:27 公開日:2024-05-12

# 脆弱性のあるクラスに対するバイアスの分析と緩和:データセットにおけるバランスの取れた表現を目指して

Analyzing and Mitigating Bias for Vulnerable Classes: Towards Balanced Representation in Dataset ( http://arxiv.org/abs/2401.10397v2 )

ライセンス: Link先を確認

Dewant Katare, David Solans Noguero, Souneil Park, Nicolas Kourtellis, Marijn Janssen, Aaron Yi Ding,

(参考訳) 自動運転における認識システムの正確性と公正性は、特に都市運転環境において重大なリスクに直面している自転車、歩行者、モーターサイクリストのような脆弱な道路利用者にとって不可欠である。主流研究は、主にクラスパフォーマンス指標を強化するが、AIモデルにおけるバイアス継承の隠れた特性、クラス不均衡、データセット内の格差はしばしば見過ごされる。本研究は, 脆弱な道路利用者間のクラス不均衡を調査し, クラス分布の分析, 性能評価, バイアスの影響評価に焦点をあてる。一般的なCNNモデルとヴィジュアルトランスフォーマー(ViT)をnuScenesデータセットと組み合わせることで,表現不足のクラスに対する検出の相違を評価できる。関連する研究と比較して、モデル最適化とバイアス軽減のためのメトリクス特化学習とコスト感学習に焦点を合わせ、データ拡張と再サンプリングを含む。提案手法を用いて、CNNモデルでは、IoU(\%)とNDS(\%)のメトリクスが71.3から75.6、80.6から83.7に改善されている。同様に、ViTでは、IoUとNDSのメトリクスが74.9から79.2、83.8から87.1に改善されているのを観察する。本研究は,データセットにおけるマイノリティクラスに対する包括性を向上しつつ,信頼性の高いモデルの開発に寄与する。

The accuracy and fairness of perception systems in autonomous driving are essential, especially for vulnerable road users such as cyclists, pedestrians, and motorcyclists who face significant risks in urban driving environments. While mainstream research primarily enhances class performance metrics, the hidden traits of bias inheritance in the AI models, class imbalances and disparities within the datasets are often overlooked. Our research addresses these issues by investigating class imbalances among vulnerable road users, with a focus on analyzing class distribution, evaluating performance, and assessing bias impact. Utilizing popular CNN models and Vision Transformers (ViTs) with the nuScenes dataset, our performance evaluation indicates detection disparities for underrepresented classes. Compared to related work, we focus on metric-specific and Cost-Sensitive learning for model optimization and bias mitigation, which includes data augmentation and resampling. Using the proposed mitigation approaches, we see improvement in IoU(\%) and NDS(\%) metrics from 71.3 to 75.6 and 80.6 to 83.7 for the CNN model. Similarly, for ViT, we observe improvement in IoU and NDS metrics from 74.9 to 79.2 and 83.8 to 87.1. This research contributes to developing reliable models while enhancing inclusiveness for minority classes in datasets.

翻訳日:2024-05-15 00:33:27 公開日:2024-05-12

# 厳格なAI監査にはブラックボックスアクセスが不十分

Black-Box Access is Insufficient for Rigorous AI Audits ( http://arxiv.org/abs/2401.14446v2 )

ライセンス: Link先を確認

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell,

(参考訳) AIシステムの外部監査は、AIガバナンスの重要なメカニズムとして、ますます認識されている。しかし、監査の有効性は監査人に与えられるアクセスの程度に依存する。最近の最先端のAIシステムの監査は、主にブラックボックスアクセスに依存しており、監査官はシステムに問い合わせて出力を観察することしかできない。しかしながら、システムの内部動作(例えば重量、アクティベーション、勾配)へのホワイトボックスアクセスは、監査人がより強力な攻撃を行い、モデルをより徹底的に解釈し、微調整を行うことを可能にする。一方、トレーニングやデプロイメント情報(方法論、コード、ドキュメンテーション、データ、デプロイメントの詳細、内部評価からの発見など)への外部アクセスは、監査人が開発プロセスを精査し、より対象とする評価を設計できるようにします。本稿では,ブラックボックス監査の限界と,ホワイトボックス監査とアウトサイドボックス監査の利点について検討する。また、これらの監査を最小限のセキュリティリスクで実施するための技術的、物理的、法的保護についても論じる。その結果,(1)監査員が使用するアクセスと手法に関する透明性は,監査結果を適切に解釈するには必要であり,(2)ブラックボックスのみよりも,ホワイトボックスとアウト・ザ・ボックスのアクセスの方がかなり精査できることがわかった。

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

翻訳日:2024-05-15 00:33:27 公開日:2024-05-12

# マルチエージェント会話による診断精度の向上:認知バイアス軽減のための大規模言語モデルを用いて

Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias ( http://arxiv.org/abs/2401.14589v2 )

ライセンス: Link先を確認

Yu He Ke, Rui Yang, Sui An Lie, Taylor Xin Yi Lim, Hairil Rizal Abdullah, Daniel Shu Wei Ting, Nan Liu,

(参考訳) 背景: 臨床的意思決定における認知バイアスは, 診断の誤りや患者準最適結果に大きく寄与する。これらのバイアスに対処することは、医療分野における深刻な課題である。目的:本研究では,大規模言語モデル(LLM)が,マルチエージェントフレームワークの利用を通じてバイアスを軽減する役割について検討する。我々は,多エージェント会話による臨床意思決定プロセスのシミュレートを行い,診断精度の向上に有効性を評価する。方法: 認知バイアスが誤診となった16件の症例報告を文献から同定した。マルチエージェントフレームワークでは,GPT-4を利用して4つの模擬エージェント間の相互作用を促進し,臨床チームのダイナミクスを再現した。各エージェントにはそれぞれ異なる役割がある。 1)議論の後に最終診断を行う。 2 悪魔の主張及び正当性確認及び偏見 3 早期閉鎖バイアスを低減するための議論の指導者及びファシリテーター 4) 結果を記録・要約すること。初発診断, 上発鑑別診断, 最終2つの鑑別診断の精度について, 合計80のシミュレーションを行った。結果: 初期診断と最終診断の両方を評価する80の回答において, 初診の精度は0% (0/80) であったが, マルチエージェントによる議論の結果, トップディファレンシャル診断の精度は71.3% (57/80), 最終2つのディファレンシャル診断の精度は80.0% (64/80) に向上した。結論: このフレームワークは、誤解を招く初期調査のシナリオであっても、誤解を再評価し、修正する能力を示した。 LLM駆動型多エージェント会話フレームワークは、診断に難渋する医療シナリオにおける診断精度を高めることを約束している。

Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses. Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80). Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.

翻訳日:2024-05-15 00:33:27 公開日:2024-05-12

# AI信頼度測定のための決定理論フレームワーク

A Decision Theoretic Framework for Measuring AI Reliance ( http://arxiv.org/abs/2401.15356v4 )

ライセンス: Link先を確認

Ziyang Guo, Yifan Wu, Jason Hartline, Jessica Hullman,

(参考訳) 人間はしばしば人工知能(AI)システムの助けを借りて意思決定をする。一般的なパターンは、最終決定をコントロールしている人間に対して、AIがアクションを推奨することである。研究者は、補完的なパフォーマンスを達成する上で重要な要素として、人間がAIに適切に依存していることを確認する。このような研究で用いられる適切な依存の定義には、正式な統計的根拠が欠如しており、矛盾を招く可能性があると論じる。統計的決定理論に基づく信頼の形式的定義を提案する。これは、意思決定者がAIの推奨に従う確率として信頼の概念を、人間が信号の識別や状況に関する正確な信念を形成する際の課題と区別するものである。私たちの定義は、人間とAIの相補性と信頼に関する研究の設計と解釈を導くのに使用できるフレームワークを生み出します。文献からのAIによる最新の意思決定研究を用いて、我々のフレームワークは、信号の正確な識別ができないために、損失と損失との信頼の相違による損失を分離するためにどのように使用できるかを実証する。これらの損失を,行動意思決定者と同じ意思決定課題に直面した合理的な意思決定者によって達成される期待された報酬によって定義される相補的性能の基準とベンチマークと比較することにより評価する。

Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.

翻訳日:2024-05-15 00:23:41 公開日:2024-05-12

# より深いかより広いか:ソボレフ損失を伴う最適一般化誤差からの展望

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss ( http://arxiv.org/abs/2402.00152v2 )

ライセンス: Link先を確認

Yahong Yang, Juncai He,

(参考訳) ニューラルネットワークのアーキテクチャを構築することは、機械学習コミュニティにとって困難な追求であり、より深く進むか、より広く進むかというジレンマは、依然として永続的な疑問である。本稿では,よりフレキシブルな層数を持つディープニューラルネットワーク (DeNN) と限られた層を持つワイドニューラルネットワーク (WeNN) を比較し,ソボレフの損失における最適一般化誤差に着目した。分析研究により、ニューラルネットワークのアーキテクチャは、サンプルポイントの数、ニューラルネットワーク内のパラメータ、損失関数の正則性など、様々な要因に大きく影響を受けることが判明した。具体的には、より多くのパラメータがWeNNを好む傾向にあり、一方、サンプルポイントの増加と損失関数の規則性の向上は、DeNNの採用に傾いている。この理論を、ディープ・リッツと物理インフォームド・ニューラルネットワーク(PINN)法を用いた偏微分方程式に応用し、ニューラルネットワークの設計を導く。

Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.

翻訳日:2024-05-15 00:23:41 公開日:2024-05-12

# Dataset Condensation Driven Machine Unlearning

Dataset Condensation Driven Machine Unlearning ( http://arxiv.org/abs/2402.00195v2 )

ライセンス: Link先を確認

Junaid Iqbal Khan,

(参考訳) データ規制要件とプライバシ保護機械学習の現在のトレンドは、機械学習の重要性を強調している。余分なサンプルを補足して再訓練することで、未学習のトレーニングデータに対する素直なアプローチは、計算上の課題に影響を受けやすい。これらの課題は、機械学習の傘の下に落ちてくるテクニックの集合を通じて、効果的に対処されてきた。しかし、未学習モデルの実用性とプライバシと調和して、永続的な計算課題を扱うのに十分でないことがまだ残っている。これは、トレーニングデータセットの観点から、近似アンラーニングの計算複雑性を改善する作業が不足しているためである。本稿では,画像分類の文脈において,機械学習の重要な要素としてデータセットの凝縮を導入することで,このギャップを埋めることを目的とする。この目的を達成するために、機械学習のプライバシ、ユーティリティ、効率のバランスをとる新しいデータセット凝縮技術と革新的なアンラーニングスキームを提案する。さらに,機械のアンラーニングを計測するための新しい効果的な手法を提案し,その適用方法として,メンバシップ推論とモデル逆転攻撃を防御する手法を提案する。さらに,本手法の新たな応用として,未学習サンプルの影響を受けずに任意のモデルを迅速に学習できる「凝縮モデル」からデータを抽出する手法を提案する。対応するコードは \href{https://github.com/algebraicdianuj/DC_U}{URL} で公開されている。

The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.

翻訳日:2024-05-15 00:23:41 公開日:2024-05-12

# NeuroCine:人間の脳活動から映像を復号する

NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties ( http://arxiv.org/abs/2402.01590v2 )

ライセンス: Link先を確認

Jingyuan Sun, Mingxiao Li, Zijiao Chen, Marie-Francine Moens,

(参考訳) 人間の脳の視覚処理の複雑さを理解するために、脳の活動から動的視覚体験を再構築することは、難しいが魅力的な試みとして現れている。近年の進歩は、非侵襲的な脳記録からの静的な画像の再構成に成功しているが、連続的な脳活動の動画フォーマットへの変換の領域はいまだ未解明のままである。本研究では、ノイズ、空間冗長性、時間ラグなどのfMRIデータを復号化するための新しい二相フレームワークであるNeuroCineを紹介する。本フレームワークは、コントラスト学習fMRI表現のための空間マスキングと時間補間に基づく拡張と、ビデオ生成のための依存先行雑音によって強化された拡散モデルを提案する。 SSIMが測定した,fMRIデータセットにおける3つの被験者の脳活動の復号化について,各被験者の脳活動の復号化について,それぞれ${20.97\%}$,${31.00\%}$,${12.30\%}$の顕著なマージンで,従来の最先端モデルを上回る有望な結果を示す。さらに,本モデルが既存の脳構造や機能と一致していることが示唆され,その生物学的妥当性と解釈可能性が示唆された。

In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor. While recent advancements have achieved success in reconstructing static images from non-invasive brain recordings, the domain of translating continuous brain activities into video format remains underexplored. In this work, we introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data, such as noises, spatial redundancy and temporal lags. This framework proposes spatial masking and temporal interpolation-based augmentation for contrastive learning fMRI representations and a diffusion model enhanced by dependent prior noise for video generation. Tested on a publicly available fMRI dataset, our method shows promising results, outperforming the previous state-of-the-art models by a notable margin of ${20.97\%}$, ${31.00\%}$ and ${12.30\%}$ respectively on decoding the brain activities of three subjects in the fMRI dataset, as measured by SSIM. Additionally, our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.

翻訳日:2024-05-15 00:23:41 公開日:2024-05-12

# 微調整強化学習モデルは秘かに緩和問題である

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem ( http://arxiv.org/abs/2402.02868v2 )

ライセンス: Link先を確認

Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś,

(参考訳) ファインチューニング(英: Fine-tuning)は、ファウンデーションモデルの成功例で最近紹介されたように、実践者が事前訓練された能力の伝達を可能にする広範なテクニックである。しかし、微調整強化学習(RL)モデルは依然として課題である。この研究は、行動と観察の間の相互作用によってRL設定でアクセント化され、事前訓練された能力を忘れる、移動不良の1つの特定の原因を概念化する。すなわち、モデルは、微調整の初期段階に訪れない下流タスクの状態部分空間を悪化させ、事前学習によりモデルがうまく振る舞う。このようにして、期待される転送利益を失うのです。この問題が発生した場合の条件を特定し、それが一般的であり、多くの場合破滅的であることを示す。課題であるNetHackとMontzumaのRevenge環境の詳細な実証分析を通じて、標準的な知識保持技術が問題を緩和し、事前学習された能力を最大限に活用できることを示す。特にNetHackでは、Human Monkシナリオの前のベストスコアを5ドルKから10ドルKポイントに改善した、ニューラルモデルのための新たな最先端技術を実現しています。

Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario.

翻訳日:2024-05-15 00:23:41 公開日:2024-05-12

# テキストと画像の拡散を優先的に調整するDense Reward View

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference ( http://arxiv.org/abs/2402.08265v2 )

ライセンス: Link先を確認

Shentao Yang, Tianqi Chen, Mingyuan Zhou,

(参考訳) 好みのテキスト・画像拡散モデル(T2I)の調整が研究の注目を集めている。優先データによるT2Iを直接最適化する以前の研究は存在するが、これらの手法は、生成過程のシーケンシャルな性質を無視しつつ、拡散逆鎖全体に対する遅延報酬のバンドイット仮定の下で開発されている。これは選好アライメントの有効性と効率を損なう可能性がある。本稿では,T2I逆鎖の初期ステップを強調する,より微細な報酬視点を導出し,トラクタブルアライメントの目的を導出する。特に、時間的対称性を破り、T2I生成階層に適合するように、DPOスタイルの明示的回帰自由目的に時間的割引を導入する。単一および複数プロンプト生成実験において,本手法は定量的および定性的に,強い関連するベースラインと競合する。我々のアプローチの洞察を説明するために、さらなる調査が行われた。

Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.

翻訳日:2024-05-15 00:13:55 公開日:2024-05-12

# ニューラルネットワークにおけるインテクスト学習による人間のカリキュラム効果

Human Curriculum Effects Emerge with In-Context Learning in Neural Networks ( http://arxiv.org/abs/2402.08674v2 )

ライセンス: Link先を確認

Jacob Russin, Ellie Pavlick, Michael J. Frank,

(参考訳) 人間の学習は規則のような構造と訓練に使用される例のカリキュラムに敏感である。簡潔な規則によって管理されるタスクでは、関連する例が試行錯誤によってブロックされる場合、学習はより堅牢になるが、そのような規則がなければインターリービングはより効果的である。これまでのところ、これらの一見矛盾する効果を同時に捉えた神経モデルはない。ここでは、メタラーニングで訓練されたニューラルネットワークと大規模言語モデル(LLM)の両方において、同じトレードオフが'in-context learning'(ICL)'で自然に現れることを示す。 ICLは、アクティベーションダイナミックスで実装されたインナーループアルゴリズムを通じて、重み変更なしで新しいタスク‘in context'’を学習する機能である。事前訓練されたLLMとメタラーニングトランスフォーマーを用いた実験では、ICLはルールのような構造を含むタスクにおいて人間に示されるブロッキングの利点を示し、逆に、同時に重み付き学習は、そのような構造が欠如しているタスクにおいて人間に観察されるインターリービングの利点を再現することを示した。

Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with ``in-context learning'' (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks ``in context'' -- without weight changes -- via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.

翻訳日:2024-05-15 00:13:55 公開日:2024-05-12

# HyperAgent: 複雑な環境のためのシンプルでスケーラブルで効率的な強化学習フレームワーク

HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments ( http://arxiv.org/abs/2402.10228v4 )

ライセンス: Link先を確認

Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo,

(参考訳) 資源制約下での複雑なタスクを解決するためには、強化学習(RL)エージェントは単純で効率的でスケーラブルで、(1)大きな状態空間と(2)相互作用データの連続的な蓄積に対処する必要がある。一般値関数に関連付けられた後続の計算効率の高いインクリメンタル近似を,共役性やデータ効率のよい動作選択を不要に実現した,ハイパーモデルとインデックスサンプリングを特徴とするRLフレームワークHyperAgentを提案する。 HyperAgentの実装は簡単で、Double-DQNに必要なモジュールをひとつ追加するだけでよい。 HyperAgentは、大規模なディープRLベンチマークで堅牢なパフォーマンスを提供する最初の方法であり、証明可能なスケーラブルなステップ毎の計算複雑性を実現し、表の仮定の下でサブ線形後悔を実現する。 HyperAgentは、問題のサイズに合わせて最適にスケールし、Atariベンチマークの下でのデータと計算の両方で大幅な効率向上を示すエピソードでディープシーのハードな探索問題を解決することができる。理論解析の核となるのは、ジョンソン-リンデンシュトラウスの非自明なマーチンゲール拡大であるシーケンシャルランダム射影の最初の解析ツールによって実現された逐次後近似論である。この研究はRLの理論的および実践的な領域を橋渡しし、RLアルゴリズム設計の新しいベンチマークを確立した。

To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.

翻訳日:2024-05-15 00:13:55 公開日:2024-05-12

# 逐次ランダム投影のための確率ツール

Probability Tools for Sequential Random Projection ( http://arxiv.org/abs/2402.14026v3 )

ライセンス: Link先を確認

Yingru Li,

(参考訳) 本稿では,不確実性の下での逐次的意思決定の課題に根ざした,逐次的ランダムプロジェクションに適した最初の確率的フレームワークを提案する。この分析は、逐次決定過程に固有の適応機構の副産物である確率変数の逐次依存と高次元の性質によって複雑である。本研究は停止過程の新規な構築を特徴とし,連続的に相互に相互に相互に相互に相互に相互に連携する一連の集中事象の解析を容易にする。停止過程から導かれる自己正規化過程において混合の手法を用いることで、所望の非漸近確率境界を達成する。この境界はジョンソン・リンデンシュトラウス(JL)補題の非自明なマーチンゲール拡大を表し、ランダム射影とシーケンシャル解析に関する文献への先駆的な貢献を示している。

We introduce the first probabilistic framework tailored for sequential random projection, an approach rooted in the challenges of sequential decision-making under uncertainty. The analysis is complicated by the sequential dependence and high-dimensional nature of random variables, a byproduct of the adaptive mechanisms inherent in sequential decision processes. Our work features a novel construction of a stopped process, facilitating the analysis of a sequence of concentration events that are interconnected in a sequential manner. By employing the method of mixtures within a self-normalized process, derived from the stopped process, we achieve a desired non-asymptotic probability bound. This bound represents a non-trivial martingale extension of the Johnson-Lindenstrauss (JL) lemma, marking a pioneering contribution to the literature on random projection and sequential analysis.

翻訳日:2024-05-15 00:13:55 公開日:2024-05-12

# ACE : 因果性を考慮したエントロピー規則化によるオフポリシィアクター批判

ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization ( http://arxiv.org/abs/2402.14528v2 )

ライセンス: Link先を確認

Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu,

(参考訳) 政策学習過程における異なる原始的行動の異なる重要性は、以前のモデルフリーなRLアルゴリズムによって見過ごされてきた。この知見を生かして、異なる行動次元と報酬の間の因果関係を探求し、訓練中の様々な原始的行動の重要性を評価する。因果関係を意識したエントロピーという用語を導入し、効率的に探索するための潜在的影響の高いアクションを効果的に識別し、優先順位付けする。さらに,特定の原始的行動に過度に焦点を合わせることを防ぐために,勾配休眠現象を解析し,休眠誘導リセット機構を導入し,本手法の有効性をさらに高める。提案アルゴリズムであるACE:Off-policy Actor-critic with Causality-aware Entropy regularizationは、7つのドメインにまたがる29の異なる連続制御タスクに対して、モデルのないRLベースラインと比較して大きな性能上の優位性を示す。ベンチマーク結果とビデオはhttps://ace-rl.github.io/.com/で公開されている。

The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.

翻訳日:2024-05-15 00:04:06 公開日:2024-05-12

# スピン軌道結合二元ボソニック凝縮体における半渦ソリトンとその励起状態

Semi-vortex solitons and their excited states in spin-orbit-coupled binary bosonic condensates ( http://arxiv.org/abs/2403.01458v2 )

ライセンス: Link先を確認

Haiming Deng, Jinqing Li, Zhaopin Chen, Yaohui Liu, Dong Liu, Chunzhi Jiang, Chao Kong, Boris A. Malomed,

(参考訳) 半渦(SV)型の2次元2成分基本ソリトンは、その成分の渦率$(s_{+},s_{-})=(0,1)$で、スピン軌道結合(SOC)二元系における安定基底状態(GS)であり、系の臨界崩壊の可能性にもかかわらず、両成分で作用する接触自己誘引と凝縮することが知られている。しかし、SVソリトン(英語版)の励起状態(ESs)は、(s_{+},s_{-})=(S_{+},S_{+}+1)$と$S_{+}=1,2,3,...$が同じ系で不安定である。本研究では,SOC系におけるSVソリトンESを2成分の自己相互作用の反対の符号で構成する。主な発見はES-SVソリトンの安定性であり、追加の渦度は(少なくとも)$S_{+}=6$である。臨界崩壊の開始のノルムのしきい値である$N_{\mathrm{thr}}$は、一般的に知られている臨界値より高く、$N_{c}\approx 5.85$は単成分タウンズソリトンと結びつき、$N_{\mathrm{thr}}$は$S_{+}$の成長とともに増加する。 GS-SVソリトンの安定運動速度間隔も見いだされた。以上の結果から, 安定渦ソリトンの生成は, トポロジカル電荷の高い解法であることが示唆された。

It is known that two-dimensional two-component fundamental solitons of the semi-vortex (SV) type, with vorticities $(s_{+},s_{-})=(0,1)$ in their components, are stable ground states (GSs) in the spin-orbit-coupled (SOC) binary Bose-Einstein condensate with the contact self-attraction acting in both components, in spite of the possibility of the critical collapse in the system. However, excited states(ESs) of the SV solitons, with the vorticity set $(s_{+},s_{-})=( S_{+},S_{+}+1)$ and $S_{+}=1,2,3,...$, are unstable in the same system. We construct ESs of SV solitons in the SOC system with opposite signs of the self-interaction in the two components. The main finding is stability of the ES-SV solitons, with the extra vorticity (at least) up to $S_{+}=6$. The threshold value of the norm for the onset of the critical collapse, $N_{\mathrm{thr}}$, in these excited states is higher than the commonly known critical value, $N_{c}\approx 5.85$,associated with the single-component Townes solitons, $N_{\mathrm{thr}}$ increasing with the growth of $S_{+}$. A velocity interval for stable motion of the GS-SV solitons is found too. The results suggest a solution for the challenging problem of the creation of stable vortex solitons with high topological charges.

翻訳日:2024-05-15 00:04:06 公開日:2024-05-12

# 時空重畳における量子アルゴリズム

Quantum Algorithms in a Superposition of Spacetimes ( http://arxiv.org/abs/2403.02937v3 )

ライセンス: Link先を確認

Omri Shmueli,

(参考訳) 量子コンピュータは私たちの情報処理能力に革命をもたらすと期待されている。古典から量子コンピューティングへの進歩は、古典から量子物理学への進化の産物である。自然の疑問は、物理学が将来どんなことを許すのかということだ。物理学のより高度な理論は、量子コンピューティングを超えて、我々の計算能力を高めることができるのか? 物理学における活発な研究分野は、量子力学(QM)と一般相対性理論(GR)を量子重力の統一理論(QG)に結合しようとするときに形成される説明可能な量子力学の範囲外の理論現象の研究である。 QGは因果構造と事象順序の量子重ね合わせの可能性を示すことが知られている。量子情報理論の文献では、これはユニタリ進化順序の重ね合わせに翻訳される。本研究では、QGに基づく自然計算モデルの最初の例を示し、標準量子計算(標準硬度仮定の下で)よりも指数的な高速化を提供する。我々は、ユニタリ進化順序の重ね合わせを生成する能力を持つ量子コンピュータのモデルと複雑性の尺度を定義し、そのようなコンピュータが多項式時間で解くことができることを示す: グラフ同型問題(英語版)(\mathsf{GI}$)とギャップ$O\left(n \sqrt{n} \right)$)であるギャップのギャップを持つギャップクローズトベクトル問題(英語版)(\mathsf{GapCVP}$)である。これらの問題は、通常の量子コンピュータでは解決が難しいと専門家によって信じられている。興味深いことに、我々のモデルはオーバーパワーとは思えず、$\mathbf{NP}$ や $\mathbf{SZK}$ のように、コンピュータ科学において難しいと考えられるすべての複雑性クラスを解く明確な方法が見つからなかった。

Quantum computers are expected to revolutionize our ability to process information. The advancement from classical to quantum computing is a product of our advancement from classical to quantum physics -- the more our understanding of the universe grows, so does our ability to use it for computation. A natural question that arises is, what will physics allow in the future? Can more advanced theories of physics increase our computational power, beyond quantum computing? An active field of research in physics studies theoretical phenomena outside the scope of explainable quantum mechanics, that form when attempting to combine Quantum Mechanics (QM) with General Relativity (GR) into a unified theory of Quantum Gravity (QG). QG is known to present the possibility of a quantum superposition of causal structure and event orderings. In the literature of quantum information theory, this translates to a superposition of unitary evolution orders. In this work we show a first example of a natural computational model based on QG, that provides an exponential speedup over standard quantum computation (under standard hardness assumptions). We define a model and complexity measure for a quantum computer that has the ability to generate a superposition of unitary evolution orders, and show that such computer is able to solve in polynomial time two of the fundamental problems in computer science: The Graph Isomorphism Problem ($\mathsf{GI}$) and the Gap Closest Vector Problem ($\mathsf{GapCVP}$), with gap $O\left( n \sqrt{n} \right)$. These problems are believed by experts to be hard to solve for a regular quantum computer. Interestingly, our model does not seem overpowered, and we found no obvious way to solve entire complexity classes that are considered hard in computer science, like the classes $\mathbf{NP}$ and $\mathbf{SZK}$.

翻訳日:2024-05-15 00:04:06 公開日:2024-05-12

# カメラLiDARフュージョンを用いた自律走行用多物体追跡

Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving ( http://arxiv.org/abs/2403.04112v2 )

ライセンス: Link先を確認

Riccardo Pieroni, Simone Specchia, Matteo Corno, Sergio Matteo Savaresi,

(参考訳) 本稿では、カメラとLiDARデータを組み合わせた自動運転車のための新しいマルチモーダルマルチオブジェクトトラッキング(MOT)アルゴリズムを提案する。カメラフレームは最先端の3Dオブジェクト検出器で処理されるのに対し、古典的なクラスタリング技術はLiDAR観測に使用される。提案したMOTアルゴリズムは、3段階のアソシエーションプロセスと、検出された動的障害物の運動を推定する拡張カルマンフィルタと、トラック管理フェーズとを備える。 EKF運動モデルは、観測対象の電流測定された相対位置と向きと、エゴ車両の縦・角速度を入力として要求する。多くの最先端のマルチモーダルMOTアプローチとは異なり、提案アルゴリズムはエゴのグローバルなポーズの地図や知識に依存しない。さらに、カメラ専用に3D検出器を使用し、使用するLiDARセンサーの種類に依存しない。このアルゴリズムはシミュレーションと実世界のデータの両方で検証され、良好な結果が得られる。

This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.

翻訳日:2024-05-15 00:04:06 公開日:2024-05-12

# 長い管内を移動する量子渦ループのエネルギースペクトル

The energy spectrum of a quantum vortex loop moving in a long pipe ( http://arxiv.org/abs/2403.06441v2 )

ライセンス: Link先を確認

S. V. Talalov,

(参考訳) 本研究では, 細長い管内を移動する量子渦ループのエネルギースペクトルの問題を初めて解く。渦フィラメントは局所誘導近似に記述されている。我々は、この力学系を新しい方法を用いて定量化し、循環$\Gamma$とエネルギー値$E$の非自明な結果をもたらす。最終形式では、渦ループのスペクトルを ''Regge trajectory'' の形で示し、$E = E(\Gamma)$ とする。渦量子化問題は2流体流体力学や他の従来の手法の外部にあると考えられる。

In this study, the problem of the energy spectrum of a quantum vortex loop moving in a thin long pipe is solved for the first time. The vortex filament is described in the Local Induction Approximation. We quantize this dynamic system using a new method, which leads to non-trivial results for circulation $\Gamma$ and energy values $E$. In the final form, we present the spectrum of the vortex loop in the form of a ''Regge trajectory'' $E = E(\Gamma)$. The vortex quantization problem is considered outside of two-fluid hydrodynamics and other conventional approaches.

翻訳日:2024-05-15 00:04:06 公開日:2024-05-12

# シュレーディンガー化による偏微分方程式の量子回路

Quantum Circuits for partial differential equations via Schrödingerisation ( http://arxiv.org/abs/2403.10032v2 )

ライセンス: Link先を確認

Junpeng Hu, Shi Jin, Nana Liu, Lei Zhang,

(参考訳) 量子コンピューティングは、特に大規模PDEシミュレーションにおいて、古典コンピューティングと比較して、大きなスピードアップを達成するための有望な道として登場した。主要な量子的アプローチの1つは、シュリンガー型方程式にのみ直接適用可能なハミルトニアンシミュレーションの利用である。この制限に対処するため、一般線形 PDE を Schr\"odinger-type equation に変換するためにワープ変換を用いることで、Schr\"odingerisation 技術が開発された。しかし、Schr\"オーダライゼーション技術の開発にもかかわらず、一般のPDEを解くための対応する量子回路の明示的な実装は設計されていない。本稿では、Schr\"オーダライゼーション技術を用いた一般PDEのための量子アルゴリズムの詳細な実装について述べる。提案手法の有効性を実証するために, 熱方程式の例と, 風上スキームにより近似された対流方程式を提案する。複素性解析は、これらのアルゴリズムの量子的利点を古典的アルゴリズムよりも高次元で示すためにも行われる。

Quantum computing has emerged as a promising avenue for achieving significant speedup, particularly in large-scale PDE simulations, compared to classical computing. One of the main quantum approaches involves utilizing Hamiltonian simulation, which is directly applicable only to Schr\"odinger-type equations. To address this limitation, Schr\"odingerisation techniques have been developed, employing the warped transformation to convert general linear PDEs into Schr\"odinger-type equations. However, despite the development of Schr\"odingerisation techniques, the explicit implementation of the corresponding quantum circuit for solving general PDEs remains to be designed. In this paper, we present detailed implementation of a quantum algorithm for general PDEs using Schr\"odingerisation techniques. We provide examples of the heat equation, and the advection equation approximated by the upwind scheme, to demonstrate the effectiveness of our approach. Complexity analysis is also carried out to demonstrate the quantum advantages of these algorithms in high dimensions over their classical counterparts.

翻訳日:2024-05-14 23:54:21 公開日:2024-05-12

# DTOR: 異常を説明するための決定木外部回帰器

DTOR: Decision Tree Outlier Regressor to explain anomalies ( http://arxiv.org/abs/2403.10903v4 )

ライセンス: Link先を確認

Riccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio Cosentini,

(参考訳) 外乱の発生と発生のメカニズムを説明することは、様々な領域において非常に重要である。誤動作、詐欺、脅迫は正しく識別されるだけでなく、効果的に行動可能な対策を実行するために有効な説明を必要とすることが多い。異常を識別するための高度な機械学習アプローチを、これまで以上に広く利用することで、このような説明がより困難になる。本稿では,異常検出モデルにより生成された異常スコアを推定することにより,個々のデータポイントに対する規則に基づく説明を生成する手法であるDTORを提案する。これはまず、推定スコアを計算し、データポイントスコアに関連する相対パスを抽出する決定木回帰器を適用する。本結果は,多数の特徴を持つデータセットにおいても,DTORの堅牢性を示すものである。さらに、他の規則に基づくアプローチとは対照的に、生成された規則は説明すべき点によって一貫して満たされる。さらに、我々の評価基準は、実行時間を短縮し、外乱説明タスクにおけるAnchorsに匹敵する性能を示す。

Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.

翻訳日:2024-05-14 23:54:21 公開日:2024-05-12

# 顔表情認識のための注意融合型エモティックマスク付きオートエンコーダ

Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition ( http://arxiv.org/abs/2403.13039v3 )

ライセンス: Link先を確認

Bach Nguyen-Xuan, Thien Nguyen-Hoang, Thanh-Huy Nguyen, Nhu Tai-Do,

(参考訳) 表情認識(FER)はコンピュータビジョンにおける重要な課題であり、様々な領域にまたがる多様な応用がある。表現認識モデルの一般化能力を損なうような限られたFERデータセットの課題に対処することは、性能向上に不可欠である。本稿では,表現分類のためのMAE-Face self-supervised learning (SSL) 法と多視点統合注意機構を統合し,特に第6回感情行動分析(ABAW)コンペティションで紹介する。人間の表情の変化を強調する高次特徴を学習する前に、外見(外見)からの低次特徴情報を活用することにより、本研究は、検査された視点(主観)を改善するための、単純かつ革新的な方法を提供することを目指している。また、重要な顔の特徴を強調することを目的とした、実装が容易でトレーニングなしのフレームワークを提案し、そのような機能がモデルのガイドとして機能し、重要なローカル要素に焦点を当てるかどうかを判断する。本手法の有効性は,Aff-wild2データセットにおけるモデル性能の向上によって検証される。

Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.

翻訳日:2024-05-14 23:54:21 公開日:2024-05-12

# サイバー犯罪とオンライン詐欺における良心の武器化 : 新しいシステム理論

Weaponization of Conscience in Cybercrime and Online Fraud: A Novel Systems Theory ( http://arxiv.org/abs/2403.14667v2 )

ライセンス: Link先を確認

Michelle Espinoza,

(参考訳) 本論では, 詐欺師が行為を偽装したり, 他人を強要したり, 被害者を欺いたりするための, 複雑なシステムと戦術としての良心の武器化の概念を紹介する。本研究は、軍事プロパガンダと心理学的操作原理の理論的基盤から導かれる概念的アプローチを採用し、良心の兵器化に対する理解と防御のためのレンズとして機能させる。

This article introduces the concept of weaponization of conscience as a complex system and tactic employed by fraudsters to camouflage their activity, coerce others, or to deceive their victims. This study adopts a conceptual approach, drawing from the theoretical underpinnings of military propaganda and psychological operations doctrines and adapting them to serve as a lens through which to understand and defend against weaponization of conscience.

翻訳日:2024-05-14 23:54:21 公開日:2024-05-12

# 保存光子電流

Conserved photon current ( http://arxiv.org/abs/2403.16919v2 )

ライセンス: Link先を確認

Margaret Hawton,

(参考訳) 保存光子電流は、電磁四電位場テンソル演算子によって満たされる可換関係から導かれる。密度は正および負の周波数項に対する和であり、どちらも正の数密度に寄与し、共通の方向に伝播する。離散正および負の周波数励起はどちらも光子として同定される。光子数は光子密度の空間積分に等しいが、源やシンクが存在しない状態で保存される。

A conserved photon current is derived from the commutation relations satisfied by the electromagnetic four-potential and field tensor operators. The density is found to be a sum over positive and negative frequency terms, both of which contribute a positive number density and propagate in a common direction. Discrete positive and negative frequency excitations are both identified as photons. Photon number, equal to the spatial integral of photon density, is conserved in the absence of sources and sinks.

翻訳日:2024-05-14 23:54:21 公開日:2024-05-12

# ディープチャネル事前制御による非教師なし機能強化モジュールによる実世界の劣化における視覚認識の促進

Boosting Visual Recognition in Real-world Degradations via Unsupervised Feature Enhancement Module with Deep Channel Prior ( http://arxiv.org/abs/2404.01703v2 )

ライセンス: Link先を確認

Zhanwen Liu, Yuhang Li, Yang Wang, Bolin Gao, Yisheng An, Xiangmo Zhao,

(参考訳) 通常の環境下での自動運転車の環境認識は、過去10年間にかなりの成功を収めてきた。しかし、霧、低照度、動きのぼかしなどの様々な不快な条件は、画像の品質を低下させ、自動運転の安全性に重大な脅威をもたらす。すなわち、劣化画像に適用した場合、画像の統計的・構造的特性の破壊による特徴量損失やアーチファクトの干渉により、最先端の視覚モデルが性能低下に悩まされることがしばしばある。そこで本研究では,劣化した視覚認識のための新しいDeep Channel Prior (DCP)を提案する。具体的には、事前学習されたモデルの深部表現空間において、劣化した特徴と同一の劣化型とのチャネル相関が、異なる内容や意味を持つ場合でも一様分布を持ち、高分離性特徴空間における劣化した特徴と明確な表現の間のマッピング関係の学習を容易にすることを観察する。そこで,UFEMの第1段階では,多目的機構を導入して,高分離性特徴空間における遅延コンテンツ復元とアーティファクト除去を実現する,新しいプラグアンドプレイunsupervised Feature Enhancement Module (UFEM)を提案する。次に、DCPの指導の下、大域的相関変調のための第2段階に生成した特徴を移し、高品質で認識しやすい特徴を得る。 3つのタスクと8つのベンチマークデータセットの評価結果から,提案手法は実劣化条件下での事前学習モデルの性能を総合的に向上できることを示した。ソースコードはhttps://github.com/liyuhang166/Deep_Channel_Priorで入手できる。

The environmental perception of autonomous vehicles in normal conditions have achieved considerable success in the past decade. However, various unfavourable conditions such as fog, low-light, and motion blur will degrade image quality and pose tremendous threats to the safety of autonomous driving. That is, when applied to degraded images, state-of-the-art visual models often suffer performance decline due to the feature content loss and artifact interference caused by statistical and structural properties disruption of captured images. To address this problem, this work proposes a novel Deep Channel Prior (DCP) for degraded visual recognition. Specifically, we observe that, in the deep representation space of pre-trained models, the channel correlations of degraded features with the same degradation type have uniform distribution even if they have different content and semantics, which can facilitate the mapping relationship learning between degraded and clear representations in high-sparsity feature space. Based on this, a novel plug-and-play Unsupervised Feature Enhancement Module (UFEM) is proposed to achieve unsupervised feature correction, where the multi-adversarial mechanism is introduced in the first stage of UFEM to achieve the latent content restoration and artifact removal in high-sparsity feature space. Then, the generated features are transferred to the second stage for global correlation modulation under the guidance of DCP to obtain high-quality and recognition-friendly features. Evaluations of three tasks and eight benchmark datasets demonstrate that our proposed method can comprehensively improve the performance of pre-trained models in real degradation conditions. The source code is available at https://github.com/liyuhang166/Deep_Channel_Prior

翻訳日:2024-05-14 23:44:37 公開日:2024-05-12

# 2レベルフィードバック制御によるネットワークシステムの侵入耐性

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control ( http://arxiv.org/abs/2404.01741v4 )

ライセンス: Link先を確認

Kim Hammar, Rolf Stadler,

(参考訳) サービスレプリカを2段階最適制御問題とするシステムの侵入耐性を定式化する。ローカルレベルではノードコントローラが侵入回復を行い、グローバルレベルではシステムコントローラが複製係数を管理する。局所的およびグローバルな制御問題は、操作研究における古典的な問題、すなわち機械交換問題と在庫補充問題として定式化することができる。この定式化に基づいて、侵入耐性システムのための新しい制御アーキテクチャであるTOLERANCEを設計する。両レベルにおける最適制御戦略がしきい値構造を持ち、それらの計算に効率的なアルゴリズムを設計することを証明する。 10種類のネットワーク侵入を行うエミュレーション環境でのTOLERANCEの実装と評価を行う。その結果、TOLERANCEは、最先端の侵入耐性システムと比較して、サービスの可用性を向上し、運用コストを低減できることがわかった。

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

翻訳日:2024-05-14 23:44:37 公開日:2024-05-12

# 室温動作のための高感度、1550nm光検出器

Highly sensitive and efficient 1550 nm photodetector for room temperature operation ( http://arxiv.org/abs/2404.15218v2 )

ライセンス: Link先を確認

Rituraj, Zhi Gang Yu, R. M. E. B. Kandegedara, Shanhui Fan, Srini Krishnamurthy,

(参考訳) 効果的な量子通信のようなフォトニック量子技術は、1550nmの波長で高い外部量子効率(EQE)を持つ単一または少数の光子センサーを操作する室温(RT)を必要とする。このセグメントのデバイスの主要なクラスは、特にガイガーモードで動作する雪崩光検出器である。 RT操作と高いEQEの要求は相反することが多く、結果として妥協された解決策となる。我々は,共最適化誘電体フォトニック結晶基板上に2次元(2D)半導体材料を用いて,RTの3桁のダーク電流を同時に低減し,EQEを99%以上維持する装置を開発した。超低暗電流と高光検出効率を有する単一光子検出の基礎を形成する。 2D素材のハイキャリアモビリティを損なうため、ジッタ時間は~psで、大型の2Dアレイカメラに統合することができる。

Photonic quantum technologies such as effective quantum communication require room temperature (RT) operating single- or few- photon sensors with high external quantum efficiency (EQE) at 1550 nm wavelength. The leading class of devices in this segment is avalanche photodetectors operating particularly in the Geiger mode. Often the requirements for RT operation and for a high EQE are in conflict, resulting in a compromised solution. We have developed a device which employs a two-dimensional (2D) semiconductor material on a co-optimized dielectric photonic crystal substrate to simultaneously decrease the dark current by three orders of magnitude at RT and maintain an EQE of >99%. The device is amenable to avalanching and form a basis for single photon detection with ultra-low dark current and high photodetection efficiency. Harnessing the high carrier mobility of 2D materials, the device has ~ps jitter time and can be integrated into a large 2D array camera.

翻訳日:2024-05-14 23:44:37 公開日:2024-05-12

# 光の変調モーメントにおけるスピンハミルトニアン

Spin Hamiltonians in the Modulated Momenta of Light ( http://arxiv.org/abs/2405.00484v2 )

ライセンス: Link先を確認

Juan Feng, Zengya Li, Luqi Yuan, Erez Hasman, Bo Wang, Xianfeng Chen,

(参考訳) 異なるスピンハミルトニアンの基底状態を見つけることができるフォトニックソルバは、多くの対話的な物理系や組合せ最適化問題の研究に利用できる。ここでは、空間光輸送によるスピンハミルトニアンの実空間対応を確立する。実空間スピン相互作用は光の運動量-空間の流れを変調することによって決定される。この原理は一般化されたプランシェレルの定理として定式化され、任意の変位依存スピン相互作用の基底状態を見つけるための単純な光学シミュレータを実装できる。特に、この原理を用いて、J1-J2-J3モデルからエキゾチックな磁気位相図を明らかにし、また、XYモデルから渦を介するベレジンスキー-コステリッツ-Thoulessのダイナミクスも観察する。これらの実験は光の運動量空間からスピン相互作用を微妙に制御することで高い計算精度を示し、新しい物理効果を探求する有望なスキームを提供する。

Photonic solvers that are able to find the ground states of different spin Hamiltonians can be used to study many interactive physical systems and combinatorial optimization problems. Here, we establish a real-and-momentum space correspondence of spin Hamiltonians by spatial light transport. The real-space spin interaction is determined by modulating the momentum-space flow of light. This principle is formulated as a generalized Plancherel theorem, allowing us to implement a simple optical simulator that can find the ground states for any displacement-dependent spin interactions. Particularly, we use this principle to reveal the exotic magnetic phase diagram from a J1-J2-J3 model, and we also observe the vortex-mediated Berezinskii-Kosterlitz-Thouless dynamics from the XY model. These experiments exhibit high calculation precision by subtly controlling spin interactions from the momentum space of light, offering a promising scheme to explore novel physical effects.

翻訳日:2024-05-14 23:44:37 公開日:2024-05-12

# インディネイティブラテンアメリカの言語におけるNLPの進歩

NLP Progress in Indigenous Latin American Languages ( http://arxiv.org/abs/2404.05365v2 )

ライセンス: Link先を確認

Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio,

(参考訳) この論文は、急速な技術進歩に直面した先住民コミュニティの限界化に焦点を当てている。我々は、これらの言語の文化的豊かさと、自然言語処理(NLP)の領域で見落とされがちなリスクを強調した。我々はこれらのコミュニティと研究者のギャップを埋めることを目指しており、先住民のコミュニティ観を尊重する包括的技術進歩の必要性を強調している。我々は、ラテンアメリカ先住民言語のNLPの進展と、ラテンアメリカ先住民言語の地位、NLPにおける表現、その保存と発展に必要な課題と革新について調査する。この論文は、ラテンアメリカの先住民コミュニティ、特に低資源・先住民コミュニティにおけるNLPの必要性と進歩を理解する上での現在の文献に貢献する。

The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.

翻訳日:2024-05-14 23:34:50 公開日:2024-05-12

# RAR-b:検索ベンチマークとしての推論

RAR-b: Reasoning as Retrieval Benchmark ( http://arxiv.org/abs/2404.06347v2 )

ライセンス: Link先を確認

Chenghao Xiao, G Thomas Hudson, Noura Al Moubayed,

(参考訳) セマンティックテキスト類似性(STS)と情報検索タスク(IR)タスクは,過去数年間の埋め込みモデルの進展を記録するための主要な方法である。新たなRAG(Retrieval-augmented Generation)パラダイムの下では、埋め込みモデルの次世代言語理解能力を評価し、それらに格納される推論能力について意識的に検討する必要がある。検索者は推論の問題を解けるだろうか? 推論タスクを検索タスクに変換することで、推論レベルの言語理解の訓練がなければ、現在の最先端の検索モデルは、特に推論集約タスクにおいてLLMを補助する役割を演じる能力にはまだ及ばないことが分かる。さらに、指示に気付くように訓練されているにもかかわらず、命令を意識したIRモデルは、推論タスクの推論時間に指示を使わずに、しばしば、研究コミュニティが協調するように見落としているレトリバー-LLMの行動ギャップを装う。しかし、最近のデコーダベースの埋め込みモデルは、そのギャップを狭め、推論レベルの言語理解を達成するための埋め込みモデルの経路を強調している。また,現行のオフ・ザ・シェルフ・リランカモデルではこれらのタスクではフェールするが,微調整による推論能力の注入はバイエンコーダよりも容易であることを示す。 Reasoning as Retrieval Benchmark (RAR-b) は、検索モデルに格納された推論能力を評価するためのタスクと設定の総合的なスイートである。 RAR-bはhttps://github.com/gowitheflow-1998/RAR-bで入手できる。

Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored in them. Addressing this, we pose the question: Can retrievers solve reasoning problems? By transforming reasoning tasks into retrieval tasks, we find that without specifically trained for reasoning-level language understanding, current state-of-the-art retriever models may still be far from being competent for playing the role of assisting LLMs, especially in reasoning-intensive tasks. Moreover, albeit trained to be aware of instructions, instruction-aware IR models are often better off without instructions in inference time for reasoning tasks, posing an overlooked retriever-LLM behavioral gap for the research community to align. However, recent decoder-based embedding models show great promise in narrowing the gap, highlighting the pathway for embedding models to achieve reasoning-level language understanding. We also show that, although current off-the-shelf re-ranker models fail on these tasks, injecting reasoning abilities into them through fine-tuning still appears easier than doing so to bi-encoders, and we are able to achieve state-of-the-art performance across all tasks by fine-tuning a reranking model. We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models. RAR-b is available at https://github.com/gowitheflow-1998/RAR-b.

翻訳日:2024-05-14 23:34:50 公開日:2024-05-12

# CNNオートエンコーダによる画像分類作業への影響評価

Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks ( http://arxiv.org/abs/2404.10664v2 )

ライセンス: Link先を確認

Mohsen Hami, Mahdi JameBozorg,

(参考訳) 現実世界から撮影された画像は、しばしば異なる種類のノイズに影響され、コンピュータビジョンシステムの性能と視覚データの品質に大きな影響を与える。本研究では, 鋳造品のノイズ画像における欠陥検出のための新しい手法を提案する。この手法は、VGG16、InceptionV3などの深層学習モデルを空間領域と周波数領域の両方で利用し、ノイズタイプと欠陥状態を特定する。研究プロセスは、前処理イメージから始まり、続いて特定のノイズカテゴリに合わせてデノナイジング技術を適用する。ノイズ検出とデノナイズを分類パイプラインに統合することにより、欠陥検出の精度と堅牢性を高めることが目的である。本研究は周波数領域のノイズタイプ分類にVGG16を用い,99%以上の精度を実現した。塩とペッパーノイズの除去は平均87.9であり、ガウスノイズ除去は平均64.0であり、周期ノイズ除去は平均81.6である。この包括的アプローチは、現実世界の産業アプリケーションにおいて、Deep AutoEncoderモデルとCentral Filterの有効性を示す。最後に, 欠陥検出における二分法分類精度は, 従来法に比べて大幅に向上した。 VGG16分類器の精度は94.6%から97.0%に向上し、提案手法の有効性を示した。同様に、InceptionV3分類器では、精度が84.7%から90.0%に向上し、さらにノイズ分析を分類パイプラインに統合する利点が検証された。

Images captured from the real world are often affected by different types of noise, which can significantly impact the performance of Computer Vision systems and the quality of visual data. This study presents a novel approach for defect detection in casting product noisy images, specifically focusing on submersible pump impellers. The methodology involves utilizing deep learning models such as VGG16, InceptionV3, and other models in both the spatial and frequency domains to identify noise types and defect status. The research process begins with preprocessing images, followed by applying denoising techniques tailored to specific noise categories. The goal is to enhance the accuracy and robustness of defect detection by integrating noise detection and denoising into the classification pipeline. The study achieved remarkable results using VGG16 for noise type classification in the frequency domain, achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of 64.0, and periodic noise removal yielded an average SSIM of 81.6. This comprehensive approach showcases the effectiveness of the deep AutoEncoder model and median filter, for denoising strategies in real-world industrial applications. Finally, our study reports significant improvements in binary classification accuracy for defect detection compared to previous methods. For the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the effectiveness of the proposed noise detection and denoising approach. Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to 90.0%, further validating the benefits of integrating noise analysis into the classification pipeline.

翻訳日:2024-05-14 23:34:50 公開日:2024-05-12

# InfoMatch:半スーパービジョン画像分類のためのエントロピーニューラル推定

InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification ( http://arxiv.org/abs/2404.11003v3 )

ライセンス: Link先を確認

Qi Han, Zhibo Tian, Chengwei Xia, Kun Zhan,

(参考訳) 擬似的監督と整合性正規化を利用した半教師画像分類は顕著な成功を収めた。しかし、現在進行中の課題は、ラベルなしデータの可能性を完全に活用することにある。これを解決するために,情報エントロピーニューラル推定を用いて,ラベルのないサンプルのポテンシャルを利用する。コントラスト学習にインスパイアされたエントロピーは、異なる拡張ビュー間での相互情報の低境界を最大化することによって推定される。さらに,画像分類器の後部の情報エントロピーが,ソフトマックス予測の確率関数を最大化することにより近似されることを理論的に分析する。これらの知見に導かれ、予測確率分布が基底構造分布と密接に一致することを保証するため、両視点からモデルを最適化する。情報エントロピーとの理論的関連性を考えると、我々はこの手法をInfoMatchと命名する。広範な実験を通じて,その優れた性能を示す。ソースコードはhttps://github.com/kunzhan/InfoMatch.comで入手できる。

Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information across different augmented views. Moreover, we theoretically analyze that the information entropy of the posterior of an image classifier is approximated by maximizing the likelihood function of the softmax predictions. Guided by these insights, we optimize our model from both perspectives to ensure that the predicted probability distribution closely aligns with the ground-truth distribution. Given the theoretical connection to information entropy, we name our method InfoMatch. Through extensive experiments, we show its superior performance. The source code is available at https://github.com/kunzhan/InfoMatch.

翻訳日:2024-05-14 23:10:20 公開日:2024-05-12

# OPTiML: 自己監督型医用画像表現のための最適輸送を用いた高密度セマンティック不変性

OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation ( http://arxiv.org/abs/2404.11868v3 )

ライセンス: Link先を確認

Azad Singh, Vandan Gorade, Deepak Mishra,

(参考訳) 自己教師付き学習(SSL)は、アノテーションなしで学習できることから、医用画像解析の有望な技術として登場した。しかし、有望な可能性にもかかわらず、従来のSSLメソッドでは、セマンティックアライメントの達成や微妙な詳細の取得など、制限に直面している。これは、解剖学的構造や病理的詳細を正確に把握できない、最適下界表現につながる。これらの制約に対応するため,医用画像表現学習におけるSSLの全体的な効果を高めるために,最適なトランスポート(OT)を用いた新しいSSLフレームワークOPTiMLを導入する。中心となる考え方は、OTとクロスビューポイントセマンティクス・インフュージョン・モジュール(CV-SIM)を統合することである。 CV-SIMモジュールに加えて、OPTiMLはOTフレームワーク内での分散と共分散の規則化を強制し、臨床的に関係のある情報に焦点を絞ると同時に、より少ない情報的特徴を破棄する。提案するフレームワークは,様々な医用画像タスクに適用可能な意味豊かな表現を学習する能力を示す。その有効性を検証するために,胸部X線モダリティから利用可能な3つのデータセットについて実験を行った。実験の結果,OPTiMLはすべての評価課題において,最先端の手法よりも優れていることがわかった。

Self-supervised learning (SSL) has emerged as a promising technique for medical image analysis due to its ability to learn without annotations. However, despite the promising potential, conventional SSL methods encounter limitations, including challenges in achieving semantic alignment and capturing subtle details. This leads to suboptimal representations, which fail to accurately capture the underlying anatomical structures and pathological details. In response to these constraints, we introduce a novel SSL framework OPTiML, employing optimal transport (OT), to capture the dense semantic invariance and fine-grained details, thereby enhancing the overall effectiveness of SSL in medical image representation learning. The core idea is to integrate OT with a cross-viewpoint semantics infusion module (CV-SIM), which effectively captures complex, fine-grained details inherent in medical images across different viewpoints. In addition to the CV-SIM module, OPTiML imposes the variance and covariance regularizations within OT framework to force the model focus on clinically relevant information while discarding less informative features. Through these, the proposed framework demonstrates its capacity to learn semantically rich representations that can be applied to various medical imaging tasks. To validate its effectiveness, we conduct experimental studies on three publicly available datasets from chest X-ray modality. Our empirical results reveal OPTiML's superiority over state-of-the-art methods across all evaluated tasks.

翻訳日:2024-05-14 23:10:20 公開日:2024-05-12

# 表構造と文字認識のためのマルチセルデコーダと相互学習

Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition ( http://arxiv.org/abs/2404.13268v2 )

ライセンス: Link先を確認

Takaya Kawakatsu,

(参考訳) 学術論文や財務報告などの文書から表の内容を取り出し,それを大規模言語モデルで処理可能な形式に変換することは,知識情報処理において重要な課題である。テーブル構造だけでなくセル内容も認識するエンドツーエンドアプローチは、外部文字認識システムを用いた最先端モデルに匹敵する性能を達成し、さらなる改善の可能性を秘めている。さらに、これらのモデルでは、数百セルの長いテーブルを局所的な注意を払って認識できるようになった。しかし、モデルでは、ヘッダーからフッタへの1方向のテーブル構造を認識し、各セルごとにセル内容の認識を行うため、近隣セルから有用な情報を検索する機会はない。本稿では,エンド・ツー・エンドアプローチを改善するために,マルチセルコンテンツデコーダと双方向相互学習機構を提案する。この効果は2つの大きなデータセットで実証され、実験結果は、多数のセルを持つ長いテーブルであっても、最先端のモデルに匹敵する性能を示す。

Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.

翻訳日:2024-05-14 23:10:20 公開日:2024-05-12

# くさびの縁としてのW:拘束型衝突体による鐘の相関

W as the Edge of a Wedge: Bell Correlations via Constrained Colliders ( http://arxiv.org/abs/2404.13928v2 )

ライセンス: Link先を確認

Huw Price,

(参考訳) Ken Wharton との以前の研究において、ベル相関は特別な選択アーチファクトであり、組み合わせによって説明されている。 (i)コライダーバイアスと (ii)コライダー変数上の境界制約。これは光円錐の外側に直接的な因果的影響を必要としないため、ベル非局所性と相対性理論を和解する新しい方法を提供する可能性がある。この記事は提案に対する新たな議論の概要である。これは、遅延チョイスエンタングルメントスワップを含む特別な(W字型)ベルの実験に対してどのように有効かを説明し、一般的な(V字型)ケースに拡張できると主張している。

In previous work with Ken Wharton, it was proposed that Bell correlations are a special sort of selection artefact, explained by a combination of (i) collider bias and (ii) a boundary constraint on the collider variable. This requires no direct causal influence outside lightcones, and may hence offer a new way to reconcile Bell nonlocality and relativity. This piece outlines a new argument for the proposal. It explains how it is valid for a special class of ('W-shaped') Bell experiments involving delayed-choice entanglement swapping, and argues that it can be extended to the general ('V-shaped') case.

翻訳日:2024-05-14 23:10:20 公開日:2024-05-12

# EvaNet:地球画像上の標高誘導洪水のマッピング

EvaNet: Elevation-Guided Flood Extent Mapping on Earth Imagery ( http://arxiv.org/abs/2404.17917v2 )

ライセンス: Link先を確認

Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou,

(参考訳) 高解像度衛星画像からの洪水範囲の正確なタイムリーマッピングは、被害評価や救援活動などの災害管理において重要な役割を担っている。しかし、現在の最先端のソリューションはU-Netに基づいており、これは、スペクトルの特徴のみを直接判断することができない不明瞭なピクセル(例えば、ツリーキャノピー、雲)のために、フラッドピクセルを正確にセグメント化できない。米国地質調査所 (USGS) などのソースから取得可能なデジタル標高モデル (DEM) により, 洪水範囲マッピングの改善を目的とした標高マップの活用が検討されている。エンコーダ・デコーダアーキテクチャに基づく標高誘導セグメンテーションモデルであるEvaNetを提案する。(1) 重力の物理則を符号化した損失関数であり,(1) 位置が浸水(乾式)した場合,その位置が低い(乾式)位置も浸水(乾式)する必要がある。大規模な実験により、EvaNetはU-Netベースラインを著しく上回り、洪水範囲マッピングの既存のソリューションにおけるU-Netの完全な代替として機能することが示された。

Accurate and timely mapping of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent mapping. We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp. dry), then its adjacent locations with a lower (resp. higher) elevation must also be flooded (resp. dry); (2) a new (de)convolution operation that integrates the elevation map by a location sensitive gating mechanism to regulate how much spectral features flow through adjacent layers. Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent mapping.

翻訳日:2024-05-14 21:13:39 公開日:2024-05-12

# M3oE: マルチドメインマルチタスク混合専門家推薦フレームワーク

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework ( http://arxiv.org/abs/2404.18465v3 )

ライセンス: Link先を確認

Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai,

(参考訳) マルチドメインレコメンデーションとマルチタスクレコメンデーションは、異なるドメインと目的の共通情報を総合的なユーザモデリングに活用する効果を実証している。それでも、実際的な推奨は通常、複数のドメインとタスクを同時に直面する。この目的のために,適応型マルチドメインマルチタスク・マルチタスク・オブ・エキスパート・リコメンデーションフレームワークであるM3oEを紹介する。 M3oEはマルチドメイン情報を統合し、ドメインとタスク間で知識をマッピングし、複数の目的を最適化する。共通、ドメイン・アスペクト、タスク・アスペクトの3つのミックス・オブ・エキスパート・モジュールを利用して、複数のドメインとタスク間の複雑な依存関係を、互いに絡み合った方法で処理する。さらに,多様な領域やタスクをまたいだ特徴抽出と融合を正確に制御するための2段階融合機構を設計する。動的構造最適化を可能にするAutoML技術を適用することにより、フレームワークの適応性はさらに向上する。著者たちの知る限りでは、M3oEはマルチドメインのマルチタスクレコメンデーションを自己適応的に解決する最初の試みです。多様なベースラインに対する2つのベンチマークデータセットの大規模な実験は、M3oEの優れたパフォーマンスを示している。実装コードは再現性を保証するために利用可能である。

Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive Multi-domain Multi-task Mixture-of-Experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.

翻訳日:2024-05-14 21:13:39 公開日:2024-05-12

# 3次元ガウススプレイティングによるブーストラップ3次元再構成シーン

Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting ( http://arxiv.org/abs/2404.18669v2 )

ライセンス: Link先を確認

Yifei Gao, Jie Ou, Lei Wang, Jun Cheng,

(参考訳) ニューラルレンダリング技術の最近の進歩は、学術分野と商業分野の両方にわたって、フォトリアリスティックな3Dシーンのレンダリングを大幅に強化している。最新の手法は3D Gaussian Splatting(3D-GS)と呼ばれ、レンダリングの品質とスピードのベンチマークを新たに設定した。それでも、3D-GSの限界は新しい視点の合成において顕著となり、特にトレーニング中に見られるものとは大きく異なる視点についてである。また、ズームインやアウト時にダイレーションやエイリアスなどの問題が発生する。これらの課題はすべて、1つの根本的な問題、すなわち不十分なサンプリングに遡ることができる。本稿では,この問題に対処するブートストラップ法を提案する。このアプローチでは,3D-GSを用いた新しいビューのレンダリングを強化するために拡散モデルを用いて,トレーニングプロセスの合理化を行う。以上の結果から,ブートストレッピングはアーティファクトを効果的に削減し,評価指標の明確化を図っている。さらに,本手法は汎用性が高く,容易に統合可能であることを示し,様々な3次元再構成プロジェクトが本手法の恩恵を受けることができることを示した。

Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrapping method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.

翻訳日:2024-05-14 21:13:38 公開日:2024-05-12

# 組織学における弱スーパービジョン対象定位モデルのソースフリー領域適応

Source-Free Domain Adaptation of Weakly-Supervised Object Localization Models for Histology ( http://arxiv.org/abs/2404.19113v2 )

ライセンス: Link先を確認

Alexis Guichemerre, Soufiane Belharbi, Tsiry Mayet, Shakeeb Murtaza, Pourya Shamsolmoali, Luke McCaffrey, Eric Granger,

(参考訳) 深層学習の出現に伴い, 組織像に基づく癌診断において, デジタル病理学が注目されている。ディープ弱教師付きオブジェクトローカライゼーション(WSOL)モデルは、安価なグローバルな画像クラスアノテーションを使用して、がんのグレードに応じて組織像を分類し、解釈のための関心領域(ROI)を特定するために訓練することができる。当初、ラベル付きソース画像データに基づいてトレーニングされたWSOLモデルは、染色、スキャナー、癌タイプの変化によって生じる大きなドメインシフトの場合に、ラベルなしのターゲットデータを使用して適応することができる。本稿では、プライバシと効率の理由から、ソースドメインデータを一切使用せずに、事前学習したソースモデルを新しいターゲットドメインに適合させるという難題である、ソースフリー(教師なし)ドメイン適応(SFDA)に焦点を当てる。 WSOLモデルのSFDAは、分類タスクとローカライゼーションタスクの両方に適応することを意図していないため、組織学におけるいくつかの課題を提起している。本報告では, 主要SFDAファミリーの代表者である4つの最先端SFDA法について, 分類と位置推定の精度でWSOLと比較した。 SFDA-Distribution Estimation, Source HypOthesis Transfer, Cross-Domain Contrastive Learning, Adaptively Domain Statistics Alignmentである。 Glas (小, 乳癌) とCamelyon16 (大, 大腸癌) の組織学的データセットの実験結果から, これらのSFDA法は, 分類に最適化された場合, 適応後の局所化にはあまり役に立たないことが示唆された。

Given the emergence of deep learning, digital pathology has gained popularity for cancer diagnosis based on histology images. Deep weakly supervised object localization (WSOL) models can be trained to classify histology images according to cancer grade and identify regions of interest (ROIs) for interpretation, using inexpensive global image-class annotations. A WSOL model initially trained on some labeled source image data can be adapted using unlabeled target data in cases of significant domain shifts caused by variations in staining, scanners, and cancer type. In this paper, we focus on source-free (unsupervised) domain adaptation (SFDA), a challenging problem where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons. SFDA of WSOL models raises several challenges in histology, most notably because they are not intended to adapt for both classification and localization tasks. In this paper, 4 state-of-the-art SFDA methods, each one representative of a main SFDA family, are compared for WSOL in terms of classification and localization accuracy. They are the SFDA-Distribution Estimation, Source HypOthesis Transfer, Cross-Domain Contrastive Learning, and Adaptively Domain Statistics Alignment. Experimental results on the challenging Glas (smaller, breast cancer) and Camelyon16 (larger, colon cancer) histology datasets indicate that these SFDA methods typically perform poorly for localization after adaptation when optimized for classification.

翻訳日:2024-05-14 21:13:38 公開日:2024-05-12

# 不変リスク最小化は全変動モデルである

Invariant Risk Minimization Is A Total Variation Model ( http://arxiv.org/abs/2405.01389v2 )

ライセンス: Link先を確認

Zhao-Rong Lai, Weiwen Wang,

(参考訳) 不変リスク最小化(英: Invariant risk minimization、IRM)とは、機械学習において、不変の機能を様々な環境に一般化する手法である。関連するほとんどの研究は、新しいIRM設定や新しいアプリケーションシナリオに焦点を当てているが、IRMの数学的本質は、まだ適切に説明されていない。 IRM は本質的に分類器変数に関する学習リスクの $L^2$ norm (TV-$\ell_2$) に基づく総変量であることを示す。さらに,TV-$\ell_1$モデルに基づく新しいIRMフレームワークを提案する。学習リスクとして使用できる関数のクラスを拡大するだけでなく、コアレア式に基づいたデノナイズおよび不変の特徴保存における堅牢な性能も備えている。 IRM-TV-$\ell_1$のアウト・オブ・ディストリビューションの一般化の要求についても述べる。実験結果から,提案フレームワークは,いくつかのベンチマーク機械学習シナリオにおいて,競合性能を実現することが示された。

Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.

翻訳日:2024-05-14 21:13:38 公開日:2024-05-12

# プログラム自動修復のための大規模言語モデルに関する体系的文献レビュー

A Systematic Literature Review on Large Language Models for Automated Program Repair ( http://arxiv.org/abs/2405.01466v2 )

ライセンス: Link先を確認

Quanjun Zhang, Chunrong Fang, Yang Xie, YuXiang Ma, Weisong Sun, Yun Yang, Zhenyu Chen,

(参考訳) 自動プログラム修復(APR)は、ソフトウェアのバグにパッチを当て、手作業によるデバッグ作業を減らす。最近、LLM(Large Language Models)の進歩に伴い、ソフトウェア開発とメンテナンスを容易にし、優れたパフォーマンスを示すAPR技術が提案されている。しかし、LLMベースのAPR分野の探索が進行中であるため、研究者が現在の成果、課題、潜在的な機会を理解することは困難である。この研究は、2020年から2024年までのAPRにおけるLLMの応用を要約する最初の体系的な文献レビューを提供する。 LLM,APRおよびそれらの統合の観点から,127件の関連論文を分析した。まず、APRをサポートするために適用されている既存のLLMを分類し、3種類の利用戦略を概説する。さらに、LLM、例えばセマンティックバグやセキュリティ脆弱性の恩恵を受ける、いくつかの特定の修復シナリオについて詳述する。さらに、ALMをAPR研究、例えば入力形式、オープンサイエンスに統合する際のいくつかの重要な側面について論じる。最後に,今後検討すべき課題と今後の研究ガイドラインについて紹介する。本稿は,APRコミュニティにおける研究状況の体系的概要を提供し,研究成果の包括的理解と今後の研究の促進を支援する。

Automated Program Repair (APR) attempts to patch software bugs and reduce manual debugging efforts. Very recently, with the advances in Large Language Models (LLMs), an increasing number of APR techniques have been proposed, facilitating software development and maintenance and demonstrating remarkable performance. However, due to ongoing explorations in the LLM-based APR field, it is challenging for researchers to understand the current achievements, challenges, and potential opportunities. This work provides the first systematic literature review to summarize the applications of LLMs in APR between 2020 and 2024. We analyze 127 relevant papers from LLMs, APR and their integration perspectives. First, we categorize existing popular LLMs that are applied to support APR and outline three types of utilization strategies for their deployment. Besides, we detail some specific repair scenarios that benefit from LLMs, e.g., semantic bugs and security vulnerabilities. Furthermore, we discuss several critical aspects of integrating LLMs into APR research, e.g., input forms and open science. Finally, we highlight a set of challenges remaining to be investigated and the potential guidelines for future research. Overall, our paper provides a systematic overview of the research landscape to the APR community, helping researchers gain a comprehensive understanding of achievements and promote future research.

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# SSUMamba:ハイパースペクトル画像復調のための空間スペクトル選択状態空間モデル

SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising ( http://arxiv.org/abs/2405.01726v3 )

ライセンス: Link先を確認

Guanyiman Fu, Fengchao Xiong, Jianfeng Lu, Jun Zhou, Yuntao Qian,

(参考訳) ハイパースペクトル画像(HSI)のデノイングは、画像内メカニズムや環境要因から生じるノイズにより、重要な前処理手順である。スペクトル相関,空間自己相似性,空間スペクトル相関といったHSIのドメイン固有知識を活用することは,深層学習に基づく認知に不可欠である。既存の手法はしばしば、時間、空間の複雑さ、計算の複雑さによって制約され、これらの先行を別々に探索する戦略を採用する。これらの戦略は、いくつかの冗長な情報を避けることができるが、画像復元に肯定的な影響を与える、より広く、より根底にある長距離空間スペクトル情報を見落としてしまう。本稿では,空間スペクトル選択状態モデルに基づくU字型ネットワークであるSpatial-Spectral U-Mamba(SSUMamba)を提案する。状態空間モデル(SSM)計算における線形空間複雑性のおかげで,モジュール内の全地球空間スペクトル相関が得られる。本研究では3次元HSIにおける複数方向の情報フローのモデル化を支援する空間スペクトル交互走査(SSAS)戦略を提案する。実験の結果,本手法は比較手法よりも優れていた。ソースコードはhttps://github.com/lronkitty/SSUMamba.comから入手できる。

Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more underlying long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Scan (SSAS) strategy for HSIs, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms compared methods. The source code will be available at https://github.com/lronkitty/SSUMamba.

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# LLMアプリケーションにおけるタスクユーティリティの評価と検証

Assessing and Verifying Task Utility in LLM-Powered Applications ( http://arxiv.org/abs/2405.02178v2 )

ライセンス: Link先を確認

Negar Arabzadeh, Siqing Huo, Nikhil Mehta, Qinqyun Wu, Chi Wang, Ahmed Awadallah, Charles L. A. Clarke, Julia Kiseleva,

(参考訳) LLM(Large Language Models)の急速な開発は、複数のエージェント間のコラボレーションを促進し、人間の日常的な作業を支援するアプリケーションの増加につながっている。しかし、LDMを利用したアプリケーションが実際のユーザエクスペリエンスとタスク実行効率をどの程度向上させるかを評価する上で、大きなギャップが残っている。このことは、特にアプリケーションの機能とエンドユーザのニーズの整合性を確保することによって、LLMベースのアプリケーションのユーティリティを検証する必要性を強調している。 AgentEvalは,アプリケーション固有の目的に合わせた一連の基準を自動提案することで,ユーティリティ検証プロセスを簡素化する新しいフレームワークである。これにより、提案された基準に対してアプリケーションの実用性を定量化する、包括的な評価が可能になる。本稿では,AgentEval の有効性とロバスト性について,Math Problemsolving や ALFWorld House-hold 関連タスクを含む2つのオープンソースデータセットに対して包括的な解析を行った。再現性のために、データ、コード、すべてのログをhttps://bit.ly/3w3yKcSで公開しています。

The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the effectiveness and robustness of AgentEval for two open source datasets including Math Problem solving and ALFWorld House-hold related tasks. For reproducibility purposes, we make the data, code and all the logs publicly available at https://bit.ly/3w3yKcS .

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# 位置:Quo Vadis, Unsupervised Time Series Anomaly Detection?

Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? ( http://arxiv.org/abs/2405.02678v2 )

ライセンス: Link先を確認

M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis,

(参考訳) Timeseries Anomaly Detection (TAD)における機械学習奨学金の現在の状況は、欠陥のある評価指標の使用、一貫性のないベンチマークプラクティス、新しいディープラーニングベースのモデル設計における選択に対する適切な正当化の欠如に悩まされている。本稿は,TADにおける現状を批判的に分析し,現在の研究の誤解を招き,問題となる方法や評価の実践を明らかにする。我々の立場は、単に新しいモデル設計を追求することから、ベンチマークプラクティスの改善、非自明なデータセットの作成、より単純なベースラインに対して複雑なメソッドの有用性を批判的に評価することへと焦点を移すことを提唱している。その結果,厳密な評価プロトコルの必要性,単純なベースラインの作成,および最先端の深部異常検出モデルが線形写像を効果的に学習できることが示唆された。これらの結果から, 簡便かつ解釈可能なTAD法のさらなる探索と開発の必要性が示唆された。最先端のディープラーニングベースのモデルにおけるモデルの複雑さの増加は、残念ながら、ほとんど改善しない。この分野を前進させるための洞察と提案を提供する。コード:https://github.com/ssarfraz/QuoVadisTAD

The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# Negative Prompt: 負の感情刺激による大規模言語モデル強化のための心理学の活用

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli ( http://arxiv.org/abs/2405.02814v2 )

ライセンス: Link先を確認

Xu Wang, Cheng Li, Yi Chang, Jindong Wang, Yuan Wu,

(参考訳) 大規模言語モデル(LLM)は、従来の計算タスクから高度な人工知能(AI)アプリケーションまで、幅広いアプリケーションに不可欠なものとなっている。この普及により、社会科学を含む様々な分野のLSMの研究が盛んになった。特に、LLMはポジティブな感情刺激によってさらに発展できる感情知能を持っていることが研究によって明らかにされている。この発見は興味深い疑問を提起する: 否定的な感情はLLMにも影響し、パフォーマンスを向上する可能性があるか? この問いに応えて,心理学的原則を基盤とした新たなアプローチである否定的刺激(Negative Prompt)を紹介する。我々は,Flan-T5-Large,Vicuna,Llama 2,ChatGPT,GPT-4の5つのLLMを,45のタスクで厳密に評価した。 NegativePromptは、命令誘導タスクの12.89%とBIG-Benchタスクの46.25%の相対的な改善により、LLMの性能を著しく向上させる。さらに,NegativePromptの影響のメカニズムを解明するための注意可視化実験を行った。本研究は,LLMの理解と感情相互作用に大きく貢献し,感情駆動型手法としてのNegativePromptの有効性を実証し,現実の応用におけるLLMの強化に向けた新たな洞察を提供する。コードはhttps://github.com/wangxu0820/NegativePrompt.comで公開されている。

Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt.

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# 実効性エクストリーム再スケーリングのための境界対応非結合流網

Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling ( http://arxiv.org/abs/2405.02941v2 )

ライセンス: Link先を確認

Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, Rizen Guo,

(参考訳) Invertible rescaling Network (IRN) やgenerative adversarial Network (GAN) など,最近開発された生成手法は,画像再スケーリングにおいて例外的な性能を示した。しかし、IRNベースの手法はオーバースムースな結果を生成する傾向があり、一方、GANベースの手法はフェイクの詳細を簡単に生成し、実際のアプリケーションを妨げる。この問題に対処するため,現実的で視覚的に満足な結果を生成するために,境界対応デカップリングフローネットワーク(BDFlow)を提案する。標準ガウス分布として高周波情報をモデル化する従来の手法とは異なり、我々のBDFlowはまず、その高周波情報を境界分布に従属する \textit{semantic high- frequency} とガウス分布に従属する \textit{non-semantic high- frequency} に分解する。具体的には、意味的な高周波部分を正確に捉えるために、境界認識マスク(BAM)を用いて、モデルを制約してリッチテクスチャを生成する一方、非意味的な高周波部分はガウス分布からランダムにサンプリングされる。特に、我々のBDFlowは、パラメータの74%と計算の20%しか利用せず、PSNRを4.4dB、SSIMを0.1に改善しています。コードはhttps://github.com/THU-Kingmin/BAFlow.comから入手できる。

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into \textit{semantic high-frequency} that adheres to a Boundary distribution and \textit{non-semantic high-frequency} counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution.Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by 4.4 dB and the SSIM by 0.1 on average over GRAIN, utilizing only 74% of the parameters and 20% of the computation. The code will be available at https://github.com/THU-Kingmin/BAFlow.

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# 可逆的残留再スケーリングモデル

Invertible Residual Rescaling Models ( http://arxiv.org/abs/2405.02945v2 )

ライセンス: Link先を確認

Jinmin Li, Tao Dai, Yaohua Zha, Yilu Luo, Longfei Lu, Bin Chen, Zhi Wang, Shu-Tao Xia, Jingyun Zhang,

(参考訳) Invertible Rescaling Networks (IRNs)とその変種は、画像再スケーリングのような様々な画像処理タスクにおいて顕著な成果をみせた。しかし、より深いネットワークを持つIRNは訓練が難しいため、IRNの表現能力が損なわれる。この問題に対処するために,高解像度画像と高解像度画像とのビジェクションを特定の分布で学習することにより,画像再スケーリングのための可逆残留再スケーリングモデル(IRRM)を提案する。具体的には、長いスキップ接続を持つResidual Downscaling Modules (RDM) を含むディープネットワークを構築するためのIRRMを提案する。それぞれのRDMは、短い接続を持ついくつかのInvertible Residual Blocks (IRB) で構成されている。このようにして、RDMは接続をスキップすることでリッチな低周波情報をバイパスし、画像から高周波情報を抽出することに集中させる。大規模な実験により、IRRMは、パラメータや複雑さがはるかに少ない他の最先端の手法よりも、はるかに優れた性能を示します。特に, IRRMは, HCFlowとIRNのX4再スケーリングにおいてそれぞれ少なくとも0.3dBのPSNRゲインを有し, 60%のパラメータと50%のFLOPしか使用していない。コードはhttps://github.com/THU-Kingmin/IRRM.comから入手できる。

Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0.3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https://github.com/THU-Kingmin/IRRM.

翻訳日:2024-05-14 21:03:09 公開日:2024-05-12

# Outlier Gradient Analysis: ヘシアンフリーインフルエンス関数によるディープラーニングモデルの性能向上

Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions ( http://arxiv.org/abs/2405.03869v2 )

ライセンス: Link先を確認

Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, Hongfu Liu,

(参考訳) 影響関数は、各トレーニングデータサンプルがモデル予測に与える影響を評価するための堅牢なフレームワークを提供する。様々なタスクで広く使われているにもかかわらず、モデルに対する強い凸性仮定と、ヘッセン行列の逆数を計算することに関連する計算コストは、特に大きな深層モデルを分析する際に制約となる。本稿では、古典的なデータ中心のシナリオ、トリミング・デトリメンタル・サンプルに焦点を当て、統一されたフレームワークにおける両方の課題に対処する。具体的には、影響関数と外乱勾配検出による有害トレーニングサンプルの同定の同値変換を確立する。この変換は単純でヘッセン自由な定式化を提示するだけでなく、試料衝突における勾配の役割について深い洞察を与える。さらに、影響関数の凸性仮定を緩和し、その適用性を非凸深度モデルに拡張する。系統的な実験的な評価を通じて,提案した合成データセットのアウトリー勾配解析の正しさを検証し,その効果を視覚モデルにおける誤ラベルサンプルの検出,自然言語処理におけるトランスフォーマーモデルの性能向上のためのデータサンプルの選択,微調整された大規模言語モデルにおける影響力のあるサンプルの同定などに適用した。

Influence functions offer a robust framework for assessing the impact of each training data sample on model predictions, serving as a prominent tool in data-centric learning. Despite their widespread use in various tasks, the strong convexity assumption on the model and the computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large deep models. This paper focuses on a classical data-centric scenario--trimming detrimental samples--and addresses both challenges within a unified framework. Specifically, we establish an equivalence transformation between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides profound insights into the role of the gradient in sample impact. Moreover, it relaxes the convexity assumption of influence functions, extending their applicability to non-convex deep models. Through systematic empirical evaluations, we first validate the correctness of our proposed outlier gradient analysis on synthetic datasets and then demonstrate its effectiveness in detecting mislabeled samples in vision models, selecting data samples for improving performance of transformer models for natural language processing, and identifying influential samples for fine-tuned Large Language Models.

翻訳日:2024-05-14 20:52:15 公開日:2024-05-12

# DALK: LLMとKGの動的併用によるアルツハイマー病問題への科学的回答

DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature ( http://arxiv.org/abs/2405.04819v2 )

ライセンス: Link先を確認

Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、様々なアプリケーションで有望なパフォーマンスを実現している。それでも、長い知識を統合するという継続的な課題は、専門分野におけるLLMのシームレスな採用を妨げるものとなっている。本研究は, LLMs と KG の動的共増強(Dynamic Co-Augmentation of LLMs and KG)である DALK を導入し, この限界に対処し, バイオメディシンの専門的サブフィールドであるアルツハイマー病(AD)の研究におけるその能力を実証する。 LLMとKGの相乗化フレームワークを相互に強化し、まずLLMを利用して、AD関連科学文献から得られたAD固有知識グラフ(KG)を構築する。 ADQA(ADQA)ベンチマークを用いて,DALKの有効性を実証した。さらに我々は,KG と LLM を相互に強化する新たなトピックについて,貴重な洞察とガイドラインを提供するための詳細な分析を行う。コードとデータはhttps://github.com/David-Li0406/DALK.comで公開します。

Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.

翻訳日:2024-05-14 20:52:15 公開日:2024-05-12

# エントロピー・エニグマ:エントロピー最小化の成功と失敗

The Entropy Enigma: Success and Failure of Entropy Minimization ( http://arxiv.org/abs/2405.05012v2 )

ライセンス: Link先を確認

Ori Press, Ravid Shwartz-Ziv, Yann LeCun, Matthias Bethge,

(参考訳) エントロピー最小化(EM)は、テスト時に新しいデータに直面した場合に、分類モデルの精度を高めるために頻繁に使用される。 EMは、分類器を最適化し、上位予測クラスにさらに高い確率を割り当てる自己教師型学習手法である。本稿では,EMがいくつかのステップでモデルに適応する際の動作の理由と,多くのステップで適応した後に最終的に失敗する理由を解析する。 EMはまず,実験画像をトレーニング画像の近くに埋め込むことで,モデルの精度を向上することを示した。多くの最適化のステップの後、EMはモデルをトレーニング画像の埋め込みから遠ざけるようにし、その結果精度が低下する。そこで本研究では,任意のデータセット上で,ラベルにアクセスせずにモデルの精度を推定する手法を提案する。提案手法は,エントロピーの最小化のためにモデルが最適化されるにつれて,入力画像の埋め込みがどう変化するかを調べることで,精度を推定する。 23の挑戦的なデータセットの実験では、我々の方法では、平均絶対誤差が5.75 %$で、前回のSoTAよりも29.62 %$で改善されていることが示されている。私たちのコードはhttps://github.com/oripress/EntropyEnigmaで利用可能です。

Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to their top predicted classes. In this paper, we analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for many steps. We show that, at first, EM causes the model to embed test images close to training images, thereby increasing model accuracy. After many steps of optimization, EM makes the model embed test images far away from the embeddings of training images, which results in a degradation of accuracy. Building upon our insights, we present a method for solving a practical problem: estimating a model's accuracy on a given arbitrary dataset without having access to its labels. Our method estimates accuracy by looking at how the embeddings of input images change as the model is optimized to minimize entropy. Experiments on 23 challenging datasets show that our method sets the SoTA with a mean absolute error of $5.75\%$, an improvement of $29.62\%$ over the previous SoTA on this task. Our code is available at https://github.com/oripress/EntropyEnigma

翻訳日:2024-05-14 20:41:54 公開日:2024-05-12

# CoViews: コントラスト学習強化のための協調視点を用いた適応的拡張

CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning ( http://arxiv.org/abs/2405.07116v1 )

ライセンス: Link先を確認

Nazim Bendib,

(参考訳) データ拡張は、効果的なコントラスト学習に必要な高品質な正と負のペアを生成する上で重要な役割を果たす。しかしながら、一般的なプラクティスでは、複数のビューを生成するために、単一の拡張ポリシを繰り返し使用することで、ビュー間の協力の欠如による非効率なトレーニングペアにつながる可能性がある。さらに、拡張の最適セットを見つけるために、既存の多くの手法は、トレーニングを通して異なる拡張を必要とするかもしれないモデルの進化的な性質を見越して、広範囲に教師付き評価を必要とする。他のアプローチでは微分可能拡張生成器を訓練し、したがって文献からの微分不可能変換関数の使用を制限する。本稿では、計算オーバーヘッドを最小限に抑えたコントラスト学習のための効率的な適応データ拡張ポリシーを学習するためのフレームワークを提案し、これらの課題に対処する。当社のアプローチでは,トレーニング中に新たなデータ拡張ポリシを継続的に生成し,監督なしに効果的なポジティブ/ネガティブなデータを生成する。このフレームワークでは、すべてのビューで使用される拡張ポリシーを生成する \ac{IndepViews} と、各ビューに依存する拡張ポリシーを生成する \ac{CoViews} の2つの方法を提案する。これにより、各ビューに適用された変換間の依存関係を学習し、異なるビューに適用された拡張戦略が相互に補完し合い、より有意義で差別的な表現につながることを保証する。複数のデータセットやコントラスト学習フレームワークの広範な実験を通じて、我々の手法はベースラインソリューションを一貫して上回り、ビューに依存した拡張ポリシーによるトレーニングは、ビュー間で共有される独立したポリシーによるトレーニングよりも優れており、コントラスト学習性能の強化におけるその効果を示す。

Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# コンテキストニューラルネットワーク:時系列予測のためのスケーラブルな多変量モデル

Context Neural Networks: A Scalable Multivariate Model for Time Series Forecasting ( http://arxiv.org/abs/2405.07117v1 )

ライセンス: Link先を確認

Abishek Sriramulu, Christoph Bergmeir, Slawek Smyl,

(参考訳) 実世界の時系列は、しばしば孤立して取得できない複雑な相互依存性を示す。時系列をローカルに生成しながら、複数の時系列から過去のデータを世界規模でモデル化するグローバルモデルは、今や一般的である。しかし、各シリーズの予測は依然として孤立しており、近隣シリーズの現在の状況を説明できない。多変量アテンションやグラフニューラルネットワークのような多変量モデルは、シリーズ間情報を明示的に組み込むことができ、グローバルモデルの欠点に対処することができる。しかし、これらの手法は時間の経過ごとに2次的な複雑さを示し、スケーラビリティを制限している。本稿では,計算オーバーヘッドを伴わずに,近隣の時系列から関連する文脈的洞察を持つ時系列モデルを拡張するための,効率的な線形複雑化手法であるContext Neural Networkを紹介する。提案手法は,大域的モデルの制約に対処しながら,大規模データセットに対して計算的に抽出可能でありながら,近隣からのリアルタイム情報をターゲットシリーズに提供することにより,予測モデルを強化する。

Real-world time series often exhibit complex interdependencies that cannot be captured in isolation. Global models that model past data from multiple related time series globally while producing series-specific forecasts locally are now common. However, their forecasts for each individual series remain isolated, failing to account for the current state of its neighbouring series. Multivariate models like multivariate attention and graph neural networks can explicitly incorporate inter-series information, thus addressing the shortcomings of global models. However, these techniques exhibit quadratic complexity per timestep, limiting scalability. This paper introduces the Context Neural Network, an efficient linear complexity approach for augmenting time series models with relevant contextual insights from neighbouring time series without significant computational overhead. The proposed method enriches predictive models by providing the target series with real-time information from its neighbours, addressing the limitations of global models, yet remaining computationally tractable for large datasets.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# 球状ダイニングプレートとボウルの野生楕円パラメータ推定

In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls ( http://arxiv.org/abs/2405.07121v1 )

ライセンス: Link先を確認

Akil Pathiranage, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong,

(参考訳) 楕円推定は, 皿やボウルのパラメータ化に利用することができるため, 食品画像処理において重要な話題である。プレートとボウルの楕円縁を自動的に検出し、その楕円パラメータを「地中」のデータとして推定することは困難であり、様々なカメラアングルとプレート形状が、撮影、ノイズの多い背景、複数の不均一なプレートとボウルが画像に存在している可能性がある。基礎モデルの最近の進歩は、ゼロショットセマンティック理解とオブジェクトセグメンテーションに有望な機能を提供する。しかし、これらのモデルによって生成されたプレートとボウルの出力マスク境界は、従来の楕円フィッティング法に比べて一貫性と精度が欠けることが多い。本稿では,ゼロショット基礎モデルから抽出した楕円フィッティングと意味情報を組み合わせて,プレートとボウルの楕円リムを検出する手法であるWildEllipseFitを提案する。提案したYummly-ellipseデータセットの評価は、実世界のシナリオにおけるその有効性とゼロショット能力を示す。

Ellipse estimation is an important topic in food image processing because it can be leveraged to parameterize plates and bowls, which in turn can be used to estimate camera view angles and food portion sizes. Automatically detecting the elliptical rim of plates and bowls and estimating their ellipse parameters for data "in-the-wild" is challenging: diverse camera angles and plate shapes could have been used for capture, noisy background, multiple non-uniform plates and bowls in the image could be present. Recent advancements in foundational models offer promising capabilities for zero-shot semantic understanding and object segmentation. However, the output mask boundaries for plates and bowls generated by these models often lack consistency and precision compared to traditional ellipse fitting methods. In this paper, we combine ellipse fitting with semantic information extracted by zero-shot foundational models and propose WildEllipseFit, a method to detect and estimate the elliptical rim for plate and bowl. Evaluation on the proposed Yummly-ellipse dataset demonstrates its efficacy and zero-shot capability in real-world scenarios.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# トラベリングセールスマン問題の解法のための2ステップ量子探索アルゴリズムの回路設計

Circuit Design of Two-Step Quantum Search Algorithm for Solving Traveling Salesman Problems ( http://arxiv.org/abs/2405.07129v1 )

ライセンス: Link先を確認

Rei Sato, Gordon Cui, Kazuhiro Saito, Hideyuki Kawashima, Tetsuro Nikuni, Shohei Watabe,

(参考訳) グロバーのアルゴリズムのような量子探索アルゴリズムは、制約付き組合せ最適化問題を効率的に解くことが期待されている。しかし、サーキット上での走行セールスマン問題(TSP)を解決するための量子探索アルゴリズムの実装は、現在のTSPの量子探索アルゴリズムが、制約を満たす実現可能な解状態の等重畳の初期状態が既に予め用意されていると仮定しているため、潜在的に困難である。ブライト力による初期状態の生成の時間的複雑さは、実現可能な解の因子的成長とともに指数関数的に増加し、大規模TSPのための量子回路の設計においてかなりの障害となる。この問題を解決するために,2つの異なる演算子を持つ2段階の量子探索アルゴリズムを提案し,初期状態を作成してTSPを解く。このアルゴリズムはまず、TSPのすべての実現可能な解の等しい重ね合わせ状態を増幅し、その後、これらの実現可能な解状態の最適解状態を増幅する。我々のアルゴリズムは、高次非制約バイナリ最適化(HOBO)表現に符号化されており、特に要求されるキュービット数を減らし、統一回路設計による初期状態の効率的な作成と、実現可能な解の事前知識がない2次高速化によるTSPの解決を可能にしている。

Quantum search algorithms, such as Grover's algorithm, are expected to efficiently solve constrained combinatorial optimization problems. However, implementing a quantum search algorithm for solving the traveling salesman problem (TSP) on a circuit poses a potential challenge because current quantum search algorithms for TSP assume that an initial state of equal superposition of feasible solution states satisfying the constraint is already prepared a priori. The time complexity of brute-force preparation of the initial state increases exponentially with the factorial growth of feasible solutions, posing a considerable obstacle in designing quantum circuits for large-scale TSP. To overcome this problem, we propose a two-step quantum search algorithm with two distinct operators for preparing the initial state and solving TSP. The algorithm first amplifies an equal superposition state of all feasible solutions of TSP and subsequently amplifies the optimal solution states among these feasible solution states. Our algorithm, encoded in the higher-order unconstrained binary optimization (HOBO) representation, notably reduces the required number of qubits, enabling efficient preparation of the initial state with a unified circuit design and solving TSP with a quadratic speedup in the absence of prior knowledge of feasible solutions.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# 振動モードギャップ:開量子多体系における相転移の指標

Oscillating-mode gap: an indicator of phase transition in open quantum many-body systems ( http://arxiv.org/abs/2405.07132v1 )

ライセンス: Link先を確認

Taiki Haga,

(参考訳) これは、開量子多体系の相と、密度行列がどのように進化するかを決定するリウヴィリアンのスペクトル構造との関係を解明する重要な課題である。これまでの研究では、最も緩やかな退化モードの崩壊速度として定義されるリウヴィリアのギャップに焦点が当てられ、放射相転移の鍵となる指標として、対称性の破れた相の閉ざしと乱れた相の開裂に言及されている。本研究では、最も緩やかな発振モードの減衰速度として定義される発振モードギャップと呼ばれる追加のスペクトルギャップを提案する。原型発散ボソン系の解析を通じて, 系の相と相転移の包括的解析を行うために, リウビリアギャップと発振モードギャップの両方の必要性を実証する。

It presents a significant challenge to elucidate the relationship between the phases of open quantum many-body systems and the spectral structure of their governing Liouvillian, which determines how the density matrix evolves. Previous studies have focused on the Liouvillian gap, defined as the decay rate of the most slowly-decaying mode, as a key indicator of dissipative phase transition, noting its closure in symmetry-broken phases and opening in disordered phases. In this work, we propose an additional spectral gap, termed the oscillating-mode gap, defined as the decay rate of the most slowly-decaying oscillating mode. Through the analysis of a prototype dissipative boson system, we demonstrate the necessity of both the Liouvillian gap and the oscillating-mode gap for the comprehensive characterization of the system's phases and the transitions between them.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# 最も効率的な量子化LDMを実現するために複数のポストトレーニング手法を組み合わせる

Combining multiple post-training techniques to achieve most efficient quantized LLMs ( http://arxiv.org/abs/2405.07135v1 )

ライセンス: Link先を確認

Sayeh Sharify, Zifei Xu, Wanzin Yazar, Xin Wang,

(参考訳) LLM(Large Language Models)は、複雑な言語モデリングタスクにおいて卓越した性能を持つが、計算と記憶に重大な課題がある。本稿では,これらの課題を緩和する量子化の可能性について検討する。 SmoothQuant と GPTQ の2つのよく知られたポストトレーニング手法の組み合わせを体系的に研究し、それらの相互作用と LLM 量子化の進展に対する影響を包括的に分析する。マイクロスケーリング(MX)フォーマットの量子化を実現し,初期固定点フォーマットのターゲットを超えて適用範囲を広げることで,両手法の汎用性を高める。我々は、GPTQとSmoothQuantを適用し、MXフォーマットを用いてモデルを定量化することにより、OPTモデルのサイズを最大4倍、LLaMAモデルで最大3倍、無視できるパープレキシティが1-3%向上できることを示す。

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of two well-known post-training techniques, SmoothQuant and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of both techniques by enabling quantization to microscaling (MX) formats, expanding their applicability beyond their initial fixed-point format targets. We show that by applying GPTQ and SmoothQuant, and employing MX formats for quantizing models, we can achieve a significant reduction in the size of OPT models by up to 4x and LLaMA models by up to 3x with a negligible perplexity increase of 1-3%.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# ノイズ量子多項時間と多項階層のOracle分離

Oracle Separation between Noisy Quantum Polynomial Time and the Polynomial Hierarchy ( http://arxiv.org/abs/2405.07137v1 )

ライセンス: Link先を確認

Nai-Hui Chia, Min-Hsiu Hsieh, Shih-Han Hung, En-Jui Kuo,

(参考訳) 本研究は、Chen, Cotler, Huang, Li (2022) などの定義に触発された、ノイズ量子回路の物理的に動機付けられた複雑性クラス間のオラクルの分離について研究する。一定の誤差率で、分離はNPの観点で達成できると証明する。誤差レートが$\Omega(\log n/n)$の場合、この結果をPHの分離にまで拡張することができる。これは、誤りの少ない量子コンピュータでさえ、様々なシナリオや仮定の下で古典的な複雑性クラスを超える可能性があることを示している。また,Raz と Tal (2022年) と Bassirian, Bouland, Fefferman, Gunn, Tal (2021年) の研究で見出された様々なノイズ設定や,新しい古典的硬度結果についても検討する。

This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $\Omega(\log n/n)$, we can extend this result to the separation of PH. Notably, our oracles, in all separations, do not necessitate error correction schemes or fault tolerance, as all quantum circuits are of constant depth. This indicates that even quantum computers with minor errors, without error correction, may surpass classical complexity classes under various scenarios and assumptions. We also explore various common noise settings and present new classical hardness results, generalizing those found in studies by Raz and Tal (2022) and Bassirian, Bouland, Fefferman, Gunn, and Tal (2021), which are of independent interest.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# バッチと量子化を用いた大規模言語モデル推論のためのエッジインテリジェンス最適化

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization ( http://arxiv.org/abs/2405.07140v1 )

ライセンス: Link先を確認

Xinyuan Zhang, Jiang Liu, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang,

(参考訳) Generative Artificial Intelligence(GAI)は、非並列なコンテンツ生成能力で世界を席巻している。大規模言語モデル(LLM)がこの運動の最前線にある。しかし、LLMの重要なリソース要求は、しばしばクラウドホスティングを必要とするため、プライバシ、レイテンシ、利用制限に関する問題が発生する。エッジインテリジェンス(エッジインテリジェンス)は、データソースに近いユビキタスなエッジリソース上でリアルタイムのAI計算を可能にすることで、これらの課題に長年利用されてきたが、ほとんどの研究は、従来のAIモデルに焦点を当てており、モデルサイズや自動回帰プロセス、自己保持機構など、LLM推論のユニークな特徴に対処する際のギャップを残している。本稿では,LLM推論に適したエッジインテリジェンス最適化問題を提案する。具体的には,資源制限エッジデバイス上でのバッチ処理手法の展開とモデル量子化により,トランスフォーマーデコーダを用いたLCMの推論モデルを定式化する。さらに,バッチスケジューリングによる推論スループットの最大化と通信資源と計算資源の同時割り当てを目標とし,エッジリソースの制約とレイテンシと精度の変動を考慮した。このNP-hard問題に対処するため,オンラインツリー探索(DFTSP)を用いたDepth-First Tree-Searchingアルゴリズムを開発した。シミュレーションの結果, DFTSPは, 多様なユーザ設定や量子化技術にまたがるスループットの他のバッチベンチマークを上回り, ブルートフォースサーチ法と比較して, 時間複雑性を45%以上低減することがわかった。

Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# デコヒーレンス効果を有する宇宙ベル試験

Cosmological Bell Tests with Decoherence Effects ( http://arxiv.org/abs/2405.07141v1 )

ライセンス: Link先を確認

Chon Man Sou, Junqi Wang, Yi Wang,

(参考訳) インフレーション宇宙は粒子対を作り、運動量保存のためにその瞬間に絡み合っている。ゆらぎのモータを含むオペレータは、Gour-Khanna-Mann-Revzen (GKMR) のような擬似スピン演算子に書き換えることができる。これらの擬スピン作用素を利用することで、宇宙的ベルの不等式を定式化することができる。これらのベルの不等式に違反することは、原始揺らぎの量子的性質を示している。本研究では,原曲率摂動に着目した。曲率摂動は重力から生じるため、その作用はギボンズ・ホーキング・ヨーク境界項を含む。線形摂動の初期条件の選択における境界項の役割を明らかにする。その後、宇宙論的摂動の相互作用(バルクおよび境界相互作用項を含む)を進め、デコヒーレンス効果を導入する。これらのデコヒーレンス効果はベル演算子の期待値を変化させ、ベルの不等式を徐々に復元する。この過程を 'Bell test curve'' で記述し、宇宙論的摂動の量子起源をテストするための窓を提供する。また,ベル試験曲線からデコヒーレンス率の情報と一次相互作用の構造を抽出する可能性についても検討した。

The inflationary universe creates particle pairs, which are entangled in their momenta due to momentum conservation. Operators involving the momenta of the fluctuations can be rewritten into pseudo-spin operators, such as the Gour-Khanna-Mann-Revzen (GKMR) pseudo-spin. Making use of these pseudo-spin operators, cosmological Bell inequalities can be formulated. The violation of these Bell inequalities indicates the quantum nature of primordial fluctuations. In this work, we focus on primordial curvature perturbations. Since curvature perturbations arise from gravity, their action includes the Gibbons-Hawking-York boundary term. We clarify the role of the boundary term in selecting suitable initial conditions for linear perturbations. After that, we proceed to the interactions of cosmological perturbations, including the bulk and boundary interaction terms, which introduce decoherence effects. These decoherence effects change the expectation value of the Bell operator, and gradually restore the Bell inequality. We describe this process by a ``Bell test curve'', which offers a window for testing the quantum origin of cosmological perturbations. We also explore the possibility of extracting the information of the decoherence rate and the structure of primordial interactions from the Bell test curve.

翻訳日:2024-05-14 18:18:14 公開日:2024-05-12

# CLAMPによるクロスドメイン連続学習

Cross-Domain Continual Learning via CLAMP ( http://arxiv.org/abs/2405.07142v1 )

ライセンス: Link先を確認

Weiwei Weng, Mahardhika Pratama, Jie Zhang, Chen Chen, Edward Yapp Kien Yee, Ramasamy Savitha,

(参考訳) 人工ニューラルネットワークは、人間のような認知学習能力で有名だが、よく知られた破滅的な忘れ(CF)問題に遭遇する。 CFを緩和するための多くの努力にもかかわらず、特に複雑な変化環境において、これは重要な課題である。この課題は、継続学習(CL)の設定に従って、ドメイン間の適応においてさらに顕著になる。この目的のために、本稿では、追加のラベリングコストを伴わずに、そのような環境で単一モデルをデプロイできるクロスドメインCLアプローチを提案する。提案手法は,多くのプロセス (CLAMP) に対する連続的な学習手法であり,クラスアウェアな敵ドメイン適応戦略を統合して,ソースドメインとターゲットドメインを整合させる。各サンプルの影響や損失関数の相互作用を制御する各サンプルに重みの集合を割り当てるベースモデルの学習過程を、安定性と可塑性ジレンマのバランスを保ち、CF問題を防止すべく、評価者誘導学習プロセスが進められる。第1評価器は、ソースドメインの無関係なサンプルを拒絶する負の転送問題に焦点を当て、第2評価器はターゲットドメインのノイズの多い擬似ラベルを防止する。どちらのアセスラも、ランダム変換技術やソースドメインの類似したサンプルを使用して、メタラーニングアプローチで訓練されている。理論解析と広範な数値検証により、CLAMPは、すべての実験で確立されたベースラインアルゴリズムを少なくとも10\%$マージンで大幅に上回っていることが示された。

Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# シリコンT中心の光遷移パラメータ

Optical transition parameters of the silicon T centre ( http://arxiv.org/abs/2405.07144v1 )

ライセンス: Link先を確認

Chloe Clear, Sara Hosseini, Amirhossein AlizadehKhaledi, Nicholas Brunelle, Austin Woolverton, Joshua Kanaganayagam, Moein Kazemi, Camille Chartrand, Mehdi Keshavarz, Yihuang Xiong, Oney O. Soykal, Geoffroy Hautier, Valentin Karassiouk, Mike Thewalt, Daniel Higginbottom, Stephanie Simmons,

(参考訳) シリコンTセンタの狭く、通信帯域の光学発光、長いスピンコヒーレンス、直接光子統合は、分散量子コンピューティングとネットワークのためのスピン光子インターフェースとしてのこのエミッタへの関心を喚起している。しかし、T中心のスピン選択光学遷移の重要なパラメータは、文学において未決定または曖昧である。本稿では、T中心TX状態のハミルトニアンを示し、T$_0$からTX$_0$への光学遷移の鍵パラメータを、公表された結果、密度汎関数理論、新しい分光法との組み合わせから決定する。文献中の内部欠陥電位の曖昧さを解消し,電気的に調整されたT中心放射の初回測定を行った。その結果、ひずみ、電気、磁場下でのT中心の光学特性とスピン特性のモデルを提供し、量子技術の実現に利用することができる。

The silicon T centre's narrow, telecommunications-band optical emission, long spin coherence, and direct photonic integration have spurred interest in this emitter as a spin-photon interface for distributed quantum computing and networking. However, key parameters of the T centre's spin-selective optical transitions remain undetermined or ambiguous in literature. In this paper we present a Hamiltonian of the T centre TX state and determine key parameters of the optical transition from T$_0$ to TX$_0$ from a combined analysis of published results, density functional theory, and new spectroscopy. We resolve ambiguous values of the internal defect potential in the literature, and we present the first measurements of electrically tuned T centre emission. As a result, we provide a model of the T centre's optical and spin properties under strain, electric, and magnetic fields that can be utilized for realizing quantum technologies.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 安定なシグナチャは不安定:拡散モデルから画像の透かしを取り除く

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models ( http://arxiv.org/abs/2405.07145v1 )

ライセンス: Link先を確認

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong,

(参考訳) Watermarkは、AI生成画像を検出するために、業界によって広くデプロイされている。最近のウォーターマークフレームワークである 'emph{Stable Signature} (Meta が提案) は、ウォーターマークを拡散モデルのデコーダのパラメータに根付け、生成した画像が本質的にウォーターマークされる。安定署名は、emph{open-source}拡散モデルによって生成された画像を透かし、除去攻撃に対して堅牢であると主張した。本研究では,拡散モデルから透かしを微調整して除去する新たな攻撃法を提案する。この結果から, 画像の視覚的品質を維持しつつ, 画像が非透かしとなるような拡散モデルから, 効果的に透かしを除去できることが示唆された。我々の結果は、Stable Signatureは以前考えられていたほど安定していないことを強調している。

Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 光誘起非ガウス交絡ボース-アインシュタイン凝縮体における光子損失効果と異なる光子測定結果

Photon loss effects on light-mediated non-Gaussian entangled Bose-Einstein condensates projecting with different photon measurement outcomes ( http://arxiv.org/abs/2405.07153v1 )

ライセンス: Link先を確認

Shuai Gao, Manish Chaudhary, Alexey N. Pyrkov, Ebubechukwu O. Ilo-Okeke, Xin Meng, Jingyan Feng, Muhammad Jamil Khan, Tim Byners, Chaogang Lou,

(参考訳) マクロ的量子ビットに対する量子情報処理の理論は、すべてのマクロ的量子ビットが保存された数の粒子を持つという事実に基づいている。しかしながら、実験的な観点からは、これらの量子ビットは、これらの量子ビット間の絡み合いの発生可能性に影響を及ぼし、量子情報処理に効率的に使用されるようなデコヒーレンス(decoherence)の過程を経験する。遠方の原子BEC間の絡み合いを発生させる最も先進的な方法の1つは、量子非破壊測定である。本稿では,光子損失デコヒーレンスを含む場合の光子測定の影響について検討する。我々は、光子損失チャネルにおける正確な密度行列を得るために、熱絡み合った状態表現(TESR)と順序演算子(IWOP)アプローチの積分を用いる。我々は,光子数測定の結果が異なる絡み合った状態の生成につながり,それぞれが独特の特性を示すことを示した。我々は,ホフマン・タケウチとデュアン・ギドケ・シラク・ゾラー基準を用いることで,ワインランド・スクイージングやEPRのステアリング基準と比較して,絡み検出の優位性が得られることを見出した。

The theory of quantum information processing for macroscopic qubits is based on the fact that every macroscopic qubit has a conserved number of particles. However, from an experimental point of view, every such qubit experiences processes of decoherence that impact the possibilities for entanglement generation between such qubits and use in quantum information processing efficiently. One of the most prospective methods for generating entanglement between distant atomic BECs is quantum nondemolition measurements. Here, we study how the effects of photon measurement impact the entanglement when photon loss decoherence is included. We employ the thermally entangled state representation (TESR) and integral within the ordered operator(IWOP) approach to obtain the accurate density matrix in a photon loss channel. We demonstrate that varying outcomes of photon number measurements lead to the generation of distinct entangled states, each exhibiting unique characteristics. We find that using the Hofmann-Takeuchi and Duan-Giedke-Cirac-Zoller criterion provides advantages in entanglement detection compared to the Wineland squeezing and EPR steering criterion in such settings.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# マルチモーダル・ラーニングの強化:メタ学習型クロスモーダル・ナレッジ蒸留

Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities ( http://arxiv.org/abs/2405.07155v1 )

ライセンス: Link先を確認

Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro,

(参考訳) マルチモーダル学習では、いくつかのモダリティは他のモダリティよりも影響を受けており、それらの欠如は分類・分類精度に大きな影響を及ぼす可能性がある。したがって、トレーニングされたマルチモーダルモデルが、入力データから影響力のあるモダリティが欠如している場合でも、高い精度を持つことができるかどうかが重要な研究課題である。本稿では,メタ学習型クロスモーダル知識蒸留(MCKD)と呼ばれる新しい手法を提案する。 MCKDはメタラーニングプロセスを通じて各モードの重要性重みを適応的に推定する。これらの動的に学習されたモダリティの重要性重みは、重みの大きいモダリティから重みの低いモダリティへ知識を移すために、対方向のクロスモーダルな知識蒸留プロセスで使用される。このクロスモーダルな知識蒸留は、影響力のあるモダリティがなくても非常に正確なモデルを生成する。従来の手法と異なり、本手法は最小限の適応で複数のタスク(例えば、セグメンテーションや分類)で機能するように設計されている。 Brain tumor Segmentation Dataset 2018 (BraTS2018)とAudiovision-MNIST分類データセットの実験結果は、現在の最先端モデルよりもMCKDの方が優れていることを示している。特に BraTS2018 では, 腫瘍増強率 3.51 %, 腫瘍コア率 2.19 %, 腫瘍全体に対する 1.14 % が平均セグメンテーションDice スコアで有意に改善した。

In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 半自己監督型ドメイン適応:小麦頭セグメンテーションのための限定アノテートデータを用いたディープラーニングモデルの開発

Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation ( http://arxiv.org/abs/2405.07157v1 )

ライセンス: Link先を確認

Alireza Ghanbari, Gholamhassan Shirdel, Farhad Maleki,

(参考訳) 精密農業は、廃棄物や環境への影響を最小限に抑えつつ、農業の生産性、効率、利益性を向上させるための先進技術の適用を含む。ディープラーニングアプローチは、多くの視覚的タスクに対して自動意思決定を可能にする。しかし、農業領域では、成長段階の変動と天候や照明などの環境条件が、異なる条件をまたいで一般化する深層学習技術を開発する上で大きな課題となっている。これらの変数をキャプチャする広範なアノテートデータセットを作成するというリソース集約的な性質は、これらのアプローチの広範な採用を妨げる。これらの課題に対処するために,確率的拡散過程を持つ深層畳み込みニューラルネットワークに基づく半自己教師付きドメイン適応手法を導入し,手動データアノテーションの最小化を求める。 3つの手動アノテート画像とコムギ畑からのビデオクリップの選択を用いて,画像マスク対の大規模アノテートデータセットとビデオフレームから抽出した非アノテート画像の大規模データセットを生成した。合成画像-マスクペアと無注釈画像の両方を用いた2分岐畳み込みエンコーダ・デコーダモデルアーキテクチャを開発し,実画像への効果的な適応を実現した。提案したモデルは、内部テストデータセットのDiceスコア80.7\%、外部テストセットのDiceスコア64.8\%を達成し、5つの国からの画像で構成され、18のドメインにまたがる。

Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 光ポンピングによるヒ素ガリウムのコヒーレント振動発生における縦型光フォノンの役割

Dual role of longitudinal optical phonons for generation of coherent oscillations in gallium arsenide under optical pumping ( http://arxiv.org/abs/2405.07159v1 )

ライセンス: Link先を確認

Itsuki Takagi, Yuma Konno, Yosuke Kayanuma, Kazutaka G. Nakamura,

(参考訳) 低温近似を用いたガリウム(GaAs)の超高速赤外ポンプパルスによるコヒーレント長手光(LO)フォノンとLO-フォノンプラズモンカップリング(LOPC)モードの生成ダイナミクスの新規かつ簡便な図式を示す。 LOフォノンは、GaAsの励起状態にある光励起電子によって形成されるプラズモンと顕著に結合している。この結合は、励起状態におけるLOPCモードのコヒーレント振動をもたらす。ポンプパルスはまた、刺激されたラマン散乱を誘導し、基底状態でコヒーレントなLOフォノン振動を発生させる。この図は単純化されたモデルに組み込まれ、密度演算子の時間発展はリンドブラッド型量子マスター方程式を用いて計算される。理論的な結果は、過渡反射測定により観測されたLOフォノンとLOPCモードのコヒーレント振動に関する報告実験結果をよく説明できる。さらに,我々のモデルは,LOフォノンとLOPCモードの同時出現の自然な理由を提供する。

We present a novel and simple picture of the generation dynamics of coherent longitudinal optical (LO) phonons and LO-phonon-plasmon-coupled (LOPC) modes by the ultrafast infrared pump-pulses in gallium arsenide (GaAs) employing the low-temperature approximation. LO phonons exhibit a pronounced coupling with plasmons formed by the optically excited electrons in the excited states of GaAs. This coupling results in the coherent oscillation of the LOPC modes in the excited states. The pump pulse also induces stimulated Raman scattering, which generates the coherent LO-phonon oscillation in the ground state. This picture is incorporated into a simplified model, and the time evolution of the density operator is calculated using the Lindblad-type quantum master equation. The theoretical results explain well the reported experimental results on the coherent oscillation of LO phonons and LOPC modes observed through transient reflection measurements. Above all, our model provides a natural reason for the simultaneous manifestation of the LO phonons and the LOPC modes.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 自己アライメントによる大規模言語モデルを用いたロボットスキルの学習

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment ( http://arxiv.org/abs/2405.07162v1 )

ライセンス: Link先を確認

Yuwei Zeng, Yao Mu, Lin Shao,

(参考訳) 報酬関数の学習は、幅広いスキルのレパートリーを持つロボットを装備する上で、依然としてボトルネックとなっている。大規模言語モデル(LLM)には、報酬関数の学習を支援する可能性のある、貴重なタスク関連の知識が含まれている。しかし,提案した報酬関数は不正確であり,環境情報にさらに根ざす必要がある。ヒトがいない場合に報酬をより効率的に学習する方法を提案した。まず、LLMを用いて報酬の特徴とパラメータ化を提案し、次に反復的な自己調整プロセスを通じてパラメータを更新する。特に、このプロセスは、実行フィードバックに基づいてLLMと学習報酬関数とのランキングの不整合を最小化する。この手法は2つのシミュレーション環境で9つのタスクで検証された。トレーニングの有効性と効率性に対して一貫した改善が示される一方で、代替の突然変異ベースの方法と比較して、GPTトークンをはるかに少なく消費する。

Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 教育のための視覚的質問応答の実現:マルチモーダルAIとしてのGPT-4V

Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI ( http://arxiv.org/abs/2405.07163v1 )

ライセンス: Link先を確認

Gyeong-Geon Lee, Xiaoming Zhai,

(参考訳) 教育学者は、教室のダイナミクスを示す写真、学習内容に関する学生の図面、教科書のイラストなど、教育や学習の状況から得られた様々な画像データを分析してきた。必然的に、画像データの質的な分析と説明は、機械による自動化なしに人間の研究者によって行われてきた。それは、ほとんどの画像処理人工知能モデルは、一般の教育学者がアクセスできなかったり、複雑なディープニューラルネットワークアーキテクチャのために説明ができなかったためである。しかし、近年のVQA(Visual Question Answering)技術は、ユーザから与えられた画像に関する質問を受け取り、自然言語の両方で回答を返す、使用可能なビジュアル言語モデルを実現している。特にOpenAIがリリースしたGPT-4Vは、VQAを様々な目的で使用できるように、最先端のビジュアルランガウジュモデルサービスを大きく開放した。しかしながら、VQAとGPT-4Vは、まだ教育研究にはあまり適用されていない。本稿では,GPT-4Vが教育用VQAの実現に寄与することを提案する。 GPT-4Vは、技術・アクセシビリティ障壁のない教育学者によるVQA技術の利用を実現し、(2)GPT-4Vは、教育研究におけるVQAの有用性を実現する。これらのことから,本論文は教育研究のためのVQAの導入を目標とし,教育研究方法論のマイルストーンを提供する。本稿では,第2章でGPT-4VのリリースにともなうVQA技術開発について概説する。第3章は、教育研究における画像分析の利用についてレビューする。第4章では、第3章でレビューされた各研究使用法において、GPT-4Vをどのように使用できるかを示し、オペレーティングプロンプトを提供している。最後に、第5章は将来の意味について論じている。

Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# エネルギー計画による多モーダル確率軌道予測のための歩行者固有の不確かさのモデル化

Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising ( http://arxiv.org/abs/2405.07164v1 )

ライセンス: Link先を確認

Yao Liu, Quan Z. Sheng, Lina Yao,

(参考訳) 歩行者の軌道予測は、自動運転とスマートシティの領域において重要な役割を果たす。シーケンスモデルと生成モデルを用いた広範な先行研究にもかかわらず、歩行者の予測不可能な性質は、社会的相互作用や個人の嗜好に影響され、不確実性と多目的性によって特徴づけられる課題を提示する。そこで本研究では,確率的軌道予測のためのエネルギー計画デノイング(EPD)モデルを提案する。 EPDは当初、ランゲヴィンエネルギーモデル(Langevin Energy Model)を用いて、プランと呼ばれる将来の軌道の分布を粗い見積もっている。その後、確率拡散モデルによる偏極化により、この推定を洗練する。計画の導入により、EPDは反復的なステップの必要性を効果的に低減し、効率を向上する。さらに、EPDは個々の軌跡の代わりに軌跡の分布をモデル化することで従来の手法と異なる。これにより、歩行者固有の不確実性の明示的なモデリングが可能になり、複数の認知操作の必要性を排除できる。単一復調操作は、複数のサンプルを描画できる分布を生成し、効率を大幅に向上させる。さらに、EDDによるプランの微調整はモデル性能の向上に寄与する。 2つの公開データセットでEPDを検証することで、最先端の結果が得られます。さらに、アブレーション実験は個々のモジュールの寄与を裏付け、提案手法の有効性を裏付けるものである。

Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# ビジョンシステムのための資源効率のよい認識

Resource Efficient Perception for Vision Systems ( http://arxiv.org/abs/2405.07166v1 )

ライセンス: Link先を確認

A V Subramanyam, Niyati Singal, Vinay K Verma,

(参考訳) 画像認識分野の急速な進歩にもかかわらず、高解像度画像の処理は依然として計算上の課題である。しかし、この処理は、自律走行車ナビゲーションから医療画像解析まで幅広い領域における詳細な物体の洞察を抽出する上で重要である。本研究では,高解像度画像に対するメモリ効率のパッチベース処理を活用することにより,これらの課題を軽減するためのフレームワークを提案する。ローカルなパッチ情報と共にグローバルなコンテキスト表現が組み込まれており、画像の内容の包括的な理解を可能にする。メモリ制約によって制限される従来のトレーニング手法とは対照的に,本手法は超高解像度画像のトレーニングを可能にする。分類,オブジェクト検出,セグメンテーションにまたがる7つのベンチマークにおいて,本手法の有効性を示す。提案手法は,Jetson Nanoのような資源制約のあるデバイスでも高い性能を実現する。私たちのコードはhttps://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systemsで利用可能です。

Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at https://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systems.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# カメラ空間における単眼RGBからの3Dハンドメッシュの回収

3D Hand Mesh Recovery from Monocular RGB in Camera Space ( http://arxiv.org/abs/2405.07167v1 )

ライセンス: Link先を確認

Haonan Li, Patrick P. K. Chen, Yitong Zhou,

(参考訳) 仮想現実、拡張現実、ジェスチャーコントロールなどの技術の急速な進歩により、ユーザはコンピュータインターフェースとのインタラクションがより自然で直感的なものになることを期待している。既存のビジュアルアルゴリズムは、高精度で信頼性の高い絶対的な空間予測手法を必要とする、高度な人間とコンピュータのインタラクションタスクを達成するのに苦労することが多い。さらに、単眼画像における複雑なシーンやオクルージョンを扱うことは、全く新しい課題をもたらす。本研究では,ルート相対格子とルート回復タスクの並列処理を行うネットワークモデルを提案する。このモデルにより、モノクロRGB画像からカメラ空間における3Dハンドメッシュの復元が可能となる。エンド・ツー・エンドのトレーニングを容易にするために、2Dヒートマップに暗黙的な学習アプローチを用い、異なるサブタスク間の2Dキューの互換性を向上させる。インセプションの概念をスペクトルグラフ畳み込みネットワークに組み込んで、根の相対メッシュを探索し、根の回復探索のために設計された局所的詳細かつ世界的な注意深い手法と統合する。このアプローチは、複雑な環境や自己排除シーンにおけるモデルの予測性能を改善する。大規模ハンドデータセットFreiHANDの評価を通じて,提案モデルが最先端モデルに匹敵することを示した。本研究は,様々な人-コンピュータインタラクションアプリケーションにおいて,高精度かつ信頼性の高い絶対空間予測技術の発展に寄与する。

With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges. This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. The model enables the recovery of 3D hand meshes in camera space from monocular RGB images. To facilitate end-to-end training, we utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Incorporate the Inception concept into spectral graph convolutional network to explore relative mesh of root, and integrate it with the locally detailed and globally attentive method designed for root recovery exploration. This approach improves the model's predictive performance in complex environments and self-occluded scenes. Through evaluation on the large-scale hand dataset FreiHAND, we have demonstrated that our proposed model is comparable with state-of-the-art models. This study contributes to the advancement of techniques for accurate and reliable absolute spatial prediction in various human-computer interaction applications.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 特徴量コサインアライメントによるオンラインテスト時間適応の強化

Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment ( http://arxiv.org/abs/2405.07171v1 )

ライセンス: Link先を確認

WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar,

(参考訳) オンラインテスト時間適応(OTTA)は、ソースデータを必要とせずに、事前学習されたモデルを新しいターゲットドメインにオンザフライで適応させる、分散シフトを処理する効果的な戦略として登場した。 OTTAのエントロピー最小化法は,決定境界付近のあいまいさや誤った低エントロピー予測によるノイズ勾配に悩まされていることがわかった。このような制約を克服するために,クラス予測の精度と新しい領域への適応性を向上する二目的損失関数を用いたコサインアライメント最適化手法を提案する。具体的には、特徴ベクトルとクラス重みベクトルのコサイン類似性を最適化し、クラス予測の精度を高め、新しい領域へのモデルの適応性を向上する。 CIFAR-10-C、CIFAR-100-C、ImageNet-C、Office-Home、DomainNetデータセットなど、最先端の手法より優れており、多様な腐敗やドメインシフトに対して高い精度と堅牢性を示す。

Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# オントロジーに基づくログモニタリングを用いたマネージドサーバーレス環境における観測可能性とインシデント応答

Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log Monitoring ( http://arxiv.org/abs/2405.07172v1 )

ライセンス: Link先を確認

Lavi Ben-Shimol, Edita Grolman, Aviad Elyashar, Inbar Maimon, Dudu Mimran, Oleg Brodt, Martin Strassmann, Heiko Lehmann, Yuval Elovici, Asaf Shabtai,

(参考訳) フルマネージドなサーバレス環境では、クラウドサービスプロバイダがクラウドインフラストラクチャの確保に責任を持ち、アプリケーション開発者の運用とメンテナンスの労力を削減します。しかし、この環境は既存のサイバーセキュリティフレームワークやツールの使用を制限するため、監視可能性や状況認識能力(リスク評価、インシデント対応など)が低下する。加えて、サーバーレスアプリケーションの既存のセキュリティフレームワークは、すべてのアプリケーションアーキテクチャにうまく一般化せず、完全に管理されたサーバーレス環境での使用には、適応、専門的な専門知識などが必要である。本稿では,フルマネージドなサーバレス環境にデプロイされたアプリケーションに対して,3層セキュリティ方式を提案する。最初の2つのレイヤには、サーバレスログのみに基づくユニークなオントロジーが含まれており、それらを統合されたアプリケーションアクティビティ知識グラフに変換するために使用される。第3のレイヤでは、グラフベースの表現を利用する2つの状況認識ツールを実装することにより、可観測性と状況認識能力の必要性に対処する。 1) オントロジーを利用したインシデント対応ダッシュボードにより,サイバーセキュリティアラートのコンテキストにおけるアプリケーションアクティビティログの可視化と調査を行う。ユーザ調査では、ダッシュボードによって、参加者はベースラインツールよりも、より正確に、迅速に新しいセキュリティアラートに応答できることがわかった。 2)サイバーセキュリティの文脈における専門家による効果的な優先順位付けを実現するためのリスクアセスメントフレームワーク(CoA)の批判性。

In a fully managed serverless environment, the cloud service provider is responsible for securing the cloud infrastructure, thereby reducing the operational and maintenance efforts of application developers. However, this environment limits the use of existing cybersecurity frameworks and tools, which reduces observability and situational awareness capabilities (e.g., risk assessment, incident response). In addition, existing security frameworks for serverless applications do not generalize well to all application architectures and usually require adaptation, specialized expertise, etc. for use in fully managed serverless environments. In this paper, we introduce a three-layer security scheme for applications deployed in fully managed serverless environments. The first two layers involve a unique ontology based solely on serverless logs which is used to transform them into a unified application activity knowledge graph. In the third layer, we address the need for observability and situational awareness capabilities by implementing two situational awareness tools that utilizes the graph-based representation: 1) An incident response dashboard that leverages the ontology to visualize and examine application activity logs in the context of cybersecurity alerts. Our user study showed that the dashboard enabled participants to respond more accurately and quickly to new security alerts than the baseline tool. 2) A criticality of asset (CoA) risk assessment framework that enables efficient expert-based prioritization in cybersecurity contexts.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# CRSFL: 継続的認証のためのクラスタベースのリソース認識型フェデレーション学習

CRSFL: Cluster-based Resource-aware Split Federated Learning for Continuous Authentication ( http://arxiv.org/abs/2405.07174v1 )

ライセンス: Link先を確認

Mohamad Wazzeh, Mohamad Arafeh, Hani Sami, Hakima Ould-Slimane, Chamseddine Talhi, Azzam Mourad, Hadi Otrok,

(参考訳) 絶え間なく変化するテクノロジーの世界では、デバイスとのユーザーインタラクションにおいて、継続的な認証と包括的なアクセス管理が不可欠である。分散学習(SL)とフェデレート学習(FL)は、最近、分散機械学習(ML)モデルをトレーニングするための有望な技術として登場した。スマートフォンとIoT(Internet of Things)デバイスの利用が増加する中、これらの分散技術により、限られたリソースを持つユーザは、サーバーアシストによるニューラルネットワークモデルのトレーニングを完了し、異なるノード間の知識を協調的に組み合わせることができる。本研究では,ユーザプライバシ保護とデバイスリソース使用制限を両立させながら,これらの技術を組み合わせて継続的な認証課題に対処することを提案する。しかし、SLシーケンシャルなトレーニングと、異なる仕様のIoTデバイス間のリソース差のため、モデルのトレーニングは遅くなっている。したがって、クラスタベースのアプローチを用いて、同様の機能を持つデバイスをグループ化し、遅いデバイスの影響を軽減すると同時に、モデルをトレーニングできないデバイスをフィルタリングする。さらに、SLおよびFL技術を用いて機械学習モデルの学習効率とロバスト性を改善し、プロセスのオーバーヘッドを解析しながらクライアントを同時に訓練する。クラスタリングに続いて、慎重に設計された目的のリストに最適化された遺伝的アルゴリズム(GA)を用いて、トレーニングに参加するための最良のクライアント群を選択する。提案手法の性能をベースライン法と比較し,実生活型 UMDAA-02-FD 顔検出データセットを用いてその利点を実証した。その結果,提案手法であるCRSFLは,ユーザのプライバシを保ちながら高い精度を維持し,継続的な認証シナリオのオーバヘッドを低減できることが示唆された。

In the ever-changing world of technology, continuous authentication and comprehensive access management are essential during user interactions with a device. Split Learning (SL) and Federated Learning (FL) have recently emerged as promising technologies for training a decentralized Machine Learning (ML) model. With the increasing use of smartphones and Internet of Things (IoT) devices, these distributed technologies enable users with limited resources to complete neural network model training with server assistance and collaboratively combine knowledge between different nodes. In this study, we propose combining these technologies to address the continuous authentication challenge while protecting user privacy and limiting device resource usage. However, the model's training is slowed due to SL sequential training and resource differences between IoT devices with different specifications. Therefore, we use a cluster-based approach to group devices with similar capabilities to mitigate the impact of slow devices while filtering out the devices incapable of training the model. In addition, we address the efficiency and robustness of training ML models by using SL and FL techniques to train the clients simultaneously while analyzing the overhead burden of the process. Following clustering, we select the best set of clients to participate in training through a Genetic Algorithm (GA) optimized on a carefully designed list of objectives. The performance of our proposed framework is compared to baseline methods, and the advantages are demonstrated using a real-life UMDAA-02-FD face detection dataset. The results show that CRSFL, our proposed approach, maintains high accuracy and reduces the overhead burden in continuous authentication scenarios while preserving user privacy.

翻訳日:2024-05-14 18:08:19 公開日:2024-05-12

# 深層強化学習によるフェデレーション学習におけるオンデマンドモデルとクライアント展開

On-Demand Model and Client Deployment in Federated Learning with Deep Reinforcement Learning ( http://arxiv.org/abs/2405.07175v1 )

ライセンス: Link先を確認

Mario Chahoud, Hani Sami, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Mohsen Guizani,

(参考訳) フェデレートラーニング(FL)では、多様な場所やユーザタイプからのデータへのアクセスが制限されているため、ユーザの参加が制限されているため、大きな課題となる。クライアントアクセスの拡大とデータの多様化により、さまざまな視点を取り入れてモデルを強化し、適応性を向上させる。しかし、あるデバイスがFLクライアントとしてアクセス不能になり、データの可用性やクライアントの選択方法に影響を及ぼすような動的およびモバイル環境では、課題が生じる。これを解決するために、Docker Containersをオンザフライで使用する新しいクライアントをデプロイするOn-Demandソリューションを提案します。当社のオンデマンドソリューションは、Deep Reinforcement Learning(DRL)を採用して、データシフトやコンテナデプロイメントの複雑さを考慮して、クライアントの可用性と選択を目標としています。モデルデプロイメントとクライアント選択を処理するために、自律的なエンドツーエンドソリューションを採用している。 DRL戦略はMarkov Decision Process(MDP)フレームワークを使用し、Master LearnerとJoiner Learnerを使用する。設計されたコスト関数は、動的クライアントの配置と選択の複雑さを表している。シミュレーションテストは、アーキテクチャが環境の変化に容易に対応し、オン・デマンド・リクエストに応答できることを示しています。これにより、クライアントの可用性、能力、正確性、学習効率を向上し、ヒューリスティックで表型的な強化学習ソリューションを超えることができる。

In Federated Learning (FL), the limited accessibility of data from diverse locations and user types poses a significant challenge due to restricted user participation. Expanding client access and diversifying data enhance models by incorporating diverse perspectives, thereby enhancing adaptability. However, challenges arise in dynamic and mobile environments where certain devices may become inaccessible as FL clients, impacting data availability and client selection methods. To address this, we propose an On-Demand solution, deploying new clients using Docker Containers on-the-fly. Our On-Demand solution, employing Deep Reinforcement Learning (DRL), targets client availability and selection, while considering data shifts, and container deployment complexities. It employs an autonomous end-to-end solution for handling model deployment and client selection. The DRL strategy uses a Markov Decision Process (MDP) framework, with a Master Learner and a Joiner Learner. The designed cost functions represent the complexity of the dynamic client deployment and selection. Simulated tests show that our architecture can easily adjust to changes in the environment and respond to On-Demand requests. This underscores its ability to improve client availability, capability, accuracy, and learning efficiency, surpassing heuristic and tabular reinforcement learning solutions.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# ホログラム:LiDARによるリアルタイムホログラフィーオーバーレイ

Hologram: Realtime Holographic Overlays via LiDAR Augmented Reconstruction ( http://arxiv.org/abs/2405.07178v1 )

ライセンス: Link先を確認

Ekansh Agrawal,

(参考訳) 悪名高いスター・ウォーズシリーズのホログラム技術を用いて、LiDARによる3D再構成によるリアルタイムホログラムオーバーレイを作成するアプリケーションを提案する。以前の試みではSLAMやNeRFは高度に調整されたシーンを必要とするか、急激な計算コストがかかるか、動的シーンのレンダリングに失敗する。本稿では,iPhone 14 Proなどの携帯端末上で動作可能な3つの高忠実度再構築ツールを提案する。私のシステムはインタラクティブで没入的なホログラフィック体験を可能にし、拡張現実、テレプレゼンス、エンターテイメントなど幅広い用途に利用できる。

Guided by the hologram technology of the infamous Star Wars franchise, I present an application that creates real-time holographic overlays using LiDAR augmented 3D reconstruction. Prior attempts involve SLAM or NeRFs which either require highly calibrated scenes, incur steep computation costs, or fail to render dynamic scenes. I propose 3 high-fidelity reconstruction tools that can run on a portable device, such as a iPhone 14 Pro, which can allow for metric accurate facial reconstructions. My systems enable interactive and immersive holographic experiences that can be used for a wide range of applications, including augmented reality, telepresence, and entertainment.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# エントロピー不確実性関係による量子電池のショーケース抽出に関する研究

Showcasing extractable work of quantum battery via entropic uncertainty relations ( http://arxiv.org/abs/2405.07185v1 )

ライセンス: Link先を確認

Meng-Long Song, Xue-Ke Song, Liu Ye, Dong Wang,

(参考訳) 本研究では,バッテリチャージャーフィールドをモデルとした量子電池(QB)のボソニックおよびフェルミオン系貯水池存在下でのエネルギー変動に対するエントロピー不確実性関係(EURs)の有効性について検討した。以上の結果から,抽出可能な作業(エクセルギーとエルゴトロピー)は異なるシナリオで多種多様であり,厳密性と抽出可能な作業との間には複雑な関係があることが示唆された。エントロピー不確実性の低い境界の厳密性は、充電QBにおけるエネルギー変換効率のよい指標となることは注目に値する。さらに,不確実性および低拘束性を含むEURがQBシステムのエネルギー変換効率にどのように寄与するかを明らかにする。これらの知見は、量子電池の性能を評価する上での量子不確実性の役割をよりよく理解するために有用であると考えられている。

In this study, we investigate the effectiveness of entropic uncertainty relations (EURs) in discerning the energy variation in quantum batteries (QBs) modelled by battery-charger-field in the presence of bosonic and fermionic reservoirs. Our results suggest that the extractable works (exergy and ergotropy) have versatile characteristics in different scenarios, resulting in a complex relationship between tightness and extractable work. It is worth noting that the tightness of the lower bound of entropic uncertainty can be a good indicator for energy conversion efficiency in charging QBs. Furthermore, we disclose how the EUR including uncertainty and lower bound contributes to energy conversion efficiency in the QB system. It is believed that these findings will be beneficial for better understanding the role of quantum uncertainty in evaluating quantum battery performance.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 実世界データを用いたランダム化制御試験に基づく平均処理効果の適応TMLE

Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data ( http://arxiv.org/abs/2405.07186v1 )

ライセンス: Link先を確認

Mark van der Laan, Sky Qiu, Lars van der Laan,

(参考訳) ランダム化制御試験(RCT)データと実世界データ(RWD)データの両方が利用可能である場合,平均処理効果(ATE)を推定する問題を考察する。本研究では, RCT と RWD を統合したプール時間推定器と, RCT の登録条件が結果に与える影響を推定するバイアス推定器との差として ATE 推定器を分解する。適応型最小損失ベース推定(A-TMLE)フレームワークを導入し、それらを推定する。我々は、A-TMLE推定器がルート-n-一貫性を持ち、漸近的に正規であることを証明する。さらに, 有限試料では, RCT の登録条件が結果に与える影響について, 1 つの既知のオラクルモデルを持つ超効率が得られる。その結果、RWDによって誘導されるバイアスの作用モデルが小さくなればなるほど、我々の推定器の効率は向上するが、我々の推定器は常にRCTデータのみを使用する効率的な推定器と同じくらい効率的である。 A-TMLEは平均二乗誤差が小さく、95%の信頼区間を持つことで、シミュレーションにおいて既存の手法よりも優れている。 A-TMLEは、介入効果の見積をバイアスすることなく、ランダム化試験結果の効率を向上させるためにRWDを利用するのに役立つ。このアプローチは、より小さく、より高速な治験を可能にし、患者が効果的な治療を受けるまでの時間を短縮することができる。

We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is root-n-consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller the working model of the bias induced by the RWD is, the greater our estimator's efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95% confidence intervals. A-TMLE could help utilize RWD to improve the efficiency of randomized trial results without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# トラップを用いたフォトニックランダムウォーク

Photonic random walks with traps ( http://arxiv.org/abs/2405.07192v1 )

ライセンス: Link先を確認

Stefano Longhi,

(参考訳) ランダムウォークは古典的および量子的粒子に対して非常に異なる振る舞いをする。ここでは, 有限個のトラップが存在する場合の1次元格子内の光子のランダムウォークのユビキタスな挙動を明らかにする。古典的なランダムウォークでは、光子はトラップによって避けられないほど破壊され、量子ウォークでは光子は生き続けることができ、ウォークは永遠に続く。このような興味深い振る舞いは、制御可能なデコヒーレンスを持つ合成メッシュ格子におけるフォトニックランダムウォークを考慮し、量子的なランダムウォークから古典的なランダムウォークに切り替えることができる。

Random walks behave very differently for classical and quantum particles. Here we unveil a ubiquitous distinctive behavior of random walks of a photon in a one-dimensional lattice in the presence of a finite number of traps, at which the photon can be destroyed and the walk terminates. While for a classical random walk the photon is unavoidably destroyed by the traps, for a quantum walk the photon can remain alive and the walk continues forever. Such an intriguing behavior is illustrated by considering photonic random walks in synthetic mesh lattices with controllable decoherence, which enables to switch from quantum to classical random walks.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 微分可能トポックを用いた微分モデルスケーリング

Differentiable Model Scaling using Differentiable Topk ( http://arxiv.org/abs/2405.07194v1 )

ライセンス: Link先を確認

Kai Liu, Ruohui Wang, Jianfei Gao, Kai Chen,

(参考訳) 過去数年間、大規模な言語モデルがインテリジェンス発生の時代に定着し、ネットワークのスケーリングに重点が置かれてきた。現在、多くのネットワークアーキテクチャは手動で設計されており、しばしばサブ最適構成をもたらす。ニューラルアーキテクチャサーチ(NAS)手法は,このプロセスを自動化するために提案されているが,探索効率の低下に悩まされている。本研究では,ネットワークの最適幅と深さを探索する効率を高めるため,微分可能モデルスケーリング(DMS)を提案する。 DMSは、幅と深さの両方を、直接的かつ完全に異なる方法でモデル化できるため、最適化が容易である。我々は、視覚タスクからNLPタスク、CNNやTransformerなど様々なネットワークアーキテクチャまで、さまざまなタスクでDMSを評価してきた。結果は,我々のDMSが改良された構造を見つけ,最先端NAS法より優れていることを一貫して示している。具体的には、ImageNet上の画像分類において、当社のDMSは、EfficientNet-B0とDeit-Tinyのトップ1の精度をそれぞれ1.4%、Deit-Tinyは0.6%改善し、検索に0.4GPU日しか必要とせず、最先端のゼロショットNASであるZiCoを1.3%上回っている。 COCO上の物体検出では、DMSはYolo-v8-nのmAPを2.0%改善する。言語モデリングでは,Llama-7Bは従来の手法よりも低いパープレキシティと高いゼロショット分類精度で優れていた。将来、コードをリリースします。

Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods. Specifically, for image classification on ImageNet, our DMS improves the top-1 accuracy of EfficientNet-B0 and Deit-Tiny by 1.4% and 0.6%, respectively, and outperforms the state-of-the-art zero-shot NAS method, ZiCo, by 1.3% while requiring only 0.4 GPU days for searching. For object detection on COCO, DMS improves the mAP of Yolo-v8-n by 2.0%. For language modeling, our pruned Llama-7B outperforms the prior method with lower perplexity and higher zero-shot classification accuracy. We will release our code in the future.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# InsightNet: 顧客からのフィードバックから構造化されたインサイトマイニング

InsightNet: Structured Insight Mining from Customer Feedback ( http://arxiv.org/abs/2405.07195v1 )

ライセンス: Link先を確認

Sandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam Mohan,

(参考訳) 顧客レビューから構造化された洞察を自動的に抽出する新しいアプローチであるInsightNetを提案する。私たちのエンドツーエンドの機械学習フレームワークは、特定トピックの構造の欠如、非標準アスペクト名、豊富なトレーニングデータの欠如など、現在のソリューションの限界を克服するために設計されています。提案手法は,ラベル付きデータを生成する意味的類似性ヒューリスティックアプローチである生のレビューから半教師付きマルチレベル分類法を構築し,LLMを微調整してマルチタスクの洞察抽出アーキテクチャを採用する。 InsightNetは、顧客の感情と各トピックに対する口頭で、より粒度の細かいアクション可能なトピックを特定する。実際の顧客レビューデータによる評価では、InsightNetは構造、階層、完全性の観点から既存のソリューションよりも優れたパフォーマンスを示している。我々は、InsightNetがマルチラベルのトピック分類において現在の最先端手法より優れていることを実証的に証明し、F1スコアが0.85となり、前回のベストスコアよりも11%のF1スコアが向上した。さらにInsightNetは、目に見えない側面を一般化し、分類に新たなトピックを追加することを提案している。

We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 合成データジェネレータのランク付けを許可されたブロックチェーンベースのフレームワーク

Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators ( http://arxiv.org/abs/2405.07196v1 )

ライセンス: Link先を確認

Narasimha Raghavan Veeraragavan, Mohammad Hossein Tabatabaei, Severin Elvatun, Vibeke Binz Vallevik, Siri Larønningen, Jan F Nygård,

(参考訳) 合成データ生成は、不足、バイアス、プライバシといったデータ関連の課題に対処するための重要なソリューションとして、ますます認識されている。合成データの増加に伴い、利用可能なさまざまなオプションを考えると、合成データジェネレータを選択するための堅牢な評価フレームワークの必要性が高まっている。本研究では,2つの質問について検討する。 1) 特定の目的のための選択肢の集合から最適な合成データ生成装置をどうやって選択できるのか。 2) 選択プロセスをより透明に、説明責任を持ち、監査可能にするにはどうすればよいのか? これらの問題に対処するために、Sawtoothと呼ばれる認可されたブロックチェーンフレームワーク内で、提案されたランキングアルゴリズムをスマートコントラクトとして実装する、新たなアプローチを導入する。本フレームワークは,最先端のベースラインランキングソリューションとの総合的な実験と比較を通じて,望ましくない特性と望ましくない特性の両方を考慮したランキングを提供する上で,その有効性を示す。さらに,本フレームワークは,データ保護原則の遵守を確保しつつ,特定のニーズに対して最適な合成データジェネレータを選択するための貴重なツールとして機能する。

Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# Qsyn: NISQ時代以降のための開発者フレンドリーな量子回路合成フレームワーク

Qsyn: A Developer-Friendly Quantum Circuit Synthesis Framework for NISQ Era and Beyond ( http://arxiv.org/abs/2405.07197v1 )

ライセンス: Link先を確認

Mu-Te Lau, Chin-Yi Cheng, Cheng-Hua Lu, Chia-Hsu Chuang, Yi-Hsiang Kuo, Hsiang-Chun Yang, Chien-Tung Kuo, Hsin-Yu Chen, Chen-Ying Tung, Cheng-En Tsai, Guan-Hao Chen, Leng-Kai Lin, Ching-Huan Wang, Tzu-Hsu Wang, Chung-Yang Ric Huang,

(参考訳) 本稿では、新しい量子回路合成(QCS)フレームワークであるQsynを紹介し、開発者がQCSアルゴリズムとツールを研究、開発、試験、実験し、そしてフレームワークに貢献できるようにする。 1) 開発者が様々なテストシナリオを簡単に設計し、アルゴリズムで柔軟に実験できるように、リッチなコマンドラインインターフェースを設計します。 2) 開発者がアルゴリズムを極端に最適化できるように,異なる抽象レベルの量子回路上で多くのデータ表現に詳細なアクセスを提供する。 (3)私たちは,開発者が開発品質を,最新のソフトウェアエンジニアリングのベストプラクティスで確保できるように,厳格な開発フローと環境を定義します。筆者らは,T-Count Optimizationアルゴリズムの開発を実演し,最近のQCSフレームワークと同等に比較して,性能上の優位性を示す。

In this paper, we introduce a new quantum circuit synthesis (QCS) framework, Qsyn, for developers to research, develop, test, experiment, and then contribute their QCS algorithms and tools to the framework. Our framework is more developer-friendly than other modern QCS frameworks in three aspects: (1) We design a rich command-line interface so that developers can easily design various testing scenarios and flexibly conduct experiments on their algorithms. (2) We offer detailed access to many data representations on different abstract levels of quantum circuits so that developers can optimize their algorithms to the extreme. (3) We define a rigid developing flow and environment so that developers can ensure their development qualities with the best modern software engineering practices. We illustrate the friendliness of our framework with a showcase of developing a T-Count Optimization algorithm and demonstrate our performance superiority with fair comparisons to other modern QCS frameworks.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 準結晶における劣化誘起運動エッジ

Dephasing-induced mobility edges in quasicrystals ( http://arxiv.org/abs/2405.07198v1 )

ライセンス: Link先を確認

Stefano Longhi,

(参考訳) アンダーソン局在状態と拡張状態とを分離するモビリティエッジ(ME)は、ある1次元格子の1次元エネルギースペクトルにおいて周期的な順序で生じることが知られている。デファスティングとデコヒーレンス効果は、アンダーソンの局在を損なうことや輸送の促進に広く認められており、MEと局在はデファスティングの存在下では観測できないことが示唆されている。ここでは、そのような知恵とは対照的に、MEは、全ての状態がコヒーレントダイナミクスの下で非局在化される準結晶における純粋に退化効果によって生成できることが示される。脱落効果によって引き起こされる局所状態の寿命は極端に長くなりうるので、反故意に脱コヒーレンスによって格子内の励起の局在化が促進される。この結果は、合成メッシュ格子におけるフォトニック量子ウォークを考慮することで説明できる。

Mobility edges (ME), separating Anderson-localized states from extended states, are known to arise in the single-particle energy spectrum of certain one-dimensional lattices with aperiodic order. Dephasing and decoherence effects are widely acknowledged to spoil Anderson localization and to enhance transport, suggesting that ME and localization are unlikely to be observable in the presence of dephasing. Here it is shown that, contrary to such a wisdom, ME can be created by pure dephasing effects in quasicrystals in which all states are delocalized under coherent dynamics. Since the lifetimes of localized states induced by dephasing effects can be extremely long, rather counter-intuitively decoherence can enhance localization of excitation in the lattice. The results are illustrated by considering photonic quantum walks in synthetic mesh lattices.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# Chebyshev Polynomial-based Kolmogorov-Arnold Networks: 非線形関数近似のための効率的なアーキテクチャ

Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation ( http://arxiv.org/abs/2405.07200v1 )

ライセンス: Link先を確認

Sidharth SS,

(参考訳) 複素非線形関数の正確な近似は、多くの科学および工学領域における根本的な挑戦である。従来のニューラルネットワークアーキテクチャは、高次元関数に存在する複雑なパターンや不規則を捉えるのに苦労することが多い。本稿では、Chebyshev Kolmogorov-Arnoldネットワーク(Chebyshev Kan)を紹介し、Kelmogorov-Arnold理論の理論的基礎とChebyshev多項式の強力な近似能力を組み合わせた新しいアプローチを提案する。 1

Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures often struggle to capture intricate patterns and irregularities present in high-dimensional functions. This paper introduces the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a novel approach that combines the theoretical foundations of the Kolmogorov-Arnold Theorem with the powerful approximation capabilities of Chebyshev polynomials. 1

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 汎用3次元大規模知覚のための強力な事前学習ベースラインの構築

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception ( http://arxiv.org/abs/2405.07201v1 )

ライセンス: Link先を確認

Haoming Chen, Zhizhong Zhang, Yanyun Qu, Ruixin Zhang, Xin Tan, Yuan Xie,

(参考訳) 汎用的な3D表現を備えた効果的な事前学習フレームワークは、大規模な動的シーンを知覚するのに非常に望ましい。しかし、タスクジェネリックかつラベル効率の両方の理想的なフレームワークを確立することは、様々な場面で同じプリミティブの表現を統一する上での課題となる。現在のコントラスト的な3D事前学習法は、典型的にはフレームレベルの一貫性に従っており、各分離画像における2D-3D関係に焦点を当てている。このような不整合性は,(1)クロスシーンセマンティック・セルフ・コンフリクト,すなわち,異なるシーンからの同じ意味論の原始的セグメント間の激しい衝突,(2)クロスシーンセマンティック・セマンティック・セマンティクスを3次元表現学習に推し進めるグローバルな統一結合の欠如といった,普遍的な事前訓練の枠組みに到達するための有望な道を大いに妨げている。上記の課題に対処するために,シーンレベルのセマンティックセマンティックセマンティックセマンティクスを心臓に配置し,類似したセマンティクスセグメントの接続を様々なシーンにブリッジするCSCフレームワークを提案する。この目的を達成するために、視覚基盤モデルによって提供される一貫性のあるセマンティック・キューと、相補的なマルチモーダル情報から導かれる知識に富んだクロスシーンのプロトタイプを組み合わせる。これにより、様々な下流タスクを容易にし、微調整の少ないユニバーサルな3D事前学習モデルを訓練することができる。実験により,SOTA事前学習アプローチ(+1.4% mIoU),オブジェクト検出(+1.0% mAP),パノプティックセグメンテーション(+3.0% PQ)に対して,タスク固有3Dネットワークを用いたnuScenesで一貫した改善を実現した。コードはhttps://github.com/chenhaomingbob/CSCでリリースされ、将来の研究に刺激を与えたいと考えている。

An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes. To achieve this goal, we combine the coherent semantic cues provided by the vision foundation model and the knowledge-rich cross-scene prototypes derived from the complementary multi-modality information. These allow us to train a universal 3D pre-training model that facilitates various downstream tasks with less fine-tuning efforts. Empirically, we achieve consistent improvements over SOTA pre-training approaches in semantic segmentation (+1.4% mIoU), object detection (+1.0% mAP), and panoptic segmentation (+3.0% PQ) using their task-specific 3D network on nuScenes. Code is released at https://github.com/chenhaomingbob/CSC, hoping to inspire future research.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 同期オーディオによる一元化ビデオ言語事前学習

Unified Video-Language Pre-training with Synchronized Audio ( http://arxiv.org/abs/2405.07202v1 )

ライセンス: Link先を確認

Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang,

(参考訳) ビデオ言語事前学習は,大規模データから視覚的およびテキスト的表現を自己指導的に学習することを目的とした,典型的で困難な問題である。既存の事前学習アプローチは、画像とテキストのペアの対応を捉えるか、フレームの時間的順序付けを利用するかのいずれかである。しかし、彼らは音声と他の2つのモード間の自然な同期を明示的に調べていない。本稿では,VLSAと呼ばれる同期音声によるビデオ言語事前学習のための拡張フレームワークを提案する。具体的には、VLSAは、ビデオ、テキスト、オーディオのローカルパッチとグローバルトークンの埋め込みを共同で集約します。さらに,ローカル・パッチ・マスクド・モデリングを用いてモダリティを意識した特徴を学習し,グローバル・オーディオ・マッチングを利用して映像やテキストの音声誘導機能をキャプチャする。テキスト,ビデオ,音声の検索について広範な実験を行った。 0.9Mデータのみを事前学習した簡単なモデルでは,最先端のベースラインに対する結果の改善が期待できる。さらに、定性的可視化は、識別的視覚・テクスチャ表現の学習において、VLSAの優位性を鮮明に示している。

Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two modalities. In this work, we propose an enhanced framework for Video-Language pre-training with Synchronized Audio, termed as VLSA, that can learn tri-modal representations in a unified self-supervised transformer. Specifically, our VLSA jointly aggregates embeddings of local patches and global tokens for video, text, and audio. Furthermore, we utilize local-patch masked modeling to learn modality-aware features, and leverage global audio matching to capture audio-guided features for video and text. We conduct extensive experiments on retrieval across text, video, and audio. Our simple model pre-trained on only 0.9M data achieves improving results against state-of-the-art baselines. In addition, qualitative visualizations vividly showcase the superiority of our VLSA in learning discriminative visual-textual representations.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# C++11コードをC++03に変換してレガシーコンパイル環境をサポートする

Transforming C++11 Code to C++03 to Support Legacy Compilation Environments ( http://arxiv.org/abs/2405.07204v1 )

ライセンス: Link先を確認

Gábor Antal, Dávid Havas, István Siket, Árpád Beszédes, Rudolf Ferenc, József Mihalicza,

(参考訳) 新しい技術 - プログラミング言語、環境、ライブラリ - は急速に変化します。しかし、様々な内部および外部の制約により、プロジェクトはこれらの変更に迅速に適用できなくなることが多い。例えば、顧客は、ソフトウェアベンダーから特定のプラットフォーム互換性を必要とするかもしれません。本研究では、C++プログラミング言語の文脈におけるそのような問題に対処する。私たちの産業パートナーは、古いC++言語エディションのみをサポートするSDKを使用する必要があります。しかし彼らは、開発者がコードで最新の言語構造を使えるようにしたいと思っている。この問題に対処するため、私たちは、C++11標準に従って書かれたソースコードを、機能的に等価なC++03変種に自動的にバックポートする、ソースコード変換フレームワークを作成しました。私たちのフレームワークでは、開発者は最新の言語機能を自由に利用できます。本稿では,トランスフォーメーションエンジンの技術的詳細と,大規模な2つのコードベースと4つのオープンソースシステムに適用した経験について報告する。私たちのソリューションは無料で、オープンソースです。

Newer technologies - programming languages, environments, libraries - change very rapidly. However, various internal and external constraints often prevent projects from quickly adopting to these changes. Customers may require specific platform compatibility from a software vendor, for example. In this work, we deal with such an issue in the context of the C++ programming language. Our industrial partner is required to use SDKs that support only older C++ language editions. They, however, would like to allow their developers to use the newest language constructs in their code. To address this problem, we created a source code transformation framework to automatically backport source code written according to the C++11 standard to its functionally equivalent C++03 variant. With our framework developers are free to exploit the latest language features, while production code is still built by using a restricted set of available language constructs. This paper reports on the technical details of the transformation engine, and our experiences in applying it on two large industrial code bases and four open-source systems. Our solution is freely available and open-source.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 静的JavaScriptコールグラフの比較研究

Static JavaScript Call Graphs: A Comparative Study ( http://arxiv.org/abs/2405.07206v1 )

ライセンス: Link先を確認

Gábor Antal, Péter Hegedűs, Zoltán Tóth, Rudolf Ferenc, Tibor Gyimóthy,

(参考訳) クライアント側とサーバ側の両方でJavaScriptの人気と広く採用されているため、コード解析はこれまで以上に重要になっている。脆弱性分析、コーディング問題検出、型推論のアルゴリズムのほとんどは、基礎となるプログラムのコールグラフ表現に依存している。動的解析のいくつかの明らかな利点にもかかわらず、静的アルゴリズムは、プログラムの広範なテストベッドやコストのかかる実行とトレースを必要としないため、コールグラフの構築にも考慮すべきである。本稿では,npmコールグラフ,IBM WALA,Google Closure Compiler,Approximate Call Graph,Type Analyzer for JavaScriptツールによって実装された,26のWebKit SunSpiderベンチマークプログラムと6つの実世界のNode.jsモジュール上でJavaScriptコールグラフを構築するための,広く採用されている5つの静的アルゴリズムを体系的に比較する。結果の定量的,定性的な評価だけでなく,性能分析も提供する。その結果,アルゴリズム間のコールエッジの交点は比較的大きく,精度は100であることがわかった。しかし、ツールのほとんどは、他のすべてに見逃されたエッジを見つけました。 ACGはTAJSの直後に最も精度が高かったが,ACGの呼び出しエッジは有意に増加した。ツールの組み合わせに関して、ACGとTAJSは、すべてのアルゴリズムで見つかった真のエッジの99%をカバーし、精度は98%まで維持した。言語機能が不完全なため、最新のマルチファイルNode.jsモジュールを解析できたのは2つだけだった。彼らは約60%の呼び出しエッジに同意したが、それぞれが、もう一方が見逃した有効なエッジを見つけた。

The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100 precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.

翻訳日:2024-05-14 17:57:54 公開日:2024-05-12

# 等変QAOAとミキサーの重複

Equivariant QAOA and the Duel of the Mixers ( http://arxiv.org/abs/2405.07211v1 )

ライセンス: Link先を確認

Boris Tsvelikhovskiy, Ilya Safro, Yuri Alexeev,

(参考訳) 量子近似最適化アルゴリズム(QAOA)の最適混合器の構築は、組合せ最適化問題の解法におけるQAOAの性能向上に不可欠である。本稿では,古典的最適化問題対象の固有対称性と整合性を確保したQAOA調整ミキサーHamiltonianを構築するための体系的方法論を提案する。このアプローチの鍵となるのは、ヒルベルト空間の根底にあるQAOA 上の対称性群の作用に可換な作用素を同定し、有効混合ハミルトニアン函数に必須の技術的基準を満たすことである。様々な組合せ最適化問題でよく見られる対称群 $S_d$ に特化して構成法を提供する。必要な特性を厳密に検証し、具体的公式とそれに対応する量子回路を実装することにより、提案したミキサーハミルトニアンの生存性を確立する。さらに、古典ミキサー$B$は、グループ自体よりもはるかに小さな$S_d$の部分群でのみ可換であることを示し、提案手法の効率性を高める。提案手法の有効性を評価するため,異なるミキサーハミルトン多様体を用いた2つのQAOA変種の比較を行った。平均値の統計的に有意な差を観測し、新しい変種は複数の独立シミュレーションにおいて常に優れた性能を示す。さらに,本研究では,近年の文献的知見を裏付ける概念的説明として,代替のウォームスタート型QAOA変種における性能低下現象を分析した。

Constructing an optimal mixer for Quantum Approximate Optimization Algorithm (QAOA) Hamiltonian is crucial for enhancing the performance of QAOA in solving combinatorial optimization problems. We present a systematic methodology for constructing the QAOA tailored mixer Hamiltonian, ensuring alignment with the inherent symmetries of classical optimization problem objectives. The key to our approach is to identify an operator that commutes with the action of the group of symmetries on the QAOA underlying Hilbert space and meets the essential technical criteria for effective mixer Hamiltonian functionality. We offer a construction method specifically tailored to the symmetric group $S_d$, prevalent in a variety of combinatorial optimization problems. By rigorously validating the required properties, providing a concrete formula and corresponding quantum circuit for implementation, we establish the viability of the proposed mixer Hamiltonian. Furthermore, we demonstrate that the classical mixer $B$ commutes only with a subgroup of $S_d$ of significantly smaller order than the group itself, enhancing the efficiency of the proposed approach. To evaluate the effectiveness of our methodology, we compare two QAOA variants utilizing different mixer Hamiltonians: conventional $B=\sum X_i$ and the newly proposed $H_M$ in edge coloring and graph partitioning problems across various graphs. We observe statistically significant differences in mean values, with the new variant consistently demonstrating superior performance across multiple independent simulations. Additionally, we analyze the phenomenon of poor performance in alternative warm-start QAOA variants, providing a conceptual explanation supported by recent literature findings.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# LLM支援推論による最適化における決定処理の強化:ニューラルネットワークの視点から

Enhancing Decision-Making in Optimization through LLM-Assisted Inference: A Neural Networks Perspective ( http://arxiv.org/abs/2405.07212v1 )

ライセンス: Link先を確認

Gaurav Singh, Kavitesh Kumar Bali,

(参考訳) 本稿では,大規模多目的最適化分野におけるジェネレーティブAI(GenAI)と進化的アルゴリズム(EA)のシームレスな統合について検討する。大規模言語モデル(LLM)の変換的役割に着目し,LLM支援推論による意思決定プロセスの自動化と向上の可能性について検討した。具体的には、進化的に最適化されたソリューションにおいて、文脈的トレードオフを明確に表現しながら、重要な決定変数を照らし出す効果を強調した。複雑な多目的最適化ソリューションを大規模に見積もることに固有の課題に対処するために,我々のアプローチはLLMの適応性を強調し,曖昧な説明を提供し,さまざまな利害関係者の専門知識レベルとドメインの嗜好とを整合させる。実世界の意思決定シナリオにおける LLM-Assisted Inference の実践的適用性と影響について実証的研究を行った。

This paper explores the seamless integration of Generative AI (GenAI) and Evolutionary Algorithms (EAs) within the domain of large-scale multi-objective optimization. Focusing on the transformative role of Large Language Models (LLMs), our study investigates the potential of LLM-Assisted Inference to automate and enhance decision-making processes. Specifically, we highlight its effectiveness in illuminating key decision variables in evolutionarily optimized solutions while articulating contextual trade-offs. Tailored to address the challenges inherent in inferring complex multi-objective optimization solutions at scale, our approach emphasizes the adaptive nature of LLMs, allowing them to provide nuanced explanations and align their language with diverse stakeholder expertise levels and domain preferences. Empirical studies underscore the practical applicability and impact of LLM-Assisted Inference in real-world decision-making scenarios.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 脆弱性のあるJavaScript関数の予測における機械学習アルゴリズムの適応

Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions ( http://arxiv.org/abs/2405.07213v1 )

ライセンス: Link先を確認

Rudolf Ferenc, Péter Hegedűs, Péter Gyimesi, Gábor Antal, Dénes Bán, Tibor Gyimóthy,

(参考訳) サイバー犯罪活動の急速な増加と、それらによって脅かされるデバイスの増加は、ソフトウェアセキュリティの問題を目立たせている。攻撃の約90%が既知のセキュリティ問題を利用しており、脆弱なコンポーネントを見つけ、既存の緩和技術を適用している。本稿では,JavaScriptプログラムにおけるセキュリティ脆弱性の可能性のある関数の予測において,一般的なディープラーニングアルゴリズムを含む最先端の機械学習技術がどのように機能するかを検討する。私たちは8つの機械学習アルゴリズムを適用して、Node Security ProjectとSnykプラットフォームの公開データベースの脆弱性情報とGitHubからのコード修正パッチから、この研究のために構築された新しいデータセットを使用して、予測モデルを構築しました。静的なソースコードメトリクスを予測子として使用し、グリッド探索アルゴリズムを使って最高のパフォーマンスモデルを見つけました。また、データセットの不均衡性を扱うために、様々な再サンプリング戦略の効果についても検討した。最高性能のアルゴリズムはKNNであり、F値が0.96(精度0.91、リコール0.66)の脆弱性関数の予測モデルを作った。さらに, 深層学習, 樹木および林質分類器, SVMは0.70以上でF値と競合した。再サンプリング戦略ではF値の差は認められなかったが, 精度とリコールの分布は変化した。再サンプリングは精度の高いモデルを作るようには見えなかったが、再サンプリング戦略はIR対策のバランスを保った。

The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# ニューラルネットワークによる連続変数上の局所的独立性の発見について

On Discovery of Local Independence over Continuous Variables via Neural Contextual Decomposition ( http://arxiv.org/abs/2405.07220v1 )

ライセンス: Link先を確認

Inwoo Hwang, Yunhyeok Kwak, Yeon-Ji Song, Byoung-Tak Zhang, Sanghack Lee,

(参考訳) 条件付き独立は、興味のある変数間の因果関係を理解する手段を提供する。基礎となるシステムは、特に変数とその親の間のよりきめ細かい因果関係を示す可能性があり、これは局所的な独立関係と呼ばれる。最も広く研究されているローカルな関係の1つは、条件付き変数の特定の割り当てに係わるコンテキスト特化独立(CSI)である。しかし、連続変数を許可しないため、その適用性はしばしば制限される: 連続変数の特定の値に条件付けられたデータには、ほとんどインスタンスが含まれていないが、独立性をテストすることは不可能である。本研究では,親変数の協調代入に係わる局所的独立関係を定義・特徴化し,その関係を文脈集合依存独立(CSSI)と呼ぶ。次に、CSSIの標準表現を提供し、その基本特性を証明します。理論的な結果から,システム内の複数のCSSI関係を,連立結果空間の分割として発見する問題を提起した。最後に,各セットに条件分布をモデル化してCSSIを誘導することにより,その分割を学習するニューラルコンテクスト分解(NCD)を提案する。提案手法は,実世界の物理力学を反映した合成データセットと複雑なシステムの両方において,真相の局所的独立関係の発見に成功していることを実証的に実証した。

Conditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specific assignment of conditioned variables. However, its applicability is often limited since it does not allow continuous variables: data conditioned to the specific value of a continuous variable contains few instances, if not none, making it infeasible to test independence. In this work, we define and characterize the local independence relationship that holds in a specific set of joint assignments of parental variables, which we call context-set specific independence (CSSI). We then provide a canonical representation of CSSI and prove its fundamental properties. Based on our theoretical findings, we cast the problem of discovering multiple CSSI relationships in a system as finding a partition of the joint outcome space. Finally, we propose a novel method, coined neural contextual decomposition (NCD), which learns such partition by imposing each set to induce CSSI via modeling a conditional distribution. We empirically demonstrate that the proposed method successfully discovers the ground truth local independence relationships in both synthetic dataset and complex system reflecting the real-world physical dynamics.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 量子並列性とは何でしょう?

What is Quantum Parallelism, Anyhow? ( http://arxiv.org/abs/2405.07222v1 )

ライセンス: Link先を確認

Stefano Markidis,

(参考訳) 量子コンピューティングのパワーの中心は量子並列性の概念であり、量子システムは複数の計算経路を同時に探索して処理することができる。本稿では,従来の並列計算モデルと並列処理を併用し,その基本特性とアルゴリズム性能への影響を解明する量子並列処理の解明について論じる。まず、量子状態の重ね合わせから生じる量子並列性を定義し、複数の計算経路を並列に探索できるようにする。量子並列性の定量化と可視化のために,量子アルゴリズムとその並列実行経路のグラフィカルな表現を提供する量子データフロー図(quantum Dataflow diagrams)の概念を導入する。本稿では、量子データフロー図を用いて量子フーリエ変換(QFT)や振幅増幅(AA)といった量子アルゴリズムを解析することにより、量子並列性を計測し評価する方法を実証する。さらに、Amdahl法やGustafson法則を含む古典並列法則と量子並列法則の相互作用について検討する。これらの法則は、もともと古典的な並列計算システムのために定式化されたものであるが、量子コンピューティング領域におけるそれらの適用性を再考する。古典的並列化法則は価値ある洞察を与えるが、量子並列化の独特な性質や古典的量子I/Oの本質的な制限などにより、量子コンピューティングへの直接的な応用は限定的であると論じる。我々の分析は、量子並列性に対する理解を深めることの必要性と、アルゴリズムの設計と性能に与える影響を強調している。

Central to the power of quantum computing is the concept of quantum parallelism: quantum systems can explore and process multiple computational paths simultaneously. In this paper, we discuss the elusive nature of quantum parallelism, drawing parallels with classical parallel computing models to elucidate its fundamental characteristics and implications for algorithmic performance. We begin by defining quantum parallelism as arising from the superposition of quantum states, allowing for the exploration of multiple computational paths in parallel. To quantify and visualize quantum parallelism, we introduce the concept of quantum dataflow diagrams, which provide a graphical representation of quantum algorithms and their parallel execution paths. We demonstrate how quantum parallelism can be measured and assessed by analyzing quantum algorithms such as the Quantum Fourier Transform (QFT) and Amplitude Amplification (AA) iterations using quantum dataflow diagrams. Furthermore, we examine the interplay between quantum parallelism and classical parallelism laws, including Amdahl's and Gustafson's laws. While these laws were originally formulated for classical parallel computing systems, we reconsider their applicability in the quantum computing domain. We argue that while classical parallelism laws offer valuable insights, their direct application to quantum computing is limited due to the unique characteristics of quantum parallelism, including the role of destructive interference and the inherent limitations of classical-quantum I/O. Our analysis highlights the need for an increased understanding of quantum parallelism and its implications for algorithm design and performance.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# オフライン-オンライン強化学習におけるタスク一般化のためのエンサンブル継承表現

Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning ( http://arxiv.org/abs/2405.07223v1 )

ライセンス: Link先を確認

Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang,

(参考訳) 強化学習(Reinforcement Learning, RL)では、探索が困難であるため、オンライン体験をゼロからトレーニングすることは非効率である。最近、オフラインRLは、オンラインインタラクションによって洗練される初期化オフラインポリシーを提供することで、有望なソリューションを提供する。しかし,既存の手法では,オフラインからオンラインへの適応におけるタスク一般化問題を考慮せずに,オフラインとオンラインの学習を同一タスクで行う。実世界のアプリケーションでは、特定のタスクからのオフラインデータセットしか持たず、複数のタスクに対する高速なオンライン適応を目指していないことが一般的である。この問題に対処するため、オンラインRLにおけるタスク一般化のための後継表現の調査を基盤として、オフライン-オンライン学習を組み込むためのフレームワークを拡張した。提案手法は,オンラインの微調整によりオフラインデータを効果的に活用することができず,新たなタスクの性能向上を図っている。これを軽減するために、オフラインデータを利用して後続表現のアンサンブルを取得し、その後にアンサンブルQ関数を構成する新しい手法を提案する。このアプローチは、異なるカバレッジを持つデータセットからの堅牢な表現学習を可能にし、オンラインの微調整フェーズにおいて、Q関数の新たなタスクへの迅速な適応を容易にする。広範囲にわたる経験的評価は,本手法の多様さ,さらには見当たらない課題に一般化する上で,優れた性能を示す説得力のある証拠となる。

In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 有限ゲームの幾何学的分解:非回帰学習における収束対再発

A geometric decomposition of finite games: Convergence vs. recurrence under no-regret learning ( http://arxiv.org/abs/2405.07224v1 )

ライセンス: Link先を確認

Davide Legacci, Panayotis Mertikopoulos, Bary Pradelski,

(参考訳) ゲームにおける非回帰学習のダイナミクスの複雑さを考慮すると、有限ゲームは、ダイナミクスの日々の振る舞いがよく理解されている単純な構成要素に分解される。これに対する自然な出発点としてヘルムホルツの定理があり、ベクトル場をポテンシャルと非圧縮成分に分解する。しかし、非回帰力学の幾何学、特に指数的/乗法的重み(EW)スキームの力学はヘルムホルツの定理のユークリッド基底と互換性がないため、シャーシャハニ計量に基づくリーマンの枠組みを考えることができる。第一に、容積保存に加えて、非圧縮ゲームにおける連続時間EWダイナミクスは運動の定数を許容し、ポアンカー'e が再帰する - すなわち、ほぼすべてのプレイの軌道は、その始点に無限に近くなる。第二に、よく知られたゲームの分解と(プレイヤーの目的がそれぞれ整列し、反整列している)ポテンシャルと調和成分との深い関係を確立する: ゲームが非圧縮的であることと、それが調和である場合に限り、EWダイナミクスがポインカーの繰り返しを調和ゲームで導くことを暗示する。

In view of the complexity of the dynamics of no-regret learning in games, we seek to decompose a finite game into simpler components where the day-to-day behavior of the dynamics is well understood. A natural starting point for this is Helmholtz's theorem, which resolves a vector field into a potential and an incompressible component. However, the geometry of no-regret dynamics - and, in particular, the dynamics of exponential / multiplicative weights (EW) schemes - is not compatible with the Euclidean underpinnings of Helmholtz's theorem, leading us to consider a Riemannian framework based on the Shahshahani metric. Using this geometric construction, we introduce the class of incompressible games, and we prove the following results: First, in addition to being volume-preserving, the continuous-time EW dynamics in incompressible games admit a constant of motion and are Poincar\'e recurrent - i.e., almost every trajectory of play comes arbitrarily close to its starting point infinitely often. Second, we establish a deep connection with a well-known decomposition of games into a potential and harmonic component (where the players' objectives are aligned and anti-aligned respectively): a game is incompressible if and only if it is harmonic, implying in turn that the EW dynamics lead to Poincar\'e recurrence in harmonic games.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# ノーランチ理論のレンズによる古典的および量子的学習プロトコルの分離パワー

Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem ( http://arxiv.org/abs/2405.07226v1 )

ライセンス: Link先を確認

Xinbiao Wang, Yuxuan Du, Kecheng Liu, Yong Luo, Bo Du, Dacheng Tao,

(参考訳) No-Free-Lunch (NFL)定理は、最適化プロセスにかかわらず問題とデータに依存しない一般化誤差を定量化する定理であり、多様な学習プロトコルの可能性を理解するための基礎的な枠組みを提供する。その重要性にも拘わらず、量子機械学習モデルに対するNFL定理の確立は、量子学習プロトコルと古典学習プロトコルの基本的な関係に関するより広範な洞察を見越して、ほとんど未解明のままである。このギャップに対処するため、様々な量子学習アルゴリズムを、特定の観測可能条件下で量子力学を学習するための3つの学習プロトコルに分類し、NFLの定理を確立する。 Classical Learning Protocols (CLC-LPs)、Restricted Quantum Learning Protocols (ReQu-LPs)、Quantum Learning Protocols (Quantum Learning Protocols (Qu-LPs)は、様々なレベルの量子リソースへのアクセスを提供するプロトコルである。得られたNFLの定理は,CLC-LP,ReQu-LP,Qu-LPの2次的複雑性の減少を示し,量子状態の直交性と可観測物の対角性に基づく。この性能差は、量子力学に固有の特異な物理的特徴である非直交量子状態のグローバル位相に関する情報を間接的に利用するために、量子関連学習プロトコルのユニークな能力に起因している。我々の発見は、量子学習プロトコルの能力の理解を深めるだけでなく、高度な量子学習アルゴリズムの開発のための実践的な洞察も提供する。

The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights into the fundamental relationship between quantum and classical learning protocols. To address this gap, we categorize a diverse array of quantum learning algorithms into three learning protocols designed for learning quantum dynamics under a specified observable and establish their NFL theorem. The exploited protocols, namely Classical Learning Protocols (CLC-LPs), Restricted Quantum Learning Protocols (ReQu-LPs), and Quantum Learning Protocols (Qu-LPs), offer varying levels of access to quantum resources. Our derived NFL theorems demonstrate quadratic reductions in sample complexity across CLC-LPs, ReQu-LPs, and Qu-LPs, contingent upon the orthogonality of quantum states and the diagonality of observables. We attribute this performance discrepancy to the unique capacity of quantum-related learning protocols to indirectly utilize information concerning the global phases of non-orthogonal quantum states, a distinctive physical feature inherent in quantum mechanics. Our findings not only deepen our understanding of quantum learning protocols' capabilities but also provide practical insights for the development of advanced quantum learning algorithms.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 自然の物理的前提下における量子通信の情報容量

Information capacity of quantum communication under natural physical assumptions ( http://arxiv.org/abs/2405.07231v1 )

ライセンス: Link先を確認

Jef Pauwels, Stefano Pironio, Armin Tavakoli,

(参考訳) 量子準備と測定のシナリオは、放出された状態に関する様々な物理的仮定の下で研究されている。ここでは、まず、異なる仮定が概念的および形式的にどのように関連しているかについて議論する。次に、状態アンサンブルのワンショットアクセス可能な情報に対する制限に対応する、他のすべての緩和に役立つものを特定します。このことは、これらの様々な物理的仮定の対象となるソースの最適状態判別確率を研究する動機となる。量子次元、真空成分、任意の一様重なり合い、高次元信号の大きさ、実験者のデバイスに対する信頼度によって制限された状態に対して、一般および厳密な境界を導出する。この結果は、半デバイス非依存の量子情報処理のより統一された図への第一歩となる。

The quantum prepare-and-measure scenario has been studied under various physical assumptions on the emitted states. Here, we first discuss how different assumptions are conceptually and formally related. We then identify one that can serve as a relaxation of all others, corresponding to a limitation on the one-shot accessible information of the state ensemble. This motivates us to study the optimal state discrimination probability of a source subject to these various physical assumptions. We derive general and tight bounds for states restricted by their quantum dimension, their vacuum component, an arbitrary uniform overlap, the magnitude of higher-dimensional signals and the experimenter's trust in their device. Our results constitute a first step towards a more unified picture of semi-device-independent quantum information processing.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# A Flowはパケットのストリーム: DDoS検出のためのストリーム構造化データアプローチ

A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection ( http://arxiv.org/abs/2405.07232v1 )

ライセンス: Link先を確認

Raja Giryes, Lior Shafir, Avishai Wool,

(参考訳) DDoS(Distributed Denial of Service)攻撃はインターネットにますます有害になり、スローダウンの兆候はない。 DDoS攻撃を阻止するための正確な検出メカニズムの開発は、これらの攻撃の多様さと新たな攻撃ベクトルの出現により、依然として大きな課題である。本稿では,集約フロー統計を含む従来の固定サイズレコード構造ではなく,フローをストリーム構造として操作する新しいツリーベースDDoS検出手法を提案する。集約フロー記録は,過去10年間で人気を博し,総トラフィック量のごく一部を検査することで,フローベースの侵入検知に有効な手段を提供しているが,本質的に制限されている。検出精度はパケットペイロードの欠如だけでなく,パケット順序や時間的関係といった粒度の細かいパケット間関係をモデル化できない構造によって制限される。さらに、集約フロー統計の推測は、完全なフローが終わるまで待たなければならない。ここでは、フロー入力を、関連するパケットヘッダーからなる可変長ストリームとして考えることにより、悪意のあるフローを極めて正確かつ高速に検出できることを示す。我々は、CICDDoS2019およびCICIDS2017データセットに対して、包括的なDDoS攻撃を含む提案した戦略を評価する。我々のアプローチは、最先端のディープラーニング手法を含む既存の機械学習技術の精度と一致しているか、あるいは超えている。さらに,最初の2つのパケットをベースとしたCICDDoS2019検出では,99.79%,トラフィック量の4-6%しか使用していない。

Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# OXYGENERATOR: 深層学習による1世紀にわたる海洋脱酸素の再構築

OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning ( http://arxiv.org/abs/2405.07233v1 )

ライセンス: Link先を確認

Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang,

(参考訳) 海洋生態系の評価と保護には,1世紀にわたる海洋の脱酸素の正確な再構築が不可欠である。既存の専門家が支配する数値シミュレーションは、地球温暖化や人的活動によって引き起こされる動的変動に追いつかなかった。さらに、高コストのデータ収集のため、歴史的観測は極めて少ないため、正確な復元には大きな課題が伴う。そこで本研究では,1920年から2023年にかけての海洋の脱酸素を再構築するための,最初の深層学習モデルであるOxyGeneratorを提案する。具体的には、大規模な時間的・空間的スケールでの不均一性に対処するため、欠落した値とスパース観測の間の複雑な海洋学的相関を捉えるために、ゾン化変化グラフメッセージパッシングを提案する。さらに,不確かさを校正するために,溶存酸素(DO)の変動と化学効果から誘導バイアスを取り入れた。その場でのDO観測と比較すると、OxyGeneratorはCMIP6数値シミュレーションを著しく上回り、MAPEを38.77%削減し、データ駆動方式で「ブレスレス・オーシャン」を理解する有望な可能性を示している。

Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# コンセプタを用いたリカレントニューラルネットワークの適応制御

Adaptive control of recurrent neural networks using conceptors ( http://arxiv.org/abs/2405.07236v1 )

ライセンス: Link先を確認

Guillaume Pourcel, Mirko Goldmann, Ingo Fischer, Miguel C. Soriano,

(参考訳) リカレントニューラルネットワークは複雑な高次元の時間パターンの予測と生成に優れる。本質的に非線形ダイナミクスとメモリのため、データから非有界時間依存を学習することができる。機械学習の設定では、ネットワークのパラメータはトレーニングフェーズ中に適応され、与えられたタスク/プロブレムの要求に適合して計算能力が向上する。トレーニング後、学習した計算を利用するために、ネットワークパラメータは固定される。これにより、静的パラメータは、外部または内部の摂動のような変化条件に適応しないネットワークをレンダリングする。本論文では,トレーニング後のネットワークの適応性維持が,その機能と堅牢性を高めることを実証する。本稿では,ネットワークの動作を連続的に解析し,その時間変化した内部表現を所望の目標に従うように調整する適応制御ループを概念化する。本稿では、時間的パターンの補間、部分的ネットワーク劣化に対する安定化、入力歪みに対する堅牢性という3つのタスクにおいて、ネットワークの適応性が計算機能をどのようにサポートするかを示す。我々の研究結果は、機械学習における適応型ネットワークの可能性を強調し、複雑なパターンを学習するだけでなく、環境の変化に合わせて動的に調整し、最終的に適用範囲を広げることを可能にする。

Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters thereby render the network unadaptive to changing conditions, such as external or internal perturbation. In this manuscript, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network's behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 多細胞系における新規性・複雑性・適応の事例研究

Case Study of Novelty, Complexity, and Adaptation in a Multicellular System ( http://arxiv.org/abs/2405.07241v1 )

ライセンス: Link先を確認

Matthew Andres Moreno, Santiago Rodriguez Papa, Charles Ofria,

(参考訳) 新規性、複雑さ、適応の継続的な生成は、オープンエンド進化の中核的な側面として確立されている。しかし、これらの現象がどの程度結合し、どのような意味によって相互作用するかは、まだ確定していない。本研究では,デジタル多細胞性の進化を研究するために設計されたdisHTINYシミュレーションシステムを用いて,新規性,複雑性,適応の共進化を事例として追跡する。本症例では, 定性的に異なる10種類の多細胞形態を記述し, そのうちのいくつかは非対称な成長と異なる生活段階を示す。我々は、これらの形態学の進化史を複雑さと適応度の測定で文脈化する。我々のケーススタディは、新奇性、複雑さ、適応の間に緩やかな(時には相違する)関係が存在することを示唆している。

Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to study the evolution of digital multicellularity. In this case study, we describe ten qualitatively distinct multicellular morphologies, several of which exhibit asymmetrical growth and distinct life stages. We contextualize the evolutionary history of these morphologies with measurements of complexity and adaptation. Our case study suggests a loose -- sometimes divergent -- relationship can exist among novelty, complexity, and adaptation.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# フォールトトレラント量子LDPCエンコーダ

Fault-Tolerant Quantum LDPC Encoders ( http://arxiv.org/abs/2405.07242v1 )

ライセンス: Link先を確認

Abhi Kumar Sharma, Shayan Srinivasa Garan,

(参考訳) 量子低密度パリティチェック(LDPC)符号に対するフォールトトレラントエンコーダを提案する。連続ブロック上の量子コード内に量子ビットをグループ化し、これらのブロックに事前共有の絡み合わせを適用することにより、超越的な実装を実現する方法を示す。提案するエンコーダは、マルチキュービットゲートを用いてエラー伝搬を低減し、絡み付き無支援および絡み付き量子LDPC符号の両方に適用できる。

We propose fault-tolerant encoders for quantum low-density parity check (LDPC) codes. By grouping qubits within a quantum code over contiguous blocks and applying preshared entanglement across these blocks, we show how transversal implementation can be realized. The proposed encoder reduces the error propagation while using multi-qubit gates and is applicable for both entanglement-unassisted and entanglement-assisted quantum LDPC codes.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# ハイブリッドコールグラフベースの呼び出しメトリックを用いたJavaScriptプログラムにおけるバグ予測の強化

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics ( http://arxiv.org/abs/2405.07244v1 )

ライセンス: Link先を確認

Gábor Antal, Zoltán Tóth, Péter Hegedűs, Rudolf Ferenc,

(参考訳) バグ予測は、欠陥を含む可能性のあるソフトウェアシステム内のソースコード要素を見つけることを目的としている。プログラムの最もエラーを起こしやすい部分を認識して、限られた量のテストとコードレビューリソースを効率的に割り当てることができる。したがって、バグ予測はソフトウェアの保守と進化をかなり支援できる。本稿では,関数呼び出しの入出力数(HNII,HNOI)のハイブリッド(静的および動的)コード解析を用いた静的ソースコードメトリクスに基づく関数レベルのJavaScriptバグ予測モデルを提案する。これに対する私たちのモチベーションは、JavaScriptが静的コード解析が非常に不正確であるかもしれない非常に動的なスクリプト言語であることです。 ESLint JavaScriptプロジェクトで公開されているBugsJSデータセットから824のバグギーと1943の非バグ関数を抽出した結果から、MLモデルの予測性能に対するハイブリッドコードメトリクスの肯定的な影響を確認することができる。 MLアルゴリズム、適用されたハイパーパラメータ、および我々が考慮する目標尺度により、ハイブリッド呼び出しメトリクスはモデル性能(精度、リコール、F測定)を2-10%向上させる。興味深いことに、静的NOIとNIIメトリクスをハイブリッドなHNOIとHNIIに置き換えることで、モデルのパフォーマンスが向上する。

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2-10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

翻訳日:2024-05-14 17:47:28 公開日:2024-05-12

# 生態・空間構造・選択圧力は系統構造に強いシグナルをもたらす

Ecology, Spatial Structure, and Selection Pressure Induce Strong Signatures in Phylogenetic Structure ( http://arxiv.org/abs/2405.07245v1 )

ライセンス: Link先を確認

Matthew Andres Moreno, Santiago Rodriguez-Papa, Emily Dolson,

(参考訳) 進化力学は、空間構造、生態学、選択圧力を含む、様々な基本的で汎用的なドライバによって形成される。これらのドライバは進化の軌跡に影響を与え、系統構造に影響を与えると仮定されている。そこで我々は,(1) 空間構造, 生態学, 選択圧力が系統構造に検出可能なシグネチャを残しているかどうか, (2) 空間構造の存在下で生態が検出・識別できる程度, (3) 進化系全体にわたってこれらのシグネチャが一般化する程度について検討した。そこで我々は, 空間構造, 生態, 選択圧の操作によって発生する系統を, 様々な範囲と高度の3種類の計算モデルで解析する。選択圧力,空間構造,生態は系統学的指標に特徴的な影響を与えるが,これらの影響は複雑で直感的とは限らない。シグナチャは、等価な分類単位の定義(例えば、個人、遺伝子型、種)を使用するとき、システム間で一定の一貫性を持つ。さらに,空間構造の存在下では,十分に強い生態学が検出できることがわかった。また、低分解能の系統的再構成はいくつかの系統学的指標に偏りがあるが、高分解能の再構成はそれらを忠実に再カプセル化する。本研究は, 系統解析による空間構造, 生態, 選択圧の進化的推測の可能性を示すものであるが, 両者の系統的特徴を識別し, 系統的指標を適切に正規化するためには, さらなる手法の開発が必要である。このような研究により、系統解析は大規模に進化する個体群を研究するための汎用的なツールキットを提供することができる。

Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) the extent, in particular, to which ecology can be detected and discerned in the presence of spatial structure, and (3) the extent to which these phylogenetic signatures generalize across evolutionary systems. To this end, we analyze phylogenies generated by manipulating spatial structure, ecology, and selection pressure within three computational models of varied scope and sophistication. We find that selection pressure, spatial structure, and ecology have characteristic effects on phylogenetic metrics, although these effects are complex and not always intuitive. Signatures have some consistency across systems when using equivalent taxonomic unit definitions (e.g., individual, genotype, species). Further, we find that sufficiently strong ecology can be detected in the presence of spatial structure. We also find that, while low-resolution phylogenetic reconstructions can bias some phylogenetic metrics, high-resolution reconstructions recapitulate them faithfully. Although our results suggest potential for evolutionary inference of spatial structure, ecology, and selection pressure through phylogenetic analysis, further methods development is needed to distinguish these drivers' phylometric signatures from each other and to appropriately normalize phylogenetic metrics. With such work, phylogenetic analysis could provide a versatile toolkit to study large-scale evolving populations.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 連続可変量子プロセスのためのZXグラフ計算

ZX Graphical Calculus for Continuous-Variable Quantum Processes ( http://arxiv.org/abs/2405.07246v1 )

ライセンス: Link先を確認

Hironari Nagayoshi, Warit Asavanant, Ryuhoh Ide, Kosuke Fukui, Atsushi Sakaguchi, Jun-ichi Yoshikawa, Nicolas C. Menicucci, Akira Furusawa,

(参考訳) 連続可変(CV)量子情報処理は大規模フォールトトレラント量子計算の候補となる。しかし、CV量子過程の解析は、主にハイゼンベルク図における作用素の進化の直接計算に依存しており、CV空間の特徴は直感的に研究されていない。 CV量子コンピューティングのさらなる探索の鍵となる要素は、視覚的直観と分析のための新しいツールをもたらす計算モデルの構築である。本稿では、任意のCV量子過程を単純な有向グラフとして表現できるZX~calculusと呼ばれる量子ビット系の類似モデルに着想を得たグラフィカル・コンピューティング・モデルについて検討する。本稿では,2つの異なる量子プロセス間の等価性が,ある場合において図形変換のシーケンスとしてどのように証明できるかを示すことによって,直感的にCVプロセスを理解するためのグラフィカルツールとしての我々のモデルの有用性を実証する。また、計測に基づく量子コンピューティング、ガウスおよび非ガウス過程のキャラクタリゼーション、回路最適化などのモデルの適用可能性についても検討する。

Continuous-variable (CV) quantum information processing is a promising candidate for large-scale fault-tolerant quantum computation. However, analysis of CV quantum process relies mostly on direct computation of the evolution of operators in the Heisenberg picture, and the features of CV space has yet to be thoroughly investigated in an intuitive manner. One key ingredient for further exploration of CV quantum computing is the construction of a computational model that brings visual intuition and new tools for analysis. In this paper, we delve into a graphical computational model, inspired by a similar model for qubit-based systems called the ZX~calculus, that enables the representation of arbitrary CV quantum process as a simple directed graph. We demonstrate the utility of our model as a graphical tool to comprehend CV processes intuitively by showing how equivalences between two distinct quantum processes can be proven as a sequence of diagrammatic transformations in certain cases. We also examine possible applications of our model, such as measurement-based quantum computing, characterization of Gaussian and non-Gaussian processes, and circuit optimization.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# ヒト心理行動のシミュレーションにおけるLDMの限られた能力:心理学的分析

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis ( http://arxiv.org/abs/2405.07248v1 )

ライセンス: Link先を確認

Nikolay B Petrov, Gregory Serapio-García, Jason Rentfrow,

(参考訳) 大規模言語モデル(LLM)の人間的な反応は、社会科学者に、実験、世論調査、調査において人間の参加者をシミュレートするためにLLMを使用できるかどうかを調査するよう促している。この研究の中心的な関心は、標準化されたアンケートに答えるよう促すことによって、LCMの心理的プロファイルをマッピングすることである。この研究の矛盾する発見は、LCMのテキスト応答から質問への基礎的、あるいは潜伏的な特徴のマッピングは容易な作業ではないことを考えると、驚くにあたらない。これを解決するために、心理測定学(サイコメトリックス)を用いる。本研究では,OpenAI のフラッグシップモデルである GPT-3.5 と GPT-4 に対して,異なるペルソナを仮定し,パーソナ構成の標準化された範囲に対応するよう促す。我々は、ジェネリック(4人か5人のランダムな人格記述)と特定の(主に大規模な人間のデータセットの実際の人間の人口統計)の2種類のペルソナ記述を使用しました。 GPT-4の反応は, GPT-3.5ではなく, GPT-3.5ではなく, GPT-3.5の反応は, 完全ではなく, 人間の規範と類似した, 有望な心理指標特性を示すが, 特定の人口統計学的プロファイルを用いた場合, 両者のデータは, 心理指標特性が劣ることを示している。現在、LLMがシリコンペルソナをシミュレートするよう求められている場合、それらの応答は潜在的な潜在特性の弱い信号である、と結論付けている。したがって、本研究は、複数の質問応答タスクにまたがる個人レベルの人間の振る舞いをシミュレートするLLMの能力に疑問を投げかけている。

The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# ミスセグメンテーション設定下でのユニバーサルバッチ学習

Universal Batch Learning Under The Misspecification Setting ( http://arxiv.org/abs/2405.07252v1 )

ライセンス: Link先を確認

Shlomi Vituri, Meir Feder,

(参考訳) 本稿では,ログロスを伴う不特定設定における普遍的バッチ学習の問題点について考察する。この設定では、仮説クラスはモデルの集合 $\Theta$ である。しかし、データは、この集合に属さないが、より大きなモデルの集合である$\Phi \supset \Theta$から生成される未知の分布によって生成される。トレーニングサンプルが与えられた場合、ユニバーサル学習者が次の結果の確率分布を予測するように要求され、ログロスが発生する。ユニバーサルラーナーのパフォーマンスは、$\Theta$から選択されたデータにマッチする最良の仮説に対する後悔によって測定される。ミニマックス定理と情報理論ツールを用いて、データ生成分布の集合上の混合である最適普遍学習者を導出し、min-max後悔の閉形式式を得る。我々は,この後悔を,データとその生成分布の条件付き容量の制約版と考えることができることを示す。この問題の複雑さは仮説モデルの豊かさによって支配され、データ生成分布セットの$\Phi$には支配されないことを暗示する。本研究では,有本・ブラフトアルゴリズムを拡張して,先行分布における後悔と能力の数値評価を行う。仮定クラス $\Theta$ はこの分布の族の部分集合に過ぎず、観測が $K$-parameters の多重項分布から来る場合の結果を実証する。

In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 半教師型医用画像分割のための固定・動的擬似ラベルの活用

Leveraging Fixed and Dynamic Pseudo-labels for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2405.07256v1 )

ライセンス: Link先を確認

Suruchi Kumari, Pravendra Singh,

(参考訳) 半監督型医用画像セグメンテーションは、注釈のないデータを利用する能力によって、関心が高まりつつある。現在の最先端の手法は、主にコトレーニングフレームワーク内の擬似ラベルに依存している。これらの手法はトレーニングのために1つの擬似ラベルに依存するが、ラベル付きデータの基本真実ほど正確ではない。 1つの擬似ラベルのみを頼りにすると、しばしば準最適結果をもたらす。そこで本研究では,従来の固定擬似ラベルと,新たに導入された動的擬似ラベルという,同一の無注釈画像のための複数の擬似ラベルを用いて,未表示データから学習する手法を提案する。同一の未注釈画像に対して複数の擬似ラベルをコトレーニングフレームワークに組み込むことで、モデル性能と一般化機能を改善するためのより堅牢なトレーニングアプローチを提供する。我々は,半教師付き医療ベンチマークセグメンテーションデータセット,左アトリウムデータセット,パンクレアCTデータセット,ブラッツ2019データセットの3つの新しいアプローチを検証する。提案手法は, ラベル付きデータ比の異なる複数のベンチマークセグメンテーションデータセットに対して, 最先端の手法を著しく上回っている。また, 本手法における各種成分の有効性を示すために, いくつかのアブレーション実験を行った。

Semi-supervised medical image segmentation has gained growing interest due to its ability to utilize unannotated data. The current state-of-the-art methods mostly rely on pseudo-labeling within a co-training framework. These methods depend on a single pseudo-label for training, but these labels are not as accurate as the ground truth of labeled data. Relying solely on one pseudo-label often results in suboptimal results. To this end, we propose a novel approach where multiple pseudo-labels for the same unannotated image are used to learn from the unlabeled data: the conventional fixed pseudo-label and the newly introduced dynamic pseudo-label. By incorporating multiple pseudo-labels for the same unannotated image into the co-training framework, our approach provides a more robust training approach that improves model performance and generalization capabilities. We validate our novel approach on three semi-supervised medical benchmark segmentation datasets, the Left Atrium dataset, the Pancreas-CT dataset, and the Brats-2019 dataset. Our approach significantly outperforms state-of-the-art methods over multiple medical benchmark segmentation datasets with different labeled data ratios. We also present several ablation experiments to demonstrate the effectiveness of various components used in our approach.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 聞き, 遠方, 制御:制御可能な音声駆動音声ヘッド生成

Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation ( http://arxiv.org/abs/2405.07257v1 )

ライセンス: Link先を確認

Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu,

(参考訳) 発話顔生成に関する最も初期の研究は、唇の動きと音声内容の同期に焦点を当てている。しかしながら、人間の頭部のポーズと顔の感情は、自然の人間の顔の同様に重要な特徴である。音声による発話顔生成は顕著な進歩を見せているが、既存の方法は顔の感情を見落としているか、特定の個人に限られており、任意の対象に適用できない。本稿では、感情的・姿勢的制御を可能にして、一般のトーキング・フェイス・ジェネレーションと区別するワンショットトーキング・ヘッド・ジェネレーション・フレームワーク(SPEAK)を提案する。具体的には、人間の顔の特徴を3つの潜在空間に分離するIRFD(Inter-Reconstructed Feature Disentanglement)手法を提案する。次に、音声コンテンツと顔の潜時符号を1つの潜時空間に修正する顔編集モジュールを設計する。次に、編集モジュールから派生した修正潜在コードを用いて、表情の合成における感情表現、頭部ポーズ、音声内容の制御を行う新しい生成器を提案する。本手法は, 唇の動き, 顔の表情, スムーズな頭部の動きを調整して, リアルな話し声を生成できることを, 広範囲にわたる試行錯誤により実証した。デモビデオは匿名リンクで公開されている。 https://anonymous.4open.science/r/SPEAK-F56E

Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce the Inter-Reconstructed Feature Disentanglement (IRFD) method to decouple human facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available at the anonymous link: https://anonymous.4open.science/r/SPEAK-F56E

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 適応型シンドローム同定を用いた記憶補正量子リピータ

Memory-corrected quantum repeaters with adaptive syndrome identification ( http://arxiv.org/abs/2405.07258v1 )

ライセンス: Link先を確認

Alena Romanova, Peter van Loock,

(参考訳) 符号化された量子メモリを、小型・中規模量子リピータの秘密鍵レート分析に組み込むという課題に対処する。この目的のために、チェック行列モデルを導入し、最大11キュービットの安定化符号のパウリ雑音に対するレジリエンスを定量化し、効率的な論理的誤差確率に関する解析式を得る。一般に、5量子ビット符号とステアン符号は、実験的に関連するパラメータ規則においてより複雑で大きな符号より優れているか、リソースオーバーヘッドが低いかが分かる。その後、メモリ量子ビット上の5量子ビットまたはステアン符号を使用する場合、メモリ補正量子リピータにおける漸近秘密鍵レートの低い境界を計算するために、本結果を適用した。 5量子ビット符号は、有効メモリコヒーレンス時間を劇的に増加させ、量子ノイズチャネルに適応したエラーシンドローム識別を使用する場合、位相フリップ確率を1\%$から0.001\%$に下げる。さらに、不良ベル状態測定と不完全な状態準備の影響を軽減し、非ゼロシークレットキーレートの最小限の非偏極パラメータを9,8.4\%から9,6.4\%に下げる。その結果、メモリ補正された量子リピータは、未符号化のリピータが秘密鍵を生成できない実験パラメータ方式で秘密鍵をしばしば生成できる。 8セグメントのリピータでは、メモリコヒーレンス時間$t_c = 10$ s以下の距離まで、多重化を使って、非消滅秘密鍵レートを2000kmまで達成することができる。ゼロ距離リンク結合効率$p_0 = 0.7$, 脱分極パラメータ$\mu = 0.99$, $t_c = 10$ s, 800 kmの総リピータ長を仮定すると、秘密鍵レートは4.85Hzとなり、1.25Hzの未符号化リピータと1.71Hzの理想的ツインフィールド量子鍵分布の両方をGHzクロックレートで打ち破る。

We address the challenge of incorporating encoded quantum memories into an exact secret key rate analysis for small and intermediate-scale quantum repeaters. To this end, we introduce the check matrix model and quantify the resilience of stabilizer codes of up to eleven qubits against Pauli noise, obtaining analytical expressions for effective logical error probabilities. Generally, we find that the five-qubit and Steane codes either outperform more complex, larger codes in the experimentally relevant parameter regimes or have a lower resource overhead. Subsequently, we apply our results to calculate lower bounds on the asymptotic secret key rate in memory-corrected quantum repeaters when using the five-qubit or Steane codes on the memory qubits. The five-qubit code drastically increases the effective memory coherence time, reducing a phase flip probability of $1\%$ to $0.001\%$ when employing an error syndrome identification adapted to the quantum noise channel. Furthermore, it mitigates the impact of faulty Bell state measurements and imperfect state preparation, lowering the minimally required depolarization parameter for non-zero secret key rates in an eight-segment repeater from $98.4\%$ to $96.4\%$. As a result, the memory-corrected quantum repeater can often generate secret keys in experimental parameter regimes where the unencoded repeater fails to produce a secret key. In an eight-segment repeater, one can even achieve non-vanishing secret key rates up to distances of 2000 km for memory coherence times of $t_c = 10$ s or less using multiplexing. Assuming a zero-distance link-coupling efficiency $p_0 = 0.7$, a depolarization parameter $\mu = 0.99$, $t_c = 10$ s, and an 800 km total repeater length, we obtain a secret key rate of 4.85 Hz, beating both the unencoded repeater that provides 1.25 Hz and ideal twin-field quantum key distribution with 0.71 Hz at GHz clock rates.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 脳波を用いた感情認識のためのマルチグラニュラリティコントラスト学習フレームワークの改訂

A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition ( http://arxiv.org/abs/2405.07260v1 )

ライセンス: Link先を確認

Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu,

(参考訳) 本研究では,脳波に基づく感情認識(SICLEER)のための新しい情報強調型コントラスト学習フレームワークを提案する。 SI-CLEERは、マルチグラニュラリティコントラスト学習を用いて、堅牢なEEGコンテキスト表現を作成し、感情認識の有効性を向上させる可能性がある。分類損失のみによって導かれる既存の方法とは異なり、自己教師付きコントラスト学習損失と教師付き分類損失を組み合わせた共同学習モデルを提案する。このモデルは両方の損失関数を最適化し、感情検出に特有の微妙な脳波信号の差を捉える。 SI-CLEERの頑健さとSEEDデータセットの精度を最先端の手法と比較した大規模な実験を行った。さらに、感情検出における中心前頭葉と側頭葉の脳波の意義を強調し、電極性能を解析した。本研究は、多種多様な脳波分類タスクに対する潜在的な利点を持つ普遍的なアプローチを提供する。

This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 効果的なフレーズマイニングのためのSpan-Aggregatable, Contextualized Word Embeddings

Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining ( http://arxiv.org/abs/2405.07263v1 )

ライセンス: Link先を確認

Eyal Orbach, Lev Haikin, Nelly David, Avi Faizakof,

(参考訳) 近年, 文の複雑なベクトル表現は, 文類似性タスクで見られるように, 顕著な進歩を遂げている。一方、実世界のフレーズ検索アプリケーションは、高密度表現を効果的に活用するための課題に直面している。目的語句が1つの重み付きベクトルで全文を表す雑音のある文脈内に存在する場合,有効な句検索には不十分であることを示す。そこで我々は、複数のサブ文、連続する単語をそれぞれ自作の高密度ベクトルで表すという概念を考察する。本稿では,この手法がフレーズマイニングに有用であるが,有用なスパン表現を得るためには,かなりの計算が必要であることを示す。そこで, 任意の単語スパンに対して, スパンの意味を保ちながら, 任意の単語スパンに対して集約可能な文脈型単語/トークン埋め込みを議論する。文の埋め込みに使用される一般的なコントラスト損失の修正を導入し、単語の埋め込みにこの特性を付与する。本手法の有効性を示すために, STS-Bデータセットに基づくデータセットに生成したテキストを付加し, より広い文脈で最もよく一致するパラフレーズを検索し, 原語句と類似性の度合いを報告する。本稿では,提案手法が計算量を大幅に増加させることなく,より優れた結果が得られることを示す。

Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a modification to the common contrastive loss used for sentence embeddings that encourages word embeddings to have this property. To demonstrate the effect of this method we present a dataset based on the STS-B dataset with additional generated text, that requires finding the best matching paraphrase residing in a larger context and report the degree of similarity to the origin phrase. We demonstrate on this dataset, how our proposed method can achieve better results without significant increase to compute.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# UAVネットワークにおける分散認証の一手法

An Approach for Decentralized Authentication in Networks of UAVs ( http://arxiv.org/abs/2405.07265v1 )

ライセンス: Link先を確認

Nicholas Jäger, Andreas Aßmuth,

(参考訳) 無人航空機のネットワークに対する分散型認証システムを提案する。ブロックチェーンベースの公開鍵インフラストラクチャは、公開鍵暗号と公開鍵ベースの認証プロトコルの使用を可能にする。ブロックチェーンは公開鍵とその関係の共通ストレージを提供し、認証プロセスに必要な情報を提供する。さらに、無人航空機は、インターネットにアクセスできない可能性のある地域で独立して運用するために、ブロックチェーンの選ばれた部分を格納する。これにより、無人航空機は、他の無人航空機、クラウドサービス、自動車、あらゆるコンピュータのように、ネットワークの実体を認証することができる。

We propose a decentralized authentication system for networks of unmanned aerial vehicles. A blockchain-based public key infrastructure allows the usage of public key cryptography and public key based authentication protocols. The blockchain provides a common storage of the public keys and their relations and can provide the required information for the authentication process. Furthermore, the unmanned aerial vehicles store selected parts of the blockchain in order to operate independently in areas where they might not have access to the Internet. This allows unmanned aerial vehicles to authenticate entities of the network, like other unmanned aerial vehicles, cloud services, cars, and any computer.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# MAML MOT:メタラーニングに基づく複数物体追跡

MAML MOT: Multiple Object Tracking based on Meta-Learning ( http://arxiv.org/abs/2405.07272v1 )

ライセンス: Link先を確認

Jiayi Chen, Chunhua Deng,

(参考訳) 映像解析技術の進歩に伴い、歩行者を含む複雑な場面における多目的追跡(MOT)問題の重要性が高まっている。この課題は主に、歩行者検出と再識別という2つの重要なタスクを含む。近年,歩行者検出タスクにおいて顕著な進歩がみられてきたが,再識別タスクの有効性の向上は引き続き課題である。この困難は、多目的追跡データセットにおける多数の歩行者サンプルと、個々のサンプルの不足から生じる。近年,メタ学習技術の急速な進歩により,メタ学習に基づくマルチオブジェクト追跡のトレーニング手法であるMAML MOTを導入する。このアプローチは,メタラーニングの迅速な学習能力を活用して,歩行者再識別作業におけるサンプル不足問題に対処し,モデルの一般化性能と堅牢性を向上させることを目的とする。実験の結果,提案手法はMOTチャレンジの主流データセットに対して高い精度を実現することが示された。これは、歩行者多目的追跡の分野の研究のための新しい視点と解決策を提供する。

With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 大規模言語モデルを用いた短文の人間解釈可能なクラスタリング

Human-interpretable clustering of short-text using large language models ( http://arxiv.org/abs/2405.07278v1 )

ライセンス: Link先を確認

Justin K. Miller, Tristram J. Alexander,

(参考訳) 大規模な言語モデルは、人間のライクなコンテンツ生成能力によって、非常に人気が高まっている。これらのモデルは人間の生成したコンテンツをクラスタリングするのにも有効であり、その成功は識別性と解釈可能性の尺度によって定義される。この成功は、人間レビュアーとChatGPTによって検証され、短文クラスタリングに挑戦する‘バリデーションギャップ’を閉じるための自動化された手段を提供する。機械と人間のアプローチを比較して、それぞれに固有のバイアスを特定し、人間のコーディングへの依存を「金の標準」として疑問視する。提案手法をTwitterのバイオスに適用し,従来の専門的な研究とよく一致しているが,アイデンティティを表現するために使用される媒体の特色は興味深い。

Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# 風力力学:マルチステップ推論による風力発生の促進

Humor Mechanics: Advancing Humor Generation with Multistep Reasoning ( http://arxiv.org/abs/2405.07280v1 )

ライセンス: Link先を確認

Alexey Tikhonov, Pavel Shtykovskiy,

(参考訳) 本稿では,多段階推論による一線ジョークの生成について検討する。我々の研究は、ユーモラスなワンライナーの作成プロセスの再構築と、ユーモラスな生成のための作業プロトタイプの開発であった。提案手法を,人間によるジョーク,ゼロショット GPT-4 生成ユーモア,その他のベースラインと比較した。評価は、ヒトのラベルをベンチマークとして、生成したユーモアの品質に焦点を当てた。以上の結果から,多段階推論手法は生成したユーモアの品質を継続的に改善することが示された。実験で使用したデータセットを提示し、AIによるユーモア生成の強化に関する洞察を提供する。

In this paper, we explore the generation of one-liner jokes through multi-step reasoning. Our work involved reconstructing the process behind creating humorous one-liners and developing a working prototype for humor generation. We conducted comprehensive experiments with human participants to evaluate our approach, comparing it with human-created jokes, zero-shot GPT-4 generated humor, and other baselines. The evaluation focused on the quality of humor produced, using human labeling as a benchmark. Our findings demonstrate that the multi-step reasoning approach consistently improves the quality of generated humor. We present the results and share the datasets used in our experiments, offering insights into enhancing humor generation with artificial intelligence.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# ブランチナラティブ:文字決定点検出

Branching Narratives: Character Decision Points Detection ( http://arxiv.org/abs/2405.07282v1 )

ライセンス: Link先を確認

Alexey Tikhonov,

(参考訳) 本稿では,登場人物が物語の方向性に大きな影響を及ぼす可能性のある意思決定を行う物語内のポイントを識別するタスクであるキャラクタ決定点検出(CHADPOD)タスクを提案する。本稿では,CYOAライクなゲームグラフをベースとした新しいデータセットを提案する。本稿では,2つのLLMと複数のMLMをベースラインとし,最大89%の精度で異なるモデルの性能の比較分析を行う。このことは物語分析の複雑さを浮き彫りにし、キャラクター駆動型ストーリーダイナミクスの理解に関わる課題を示している。さらに、そのようなモデルを既存のテキストに適用して、潜在的な分岐点によって分割された線形セグメントを生成する方法を示し、物語分析における我々の発見の実践的応用を実証する。

This paper presents the Character Decision Points Detection (CHADPOD) task, a task of identification of points within narratives where characters make decisions that may significantly influence the story's direction. We propose a novel dataset based on CYOA-like games graphs to be used as a benchmark for such a task. We provide a comparative analysis of different models' performance on this task, including a couple of LLMs and several MLMs as baselines, achieving up to 89% accuracy. This underscores the complexity of narrative analysis, showing the challenges associated with understanding character-driven story dynamics. Additionally, we show how such a model can be applied to the existing text to produce linear segments divided by potential branching points, demonstrating the practical application of our findings in narrative analysis.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# BeautyMap:グローバルマップにおける動的点除去のためのバイナリエンコード適応グラウンドマトリックス

BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps ( http://arxiv.org/abs/2405.07283v1 )

ライセンス: Link先を確認

Mingkai Jia, Qingwen Zhang, Bowen Yang, Jin Wu, Ming Liu, Patric Jensfelt,

(参考訳) 静的環境機能を正しく表現するグローバルポイントクラウドは、正確なローカライゼーションと堅牢なパス計画を容易にする。しかし、動的オブジェクトは、静的環境と混ざった望ましくないゴーストトラックを導入します。既存の動的除去法は通常、計算効率と精度のバランスをとるのに失敗する。これに対して,高忠実度グローバルマップの静的な特徴を維持しつつ,動的点を効率的に除去するBeautyMapを提案する。本手法では, 環境特徴を効率的に抽出するために, バイナリ符号化行列を用いる。各フレームの行列と対応するマップ領域をビット単位で比較することにより、ポテンシャル動的領域を抽出できる。次に、粗さを用いて、地形変動を扱うために、$z$-軸の階層的セグメンテーションを微調整する。最終的な静的復元モジュールは、各スキャンのレンジ可視性を考慮し、視界外の静的ポイントを保護する。比較実験は、他の動的点除去法と比較して、精度と効率の両面で、BeautyMapの優れた性能を示している。コードはhttps://github.com/MKJia/BeautyMapで公開されている。

Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the $z$-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/MKJia/BeautyMap.

翻訳日:2024-05-14 17:30:59 公開日:2024-05-12

# SLIP(SAM+CLIP)を用いたゼロショットコンテキストベースオブジェクトセグメンテーション

Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP) ( http://arxiv.org/abs/2405.07284v1 )

ライセンス: Link先を確認

Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal,

(参考訳) ゼロショットオブジェクトセグメンテーションのための拡張アーキテクチャであるSLIP(SAM+CLIP)を提案する。 SLIPはSegment Anything Model (SAM) \cite{kirillov2023segment}とContrastive Language- Image Pretraining (CLIP) \cite{radford2021learning}を組み合わせたものである。 CLIPを使ってSAMにテキストプロンプトを組み込むことで、SLIPは特定のクラスやカテゴリの事前トレーニングなしにオブジェクトセグメンテーションを可能にする。 Pokemonデータセット上でCLIPを微調整し、意味のある画像テキスト表現を学習できるようにします。 SLIPは、テキストプロンプトからコンテキスト情報に基づいて画像中のオブジェクトを認識およびセグメント化できることを示し、多目的オブジェクトセグメンテーションのためのSAMの機能を拡張する。本実験は,テキストによる画像のセグメント化におけるSLIPアーキテクチャの有効性を実証するものである。 CLIPのテキストイメージ理解機能をSAMに統合することで、元のアーキテクチャの機能を拡張し、より汎用的でコンテキスト対応のオブジェクトセグメンテーションを可能にする。

We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# わずかな未学習によるテキスト・画像拡散モデルからの概念の消去

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning ( http://arxiv.org/abs/2405.07288v1 )

ライセンス: Link先を確認

Masane Fuchi, Tomohiro Takagi,

(参考訳) テキストから画像を生成することは、拡散モデルのスケーリングと視覚・言語分野の進歩により容易になっている。これらのモデルは、インターネットから大量のデータを使って訓練されている。したがって、著作権のある資料のような望ましくない内容もしばしば含んでいる。このようなデータを取り除き、モデルを再訓練することは難しいため、事前訓練されたモデルから特定の概念を消去する方法が研究されている。本稿では,テキストエンコーダを数発のアンラーニングで更新するコンセプト・エミッション手法を提案する。概念の消去後の生成画像に関する議論は欠落している。概念の移行先を特定する方法はあるが,その妥当性は明らかではない。提案手法は,モデルや画像に固有の潜在概念に遷移することで,暗黙的にこれを実現する。提案手法は10秒以内に概念を消去し,概念の消去をこれまで以上に容易に行えるようにする。暗黙的に関連する概念に移行することは、より自然な概念の消去につながる。提案手法を様々な概念に適用し, 提案手法の数十倍から数百倍の速度で実現可能であることを確認した。更新すべきパラメータを変化させることで、従来の研究と同様に、知識が主にテキストエンコーダのフィードフォワードネットワークに蓄積されていることを示唆する結果を得た。

Generating images from text has become easier because of the scaling of diffusion models and advancements in the field of vision and language. These models are trained using vast amounts of data from the Internet. Hence, they often contain undesirable content such as copyrighted material. As it is challenging to remove such data and retrain the models, methods for erasing specific concepts from pre-trained models have been investigated. We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning in which a few real images are used. The discussion regarding the generated images after erasing a concept has been lacking. While there are methods for specifying the transition destination for concepts, the validity of the specified concepts is unclear. Our method implicitly achieves this by transitioning to the latent concepts inherent in the model or the images. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before. Implicitly transitioning to related concepts leads to more natural concept erasure. We applied the proposed method to various concepts and confirmed that concept erasure can be achieved tens to hundreds of times faster than with current methods. By varying the parameters to be updated, we obtained results suggesting that, like previous research, knowledge is primarily accumulated in the feed-forward networks of the text encoder.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 量子力学の非線形拡張の幾何学的解釈

Geometric Interpretation of a nonlinear extension of Quantum Mechanics ( http://arxiv.org/abs/2405.07289v1 )

ライセンス: Link先を確認

Alan Chodos, Fred Cooper,

(参考訳) 我々は最近、通常の線形量子力学問題のハミルトニアンの固有値と固有関数の観点から正確に解ける性質を持つ特定の非線形量子力学の一般化を導入した。本稿では,波動関数の2つの成分が時空の2つの異なる漸近領域におけるハミルトニアンHによって記述された系を表すことを示唆し,非線型項が重力効果をもたらすと考えられることを示す。

We recently introduced a particular nonlinear generalization of quantum mechanics which has the property that it is exactly solvable in terms of the eigenvalues and eigenfunctions of the Hamiltonian of the usual linear quantum mechanics problem. In this paper we suggest that the two components of the wave function represent the system described by the Hamiltonian H in two different asymptotic regions of spacetime and we show that the non-linear terms can be viewed as giving rise to gravitational effects.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# CCTV動画のスパルスサンプリングと高速サイクル検出

Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV Videos ( http://arxiv.org/abs/2405.07293v1 )

ライセンス: Link先を確認

Jing Xu, Wentao Shi, Sheng Ren, Pan Gao, Peng Zhou, Jie Qin,

(参考訳) 輸送の分野では、モーターと非モーターの両方が行う違法行為に対処し、緩和することが最重要である。これらの行動の中で、自転車や電動自転車を指定した交通の流れの反対方向に走ることが、自転車と他の道路利用者の両方に重大なリスクをもたらす。そこで本論文では,CCTVビデオにおける不正なサイクル比を検出する問題を定式化する。具体的には,WWC予測器(WWC-Predictor)と呼ばれるスパースサンプリング手法を提案する。本手法では,境界ボックスからの情報を利用する検出ベース情報と,画像自体に対する洞察を提供する方位ベース情報の両方を活用して,瞬時情報取得能力を向上させる。提案手法は,35分間のビデオシーケンスと微小レベルのアノテーションからなるベンチマークデータセットを用いて,11.475%の平均誤差率を実現し,同じ検出モデルの下では,19.12%のGPU時間のみを要した。この顕著な性能は、不正なサイクリングの事例を特定し予測する上で、我々のアプローチの有効性を示している。

In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detecting wrong-way cycling ratios in CCTV videos. Specifically, we propose a sparse sampling method called WWC-Predictor to efficiently solve this problem, addressing the inefficiencies of direct tracking methods. Our approach leverages both detection-based information, which utilizes the information from bounding boxes, and orientation-based information, which provides insights into the image itself, to enhance instantaneous information capture capability. On our proposed benchmark dataset consisting of 35 minutes of video sequences and minute-level annotation, our method achieves an average error rate of a mere 1.475% while taking only 19.12% GPU time of straightforward tracking methods under the same detection model. This remarkable performance demonstrates the effectiveness of our approach in identifying and predicting instances of wrong-way cycling.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 環境富化 : 連続学習における前向き移動の生物学的モデル

Environmental enrichment: a biological model of forward transfer in continual learning ( http://arxiv.org/abs/2405.07295v1 )

ライセンス: Link先を確認

Rajat Saxena, Bruce L. McNaughton,

(参考訳) 連続学習(きゅうがく、Continuous Learning, CL)とは、エージェントがデータの連続的ストリームから学習し、古い情報を忘れずに知識を伝達する能力である。 CLの重要な側面は、フォワード転送(フォワード転送)、すなわち、前の知識からの情報を活用することで、新しいタスクの改善と学習の高速化である。この能力は自然界の脳にもたらされますが、人工知能(AI)には大きな課題があります。ここでは,環境富化(EE)が,人間のようなAI開発を刺激する前向き移動の研究の生物学的モデルとして利用できることを示唆する。 EEは、認知、社会的、運動、感覚の刺激を高める動物研究であり、人間では「認知的予備」と呼ばれるモデルである。豊かになった動物は、新しいタスクにおける学習速度と性能を著しく改善し、通常、前方移動を示す。我々は、EE後の解剖学的、分子的、神経学的変化を探求し、人工知能ニューラルネットワーク(ANN)が、豊かな経験の後の神経計算の変化を予測するためにどのように使用できるかについて議論する。最後に、我々は神経科学とAI研究を組み合わせたシナジスティックな方法を提供し、迅速かつ効率的な新しいタスク学習が可能なAI開発への道を歩む。

Continual learning (CL) refers to an agent's capability to learn from a continuous stream of data and transfer knowledge without forgetting old information. One crucial aspect of CL is forward transfer, i.e., improved and faster learning on a new task by leveraging information from prior knowledge. While this ability comes naturally to biological brains, it poses a significant challenge for artificial intelligence (AI). Here, we suggest that environmental enrichment (EE) can be used as a biological model for studying forward transfer, inspiring human-like AI development. EE refers to animal studies that enhance cognitive, social, motor, and sensory stimulation and is a model for what, in humans, is referred to as 'cognitive reserve'. Enriched animals show significant improvement in learning speed and performance on new tasks, typically exhibiting forward transfer. We explore anatomical, molecular, and neuronal changes post-EE and discuss how artificial neural networks (ANNs) can be used to predict neural computation changes after enriched experiences. Finally, we provide a synergistic way of combining neuroscience and AI research that paves the path toward developing AI capable of rapid and efficient new task learning.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 編集可能なNeRFモデルに対する点再サンプリングと光変換

Point Resampling and Ray Transformation Aid to Editable NeRF Models ( http://arxiv.org/abs/2405.07306v1 )

ライセンス: Link先を確認

Zhenyang Li, Zilong Chen, Feifan Qu, Mingqing Wang, Yizhou Zhao, Kai Zhang, Yifan Peng,

(参考訳) NeRF支援編集作業では,物体位置の可変性の導入により,物体の動きが監視生成の困難を呈する。さらに、特定のシーンオブジェクトの除去操作は、しばしば空の領域につながり、それらを効果的に塗布する際のNeRFモデルの課題を提示する。我々は3次元物体の姿勢を直接操作できる暗黙の光線変換戦略を提案する。潜在的な空き領域を塗布するという課題に対処するため,DNRと呼ばれるプラグ・アンド・プレイの塗布モジュールが提案され,これらの領域を暗黙空間内の元の線源位置の3次元空間に補間することで,物体の除去とシーンの塗布作業が容易になる。重要なことに、DNRを用いることで、地上の真実と暗黙の特徴のギャップを効果的に狭め、光線をまたいだ特徴の相互情報(MI)を増大させる可能性がある。そして、DNRとレイ変換を利用して、点ベースの編集可能なNeRFパイプラインPR^2T-NeRFを構築する。主に3Dオブジェクトの除去および塗装タスクで評価した結果、パイプラインが最先端のパフォーマンスを達成することを示す。さらに、我々のパイプラインは、余分な監督を必要とせず、多様な編集操作のための高品質なレンダリング視覚化をサポートしています。

In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# DiffGen: 微分物理シミュレーション、微分レンダリング、ビジョンランゲージモデルによるロボットデモ生成

DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model ( http://arxiv.org/abs/2405.07309v1 )

ライセンス: Link先を確認

Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu,

(参考訳) シミュレーションによるロボットのデモンストレーションの生成は、ロボットデータのスケールアップに有効な方法として広く認識されている。従来の作業では、専門家のポリシーを生成するために強化学習エージェントを訓練することが多かったが、このアプローチにはサンプル効率が欠如している。近年,ロボットによる実演を微分可能シミュレーションで実現しようと試みている。これは有望だが,報酬設計に大きく依存している。本稿では,微分可能物理シミュレーションと微分可能レンダリングを統合した新しいフレームワークであるDiffGenと,自動かつ効率的なロボットデモ生成を実現するビジョン言語モデルを提案する。 DiffGenは、シミュレーションロボット操作シナリオと自然言語命令を前提として、言語命令の埋め込みと操作後のシミュレーション観察の埋め込みとの距離を最小化することにより、現実的なロボットデモを生成することができる。組込みは視覚言語モデルから得られ、微分可能シミュレーション、微分可能レンダリング、および視覚言語モデルコンポーネントを用いて勾配を計算・下降させることにより、特定のタスクを達成できる。実験によると、DiffGenを使えば、人間の努力やトレーニング時間を最小限に抑えて、ロボットデータを効率よく、効果的に生成できる。

Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward design, a labor-intensive process. In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations. Given a simulated robot manipulation scenario and a natural language instruction, DiffGen can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation after manipulation. The embeddings are obtained from the vision-language model, and the optimization is achieved by calculating and descending gradients through the differentiable simulation, differentiable rendering, and vision-language model components, thereby accomplishing the specified task. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 非パラメトリック制御-コオプマン演算子学習:予測と制御のための柔軟でスケーラブルなモデル

Nonparametric Control-Koopman Operator Learning: Flexible and Scalable Models for Prediction and Control ( http://arxiv.org/abs/2405.07312v1 )

ライセンス: Link先を確認

Petar Bevanda, Bas Driessen, Lucian Cristian Iacob, Roland Toth, Stefan Sosnowski, Sandra Hirche,

(参考訳) クープマン作用素の線形性とそれらの推定器の単純さとモデル推論能力は、力学系を学習するアプリケーションにおいて大きな人気をもたらした。無限次元再生カーネルヒルベルト空間における非パラメトリッククープマン作用素の学習は、自律系においてよく理解されているが、制御系類似性はほとんど探索されていない。特に既存の手法では、表現的ヒューリスティックスや、限定的な表現性や拡張性を持つパラメトリックモデルを利用することが多いため、制御器の完全なデータ駆動学習には、制御入力によるシステムへの対処が不可欠である。本稿では,制御系においても単一演算子の直接推定が可能な制御アフィン再生カーネルによるユニバーサルフレームワークを提案することで,上記の課題に対処する。提案手法は制御・クープマン作用素回帰(英語版)(cKOR)と呼ばれ、自律の場合のクープマン作用素回帰(英語版)(Koopman operator regression)と完全に類似している。まず、制御入力次元の呪いに苦しむことのない非線形制御アフィン系のクープマン演算子表現を学習するための非パラメトリックフレームワークを提案する。これにより、有限次元空間における無限次元学習問題を、他のアプローチのように有限な関数や入力に制限されるため、アプリオリの精度を失うことなく、データのみに基づいて再構成することができる。大規模制御システムへの応用を実現するため,ランダムプロジェクション(スケッチング)を活用して制御クープマン演算子推定器のスケーラビリティを向上させる。予測タスクと制御タスクの両方において,新しいcKORアプローチの有効性を実証した。

Linearity of Koopman operators and simplicity of their estimators coupled with model-reduction capabilities has lead to their great popularity in applications for learning dynamical systems. While nonparametric Koopman operator learning in infinite-dimensional reproducing kernel Hilbert spaces is well understood for autonomous systems, its control system analogues are largely unexplored. Addressing systems with control inputs in a principled manner is crucial for fully data-driven learning of controllers, especially since existing approaches commonly resort to representational heuristics or parametric models of limited expressiveness and scalability. We address the aforementioned challenge by proposing a universal framework via control-affine reproducing kernels that enables direct estimation of a single operator even for control systems. The proposed approach, called control-Koopman operator regression (cKOR), is thus completely analogous to Koopman operator regression of the autonomous case. First in the literature, we present a nonparametric framework for learning Koopman operator representations of nonlinear control-affine systems that does not suffer from the curse of control input dimensionality. This allows for reformulating the infinite-dimensional learning problem in a finite-dimensional space based solely on data without apriori loss of precision due to a restriction to a finite span of functions or inputs as in other approaches. For enabling applications to large-scale control systems, we also enhance the scalability of control-Koopman operator estimators by leveraging random projections (sketching). The efficacy of our novel cKOR approach is demonstrated on both forecasting and control tasks.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# VALID: 可逆的存在を考慮した分散ネットワーク学習のための検証アルゴリズム

VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence ( http://arxiv.org/abs/2405.07316v1 )

ライセンス: Link先を確認

Mayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka, Allison Beemer, Oliver Kosut, Joerg Kliewer,

(参考訳) 異種データと対向的浸透の可能性を持つ非方向性ネットワークに対して、検証された分散学習のパラダイムを導入する。必要です。 (a)敵がいないときの世界的経験損失最小化器に収束し、 b) 敵対的構成にかかわらず,許容的コンセンサスに対する敵的収束の有無を検出すること。この目的のために、我々は、我々の知る限り、検証された学習保証を最初に達成するVALIDプロトコルを提案する。さらに、VALIDはO(1/T)収束率(関連する正則性仮定の下で)と、非逆分散確率勾配勾配に匹敵する計算と通信の複雑さを提供する。注目すべきは、VALIDは、逆境のない環境での最適なパフォーマンス指標を保持し、以前のビザンチン・ロバスト法で観察された堅牢性ペナルティをサイドステッピングすることである。本研究の特筆すべき側面は、グローバルな経験損失最小化器で計算された個々のエージェントの勾配のノルムに基づく不均一度計量である。これは、重要なビザンチン破壊を検出するための自然な統計を提供するだけでなく、VALIDの最適性を幅広い一般性で証明することを可能にする。最後に, 敵がいない場合, VALIDは最先端のビザンチン頑健なアルゴリズムよりも高速に収束する一方で, 敵が存在する場合, VALIDは各誠実さで終了するか, あるいはネットワーク内の敵の存在を宣言する許容的コンセンサスに収束する。

We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestepping the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# コントラスト学習における機械学習

Machine Unlearning in Contrastive Learning ( http://arxiv.org/abs/2405.07317v1 )

ライセンス: Link先を確認

Zixin Wang, Kongyang Chen,

(参考訳) 機械学習は、モデルの精度を最小限に保ちながら、トレーニングデータの影響を減少させるために必要な複雑なプロセスである。近年の機械学習に関する多くの研究にもかかわらず、その大半は教師付き学習モデルに重点を置いており、対照的な学習モデルの研究は比較的過小評価されている。自己教師型学習は,教師型学習よりも有望な可能性を秘めており,教師型学習に匹敵するものであるという信念から,コントラスト型学習モデルを中心とした機械学習の手法を探究した。本研究では,機械学習を効果的に実現するために,モデルトレーニングのための勾配制約に基づく新しいアプローチを提案する。本手法では,学習対象データの最小限の学習エポック数と,学習対象データの同定しか必要としない。また,本手法は,コントラスト学習モデルだけでなく,教師付き学習モデルにも有能な性能を示し,その汎用性と適応性を様々な学習パラダイムで示している。

Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# LayGA: Animatable Clothing Transferのための層状ガウスアバター

LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer ( http://arxiv.org/abs/2405.07319v1 )

ライセンス: Link先を確認

Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu,

(参考訳) キャラクター間の衣料品の着替えやアニメ化をめざすアニマタブルな衣料転送は難しい問題である。ほとんどの人間のアバターは、人間の体と衣服の表現をくっつけることで、仮想的な試行錯誤の難しさを招いている。さらに悪いことに、絡み合った表現は通常、衣服の滑りの動きを正確に追跡することができないのです。この制限を克服するために、我々はLayGA(Layered Gaussian Avatars)という新しい表現を紹介した。我々の表現は、ガウスの地図に基づくアバターの上に構築され、衣服の詳細の表現力に優れています。しかし、ガウス写像は実際の曲面の周りに分布する非構造的な3次元ガウス写像を生成する。スムーズな表面がないことは、正確な衣服追跡と体と衣服の衝突処理の課題を提起する。そこで本研究では,単層再構築と多層フィッティングを含む2段階のトレーニングを提案する。単層リコンストラクション段階において,スムーズな表面を再構築し,同時に体と衣服のセグメンテーションを得るための一連の幾何的制約を提案する。次に,多層フィッティング段階では,体と衣服を表すために2つの異なるモデルを訓練し,再構築された衣服のジオメトリーを3次元監視として利用し,より正確な衣服追跡を行う。さらに,高品質な幾何再構成と高忠実なレンダリングのための幾何層とレンダリング層を提案する。全体として、提案したLayGAは、フォトリアリスティックなアニメーションと仮想トライオンを実現し、他のベースライン手法よりも優れている。私たちのプロジェクトページはhttps://jsnln.github.io/layga/index.htmlです。

Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# L(u)PIN:LLMによる政治イデオロギー放送

L(u)PIN: LLM-based Political Ideology Nowcasting ( http://arxiv.org/abs/2405.07320v1 )

ライセンス: Link先を確認

Ken Kato, Annabelle Purnomo, Christopher Cochrane, Raeid Saqur,

(参考訳) 政治的イデオロギー的立場の定量的分析は難しい課題である。過去には、政治家、政党宣言、議会演説の議決データに焦点が当てられ、様々な政治制度における政治的不一致と分極を推定した。しかし、従来の定量的政治的分析手法は、分析に利用可能なデータの量という共通の課題に悩まされていた。以前の手法では、議会全体の分極や政党全体の政治イデオロギー的立場など、より一般的な政治分析に重点を置いていた。本稿では,LLMの潜在知識を活用して,各議員のイデオロギー的立場を分析する手法を提案する。この方法により、選択の軸として政治家のスタンスを評価することができ、選択の話題・論争に関して政治家のスタンスを柔軟に測定することができる。提案手法は,一対の参照シードに対して,各代表に対する平均BERT埋め込みを投影し,各代表者の音声から意見に基づく文章を抽出するために,微調整のBERT分類器を用いて実現する。これらの参照シードは、特定のトピックに対する反対の見解を持つことが知られている手動で選択された代表者か、OpenAIのGPT-4モデルを用いて生成された文である。我々は、GPT-4モデルに特定の立場を擁護する政治家から発せられるスピーチを生成するよう促すことで、文を作成した。

The quantitative analysis of political ideological positions is a difficult task. In the past, various literature focused on parliamentary voting data of politicians, party manifestos and parliamentary speech to estimate political disagreement and polarization in various political systems. However previous methods of quantitative political analysis suffered from a common challenge which was the amount of data available for analysis. Also previous methods frequently focused on a more general analysis of politics such as overall polarization of the parliament or party-wide political ideological positions. In this paper, we present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs. The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice. We achieve this by using a fine-tuned BERT classifier to extract the opinion-based sentences from the speeches of representatives and projecting the average BERT embeddings for each representative on a pair of reference seeds. These reference seeds are either manually chosen representatives known to have opposing views on a particular topic or they are generated sentences which where created using the GPT-4 model of OpenAI. We created the sentences by prompting the GPT-4 model to generate a speech that would come from a politician defending a particular position.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 米議会演説の計算分析、証拠から直感へのシフトを明らかに

Computational analysis of US Congressional speeches reveals a shift from evidence to intuition ( http://arxiv.org/abs/2405.07323v1 )

ライセンス: Link先を確認

Segun Taofeek Aroyehun, Almog Simchon, Fabio Carrella, Jana Lasser, Stephan Lewandowsky, David Garcia,

(参考訳) 誠実で誠実な意思決定を求めることは、民主主義における統治と説明責任にとって不可欠である。しかし、正直であることの意味と、誠実さを追求する方法について、人々は異なる視点を取ることがある。ここでは、確証可能な事実とデータに根ざしたエビデンスに基づく推論から、感情や主観的な解釈によって引き起こされる直感的な決定まで、視点の連続性を探る。我々は1879年から2022年までの議会演説において、対照的な視点の言語的痕跡を分析した。 1970年代中頃からエビデンスベースの言語は、立法生産性の低下とともに減少し続けています。この減少は、議会における党派偏極の増大と、社会における所得格差の増大に伴った。結果は、政治意思決定における証拠に基づく言語の重要性を浮き彫りにする。

Pursuit of honest and truthful decision-making is crucial for governance and accountability in democracies. However, people sometimes take different perspectives of what it means to be honest and how to pursue truthfulness. Here we explore a continuum of perspectives from evidence-based reasoning, rooted in ascertainable facts and data, at one end, to intuitive decisions that are driven by feelings and subjective interpretations, at the other. We analyze the linguistic traces of those contrasting perspectives in Congressional speeches from 1879 to 2022. We find that evidence-based language has continued to decline since the mid-1970s, together with a decline in legislative productivity. The decline was accompanied by increasing partisan polarization in Congress and rising income inequality in society. Results highlight the importance of evidence-based language in political decision-making.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# 連続学習のための液体アンサンブル選択

Liquid Ensemble Selection for Continual Learning ( http://arxiv.org/abs/2405.07327v1 )

ライセンス: Link先を確認

Carter Blair, Ben Armstrong, Kate Larson,

(参考訳) 継続的学習は、機械学習モデルが、すでに学んだことを忘れずに、シフトするデータ分布から継続的に学習できるようにすることを目的としている。異なる部分集合上でアンサンブルの各メンバーを訓練することにより、アンサンブル全体の精度をナイーブモデルよりもはるかに高い精度で達成することができる。アンサンブル内のどのモデルを任意のデータで学習し、どのモデルを予測すべきかという問題に対処する。代表投票から作業を引き出すことにより,どのモデルがアクティブであるかを動的に選択するアルゴリズムを開発した。さまざまなデリゲート手法とパフォーマンス指標について検討し、最終的に分散シフトに直面した上で、デリゲートがナイーブな学習よりも大きなパフォーマンス向上を提供できることを発見した。

Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# ReLUニューラルネットワークを用いた確率帯域

Stochastic Bandits with ReLU Neural Networks ( http://arxiv.org/abs/2405.07331v1 )

ライセンス: Link先を確認

Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani,

(参考訳) 本稿では,ReLUニューラルネットワーク構造を用いた確率的帯域幅問題について検討する。我々は, 1層ReLUニューラルネットワークの帯域を考慮すれば, $\tilde{O}(\sqrt{T})$ 後悔の保証が達成可能であることを示す。本稿では,この上限を達成できるOFU-RELUアルゴリズムを提案する。このアルゴリズムはまず線形状態に到達するまでランダムに探索し、続いて探索と利用のバランスをとるために UCB 型線形バンドイットアルゴリズムを実装した。我々の重要な洞察は、探索段階でReLUのパラメータを相対的に正確に学習すると、ReLUアクティベーションの断片的線形構造を利用して、変換された特徴空間における問題を線形帯域に変換することができるということである。モデルパラメータへの依存を取り除くため,バッチ化戦略に基づくOFU-ReLU+アルゴリズムを設計し,同じ理論的保証を提供する。

We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

翻訳日:2024-05-14 15:34:20 公開日:2024-05-12

# PotatoGANs:Potato病の特定と分類のためのジェネレーティブ・ディバイサル・ネットワーク、インスタンス・セグメンテーション、説明可能なAIの利用

PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification ( http://arxiv.org/abs/2405.07332v1 )

ライセンス: Link先を確認

Mohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md Hasib,

(参考訳) 深層学習技術を用いた農業病のセグメンテーションの自動化により、多くの応用がもたらされた。しかし、新しい条件を適用すると、これらのアプリケーションはオーバーフィッティングの困難に直面し、セグメンテーション性能が低下する。ジャガイモ農業では、病気が収量に大きな影響を与えているため、農業経済がこれらの病気を迅速かつ適切に識別することが重要である。回転、フリップ、翻訳といった従来のデータ拡張アプローチには制限があり、しばしば強力な一般化結果の提供に失敗する。これらの課題に対処するため,本研究では,PotatoGANと呼ばれる新しいアプローチを採用している。この新たなデータ拡張アプローチでは,2種類のGANを用いて,健康なジャガイモ画像から合成ジャガイモ病画像を生成する。このアプローチはデータセットを拡大するだけでなく、モデル一般化の強化に役立つバラエティも追加する。インセプションスコアを指標として,本実験では,PotatoGANsが生成した画像の品質と現実性が向上し,実際の疾患画像と密に類似する能力を強調した。 CycleGANモデルは、画像品質の点でPix2Pix GANモデルよりも優れており、より高いISスコアにより、CycleGANはブラックスカーフとコモンスハーブでそれぞれ1.2001と1.0900のインセプションスコア(IS)を達成している。この合成データは、大規模なニューラルネットワークのトレーニングを大幅に改善することができる。また、データの多様性と一般化能力を高めながら、データ収集コストを低減する。我々の研究は、3つのグラデーションベースのExplainable AIアルゴリズム(GradCAM, GradCAM++, ScoreCAM)と3つの異なるCNNアーキテクチャ(DenseNet169, Resnet152 V2, InceptionResNet V2)を組み合わせてジャガイモ病の分類を行うことにより、解釈可能性を向上させる。

Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# Quantum Mini-Apps: 量子HPCアプリケーションの開発とベンチマークのためのフレームワーク

Quantum Mini-Apps: A Framework for Developing and Benchmarking Quantum-HPC Applications ( http://arxiv.org/abs/2405.07333v1 )

ライセンス: Link先を確認

Nishant Saurabh, Pradeep Mantha, Florian J. Kiwit, Shantenu Jha, Andre Luckow,

(参考訳) 量子ハードウェアの成熟度と規模の増加とHPCシステムへの統合により、量子HPCアプリケーションやミドルウェアシステムの開発、特徴化、ベンチマークを行うための堅牢な技術を開発する必要がある。これは、量子と古典的なワークロードタスクとコンポーネント間の相互作用、結合、一般的な実行パターンをよりよく理解する必要があります。本稿では,異なる結合モードと相互作用モードを特徴とする6つの量子HPC実行モチーフを同定する。これらのモチーフは、プロダクションシステムの本質的な特性をカプセル化した、一連の量子ミニアプリ - 単純化されたアプリケーションプロトタイプの基礎を提供する。これらの開発を支援するために、異種量子HPCインフラストラクチャをまたいだミニアプリの作成と実行に必要な抽象化を提供するミニアプリケーションフレームワークを導入し、パフォーマンス評価とミドルウェア開発に有用なツールとなる。

With the increasing maturity and scale of quantum hardware and its integration into HPC systems, there is a need to develop robust techniques for developing, characterizing, and benchmarking quantum-HPC applications and middleware systems. This requires a better understanding of interaction, coupling, and common execution patterns between quantum and classical workload tasks and components. This paper identifies six quantum-HPC execution motifs - recurring execution patterns characterized by distinct coupling and interaction modes. These motifs provide the basis for a suite of quantum mini-apps - simplified application prototypes that encapsulate essential characteristics of production systems. To support these developments, we introduce a mini-app framework that offers the necessary abstractions for creating and executing mini-apps across heterogeneous quantum-HPC infrastructure, making it a valuable tool for performance characterizations and middleware development.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# VR応用におけるアクセシブルレイベース相互作用のための震動低減

Tremor Reduction for Accessible Ray Based Interaction in VR Applications ( http://arxiv.org/abs/2405.07335v1 )

ライセンス: Link先を確認

Dr Corrie Green, Dr Yang Jiang, Dr John Isaacs, Dr Michael Heron,

(参考訳) 従来の2Dインタラクション手法と比較して、バーチャルリアリティ(VR)は、ユニークなインターフェースとインタラクション設計決定の機会を示す。現在、既存のインタラクション技術がすべてのユーザにとって使用できない可能性があるため、アクセス可能なVRエクスペリエンスを開発する上で、これは課題となっている。従来の2次元インタフェースのインタラクション手法の多くは、従来のカーソル用に設計されたレーザーポインターの使用など、入力機構にほとんど変更を加えることなく、VR空間で直接動作するように変換されていることが判明した。距離に依存しないミリメートルは、仮想世界でスケールするインタフェースの開発においてデザイナを支援することができると認識されている。関連して、Fittsの法則では、距離が大きくなるにつれて、ユーザーの動きは徐々に遅くなり、正確性が低下する。本稿では,低域通過フィルタを用いてユーザ入力ノイズの正規化を行い,光線による相互作用におけるモータの細かな要求を緩和する手法を提案する。このようなフィルタの実装の可能性を理解し,エンドユーザー体験への影響を探るための開発研究を行った。アルゴリズムが、不随意の手震動をフィルタリングして軽減することで、より正確で結果としてフラストレーションの少ない体験の機会を、どのように提供できるかを実証する。既存のVRデザイン哲学に関するさらなる議論も行われ、多感覚フィードバックと心理モデルを支持する証拠を分析している。完成した研究はGitHubからダウンロードできる。

Comparative to conventional 2D interaction methods, virtual reality (VR) demonstrates an opportunity for unique interface and interaction design decisions. Currently, this poses a challenge when developing an accessible VR experience as existing interaction techniques may not be usable by all users. It was discovered that many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism, such as the use of a laser pointer designed to that of a traditional cursor. It is recognized that distanceindependent millimetres can support designers in developing interfaces that scale in virtual worlds. Relevantly, Fitts law states that as distance increases, user movements are increasingly slower and performed less accurately. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction. A development study was conducted to understand the feasibility of implementing such a filter and explore its effects on end users experience. It demonstrates how an algorithm can provide an opportunity for a more accurate and consequently less frustrating experience by filtering and reducing involuntary hand tremors. Further discussion on existing VR design philosophies is also conducted, analysing evidence that supports multisensory feedback and psychological models. The completed study can be downloaded from GitHub.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 指数メカニズムに基づくデータトレーディング複合オークション機構

Data Trading Combination Auction Mechanism based on the Exponential Mechanism ( http://arxiv.org/abs/2405.07336v1 )

ライセンス: Link先を確認

Kongyang Chen, Zeming Xu, Bing Mi,

(参考訳) 近年、機械学習技術の普及に伴い、トレーニングデータの需要が大幅に増加し、データトレーディングなどの研究分野が出現している。この分野での仕事はまだ発展段階にある。異なる購入者は様々な種類のデータに対する需要の程度が異なり、オークションはその真正さと公正さのためにこのようなシナリオで重要な役割を果たしている。近年の研究では、異なるドメインに対する組み合わせオークション機構が提案されている。しかし、こうしたメカニズムは購入者のプライバシー上の懸念に対処していない。本稿では,購入者の入札プライバシの漏洩を防止するために,指数的メカニズム(DCAE)に基づく「textit{Data Trading Combination Auction Mechanism」を設計する。本稿では,この指数的メカニズムを適用して,競売の最終決着価格を選択し,価格と収益の関係に基づいて確率分布を生成する。実験的な側面では,2つのシナリオの下で異なるメカニズムを選択することを考慮し,本手法は高いオークション収入を確保し,購入者のプライバシが侵害されるのを防ぐことができることを示した。

With the widespread application of machine learning technology in recent years, the demand for training data has increased significantly, leading to the emergence of research areas such as data trading. The work in this field is still in the developmental stage. Different buyers have varying degrees of demand for various types of data, and auctions play a role in such scenarios due to their authenticity and fairness. Recent related work has proposed combination auction mechanisms for different domains. However, such mechanisms have not addressed the privacy concerns of buyers. In this paper, we design a \textit{Data Trading Combination Auction Mechanism based on the exponential mechanism} (DCAE) to protect buyers' bidding privacy from being leaked. We apply the exponential mechanism to select the final settlement price for the auction and generate a probability distribution based on the relationship between the price and the revenue. In the experimental aspect, we consider the selection of different mechanisms under two scenarios, and the experimental results show that this method can ensure high auction revenue and protect buyers' privacy from being violated.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 眼底画像からの網膜血管の網膜基底分類と切削エッジ分割モデルのための説明可能な畳み込みニューラルネットワーク

Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images ( http://arxiv.org/abs/2405.07338v1 )

ライセンス: Link先を確認

Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah,

(参考訳) 本研究は,眼底画像における網膜血管検査による早期診断の重要領域に焦点を当てた。網膜血管の自動セグメンテーションは早期発見を約束するが、既存の方法の限界のために正確な分析は困難であり、しばしば識別能力が欠如しており、病理領域の影響を受けやすい。基礎画像解析の研究は,8つの事前学習CNNモデルを用いたディープラーニングに基づく分類を進歩させる。本研究では,Grad-CAM,Grad-CAM++,Score-CAM,Faster Score-CAM,Layer CAMなどの説明可能なAI技術を利用する。これらのテクニックは、モデルの意思決定プロセスを照らし、透明性を促進し、予測に対する信頼を高める。調査を拡大し、ResNetバックボーンを使用したTransUNet、DenseNetとResNetバックボーンによるAtention U-Net、Swin-UNETを含む10のモデルを調査しました。 ResNet50V2、ResNet101V2、ResNet152V2、DenseNet121などの多様なアーキテクチャを組み込んだ総合的研究により、ファンドス画像解析の強化のための注意機構に関する洞察を深めることができた。基礎画像分類の評価モデルのうち、ResNet101は最高精度で登場し、94.17%を達成した。一方、EfficientNetB0はモデルの中で最も精度が低く、88.33%のスコアを得た。さらに、眼底画像セグメンテーションの分野では、Swin-Unetは86.19%の平均画素精度を示し、眼底画像内の関心領域を正確に記述する効果を示した。逆に、Attention U-Net with DenseNet201 backboneは評価されたモデルの中で最も低い平均画素精度を示し、スコアは75.87%に達した。

Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 疑似科学としての機械意識:意識機械の神話

Machine Consciousness as Pseudoscience: The Myth of Conscious Machines ( http://arxiv.org/abs/2405.07340v1 )

ライセンス: Link先を確認

Eduardo C. Garrido-Merchán,

(参考訳) 意識的機械の仮説は、人工知能の概念が発明されてから議論され、システムによって達成された計算知性は、そのシステムにおいてエピノメノンとして現象的意識が出現する原因、あるいはシステムの行動的・内部的複雑さがしきい値を超えた結果である、という仮定に基づいている。その結果、マシン意識の可能性と、それをコンピュータにどのように実装するかを探求する膨大な文献が公表された。さらに、民間心理学やトランスヒューマニズム文学はこの仮説をSF文学の人気と結び付けており、知的ロボットは通常異形化され、即ち現象意識が与えられる。しかし、本研究では、これらの文献が科学的厳密さに欠けており、反対の仮説を偽造することは不可能であり、機械意識文学が公表した全てのアプローチが科学的方法によって証明できない哲学的仮定に依存していることを示す議論の一覧を示す。具体的には, 現象意識が, アルゴリズムやモデルの複雑さとは独立して, 客観的に測定あるいは定量的に定義できないことを示し, 基本的には観察者にとって主観的で内部的な現象であることを示す。これらすべての議論を踏まえると、なぜ意識的な機械という概念が現在トランスヒューマニズムとサイエンスフィクション文化の神話であるかについて論じる作業は終了する。

The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a consequence, a huge amount of literature exploring the possibility of machine consciousness and how to implement it on a computer has been published. Moreover, common folk psychology and transhumanism literature has fed this hypothesis with the popularity of science fiction literature, where intelligent robots are usually antropomorphized and hence given phenomenal consciousness. However, in this work, we argue how these literature lacks scientific rigour, being impossible to falsify the opposite hypothesis, and illustrate a list of arguments that show how every approach that the machine consciousness literature has published depends on philosophical assumptions that cannot be proven by the scientific method. Concretely, we also show how phenomenal consciousness is not computable, independently on the complexity of the algorithm or model, cannot be objectively measured nor quantitatively defined and it is basically a phenomenon that is subjective and internal to the observer. Given all those arguments we end the work arguing why the idea of conscious machines is nowadays a myth of transhumanism and science fiction culture.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 進化的グリッドトポロジー下における電力グリッド運用リスク評価のためのグラフニューラルネットワーク

Graph neural networks for power grid operational risk assessment under evolving grid topology ( http://arxiv.org/abs/2405.07343v1 )

ライセンス: Link先を確認

Yadong Zhang, Pranav M Karve, Sankaran Mahadevan,

(参考訳) 本稿では,今後の発電機のオン/オフ状態(グリッドトポロジ)や配電決定に関する高精細な情報なしに,電力網内の危険条件を特定するグラフニューラルネットワーク(GNN)の能力について検討する。 GNNは教師付き学習を用いて訓練され、電力グリッドの集積バスレベル(ゾーンレベルまたはシステムレベル)または個々のブランチレベル状態を異なる電力供給および需要条件下で予測する。トレーニングデータに対する入力を生成しながら、確率格子変数(風/ソラ生成と負荷需要)の変動とそれらの統計的相関を厳格に考慮する。多数の混合整数線形計画法(MILP)最適電力フロー問題を解くことで得られたトレーニングデータの出力は、システムレベル、粒子レベル、伝送ラインレベルの関心量(QoIs)に対応する。 GNNが予測するQoIは、時間前、サンプリングベースの信頼性、リスクアセスメント(英語版)において、水平およびシステムレベル(ロードシェディング)および分岐レベル(オーバーロード)障害イベントの実行に使用される。提案手法は, バスサイズが118から2848の3種類の合成格子に対して実証された。以上の結果から,GNNはQoIの高速かつ高精度な予測が可能であり,計算コストのかかるMILPアルゴリズムに優れたプロキシであることを示す。 GNNに基づく信頼性とリスクアセスメントの優れた精度は、厳密な信頼性とリスク推定を迅速に提供することにより、GNNモデルが状況認識を大幅に改善できることを示唆している。

This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-level) or individual branch-level state under different power supply and demand conditions. The variability of the stochastic grid variables (wind/solar generation and load demand), and their statistical correlations, are rigorously considered while generating the inputs for the training data. The outputs in the training data, obtained by solving numerous mixed-integer linear programming (MILP) optimal power flow problems, correspond to system-level, zonal and transmission line-level quantities of interest (QoIs). The QoIs predicted by the GNNs are used to conduct hours-ahead, sampling-based reliability and risk assessment w.r.t. zonal and system-level (load shedding) as well as branch-level (overloading) failure events. The proposed methodology is demonstrated for three synthetic grids with sizes ranging from 118 to 2848 buses. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and can be good proxies for computationally expensive MILP algorithms. The excellent accuracy of GNN-based reliability and risk assessment suggests that GNN models can substantially improve situational awareness by quickly providing rigorous reliability and risk estimates.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# TKAN: 一時的コルモゴロフ・アルノルドネットワーク

TKAN: Temporal Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2405.07344v1 )

ライセンス: Link先を確認

Remi Genet, Hugo Inzirillo,

(参考訳) リカレントニューラルネットワーク(RNN)は、特に自然言語やデータシーケンス処理において、機械学習の多くの領域に革命をもたらした。 LSTM(Long Short-Term Memory)は、シーケンシャルデータにおける長期的な依存関係をキャプチャする能力を示している。 MLP(Multi-Layer Perceptrons)に代わる有望な代替手段であるKolmogorov-Arnold Networks(KAN)に触発された我々は、kanとLSTM、TKAN(Temporal Kologorov-Arnold Networks)に触発された新しいニューラルネットワークアーキテクチャを提案した。 TKANは両方のネットワークの強みを組み合わせたもので、メモリ管理を組み込んだRecurring Kolmogorov-Arnold Networks (RKANs) Layersで構成されている。この革新により、精度と効率を向上したマルチステップ時系列予測が可能となる。複雑なシーケンシャルパターンを扱う場合の従来のモデルの限界に対処することにより、TKANアーキテクチャは予測を1段階以上進める必要がある分野において、大きな可能性をもたらす。

Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# インストラクションチューニングによるAI生成画像の人間の嗜好理解と評価

Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning ( http://arxiv.org/abs/2405.07346v1 )

ライセンス: Link先を確認

Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min,

(参考訳) 人工知能生成コンテンツ(AIGC)は近年急速に成長しており、AIベースの画像生成は、その効率的で想像力のある画像生成能力によって広く注目を集めている。しかし、AIGI(AI- generated Images)は、その独特の歪みのために人間の嗜好を満足させておらず、AIGIに対する人間の嗜好を理解し評価する必要性を強調している。そこで本論文では,AIGIを対象とした画像品質評価(IQA)データベース,AIGCIQA2023+を構築し,人間の視覚的嗜好スコアと,品質,信頼性,対応性といった3つの視点から詳細な嗜好説明を提供する。そして,構築したAIGCIQA2023+データベースをベースとして,インストラクションチューニングを用いたマルチパースペクティブからAIGIに対する人間の嗜好を評価・説明するためのMINT-IQAモデルを提案する。具体的には、MINT-IQAモデルは、まず、マルチパースペクティブからAI生成画像に対する人間の嗜好を学習し、評価し、次に、視覚言語による指示チューニング戦略を通じて、AIGIに対する人間の視覚的嗜好に対する強力な理解と説明能力を得る。 MINT-IQAモデルはAIGIに対する人間の視覚的嗜好の理解と評価において最先端の性能を達成し,提案モデルは最先端IQAモデルと比較して従来のIQAタスクと競合する結果も得ることを示した。 AIGCIQA2023+データベースとMINT-IQAモデルは、将来の研究を促進するためにリリースされる。

Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for AIGIs. To this end, in this paper, we first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+, which provides human visual preference scores and detailed preference explanations from three perspectives including quality, authenticity, and correspondence. Then, based on the constructed AIGCIQA2023+ database, this paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning. Specifically, the MINT-IQA model first learn and evaluate human preferences for AI-generated Images from multi-perspectives, then via the vision-language instruction tuning strategy, MINT-IQA attains powerful understanding and explanation ability for human visual preference on AIGIs, which can be used for feedback to further improve the assessment capabilities. Extensive experimental results demonstrate that the proposed MINT-IQA model achieves state-of-the-art performance in understanding and evaluating human visual preferences for AIGIs, and the proposed model also achieves competing results on traditional IQA tasks compared with state-of-the-art IQA models. The AIGCIQA2023+ database and MINT-IQA model will be released to facilitate future research.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# MedConceptsQA - オープンソースの医療概念QAベンチマーク

MedConceptsQA -- Open Source Medical Concepts QA Benchmark ( http://arxiv.org/abs/2405.07348v1 )

ライセンス: Link先を確認

Ofir Ben Shoham, Nadav Rappoport,

(参考訳) MedConceptsQAは、医療概念質問応答のための専用のオープンソースベンチマークである。このベンチマークは、診断、手順、薬物など、さまざまな語彙にわたる様々な医学概念に関する質問で構成されている。質問は、簡単、中、困難の3つのレベルに分類される。各種大規模言語モデルを用いて評価を行った。以上の結果より, 事前訓練を受けた臨床用大言語モデルでは, 医用データで事前訓練を受けたにもかかわらず, ランダムな推定値に近い精度の精度が得られたことが示唆された。しかし、GPT-4は、臨床大言語モデルと比較して、27%-37%(ゼロショット学習では27%、少数ショット学習では37%)の絶対的な平均改善を実現している。我々のベンチマークは、大規模言語モデルによる医学的概念の理解と推論を評価するための貴重なリソースとして役立ちます。私たちのベンチマークはhttps://huggingface.co/datasets/ofir408/MedConceptsQAで公開されています。

We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at https://huggingface.co/datasets/ofir408/MedConceptsQA

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# WeedScout:専用ハードウェアを用いたリアルタイム自動ブラックグラス分類とマッピング

WeedScout: Real-Time Autonomous blackgrass Classification and Mapping using dedicated hardware ( http://arxiv.org/abs/2405.07349v1 )

ライセンス: Link先を確認

Matthew Gazzard, Helen Hicks, Isibor Kennedy Ihianle, Jordan J. Bird, Md Mahmudul Hasan, Pedro Machado,

(参考訳) クログラス(Alopecurus myosuroides)は、作物の収穫量を減らし、栽培コストを増大させることで、食品の安全性に広範囲に影響を及ぼす競争雑草である。農業の財政的負担に加えて、黒草への除草剤としての除草剤の応用は、清潔な水や衛生へのアクセスに悪影響を及ぼす可能性がある。 WeedScoutプロジェクトは、黒草のリアルタイム検出に適した最先端ソリューションであるRT-ABGCM(Real-Rime Autonomous Black-Grass Classification and Mapping)を導入し、精密雑草管理を実践している。人工知能(AI)アルゴリズムを活用することで、システムはライブイメージフィードを処理し、ブラックグラス密度を推測し、成熟の2段階をカバーする。この研究は、YOLO(You Only Look Once)モデル、具体的には、NVIDIA Jetson Nano (NJN)でエッジで加速されたYOLOv8とYOLO-NASの合理化について調査している。推論速度とモデルパフォーマンスを最適化することにより、プロジェクトはAIを農業プラクティスに統合し、除草剤耐性や環境影響といった課題に対する潜在的な解決策を提供する。さらに、2つのデータセットとモデルウェイトが研究コミュニティに提供され、雑草の検出と精密農業技術のさらなる進歩を促進する。

Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 光の非ガウス状態の反復生成のための多目的かつスケーラブルなスキームの実験的実証

Experimental demonstration of a versatile and scalable scheme for iterative generation of non-Gaussian states of light ( http://arxiv.org/abs/2405.07350v1 )

ライセンス: Link先を確認

Hector Simon, Lucas Caron, Romaric Journet, Viviane Cotte, Rosa Tualle-Brouri,

(参考訳) GKP状態のような非ガウス状態は、光連続変数量子コンピューティングにとって必須の資源である。これらの状態を効率的に生成する能力は、一般に量子技術、特にフォールトトレラントな量子コンピューティングの膨大な可能性を開くだろう。このレターでは、量子メモリキャビティを用いて、育種プロトコルの確率的性質を克服し、スケーラビリティの観点から高速で非ガウス状態を生成する。実験装置の性能は, 振幅アルファ = 1.63 のシュリンガー猫状態の生成により, kHz 域における生成速度において 60% 以上の忠実度を呈し, それらの状態に対する「最先端猫状態」よりも高い値を示した。

Non-Gaussian states of light, such as GKP states, are essential resources for optical continuous-variable quantum computing. The ability to efficiently produce these states would open up tremendous prospects for quantum technologies in general and fault-tolerant quantum computing in particular. This letter demonstrates a versatile method using a quantum memory cavity to overcome the probabilistic nature of the breeding protocols and generate non-Gaussian states at high rates with scalability perspectives. The performances of our experimental setup are illustrated with the generation of Schr\"odinger cat states of amplitude alpha = 1.63 with a fidelity of more than 60% at a generation rate in the kHz range, which is higher than the state of the art for such states.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# SoccerNet-Echoes: サッカーゲームのオーディオ解説データセット

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset ( http://arxiv.org/abs/2405.07354v1 )

ライセンス: Link先を確認

Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah,

(参考訳) サッカーにおける自動音声認識(ASR)技術の応用は、スポーツ分析に多くの機会を提供する。具体的には、ASRでオーディオコメンタリーを抽出することで、ゲームのイベントに関する貴重な洞察を与え、自動ハイライト生成などの下流アプリケーションへの扉を開く。本稿では,サッカーゲーム放送から音声コメントを自動的に書き起こし,ASRを用いてゲーム音声から派生したリッチなテキスト情報を用いて映像コンテンツを拡張した,サッカーネットデータセットの強化について述べる。 Whisperモデルを使用して生成され、Google Translateで翻訳されたこれらのテキストコメンタリーは、アクションスポッティングの強化、自動キャプション生成、ゲーム要約など、さまざまなアプリケーションにおける SoccerNetデータセットの有用性を拡張している。視覚的および聴覚的コンテンツとともにテキストデータを組み込むことで、サッカーゲームのダイナミクスを捉えるアルゴリズムを開発するための総合的なリソースとなることを目的としている。本稿では,このデータセットのキュレーションとASRの統合に関わる手法について詳述する。また,スポーツ分析におけるマルチモーダルなアプローチの意義と,リッチなデータセットが多様なアプリケーションをどのようにサポートするかを強調し,スポーツ分析の分野における研究と開発の範囲を広げる。

The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 交通・インフラにおけるサイバーセキュリティイノベーションのための価値駆動型フレームワーク

A Value Driven Framework for Cybersecurity Innovation in Transportation & Infrastructure ( http://arxiv.org/abs/2405.07358v1 )

ライセンス: Link先を確認

Lampis Alevizos, Lalit Bhakuni, Stefan Jaschke,

(参考訳) 本稿では、この分野を支配してきた従来の市場中心のアプローチとは対照的に、輸送・インフラ分野における価値駆動型サイバーセキュリティ革新の枠組みを紹介する。イノベーションカテゴリーを持続的、漸進的、破壊的、変革的へと再定義し、私たちは組織内の自己革新の文化を育み、ビジネス価値と戦略的目標に直接貢献するサイバーセキュリティ対策に戦略的に焦点をあてることを目指しています。このアプローチは、主にサイバー防衛の運用効率と効率を高めると同時に、サイバーセキュリティイニシアチブをミッションクリティカルな目標と整合させる。本稿では,サイバーセキュリティイノベーションのビジネス価値を評価するための実践的手法を詳述する。このフレームワークは、インフラの整合性を維持しながら、進化するサイバー脅威の状況に対するサイバーセキュリティ機能を強化するように設計されている。一般市場へのアピールからセクター特有のニーズへと焦点を移すため、当社のフレームワークは、サイバーセキュリティのリーダーに、影響力のあるイニシアティブの優先順位付けに必要な戦略的サイバー監視を提供する。

This paper introduces a value-driven cybersecurity innovation framework for the transportation and infrastructure sectors, as opposed to the traditional market-centric approaches that have dominated the field. Recontextualizing innovation categories into sustaining, incremental, disruptive, and transformative, we aim to foster a culture of self-innovation within organizations, enabling a strategic focus on cybersecurity measures that directly contribute to business value and strategic goals. This approach enhances operational effectiveness and efficiency of cyber defences primarily, while also aligns cybersecurity initiatives with mission-critical objectives. We detail a practical method for evaluating the business value of cybersecurity innovations and present a pragmatic approach for organizations to funnel innovative ideas in a structured and repeatable manner. The framework is designed to reinforce cybersecurity capabilities against an evolving cyber threat landscape while maintaining infrastructural integrity. Shifting the focus from general market appeal to sector-specific needs, our framework provides cybersecurity leaders with the strategic cyber-foresight necessary for prioritizing impactful initiatives, thereby making cybersecurity a core business enabler rather than a burden.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# N次元ランゲヴィン方程式とニューラル正規微分方程式による予測

Forecasting with an N-dimensional Langevin Equation and a Neural-Ordinary Differential Equation ( http://arxiv.org/abs/2405.07359v1 )

ライセンス: Link先を確認

Antonio Malpica-Morales, Miguel A. Duran-Olivencia, Serafim Kalliadasis,

(参考訳) 競争力のある電気市場では、電気の日頭価格の正確な予測が不可欠である。静電気価格予測技術は注目されているが、電力市場における非定常的特徴の一般的な普及にもかかわらず、非定常的手法の研究は比較的少ない。具体的には、既存の非定常的手法は、単独で個々の非定常的特徴に対処することを目的としており、同時に複数の非定常的効果を探索すること以外にはならない。本研究の目的は,非定常行動の範囲を包含する,非定常電気価格時系列を体系的にモデル化し,予測する枠組みの定式化である。この目的のために、N次元ランゲヴィン方程式(LE)とニューラル正規微分方程式(NODE)を組み合わせたデータ駆動モデルを開発する。 LEは定常状態における電気価格の挙動を詳細に把握するが、非定常状態には不十分である。この制約を克服するために、我々はNODEアプローチを用いて学習し、同時にLEが生み出す実際の電気価格時系列と模擬価格軌跡との差を予測する。この違いを学習することで、NODEはLEがキャプチャできない時系列の非定常成分を再構成する。スペイン電力日頭市場を原型事例研究として用いた枠組みの有効性を実証する。その結果,NODEはLEを良好に補完し,定常的および非定常的電気価格の双方に対処するための包括的戦略を提供することがわかった。フレームワークの信頼性とロバスト性は、様々な非定常シナリオを通じて、様々な基本的な単純法と比較することによって実証される。

Accurate prediction of electricity day-ahead prices is essential in competitive electricity markets. Although stationary electricity-price forecasting techniques have received considerable attention, research on non-stationary methods is comparatively scarce, despite the common prevalence of non-stationary features in electricity markets. Specifically, existing non-stationary techniques will often aim to address individual non-stationary features in isolation, leaving aside the exploration of concurrent multiple non-stationary effects. Our overarching objective here is the formulation of a framework to systematically model and forecast non-stationary electricity-price time series, encompassing the broader scope of non-stationary behavior. For this purpose we develop a data-driven model that combines an N-dimensional Langevin equation (LE) with a neural-ordinary differential equation (NODE). The LE captures fine-grained details of the electricity-price behavior in stationary regimes but is inadequate for non-stationary conditions. To overcome this inherent limitation, we adopt a NODE approach to learn, and at the same time predict, the difference between the actual electricity-price time series and the simulated price trajectories generated by the LE. By learning this difference, the NODE reconstructs the non-stationary components of the time series that the LE is not able to capture. We exemplify the effectiveness of our framework using the Spanish electricity day-ahead market as a prototypical case study. Our findings reveal that the NODE nicely complements the LE, providing a comprehensive strategy to tackle both stationary and non-stationary electricity-price behavior. The framework's dependability and robustness is demonstrated through different non-stationary scenarios by comparing it against a range of basic naive methods.

翻訳日:2024-05-14 15:24:35 公開日:2024-05-12

# 量子連続可変状態における絡み合いダイナミクス

Entanglement Dynamics in Quantum Continuous-Variable States ( http://arxiv.org/abs/2405.07362v1 )

ライセンス: Link先を確認

Ankit Kumar,

(参考訳) 重力結合の弱さのため、重力が地球の磁場を利用する全ての量子実験が現在まで行われている。この場は量子粒子から事実上検出不可能なバックアクションを行うため、固定背景ニュートン場あるいは時空として古典的な記述を効果的に認めている。この議論は、重力の量子的特徴を観測できる最も単純なシナリオの1つであるため、2つの量子質量間の重力の実証に向けた理論的および実験的研究を強く動機付けている。いくつかの提案は、2つの巨大な物体間の絡み合いを発生させる可能性について研究した。同じ線に沿って、特に重力に焦点を当て、この論文は相互作用を媒介する絡み合いに対処するための一般的なツールを導入し、連続可変状態の2つの粒子に適用する。

Due to the weakness of gravitational coupling, all quantum experiments up to date in which gravity plays a role utilized the field of the Earth. Since this field undergoes practically undetectable back-action from quantum particles, it effectively admits a classical description as a fixed background Newtonian field or spacetime. This argument strongly motivates theoretical and experimental research towards a demonstration of gravitation between two quantum masses, as this is one of the most straightforward scenarios where quantum features of gravity could be observed. Several proposals studied the possibility of generating entanglement between two massive objects. Along the same lines, with a particular focus on gravity, this thesis introduces general tools to tackle interaction-mediated entanglement and applies them to two particles prepared in continuous-variable states.