Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241023となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 時間差学習の簡易化 Simplifying Deep Temporal Difference Learning ( http://arxiv.org/abs/2407.04811v2 ) ライセンス: Link先を確認	Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, Mario Martin,	(参考訳) Q-ラーニングは、フィールド強化学習(RL)において基礎的な役割を担った。しかし、Qラーニングやディープニューラルネットワークのような非線形関数近似のような非政治データを持つTDアルゴリズムは、主にリプレイバッファとターゲットネットワークのトレーニングを安定化するために、いくつかの追加のトリックを必要とする。残念なことに、ターゲットネットワークにおける凍結ネットワークパラメータの更新が遅れてサンプル効率が損なわれ、同様にリプレイバッファもメモリと実装のオーバーヘッドを発生させる。本稿では,安定性を維持しつつ,TDトレーニングの高速化と簡易化が可能であるかを検討する。我々の重要な理論的結果は、LayerNormのような正規化手法が、目的のネットワークを必要とせずに、たとえ政治外のデータであっても、確実に収束したTDアルゴリズムが得られることを初めて示している。実験的に、ベクトル化された環境によって可能とされたオンライン並列サンプリングは、リプレイバッファを必要とせずにトレーニングを安定化させる。これらの結果に触発され,より簡易なオンラインQ-LearningアルゴリズムであるPQNを提案する。意外なことに、この単純なアルゴリズムは、AtariのRainbow、HanabiのR2D2、SmaxのQMix、CraftaxのPPO-RNNといった複雑な手法と競合する。 PPOがゴーツーRLアルゴリズムになった時代に、PQNはQラーニングを実行可能な代替手段として再確立する。 Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need of a replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, R2D2 in Hanabi, QMix in Smax, PPO-RNN in Craftax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes Q-learning as a viable alternative.	翻訳日:2024-11-08 23:35:45 公開日:2024-10-23
# Richelieu: AI外交のための自己進化型LLMベースのエージェント Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v2 ) ライセンス: Link先を確認	Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,	(参考訳) 外交は人間社会における最も洗練された活動の1つである。複数の当事者やエージェント間の複雑な相互作用には、社会的推論、交渉術、長期戦略計画など様々な能力が含まれる。従来のAIエージェントは、複数のエージェントを含むタスクにおいて、多段階ゲームやより大きなアクションスペースを扱う能力を確実に証明している。しかし外交は、特に必要な交渉段階を考慮して、決定空間の停滞を伴う。近年, LLMエージェントは, 複雑なマルチエージェント環境において, 従来のエージェントの境界を拡張できる可能性を示しているが, 複雑なマルチエージェント環境において, 非常に長い計画期間を扱うには不十分である。最先端のLLM技術を活用して、我々は、より強力なLLMベースの社会エージェントに3つのコアと必須の機能を組み合わせることで、高度に包括的なマルチエージェントミッションのために、AIの上限を人間のようなエージェントに向けて探索する第一歩を踏み出します。 1) 記憶とリフレクションを有する戦略的プランナー 2 社会的理由づけによる目標志向の交渉 3) 自己プレイゲームによって記憶を増強し, ループ内の人間を介さずに自己進化する。 Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop.	翻訳日:2024-11-08 23:02:19 公開日:2024-10-23
# Richelieu: AI外交のための自己進化型LLMベースのエージェント Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v3 ) ライセンス: Link先を確認	Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,	(参考訳) 外交は、社会的推論、交渉、長期戦略計画のスキルを必要とする複数の当事者間の複雑な相互作用を含む、人間社会における最も洗練された活動の1つである。従来のAIエージェントは、マルチエージェントタスクにおいて、マルチステップゲームと大きなアクションスペースを扱う能力を示した。しかし外交は、特に必要な交渉段階を考慮して、決定空間の停滞を伴う。大規模言語モデル(LLM)に基づく最近のエージェントは、様々なアプリケーションに可能性を示しているが、複雑なマルチエージェント設定において、計画期間の延長に苦慮している。 LLMベースのエージェントの最近の技術を活用し、我々は3つの基本的な機能を統合することで包括的なマルチエージェントミッションを実行することができる人間のようなエージェントを作るAIの可能性を探究することを目的としている。 1) 記憶とリフレクションによる戦略的計画 2 社会的理由づけによる目標志向の交渉 3) 自己学習ゲームによって記憶を増強し, ループ内に人間がいない自己進化を図った。 Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop.	翻訳日:2024-11-08 23:02:19 公開日:2024-10-23
# Richelieu: AI外交のための自己進化型LLMベースのエージェント Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v4 ) ライセンス: Link先を確認	Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,	(参考訳) 外交は、社会的推論、交渉、長期戦略計画のスキルを必要とする複数の当事者間の複雑な相互作用を含む、人間社会における最も洗練された活動の1つである。従来のAIエージェントは、マルチエージェントタスクにおいて、マルチステップゲームと大きなアクションスペースを扱う能力を示した。しかし外交は、特に必要な交渉段階を考慮して、決定空間の停滞を伴う。大規模言語モデル(LLM)に基づく最近のエージェントは、様々なアプリケーションに可能性を示しているが、複雑なマルチエージェント設定において、計画期間の延長に苦慮している。 LLMベースのエージェントの最近の技術を活用し、我々は3つの基本的な機能を統合することで包括的なマルチエージェントミッションを実行することができる人間のようなエージェントを作るAIの可能性を探究することを目的としている。 1) 記憶とリフレクションによる戦略的計画 2 社会的理由づけによる目標志向の交渉 3) 自己学習ゲームによって記憶を増強し, ループ内に人間がいない自己進化を図った。 Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop.	翻訳日:2024-11-08 23:02:19 公開日:2024-10-23
# Attribute or Abstain: 長期ドキュメントアシスタントとしての大規模言語モデル Attribute or Abstain: Large Language Models as Long Document Assistants ( http://arxiv.org/abs/2407.07799v2 ) ライセンス: Link先を確認	Jan Buchmann, Xiao Liu, Iryna Gurevych,	(参考訳) LLMは人間が長い文書を扱うのを助けることができるが、幻覚で知られている。 LLMは、その応答を支持する証拠を提供し、検証可能性を高める。既存の属性に対するアプローチはRAG設定でのみ評価されている。これは、検索が不要な長いドキュメント設定とは大きく異なるが、助けになる可能性がある。これにより、属性の長い文書特定評価が欠落する。このギャップを埋めるために、LABは、6つの異なる長文タスクを属性付きでベンチマークし、異なるサイズの5 LLMに対して異なるアプローチで帰属する実験を行う。一つのステップで応答生成とエビデンス抽出という引用が、大規模で微調整されたモデルに最適であるのに対して、追加の検索は小さなモデルに有効であることがわかった。我々は,「中流の失われた」現象が帰属に有効であるかどうかを考察するが,この現象は見つからない。また、モデルが複雑なクレームのエビデンスの提供に苦労しているため、単純な応答を持つデータセットで応答品質を予測できるが、複雑なレスポンスには当てはまらないこともわかりました。 LLMs can help humans working with long documents, but are known to hallucinate. Attribution can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. Thus, a long document specific evaluation of attribution is missing. To fill this gap, we present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiments with different approaches to attribution on 5 LLMs of different sizes. We find that citation, i.e. response generation and evidence extraction in one step, performs best for large and fine-tuned models, while additional retrieval can help for small, prompted models. We investigate whether the "Lost in the Middle'' phenomenon exists for attribution, but do not find this. We also find that evidence quality can predict response quality on datasets with simple responses, but not so for complex responses, as models struggle with providing evidence for complex claims.	翻訳日:2024-11-08 22:40:08 公開日:2024-10-23
# Twitterのリアルタイム要約 Real-Time Summarization of Twitter ( http://arxiv.org/abs/2407.08125v2 ) ライセンス: Link先を確認	Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu,	(参考訳) 本稿では,Twitter のリアルタイム要約における TREC のアプローチについて述べる。我々は、リアルタイムプッシュ通知のシナリオに焦点を当て、システムはサンプルツイートのストリームを監視し、関連するツイートと新規なツイートを、特定の興味のあるプロフィールに返却する。ダイリクレスコア(ダイリクレスコア)は、非常にスムーズな(ベースライン)で、あるツイートが特定の関心プロファイルに関連するかどうかを分類するために使用される。平均利得(MAP),累積利得(CG),ディスカウント累積利得(DCG)などの指標を用いて,本手法が良好な性能を示すことを示す。また、プッシュキューから冗長なツイートを削除することも望まれる。精度の限界のため,本論文ではアルゴリズムについてのみ記述する。 In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.	翻訳日:2024-11-08 22:29:08 公開日:2024-10-23
# 算数推論のための自己学習言語モデル Self-training Language Models for Arithmetic Reasoning ( http://arxiv.org/abs/2407.08400v2 ) ライセンス: Link先を確認	Marek Kadlčík, Michal Štefánik,	(参考訳) 最近の言語モデルは、複雑な多段階推論を含むタスクにおいて印象的な結果をもたらすが、これらの機能をさらに拡張するには、より注釈付きデータの高価な収集が必要である。本研究では,算術的推論(自己学習)における予測の有効性に対する自動フィードバックを用いることで,新しいデータなしにモデルの推論能力を向上させる可能性を検討する。 6つの異なる算術推論データセットの体系的な実験では、モデルは単ラウンド(オフライン)とオンラインの自己学習の両方で大幅に改善され、それぞれ+13.9%と+25.9%のケースで正しい結果が得られる。さらに、単一ラウンドでオフラインのセルフトレーニングでは、従来の教師あり学習は好みの最適化に匹敵する効果をもたらすが、オンラインのセルフトレーニングでは、教師あり学習よりも優れた安定性と、目に見えない種類の問題に対する堅牢性により、好みの最適化手法が優れていることが判明した。 Recent language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving models' reasoning capabilities without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). In systematic experimentation across six different arithmetic reasoning datasets, we find that models can substantially improve in both single-round (offline) and online self-training, reaching a correct result in +13.9% and +25.9% more cases, respectively, underlining the importance of actuality of self-training feedback. We further find that in the single-round, offline self-training, traditional supervised training can deliver gains comparable to preference optimization, but in online self-training, preference optimization methods largely outperform supervised training thanks to their superior stability and robustness on unseen types of problems.	翻訳日:2024-11-08 22:29:08 公開日:2024-10-23
# 算数推論のための自己学習言語モデル Self-training Language Models for Arithmetic Reasoning ( http://arxiv.org/abs/2407.08400v3 ) ライセンス: Link先を確認	Marek Kadlčík, Michal Štefánik,	(参考訳) 最近の言語モデルは、複雑な多段階推論を含むタスクにおいて印象的な結果をもたらすが、これらの機能をさらに拡張するには、より注釈付きデータの高価な収集が必要である。本研究では,算術的推論(自己学習)における予測の有効性に対する自動フィードバックを用いることで,新しいデータなしにモデルの推論能力を向上させる可能性を検討する。 6つの異なる算術推論データセットの体系的な実験では、モデルは単ラウンド(オフライン)とオンラインの自己学習の両方で大幅に改善され、それぞれ+13.9%と+25.9%のケースで正しい結果が得られる。さらに、単一ラウンドでオフラインのセルフトレーニングでは、従来の教師あり学習は好みの最適化に匹敵する効果をもたらすが、オンラインのセルフトレーニングでは、教師あり学習よりも優れた安定性と、目に見えない種類の問題に対する堅牢性により、好みの最適化手法が優れていることが判明した。 Recent language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving models' reasoning capabilities without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). In systematic experimentation across six different arithmetic reasoning datasets, we find that models can substantially improve in both single-round (offline) and online self-training, reaching a correct result in +13.9% and +25.9% more cases, respectively, underlining the importance of actuality of self-training feedback. We further find that in the single-round, offline self-training, traditional supervised training can deliver gains comparable to preference optimization, but in online self-training, preference optimization methods largely outperform supervised training thanks to their superior stability and robustness on unseen types of problems.	翻訳日:2024-11-08 22:29:08 公開日:2024-10-23
# 言語モデルを用いたA探索の高速化のためのトレーニングデータ A Training Data Recipe to Accelerate A Search with Language Models ( http://arxiv.org/abs/2407.09985v2 ) ライセンス: Link先を確認	Devaansh Gupta, Boyang Li,	(参考訳) 大規模言語モデル(LLM)とAのようなヒューリスティック検索アルゴリズムを組み合わせることで、LLM推論の強化とスケーラブルな推論が期待できる。 LLMヒューリスティック学習のトレーニングデータのコアセット選択問題について検討する。ヒューリスティック関数を学習する手法はほとんどなく,探索アルゴリズムと機械学習モデルとの相互作用を考察する。本研究では, A 探索アルゴリズムの要件を LLM の要件から実証的に切り離して, この課題を一般化する。 A* は目標に近い探索ノードに対してより正確な予測を必要とし、LLM は効率的な一般化のために同じノードセットを必要とする。これらの知見により,LLMに基づくヒューリスティックス学習のためのデータ選択分布を導出する。迷路ナビゲーション,ソコバン,スライディングタイルパズルの3つの古典的計画領域において,我々の手法は,解を見つけるのに必要な反復回数を最大15倍に減らし,探索時間を最大5倍に短縮する。コードベースはhttps://github.com/devaansh100/a_starにある。 Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.	翻訳日:2024-11-08 21:43:45 公開日:2024-10-23
# AlleNoise: 実世界のラベルノイズを用いた大規模テキスト分類ベンチマークデータセット AlleNoise: large-scale text classification benchmark dataset with real-world label noise ( http://arxiv.org/abs/2407.10992v2 ) ライセンス: Link先を確認	Alicja Rączkowska, Aleksandra Osowska-Kurczab, Jacek Szczerbiński, Kalina Jasinska-Kobus, Klaudia Nazarko,	(参考訳) ラベルノイズは、ロバストな分類モデルのトレーニングにおいて依然として課題である。ラベルノイズを緩和するほとんどの方法は、主に合成ノイズを伴うデータセットを用いてベンチマークされている。現実的なノイズ分布を持つデータセットの必要性は、WebVisionやChrothing1MといったWebスクラッドベンチマークによって部分的に解決されているが、これらのベンチマークはコンピュータビジョンドメインに限定されている。 Transformerベースのモデルの重要性が高まっているため、ノイズのあるラベルで学習するためのテキスト分類ベンチマークを確立することが不可欠である。本稿では、約5600のクラスに50,000以上のサンプルを含む実世界のインスタンス依存ラベルノイズを持つ、新しいキュレートされたテキスト分類ベンチマークであるAlleNoiseについて、有意義で階層的な分類法を補完する。ノイズの分布は、主要なeコマースマーケットプレースの実際のユーザから来ており、人間のミスの意味を現実的に反映している。ノイズラベルに加えて、典型的にはフィールドで使用されるWebスクラッドデータセットとは異なり、ノイズ分布に関する深い洞察を得るのに役立つ、人間検証されたクリーンラベルも提供します。このような実環境騒音に対処するには,雑音ラベルを用いた学習方法の代表的選択が不十分であることを示す。さらに,これらのアルゴリズムが過剰な暗記を緩和しないことを示す。そこで、AlleNoiseでは、テキスト分類タスクにおける実世界のラベルノイズを処理できるラベルノイズ法の開発のために、バーを高く設定する。コードとデータセットはhttps://github.com/allegro/AlleNoise.comからダウンロードできる。 Label noise remains a challenge for training robust classification models. Most methods for mitigating label noise have been benchmarked using primarily datasets with synthetic noise. While the need for datasets with realistic noise distribution has partially been addressed by web-scraped benchmarks such as WebVision and Clothing1M, those benchmarks are restricted to the computer vision domain. With the growing importance of Transformer-based models, it is crucial to establish text classification benchmarks for learning with noisy labels. In this paper, we present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise, containing over 500,000 examples across approximately 5,600 classes, complemented with a meaningful, hierarchical taxonomy of categories. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. In addition to the noisy labels, we provide human-verified clean labels, which help to get a deeper insight into the noise distribution, unlike web-scraped datasets typically used in the field. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise. In addition, we show evidence that these algorithms do not alleviate excessive memorization. As such, with AlleNoise, we set the bar high for the development of label noise methods that can handle real-world label noise in text classification tasks. The code and dataset are available for download at https://github.com/allegro/AlleNoise.	翻訳日:2024-11-08 21:21:36 公開日:2024-10-23
# 99の問題があるが、FLOPSは一つではない I've Got 99 Problems But FLOPS Ain't One ( http://arxiv.org/abs/2407.12819v2 ) ライセンス: Link先を確認	Alexandru M. Gherghescu, Vlad-Andrei Bădoiu, Alexandru Agache, Mihai-Valentin Dumitru, Iuliu Vasilescu, Radu Mantu, Costin Raiciu,	(参考訳) ハイパースケーラは大規模なネットワーク展開の状況を支配していますが、直面する課題に関するデータや洞察を共有することはめったにありません。この優位性を考慮して、この分野で解決すべき問題は何か? 私たちは、機械学習アプリケーションのための1000億ドルのデータセンターを構築するための公開計画から始まり、関連する研究方向を見つけるために、従来からあるアプローチを取っています。法律をスケールする言語モデルを活用することで、データセンターのようなワークロードが持つものを見つけ、ネットワーク研究に焦点をあてて、その上で直面する課題を探求します。我々は、データセンターの構築とそのようなモデルの訓練は技術的に可能であると結論づけるが、これはDC間通信のための新しい広域輸送、マルチパストランスポート、データセンター内通信のための新しいデータセンタートポロジ、高速スケールアップネットワークおよびトランスポート、ネットワークコミュニティのための豊富な研究課題を概説する。 Hyperscalers dominate the landscape of large network deployments, yet they rarely share data or insights about the challenges they face. In light of this supremacy, what problems can we find to solve in this space? We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications. Leveraging the language models scaling laws, we discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies for intra-datacenter communication, high speed scale-up networks and transports, outlining a rich research agenda for the networking community.	翻訳日:2024-11-08 20:25:29 公開日:2024-10-23
# 想像の技:少数のデモから長い水平操作課題を学習する The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations ( http://arxiv.org/abs/2407.13432v2 ) ライセンス: Link先を確認	Jan Ole von Hartz, Tim Welschehold, Abhinav Valada, Joschka Boedecker,	(参考訳) Task Parametrized Gaussian Mixture Models (TP-GMM) は、オブジェクト中心のロボット操作タスクを学習するためのサンプル効率のよい手法である。しかし、TP-GMMの適用にはいくつかのオープンな課題がある。本研究では, 相乗的に3つの重要な課題に取り組む。第一に、エンドエフェクタ速度は非ユークリッドであり、したがって標準GMMを用いたモデリングは困難である。そこで本研究では,ロボットのエンドエフェクタ速度をその方向と大きさに分解し,リーマンGMMを用いてモデル化する。第二に、複雑な実演軌跡のセグメンテーションとシーケンシャルスキルに分解速度を利用する。セグメンテーションを通じて、スキルトラジェクトリをさらに整列させ、従って時間を強力な帰納バイアスとして活用する。第3に,視覚的観察からスキル毎のタスクパラメータを自動的に検出する手法を提案する。提案手法は,RGB-D観測のみを用いて,たった5つの実演から複雑な操作タスクを学習することを可能にする。 RLBenchの大規模実験により,20倍の試料効率向上を図った。我々のポリシーは様々な環境、オブジェクトインスタンス、オブジェクトの位置にまたがって一般化され、学習スキルは再利用されます。 Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.	翻訳日:2024-11-08 20:14:30 公開日:2024-10-23
# 想像の技:少数のデモから長い水平操作課題を学習する The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations ( http://arxiv.org/abs/2407.13432v3 ) ライセンス: Link先を確認	Jan Ole von Hartz, Tim Welschehold, Abhinav Valada, Joschka Boedecker,	(参考訳) Task Parametrized Gaussian Mixture Models (TP-GMM) は、オブジェクト中心のロボット操作タスクを学習するためのサンプル効率のよい手法である。しかし、TP-GMMの適用にはいくつかのオープンな課題がある。本研究では, 相乗的に3つの重要な課題に取り組む。第一に、エンドエフェクタ速度は非ユークリッドであり、したがって標準GMMを用いたモデリングは困難である。そこで本研究では,ロボットのエンドエフェクタ速度をその方向と大きさに分解し,リーマンGMMを用いてモデル化する。第二に、複雑な実演軌跡のセグメンテーションとシーケンシャルスキルに分解速度を利用する。セグメンテーションを通じて、スキルトラジェクトリをさらに整列させ、従って時間を強力な帰納バイアスとして活用する。第3に,視覚的観察からスキル毎のタスクパラメータを自動的に検出する手法を提案する。提案手法は,RGB-D観測のみを用いて,たった5つの実演から複雑な操作タスクを学習することを可能にする。 RLBenchの大規模実験により,20倍の試料効率向上を図った。我々のポリシーは様々な環境、オブジェクトインスタンス、オブジェクトの位置にまたがって一般化され、学習スキルは再利用されます。 Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.	翻訳日:2024-11-08 20:14:30 公開日:2024-10-23
# 言語リワードモデルのための目標条件付き表現の学習 Learning Goal-Conditioned Representations for Language Reward Models ( http://arxiv.org/abs/2407.13887v2 ) ライセンス: Link先を確認	Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx,	(参考訳) 従来の強化学習(RL)では,オフラインデータや自己教師対象による表現の改善を学習する技術が目覚ましい成果を上げている。それでも、表現学習の改善が、言語モデル(LM)における人間からのフィードバック(RLHF)からの強化学習にどのような効果があるかは明らかでない。本研究は、サンプル化された好ましくない軌道に沿った将来の状態の表現類似度を高め、ランダムにサンプリングされた非推奨軌道に沿った類似度を減少させることにより、対照的に$\textit{goal-conditioned}$ファッションのトレーニング報酬モデル(RM)を提案する。この目的により、MATHやGSM8kといった挑戦的なベンチマークにおいて、RM性能は最大0.09 AUROCまで大幅に向上した。これらの結果は、Helpful-Harmlessデータセット上の一般的なアライメントにも及んでいる。報酬モデルのパフォーマンスの改善以外にも、このRM表現のトレーニング方法により、$\textit{steerability}$の改善が可能となる。この洞察を活用すれば、過半数投票中に生成したトークンの最大5,5\%をフィルタして、トラジェクトリを破棄して、結果として“誤った”状態に陥り、コストを大幅に削減できることが分かります。さらに、これらの表現は、希望する将来の目標状態に条件付けすることで、きめ細かい制御を行うことができる。例えば、Llama 3モデルを有用な世代に向けて操ることで、教師付き微調整トレーニングベースラインよりも9.6\%の利便性が向上することを示す。同様に、複雑な世代に向けてモデルをステアリングすることで、ベースラインよりも21.6\%の複雑さが向上する。全体として、この対照的な目標条件の方法でのRMのトレーニングは、性能を大幅に改善し、モデルステアビリティを実現している。 Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive, $\textit{goal-conditioned}$ fashion by increasing the representation similarity of future states along sampled preferred trajectories and decreasing the similarity along randomly sampled dispreferred trajectories. This objective significantly improves RM performance by up to 0.09 AUROC across challenging benchmarks, such as MATH and GSM8k. These findings extend to general alignment as well -- on the Helpful-Harmless dataset, we observe $2.3\%$ increase in accuracy. Beyond improving reward model performance, we show this way of training RM representations enables improved $\textit{steerability}$ because it allows us to evaluate the likelihood of an action achieving a particular goal-state (e.g., whether a solution is correct or helpful). Leveraging this insight, we find that we can filter up to $55\%$ of generated tokens during majority voting by discarding trajectories likely to end up in an "incorrect" state, which leads to significant cost savings. We additionally find that these representations can perform fine-grained control by conditioning on desired future goal-states. For example, we show that steering a Llama 3 model towards helpful generations with our approach improves helpfulness by $9.6\%$ over a supervised-fine-tuning trained baseline. Similarly, steering the model towards complex generations improves complexity by $21.6\%$ over the baseline. Overall, we find that training RMs in this contrastive, goal-conditioned fashion significantly improves performance and enables model steerability.	翻訳日:2024-11-08 20:01:00 公開日:2024-10-23
# 量子電磁場によるスピン重ね合わせ状態のデコヒーレンス Decoherence of spin superposition state caused by a quantum electromagnetic field ( http://arxiv.org/abs/2407.14581v2 ) ライセンス: Link先を確認	Kensuke Gallock-Yoshimura, Yuuki Sugiyama, Akira Matsumura, Kazuhiro Yamamoto,	(参考訳) 本研究では、ミンコフスキー時空における相対論的量子電磁場の存在下で、空間的に重畳された電気的中性スピン-$\frac12$粒子のデコヒーレンスについて検討する。スピン磁場結合によるデコヒーレンスを, 重畳軌道の各分岐に沿った2点相関関数から生じる局所的デコヒーレンスと, 重畳軌道間の相関関数から生じる非局所的デコヒーレンスに分類できることを示した。これらの効果は位相減衰と振幅減衰と関連している。また、量子場が熱状態で準備されている場合、デコヒーレンスは磁場温度とともに単調に増大することを示した。 In this study, we investigate the decoherence of a spatially superposed electrically neutral spin-$\frac12$ particle in the presence of a relativistic quantum electromagnetic field in Minkowski spacetime. We demonstrate that decoherence due to the spin-magnetic field coupling can be categorized into two distinct factors: local decoherence, originating from the two-point correlation functions along each branch of the superposed trajectories, and nonlocal decoherence, which arises from the correlation functions between the two superposed trajectories. These effects are linked to phase damping and amplitude damping. We also show that if the quantum field is prepared in a thermal state, decoherence monotonically increases with the field temperature.	翻訳日:2024-11-08 19:27:32 公開日:2024-10-23
# GPHM : 単眼頭アバター再建のためのガウスパラメトリック頭部モデル GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction ( http://arxiv.org/abs/2407.15070v2 ) ライセンス: Link先を確認	Yuelang Xu, Zhaoqi Su, Qingyao Wu, Yebin Liu,	(参考訳) 高忠実度3D人間の頭部アバターの作成は、VR/AR、デジタル人間、映画製作における応用に不可欠である。近年の進歩は、変形可能な顔モデルを利用して、容易にアクセス可能なデータからアニメーションヘッドアバターを生成し、低次元パラメトリック空間内の様々なアイデンティティと表現を表現している。しかし、既存の手法は、例えばヘアスタイルのような複雑な外観の詳細をモデル化するのに苦労し、レンダリング品質と効率の低下に悩まされることが多い。本稿では,人間の頭部の複雑さを正確に表現するために,3次元ガウス的パラメトリック頭部モデル(3D Gaussian Parametric Head Model)を提案する。ガウスモデルは複雑な詳細を扱うことができ、様々な外観や複雑な表現の現実的な表現を可能にする。さらに、スムーズな収束を保証するために、よく設計されたトレーニングフレームワークを提示し、リッチコンテンツを学ぶための堅牢な保証を提供する。提案手法は,高画質でリアルタイムな実写レンダリングを実現し,パラメトリックヘッドモデルの分野に有意義な貢献をする。最後に、3Dガウスパラメトリックヘッドモデルをモノクロビデオや数発の頭部アバター再構成タスクに適用し、入力データが極端に制限された場合でも高品質な3Dヘッドアバターの即時再構築を可能にする。 Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.	翻訳日:2024-11-08 15:56:37 公開日:2024-10-23
# Conditional Language Policy: ステアブルな多目的ファインタニングのための汎用フレームワーク Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning ( http://arxiv.org/abs/2407.15762v2 ) ライセンス: Link先を確認	Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent,	(参考訳) リワードベースの微調整は、言語ポリシーを意図した行動(創造性と安全性など)と整合させることに不可欠である。重要な課題は、複数の(競合する)目標を柔軟かつ効率的な方法でトレードオフする、ステアブル言語モデルを開発することである。本稿では,多目的言語モデルを微調整するための一般的なフレームワークである条件言語政策(CLP)について述べる。マルチタスクトレーニングとパラメータ効率の微調整のテクニックに基づいて、CLPは推論時に競合する目標を効果的にトレードオフするステアブルモデルを学習する。特に、目標間の異なるトレードオフを達成するために、トレーニングや複数のモデルのメンテナンスは必要ありません。 CLPは2つの要約データセットに関する広範な実験と改善を通じて,多目的ファインタニングにおける既存のアプローチを上回り,Paretoが優位に立つステアブル言語モデルを学習していることを示す。 Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning.	翻訳日:2024-11-08 15:45:25 公開日:2024-10-23
# あらゆる場所を操作するための学習:強化学習のための視覚的一般化可能なフレームワーク Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning ( http://arxiv.org/abs/2407.15815v2 ) ライセンス: Link先を確認	Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu,	(参考訳) 多様なオープンワールドシナリオで動作するための一般化機能を備えたビジュモータロボットを開発できるだろうか? 本稿では,視覚的強化学習に適した一般化可能なフレームワークである「textbf{Maniwhere}」を提案する。具体的には,Spatial Transformer Network (STN) モジュールと融合した多視点表現学習手法を提案する。さらに、カリキュラムベースのランダム化と拡張手法を用いて、RLトレーニングプロセスの安定化と視覚一般化能力の向上を図る。 Maniwhereの有効性を示すために、3つのハードウェアプラットフォームにまたがる強力な視覚的一般化とsim2realトランスファー能力を示すために、明瞭なオブジェクト、バイマニュアル、デクスタスな手操作タスクを含む8つのタスクを慎重に設計した。実験の結果,Maniwhereは既存の最先端手法を著しく上回っていることがわかった。ビデオはhttps://gemcollector.github.io/maniwhere/で公開されている。 Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://gemcollector.github.io/maniwhere/.	翻訳日:2024-11-08 15:45:25 公開日:2024-10-23
# 視覚テキストの理解と生成の調和 Harmonizing Visual Text Comprehension and Generation ( http://arxiv.org/abs/2407.16364v2 ) ライセンス: Link先を確認	Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie,	(参考訳) 本研究では,視覚テキストの理解と生成に熟練した,統一的で汎用的な多モード生成モデルであるTextHarmonyを提案する。画像とテキストを同時に生成すると、視覚と言語モダリティの固有の矛盾により、パフォーマンスが劣化する。この課題を克服するため、既存のアプローチでは、個別のモデルインスタンスを必要とする、教師付き微調整のためのモダリティ固有のデータを活用している。マルチモーダル生成空間を部分的に分離し,モダリティ特化およびモダリティ非依存のLoRAエキスパートを動的に集約するSlide-LoRAを提案する。 Slide-LoRAは特異モデルインスタンス内の視覚と言語の生成を調和させ、より統一された生成プロセスを促進する。さらに,高品質な画像キャプションデータセットであるDetailedTextCaps-100Kを開発した。様々なベンチマークによる総合的な実験により,提案手法の有効性が示された。 Slide-LoRAにインスパイアされたTextHarmonyは、パラメータがわずか2%増加し、ビジュアルテキスト理解タスクが2.5%、ビジュアルテキスト生成タスクが4.0%改善した。本研究は,視覚テキスト領域におけるマルチモーダル生成への統合的アプローチの実現可能性について述べる。コードはhttps://github.com/bytedance/TextHarmonyで入手できる。 In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervised fine-tuning, necessitating distinct model instances. We propose Slide-LoRA, which dynamically aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space. Slide-LoRA harmonizes the generation of vision and language within a singular model instance, thereby facilitating a more unified generative process. Additionally, we develop a high-quality image caption dataset, DetailedTextCaps-100K, synthesized with a sophisticated closed-source MLLM to enhance visual text generation capabilities further. Comprehensive experiments across various benchmarks demonstrate the effectiveness of the proposed approach. Empowered by Slide-LoRA, TextHarmony achieves comparable performance to modality-specific fine-tuning results with only a 2% increase in parameters and shows an average improvement of 2.5% in visual text comprehension tasks and 4.0% in visual text generation tasks. Our work delineates the viability of an integrated approach to multimodal generation within the visual text domain, setting a foundation for subsequent inquiries. Code is available at https://github.com/bytedance/TextHarmony.	翻訳日:2024-11-08 15:34:26 公開日:2024-10-23
# 学習からスピン"ペン"への教訓 Lessons from Learning to Spin "Pens" ( http://arxiv.org/abs/2407.18902v2 ) ライセンス: Link先を確認	Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang,	(参考訳) ハンマーやスクリュードライバーのような道具も同じような形をしているので、ペンのような物体を手作業で操作することは私たちの日常生活にとって重要なスキルです。しかし,従来の学習手法では,高品質な実演が欠如し,シミュレーションと実世界の間に大きなギャップがあるため,この課題に苦慮している。本研究では,ペンのような物体を回転させる能力を示すことによって,学習に基づく手動操作システムの境界を推し進める。まず、強化学習を用いて、特権情報でオラクルポリシーを訓練し、シミュレーションにおいて高忠実度軌道データセットを生成する。これは2つの目的がある。 1) シミュレーションにおける感覚士政策の事前学習 2) 実世界におけるオープンループ軌道再生の実施。次に、これらの実世界の軌道を用いて感覚運動のポリシーを微調整し、実世界の力学に適応させる。 50個未満の軌道で、我々のポリシーは、複数の革命のために異なる物理的特性を持つ10個以上のペンのような物体を回転させることを学ぶ。デザイン選択の包括的分析を行い、開発中に学んだ教訓を共有します。 In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.	翻訳日:2024-11-08 14:50:05 公開日:2024-10-23
# 深部ニューラルネットワークにおける特徴学習のバネブロック理論 A spring-block theory of feature learning in deep neural networks ( http://arxiv.org/abs/2407.19353v2 ) ライセンス: Link先を確認	Cheng Shi, Liming Pan, Ivan Dokmanić,	(参考訳) 特徴学習深層ネットは、定期的に低次元の幾何学にデータを徐々に崩壊させる。この現象は、非線形性、ノイズ、学習率、および力学を形成する他の選択の集合的作用からどのように生じるかは、顕微鏡神経力学から構築された第一原理理論を解明した。浅い層や深い層がより効果的に学習するレシエーションを識別するノイズ非線形位相図を示す。次に、図を再現するマクロ力学的理論を提案し、なぜいくつかのDNNが遅延でアクティブなのかを説明し、層をまたいだ特徴学習と一般化をリンクする。 Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively. We then propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-23
# AI生成画像検出のためのCLIPの逆ロバスト性探索 Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection ( http://arxiv.org/abs/2407.19553v2 ) ライセンス: Link先を確認	Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, Luisa Verdoliva,	(参考訳) 近年、AI生成画像の検出や悪意のある目的での使用を防止するために、多くの法医学的検知器が提案されている。畳み込みニューラルネットワーク(CNN)はこの分野で長い間支配的なアーキテクチャであり、激しい研究の対象となっている。しかし、最近提案されたTransformerベースの検出器は、特に一般化の点において、CNNベースの検出器と一致するか、さらに優れていることが示されている。本稿では,視覚変換器 (ViT) のバックボーンに依存するコントラスト言語-画像事前学習 (CLIP) 法に着目し,その性能をCNN法と比較し,AI生成画像検出器の対角的ロバスト性について検討する。種々の条件下で異なる敵攻撃に対するロバスト性について検討し、数値結果と周波数領域パターンの両方を解析する。 CLIPベースの検出器は、CNNベースの検出器と同様に、ホワイトボックス攻撃に対して脆弱である。しかし、攻撃はCNNベースのメソッドとCLIPベースのメソッド間で簡単に伝達できない。また、周波数領域における逆方向雑音パターンの異なる分布により、このことが確認される。全体として、この分析はより効果的な戦略を開発するのに役立つ法医学的検出器の特性に関する新たな洞察を提供する。 In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.	翻訳日:2024-11-08 14:27:29 公開日:2024-10-23
# タスクプロンプトベクトル:マルチタスクソフトプロンプト転送による効果的な初期化 Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer ( http://arxiv.org/abs/2408.01119v2 ) ライセンス: Link先を確認	Robert Belanec, Simon Ostermann, Ivan Srba, Maria Bielikova,	(参考訳) プロンプトチューニングは、大きな言語モデル(LLM)をトレーニングするための効率的なソリューションである。しかし、現在のソフトプロンプトベースの手法は、しばしばマルチタスクのモジュラリティを犠牲にし、新たに追加されたタスクごとにトレーニングプロセスを完全にあるいは部分的に繰り返す必要がある。タスクベクトルに関する最近の研究は、望まれるマルチタスク性能を達成するために、フルモデルウェイトに算術演算を適用しているが、ソフトプロンプトに対する同様のアプローチはいまだに欠落している。そこで本研究では,調整したソフトプロンプトの重みとランダム初期化との要素的差異から生成したタスクプロンプトベクトルを提案する。 12個のNLUデータセットの実験結果から、タスクプロンプトベクトルを低リソース設定で使用して、類似タスクのプロンプトチューニングを効果的に初期化できることが示されている。さらに、タスクプロンプトベクトルは、2つの異なる言語モデルアーキテクチャ上でのプロンプトチューニングのランダム初期化とは無関係であることを示す。これにより、異なるタスクから事前訓練されたベクトルで即時算術を行うことができる。このようにして、複数のタスクからタスクプロンプトベクトルを算術的に加算することで、最先端のベースラインと競合する代替手段を提供する。 Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning on 2 different language model architectures. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, we provide a competitive alternative to state-of-the-art baselines by arithmetic addition of task prompt vectors from multiple tasks.	翻訳日:2024-11-08 13:18:17 公開日:2024-10-23
# MoC-System:Sparse Mixture-of-Experts Model Trainingのための効率的なフォールトトレランス MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training ( http://arxiv.org/abs/2408.04307v2 ) ライセンス: Link先を確認	Weilin Cai, Le Qin, Jiayi Huang,	(参考訳) 大きな言語モデルがスケールアップを続けるにつれて、分散トレーニングシステムは10kノードを超えて拡張され、フォールトトレランスの重要性が増している。チェックポイントは耐障害性の主要な戦略として現れ、その効率を最適化するための広範な研究がなされている。しかし,Sparse Mixture-of-Experts (MoE) モデルの出現は,高密度モデルに対する計算要求に匹敵するものの,モデルサイズが大幅に増加するために新たな課題を提起する。本研究では,分散学習システムで発生する多数のチェックポイントシャードをオーケストレーションするMixture-of-Checkpoint System (MoC-System)を提案する。 MoC-Systemは、新しい部分エキスパートチェックポイント機構(PEC)を特徴としている。これはアルゴリズムシステムの共同設計で、選択した専門家のサブセットを戦略的に保存し、MoEチェックポイントのサイズを高密度モデルに匹敵するレベルに効果的に縮小する。ハイブリッド並列戦略を取り入れたMoC-Systemは、分散階級間でワークロードを均等に分散する完全シャードなチェックポイント戦略を含む。さらに、MoC-Systemはメモリ内スナップショットと永続化プロセスを非同期に処理する2段階のチェックポイント管理手法を導入している。 We build MoC-System on the Megatron-DeepSpeed framework, achieved a 98.9% down the overhead for each checkpointing process than the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism。さらに,本手法は,下流タスクにおける平均精度が1.08%向上しても,同等のモデル精度を維持しながら効率を向上させることを実証的分析により実証した。 As large language models continue to scale up, distributed training systems have expanded beyond 10k nodes, intensifying the importance of fault tolerance. Checkpoint has emerged as the predominant fault tolerance strategy, with extensive studies dedicated to optimizing its efficiency. However, the advent of the sparse Mixture-of-Experts (MoE) model presents new challenges due to the substantial increase in model size, despite comparable computational demands to dense models. In this work, we propose the Mixture-of-Checkpoint System (MoC-System) to orchestrate the vast array of checkpoint shards produced in distributed training systems. MoC-System features a novel Partial Experts Checkpointing (PEC) mechanism, an algorithm-system co-design that strategically saves a selected subset of experts, effectively reducing the MoE checkpoint size to levels comparable with dense models. Incorporating hybrid parallel strategies, MoC-System involves fully sharded checkpointing strategies to evenly distribute the workload across distributed ranks. Furthermore, MoC-System introduces a two-level checkpointing management method that asynchronously handles in-memory snapshots and persistence processes. We build MoC-System upon the Megatron-DeepSpeed framework, achieving up to a 98.9% reduction in overhead for each checkpointing process compared to the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism. Additionally, extensive empirical analyses substantiate that our methods enhance efficiency while maintaining comparable model accuracy, even achieving an average accuracy increase of 1.08% on downstream tasks.	翻訳日:2024-11-08 12:22:45 公開日:2024-10-23
# 異常予測: 明示的遅延と水平による新しいアプローチ Anomaly Prediction: A Novel Approach with Explicit Delay and Horizon ( http://arxiv.org/abs/2408.04377v3 ) ライセンス: Link先を確認	Jiang You, Arben Cela, René Natowicz, Jacob Ouanounou, Patrick Siarry,	(参考訳) 時系列データの異常検出は、さまざまな領域において重要な課題である。従来の手法では、通常、後続のステップで異常を識別することに集中しており、多くの場合、遅延時間や異常の地平線といった時間的ダイナミクスの重要性を過小評価している。本稿では,時系列異常予測に時間的情報を直接組み込んだ新しい手法を提案する。本稿では,このアプローチの評価と,いくつかの最先端手法を用いた包括的実験を行うために設計された新しいデータセットを提案する。本研究は, 時間的, 正確な異常予測を行う上で, 提案手法の有効性を実証し, 今後の研究のための新しいベンチマークを設定した。 Anomaly detection in time series data is a critical challenge across various domains. Traditional methods typically focus on identifying anomalies in immediate subsequent steps, often underestimating the significance of temporal dynamics such as delay time and horizons of anomalies, which generally require extensive post-analysis. This paper introduces a novel approach for time series anomaly prediction, incorporating temporal information directly into the prediction results. We propose a new dataset specifically designed to evaluate this approach and conduct comprehensive experiments using several state-of-the-art methods. Our results demonstrate the efficacy of our approach in providing timely and accurate anomaly predictions, setting a new benchmark for future research in this field.	翻訳日:2024-11-08 12:22:45 公開日:2024-10-23
# 固定予算ベイズ型ベストアーム識別のためのUCB探索 UCB Exploration for Fixed-Budget Bayesian Best Arm Identification ( http://arxiv.org/abs/2408.04869v2 ) ライセンス: Link先を確認	Rong J. B. Zhu, Yanqi Qiu,	(参考訳) 固定予算設定におけるベストアーム識別(BAI)について検討した。 UCBEのような上位信頼境界(UCB)に基づく適応的アロケーションは、BAIでうまく機能することが知られている。しかし、その最適後悔が理論的にインスタンスに依存していることはよく知られており、これは多くの固定予算のBAI問題においてアーティファクトであることが示されている。本稿では, ベイズ条件下での固定予算BAI問題に対して, 理論的かつ実験的に効率的なUPB探索アルゴリズムを提案する。鍵となる考え方は事前情報を学習することであり、これは累積的後悔の最小化問題において行ったような UCB ベースの BAI アルゴリズムの性能を向上させることができる。我々は、失敗確率とベイズ的BAI問題に対する単純な後悔の限界を確立し、次数 $\tilde{O}(\sqrt{K/n})$ の上限を対数因子まで与え、$n$ は予算を表し、$K$ は武器の数を表す。さらに,本手法が最先端のベースラインを一貫して上回ることを示す実証実験を行った。 We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.	翻訳日:2024-11-08 12:11:36 公開日:2024-10-23
# 固定予算ベイズ型ベストアーム識別のためのUCB探索 UCB Exploration for Fixed-Budget Bayesian Best Arm Identification ( http://arxiv.org/abs/2408.04869v3 ) ライセンス: Link先を確認	Rong J. B. Zhu, Yanqi Qiu,	(参考訳) 固定予算設定におけるベストアーム識別(BAI)について検討した。 UCBEのような上位信頼境界(UCB)に基づく適応的アロケーションは、BAIでうまく機能することが知られている。しかし、その最適後悔が理論的にインスタンスに依存していることはよく知られており、これは多くの固定予算のBAI問題においてアーティファクトであることが示されている。本稿では, ベイズ条件下での固定予算BAI問題に対して, 理論的かつ実験的に効率的なUPB探索アルゴリズムを提案する。鍵となる考え方は事前情報を学習することであり、これは累積的後悔の最小化問題において行ったような UCB ベースの BAI アルゴリズムの性能を向上させることができる。我々は、失敗確率とベイズ的BAI問題に対する単純な後悔の限界を確立し、次数 $\tilde{O}(\sqrt{K/n})$ の上限を対数因子まで与え、$n$ は予算を表し、$K$ は武器の数を表す。さらに,本手法が最先端のベースラインを一貫して上回ることを示す実証実験を行った。 We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.	翻訳日:2024-11-08 12:11:36 公開日:2024-10-23
# 混乱したパイロット:RAGによるLCMの混乱した副次的リスク ConfusedPilot: Confused Deputy Risks in RAG-based LLMs ( http://arxiv.org/abs/2408.04870v5 ) ライセンス: Link先を確認	Ayush RoyChowdhury, Mulong Luo, Prateek Sahu, Sarbartha Banerjee, Mohit Tiwari,	(参考訳) 検索拡張生成(英: Retrieval augmented generation、RAG)とは、大規模言語モデル(LLM)がデータベースから有用な情報を検索し、応答を生成するプロセスである。日常業務の企業環境では人気が高まっている。例えば、Copilot for Microsoft 365は数百万のビジネスを蓄積している。しかしながら、RAGベースのシステムを採用する際のセキュリティへの影響は明らかでない。本稿では、Copilotを混乱させ、その応答に完全性と機密性を侵害するRAGシステムのセキュリティ脆弱性のクラスであるConfusedPilotを紹介する。まず、RAGの修正プロンプトに悪意のあるテキストを埋め込んだ脆弱性を調査し、LSMが生成した応答を破損させる。第2に、シークレットデータを漏洩する脆弱性を示し、検索時にキャッシュ機構を活用する。第3に,企業内の誤情報を伝播し,最終的に販売や製造といった業務に影響を及ぼすために,両脆弱性をどのように活用するかを検討する。また、RAGベースのシステムのアーキテクチャを調べることにより、これらの攻撃の根本原因についても論じる。本研究は,現在のRAGベースのシステムにおけるセキュリティ脆弱性を強調し,今後のRAGベースのシステムを保護するための設計ガイドラインを提案する。 Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear. In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.	翻訳日:2024-11-08 12:11:36 公開日:2024-10-23
# レーザー添加物製造における機械学習を用いたその場監視のための音響・視覚的クロスモーダル知識伝達 Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing ( http://arxiv.org/abs/2408.05307v2 ) ライセンス: Link先を確認	Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao,	(参考訳) レーザー添加物製造(LAM)プロセスにおける異常や欠陥を検出するために、機械学習(ML)に基づく様々なin-situモニタリングシステムを開発した。視覚、オーディオ、その他のモダリティからのデータを統合するマルチモーダルフュージョンは、監視性能を向上させることができるが、複数のセンサータイプを使用することにより、ハードウェア、計算、運用コストも向上する。本稿では,LAMインサイトモニタリングのための相互モダリティ・ナレッジ・トランスファー(CMKT)手法を提案する。 CMKTは、目標モダリティから抽出した特徴の表現性を向上し、予測中のソースモダリティセンサの除去を可能にする。本稿では,意味的アライメント,完全教師付きマッピング,半教師付きマッピングという3つのCMKT手法を提案する。セマンティックアライメント法は、モダリティ間の共有符号化空間を確立し、知識伝達を容易にする。これは、同じグループ(例えば、視覚的および音声的欠陥群)の分布を整合させる意味的アライメント損失と、異なるグループ(例えば、視覚的欠陥と音声的欠陥のないグループ)を区別する分離損失を用いる。この2つのマッピング手法は、完全な教師付きおよび半教師付き学習アプローチを用いて、特徴を1つのモダリティから別のモダリティへと導出することで知識を伝達する。 LAMインサイト欠陥検出のケーススタディでは,CMKT法とマルチモーダル・オーディオ・ビジュアル・フュージョンを比較した。セマンティックアライメント法は、予測フェーズ中に音質を除去しながら98.7%の精度を達成し、これはマルチモーダル融合によって得られる98.2%の精度に匹敵する。説明可能な人工知能を用いて,意味的アライメントCMKTは,モーダル間の固有相関を利用して,雑音を低減しつつ,より代表的な特徴を抽出できることを発見した。 Various machine learning (ML)-based in-situ monitoring systems have been developed to detect anomalies and defects in laser additive manufacturing (LAM) processes. While multimodal fusion, which integrates data from visual, audio, and other modalities, can improve monitoring performance, it also increases hardware, computational, and operational costs due to the use of multiple sensor types. This paper introduces a cross-modality knowledge transfer (CMKT) methodology for LAM in-situ monitoring, which transfers knowledge from a source modality to a target modality. CMKT enhances the representativeness of the features extracted from the target modality, allowing the removal of source modality sensors during prediction. This paper proposes three CMKT methods: semantic alignment, fully supervised mapping, and semi-supervised mapping. The semantic alignment method establishes a shared encoded space between modalities to facilitate knowledge transfer. It employs a semantic alignment loss to align the distributions of identical groups (e.g., visual and audio defective groups) and a separation loss to distinguish different groups (e.g., visual defective and audio defect-free groups). The two mapping methods transfer knowledge by deriving features from one modality to another using fully supervised and semi-supervised learning approaches. In a case study for LAM in-situ defect detection, the proposed CMKT methods were compared with multimodal audio-visual fusion. The semantic alignment method achieved an accuracy of 98.7% while removing the audio modality during the prediction phase, which is comparable to the 98.2% accuracy obtained through multimodal fusion. Using explainable artificial intelligence, we discovered that semantic alignment CMKT can extract more representative features while reducing noise by leveraging the inherent correlations between modalities.	翻訳日:2024-11-08 12:00:35 公開日:2024-10-23
# ランプ素子2分割インピーダンス整合SNAILパラメトリック増幅器 Lumped-element two-section impedance-matched SNAIL parametric amplifier ( http://arxiv.org/abs/2408.06154v2 ) ライセンス: Link先を確認	D. Moskaleva, N. Smirnov, D. Moskalev, A. Ivanov, A. Matanin, D. Baklykov, M. Teleganov, V. Polozov, V. Echeistov, E. Malevannaya, I. Korobenko. A. Kuguk, G. Nikerov, J. Agafonova, I. Rodionov,	(参考訳) 広帯域インピーダンス整合ジョセフソンパラメトリック増幅器は、高忠実度シングルショットマルチキュービット読み出しの鍵となる要素である。現在、いくつかのインピーダンス整合パラメトリック増幅器が提案されており、第1はクロップフェンシュタインテーパーに基づくインピーダンス整合パラメトリック増幅器、第2は補助共振器に基づくインピーダンス整合パラメトリック増幅器である。本稿では,2ユニットインピーダンス整合変換器を用いた量子制限型3波混合ラム素子SNAILパラメトリック増幅器を提案する。パラレルプレートコンデンサと超伝導平面コイルに基づく共振器付き2極チェビシェフ整合回路フラックスポンプモードで動作し、600MHzの帯域で平均15dB$、平均飽和電力で平均107dBm$、量子制限ノイズ温度で平均15dB$を実証した。 Broadband impedance-matched Josephson parametric amplifiers are key components for high-fidelity single-shot multi-qubit readout. Nowadays, several types of impedance matched parametric amplifiers have been proposed: the first is an impedance-matched parametric amplifier based on a Klopfenstein taper, and the second is an impedance-matched parametric amplifier based on auxiliary resonators. Here, we present the quantum-limited 3-wave-mixing lumped-element SNAIL parametric amplifier with two-units impedance matching transformer. A two-pole Chebyshev matching network with shunted resonators based on parallel-plate capacitors and superconducting planar coil. Operating in a flux-pumped mode, we experimentally demonstrate an average gain of $15dB$ across a $600MHz$ bandwidth, along with an average saturation power of $-107dBm$ and quantum-limited noise temperature.	翻訳日:2024-11-08 11:38:16 公開日:2024-10-23
# OWL2Vec4OA: オントロジーアライメントのための知識グラフ埋め込みの設計 OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment ( http://arxiv.org/abs/2408.06310v2 ) ライセンス: Link先を確認	Sevinj Teymurova, Ernesto Jiménez-Ruiz, Tillman Weyde, Jiaoyan Chen,	(参考訳) 交差するドメインをカバーするオントロジーの数が増えるにつれて、オントロジーのアライメントはセマンティック・インターオペラビリティの実現に不可欠である。本稿では、オントロジー埋め込みシステムOWL2Vecの拡張であるOWL2Vec4OAを提案する。 OWL2Vecは、オントロジー埋め込みの強力なテクニックとして登場したが、現在、オントロジーアライメントタスクへの埋め込みを調整するためのメカニズムが欠如している。 OWL2Vec4OAは、種子マッピングからのエッジ信頼値を組み込んでランダムウォーク戦略を導出する。本稿では,提案する拡張の理論的基礎,実装の詳細,および実験的評価を行い,オントロジーアライメントタスクの有効性を実証する。 Ontology alignment is integral to achieving semantic interoperability as the number of available ontologies covering intersecting domains is increasing. This paper proposes OWL2Vec4OA, an extension of the ontology embedding system OWL2Vec. While OWL2Vec has emerged as a powerful technique for ontology embedding, it currently lacks a mechanism to tailor the embedding to the ontology alignment task. OWL2Vec4OA incorporates edge confidence values from seed mappings to guide the random walk strategy. We present the theoretical foundations, implementation details, and experimental evaluation of our proposed extension, demonstrating its potential effectiveness for ontology alignment tasks.	翻訳日:2024-11-08 11:26:46 公開日:2024-10-23
# クラスバリアを壊す:クラス間特徴補償器による効率的なデータセット蒸留 Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator ( http://arxiv.org/abs/2408.06927v2 ) ライセンス: Link先を確認	Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou,	(参考訳) データセット蒸留は、大規模で自然なデータセットから情報的特徴をコンパクトで合成的な形式に凝縮する技術として登場した。近年の進歩により、この技術は洗練されているが、その性能は一般的なクラス固有の合成パラダイムによってボトルネックになっている。このパラダイムの下では、合成データは事前に割り当てられた1ホットラベルにのみ最適化され、特徴凝縮における暗黙のクラスバリアを生成する。このことは, 蒸留予算の非効率利用, クラス間特徴分布の監視につながり, 最終的に有効性と効率を損なう結果となった。これらの制約を克服するために,本論文では,現在のデータセット蒸留法で広く利用されているクラス固有のデータラベルフレームワークを超越した,革新的な蒸留手法であるInter-class Feature Compensator (INFER)を提案する。特に、INFERはUniversal Feature Compensator (UFC)を活用して、クラス間の機能統合を強化し、単一のUFC入力から複数の合成インスタンスを生成する。これにより蒸留予算の効率が大幅に向上する。さらに、INFERは、蒸留中のクラス間相互作用を強化し、蒸留データの有効性と一般化性を高める。従来のデータセットと同様のラベルの線形補間を可能にすることにより、INFERは、合成データを厳密に最適化し、合成データセットのソフトラベルのサイズをほぼゼロに減らし、データセットの蒸留における効率と有効性のための新しいベンチマークを確立する。 Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation.	翻訳日:2024-11-08 07:53:35 公開日:2024-10-23
# 準金属SWCNTにおける量子輸送ストレイントロニクスとメカニカルアハロノフ・ボーム効果 Quantum Transport Straintronics and Mechanical Aharonov-Bohm Effect in Quasi-metallic SWCNTs ( http://arxiv.org/abs/2408.10355v2 ) ライセンス: Link先を確認	L. Huang, G. Wei, A. R. Champagne,	(参考訳) 単層カーボンナノチューブ(SWCNT)は、原子的に精密なエッジを持つ2次元材料の効果的に狭いリボンである。量子輸送ストレトロニクス(QTS)、すなわち量子輸送を制御するための機械的ひずみを利用するのに理想的なシステムである。彼らの大きなサブバンドエネルギー間隔($ 0.8 eV)は、単一の量子輸送チャネルを持つトランジスタにつながる。一軸拘束された準金属-SWCNTトランジスタにおけるQTSの研究に応用モデルを適用した。デバイスパラメータは既存の実験プラットフォームに基づいており、チャネル長は$L=$50 nm、直径は$d\approx$1.5 nm、歪は$\varepsilon_{\text{tot}}\approx$7$\%である。電荷キャリアの伝搬角 $\Theta$ が $\varepsilon_{\text{tot}}$ で完全に調整可能であることを示す。 $\Theta$が90$^o$に達すると、コンダクタンス$G$は完全に抑制される。ひずみ発生バンドギャップは、$\approx$400 meVまで調整できる。機械ひずみはスカラー$\phi_{\varepsilon}$とベクトル$\textbf{A}$ゲージポテンシャルをトランジスタのハミルトニアンに付加する。これらのポテンシャルは、メカニカルなアハロノフ・ボーム効果として記述できる、$G$の量子干渉のスペクトルを豊富に生成する。電荷キャリアの量子相は純粋に機械的な手段で制御できる。例えば、フル2$\pi$の位相シフトは(12,9)チューブで0.7$\%のひずみ変化によって引き起こされる。この研究は、2D材料とそのナノチューブに基づく量子技術のツールボックスに定量的な量子輸送ひずみ効果を加える機会を開く。 Single-wall carbon nanotubes (SWCNTs) are effectively narrow ribbons of 2D materials with atomically precise edges. They are ideal systems to harness quantum transport straintronics (QTS), i.e. using mechanical strain to control quantum transport. Their large subband energy spacing ($\sim$ 0.8 eV) leads to transistors with a single quantum transport channel. We adapt an applied model to study QTS in uniaxially-strained quasi-metallic-SWCNT transistors. The device parameters are based on an existing experimental platform, with channel lengths of $L=$ 50 nm, diameters $d\approx$ 1.5 nm, and strains up to $\varepsilon_{\text{tot}}\approx$ 7 $\%$. We demonstrate that the charge carrier's propagation angle $\Theta$ is fully tunable with $\varepsilon_{\text{tot}}$. When $\Theta$ reaches 90$^o$, the conductance $G$ is completely suppressed. A strain-generated band gap can be tuned up to $\approx$ 400 meV. Mechanical strain adds both scalar $\phi_{\varepsilon}$ and vector $\textbf{A}$ gauge potentials to the transistor's Hamiltonian. These potentials create a rich spectrum of quantum interferences in $G$, which can be described as a mechanical Aharonov-Bohm effect. The charge carriers' quantum phase can be controlled by purely mechanical means. For instance, a full 2$\pi$ phase shift can be induced in a (12,9) tube by a 0.7 $\%$ strain change. This work opens opportunities to add quantitative quantum transport strain effects to the tools box of quantum technologies based on 2D materials and their nanotubes.	翻訳日:2024-11-08 06:44:48 公開日:2024-10-23
# KeySpace:惑星間ネットワークにおける公開鍵インフラストラクチャの考察 KeySpace: Public Key Infrastructure Considerations in Interplanetary Networks ( http://arxiv.org/abs/2408.10963v2 ) ライセンス: Link先を確認	Joshua Smailes, Sebastian Köhler, Simon Birnbach, Martin Strohmeier, Ivan Martinovic,	(参考訳) 衛星ネットワークが拡大し、惑星間通信を取り入れ始めるにつれ、これらの条件下でPKIにアプローチする方法に関する未解決問題への関心が高まっている。本稿では,メガコンステレーションと惑星間ネットワークに着目し,衛星ネットワークにおける鍵管理システムの実現に向けた目標と要件について検討する。我々は、特定のネットワークトポロジにおいて、システムとシステムを比較するのに使用できる標準化された実験のセットを設計する。これらを用いて、高度に分散した惑星間ネットワークにおいて地球上のPKI技術が実現可能であることを実証し、効率的な低遅延接続を実現するためにPKIシステムを構成でき、効果的な再起動による攻撃の影響を最小限に抑えることができることを示した。我々は,大規模な宇宙ネットワークの効率的なシミュレーションを目的とした新しいネットワークシミュレータであるDeep Space Network Simulator (DSNS) を構築し,これを評価した。広範囲なPKI構成で接続確立とキーの取り消しを評価するシミュレーションを実行する。最後に、OCSP Hybridとリレーノードをファイアウォールとして使用する2つの追加構成オプションを提案し、評価する。これらの組み合わせにより、攻撃者が妥協鍵で到達できるネットワークの範囲を最小化し、惑星間リレーリンクに対する攻撃者の負荷を低減できる。 As satellite networks grow larger and begin to incorporate interplanetary communication, there is an increasing interest in the unsolved problem of how to approach PKI in these conditions. In this paper we explore the goals and requirements for implementing key management systems in satellite networks, focusing on megaconstellations and interplanetary networks. We design a set of standardized experiments which can be used to compare systems against one another for particular network topologies. Using these, we demonstrate that terrestrial PKI techniques are feasible in highly distributed interplanetary networks, showing that it is possible to configure PKI systems to achieve efficient low-latency connection establishment, and minimize the impact of attacks through effective revocations. We evaluate this by building the Deep Space Network Simulator (DSNS), a novel network simulator aimed at efficient simulation of large space networks. We run simulations evaluating connection establishment and key revocation under a wide range of PKI configurations. Finally, we propose and evaluate two additional configuration options: OCSP Hybrid, and the use of relay nodes as a firewall. Together these minimize the extent of the network an attacker can reach with a compromised key, and reduce the attacker's load on interplanetary relay links.	翻訳日:2024-11-08 06:22:37 公開日:2024-10-23
# 機能選択のための大規模言語モデル探索:データ中心の視点 Exploring Large Language Models for Feature Selection: A Data-centric Perspective ( http://arxiv.org/abs/2408.12025v2 ) ライセンス: Link先を確認	Dawei Li, Zhen Tan, Huan Liu,	(参考訳) LLM(Large Language Models)の急速な進歩は様々な領域に大きく影響を与え、例外的な少数ショットとゼロショットの学習能力を活用している。本研究では,データ中心の観点からLLMに基づく特徴選択手法を探索し,理解することを目的としている。まず, LLM を用いた既存の特徴選択手法を, 統計的推測を行うためにサンプルの数値値を必要とするデータ駆動特徴選択と, 記述的文脈を用いた意味的関連付けを行うために LLM の事前知識を利用するテキストベースの特徴選択の2つのグループに分類することから始める。我々は, LLM の分類と回帰作業において, 様々なサイズ (例えば , GPT-4, ChatGPT, LLaMA-2) で実験を行った。本研究は,テキストベースの特徴選択手法の有効性とロバスト性を強調し,実世界の医療応用を用いてその可能性を示す。また,LLMを機能選択に活用する上での課題と今後の可能性についても論じ,この新興分野におけるさらなる研究・開発のための洞察を提供する。 The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, leveraging their exceptional few-shot and zero-shot learning capabilities. In this work, we aim to explore and understand the LLMs-based feature selection methods from a data-centric perspective. We begin by categorizing existing feature selection methods with LLMs into two groups: data-driven feature selection which requires numerical values of samples to do statistical inference and text-based feature selection which utilizes prior knowledge of LLMs to do semantical associations using descriptive context. We conduct experiments in both classification and regression tasks with LLMs in various sizes (e.g., GPT-4, ChatGPT and LLaMA-2). Our findings emphasize the effectiveness and robustness of text-based feature selection methods and showcase their potentials using a real-world medical application. We also discuss the challenges and future opportunities in employing LLMs for feature selection, offering insights for further research and development in this emerging field.	翻訳日:2024-11-08 05:49:00 公開日:2024-10-23
# TensorOpera Router: 効率的なLLM推論のためのマルチモデルルータ TensorOpera Router: A Multi-Model Router for Efficient LLM Inference ( http://arxiv.org/abs/2408.12320v3 ) ライセンス: Link先を確認	Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He,	(参考訳) 様々なドメインにわたる大規模言語モデル(LLM)の急速な成長に伴い、多くの新しいLLMが出現し、それぞれがドメイン固有の専門知識を持っている。この増殖は、高速で高品質で費用対効果の高いLCMクエリ応答方法の必要性を強調している。しかし、このトリレンマを効率的にバランスさせるLLMは存在しない。一部のモデルは強力だが非常に高価であり、他のモデルは高速で安価だが質的に劣る。この課題に対処するために,TO-Routerを提案する。TO-RouterはモノリシックなLLMクエリシステムで,多様なLLM専門家をシームレスに単一のクエリインターフェースに統合し,クエリの要求に応じて,入力クエリを最も高性能なエキスパートに動的にルーティングする。大規模な実験により,TO-Routerは,スタンドアロンのエキスパートモデルと比較してクエリ効率を最大40%向上し,最大30%のコスト削減を実現し,モデル性能を最大10%向上させることを示した。 With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present TO-Router, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, TO-Router improves query efficiency by up to 40\%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.	翻訳日:2024-11-08 05:49:00 公開日:2024-10-23
# 単光子検出器アレイを用いた線形多重光子数分解 Linearly Multiplexed Photon Number Resolving Single-photon Detectors Array ( http://arxiv.org/abs/2408.12345v2 ) ライセンス: Link先を確認	Leonardo Limongi, Francesco Martini, Thu Ha Dao, Alessandro Gaggero, Hamza Hasnaoui, Igor Lopez-Gonzalez, Fabio Chiarello, Fabio de Matteis, Alberto Quaranta, Andrea Salamon, Francesco Mattioli, Martino Bernard, Mirko Lobino,	(参考訳) 光子数分解検出器(英: Photon Number Resolving Detector、PNRD)は、入射光ビームに存在する光子数を測定する装置であり、光を量子レベルで測定し、特徴付けることができる。本稿では, 単一モード導波路上に集積された線形多重光子数分解型単一光子検出器アレイの性能と設計について考察する。本研究は, 種々の条件下でのアレーの忠実度の定義と解析に焦点をあて, 実装のための実用的な設計を提案する。理論的解析と数値シミュレーションにより, 伝搬損失や暗黒数の増加がシステムの性能にどのような影響を及ぼすかを示し, 実用化においてこれらの効果を緩和することの重要性を強調した。 Photon Number Resolving Detectors (PNRDs) are devices capable of measuring the number of photons present in an incident optical beam, enabling light sources to be measured and characterized at the quantum level. In this paper, we explore the performance and design considerations of a linearly multiplexed photon number-resolving single-photon detector array, integrated on a single mode waveguide. Our investigation focus on defining and analyzing the fidelity of such an array under various conditions and proposing practical designs for its implementation. Through theoretical analysis and numerical simulations, we show how propagation losses and dark counts may have a strong impact on the performance of the system and highlight the importance of mitigating these effects in practical implementations.	翻訳日:2024-11-08 05:37:29 公開日:2024-10-23
# 再検討によるプルーニング:CNNとトランスフォーマーの属性最適化 Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers ( http://arxiv.org/abs/2408.12568v2 ) ライセンス: Link先を確認	Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin,	(参考訳) より複雑な問題を解決するために、Deep Neural Networksは数十億のパラメータにスケールされ、膨大な計算コストがかかる。計算要求の削減と効率の向上のための効果的なアプローチは、しばしば過パラメータ化されたネットワークの不要なコンポーネントを創り出すことである。これまでの研究では、eXplainable AIの分野からの帰属法が、最も関係の低いネットワークコンポーネントを数ショットで抽出し、プルークする効果的な手段であることが示された。我々は、刈り取り作業における属性法のハイパーパラメーターを明示的に最適化し、解析にトランスフォーマーベースのネットワークを含めることにより、現在の状態を拡張した。提案手法は,ImageNet分類タスクにおいて高い性能を保ちながら,大規模トランスフォーマーおよび畳み込み型アーキテクチャ(VGG, ResNet, ViT)のモデル圧縮率を向上させる。ここでは, 畳み込みニューラルネットワークと比較して, トランスフォーマーの過度パラメータ化の度合いが高いことを示す。コードはhttps://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorchで入手できる。 To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at https://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorch.	翻訳日:2024-11-08 05:37:29 公開日:2024-10-23
# 通勤予測のための説明可能な階層型都市表現学習 Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction ( http://arxiv.org/abs/2408.14762v3 ) ライセンス: Link先を確認	Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto,	(参考訳) 通勤フロー予測は、現実の自治体の業務に欠かせない課題である。従来の研究では、複数の補助データを用いて都市内における通勤起因決定(OD)需要を推定することは可能であることが明らかになっている。しかし、既存の方法の多くは、維持すべき地理的単位の増加により、都道府県や全国で同様の業務を大規模にこなすには適していない。さらに、地域表現学習は、多様な都市下流課題に対する都市知識獲得のための普遍的なアプローチである。多くの研究者がマルチソースデータから都市単位を記述するための包括的枠組みを開発してきたが、選択した地理的要素の関係は明らかになっていない。さらに、都心部は、都市やその包括地区などの格付け構造を自然に保存しており、都市単位間の関係を解明する必要がある。そこで我々は,複数の空間解像度で有意な領域埋め込みを生成できる不均一なグラフベースモデルを構築し,異なるタイプのODフローを予測する。提案手法の有効性を実証するために,静岡県から収集した実世界の携帯電話データを用いた広範な実験を行った。その結果,提案モデルが一様都市構造の観点から既存モデルより優れていたことが示唆された。モデルの信頼性を高めるために、合理的な説明を用いて予測結果の理解を拡大する。 Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the increased number of geographical units that need to be maintained. In addition, region representation learning is a universal approach for gaining urban knowledge for diverse metropolitan downstream tasks. Although many researchers have developed comprehensive frameworks to describe urban units from multi-source data, they have not clarified the relationship between the selected geographical elements. Furthermore, metropolitan areas naturally preserve ranked structures, like cities and their inclusive districts, which makes elucidating relations between cross-level urban units necessary. Therefore, we develop a heterogeneous graph-based model to generate meaningful region embeddings at multiple spatial resolutions for predicting different types of inter-level OD flows. To demonstrate the effectiveness of the proposed method, extensive experiments were conducted using real-world aggregated mobile phone datasets collected from Shizuoka Prefecture, Japan. The results indicate that our proposed model outperforms existing models in terms of a uniform urban structure. We extend the understanding of predicted results using reasonable explanations to enhance the credibility of the model.	翻訳日:2024-11-08 04:52:58 公開日:2024-10-23
# 通勤予測のための説明可能な階層型都市表現学習 Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction ( http://arxiv.org/abs/2408.14762v4 ) ライセンス: Link先を確認	Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto,	(参考訳) 通勤フロー予測は、現実の自治体の業務に欠かせない課題である。従来の研究では、複数の補助データを用いて都市内における通勤起因決定(OD)需要を推定することは可能であることが明らかになっている。しかし、既存の方法の多くは、維持すべき地理的単位の増加により、都道府県や全国で同様の業務を大規模にこなすには適していない。さらに、地域表現学習は、多様な都市下流課題に対する都市知識獲得のための普遍的なアプローチである。多くの研究者がマルチソースデータから都市単位を記述するための包括的枠組みを開発してきたが、選択した地理的要素の関係は明らかになっていない。さらに、都心部は、都市やその包括地区などの格付け構造を自然に保存しており、都市単位間の関係を解明する必要がある。そこで我々は,複数の空間解像度で有意な領域埋め込みを生成できる不均一なグラフベースモデルを構築し,異なるタイプのODフローを予測する。提案手法の有効性を実証するために,静岡県から収集した実世界の携帯電話データを用いた広範な実験を行った。その結果,提案モデルが一様都市構造の観点から既存モデルより優れていたことが示唆された。モデルの信頼性を高めるために、合理的な説明を用いて予測結果の理解を拡大する。 Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the increased number of geographical units that need to be maintained. In addition, region representation learning is a universal approach for gaining urban knowledge for diverse metropolitan downstream tasks. Although many researchers have developed comprehensive frameworks to describe urban units from multi-source data, they have not clarified the relationship between the selected geographical elements. Furthermore, metropolitan areas naturally preserve ranked structures, like cities and their inclusive districts, which makes elucidating relations between cross-level urban units necessary. Therefore, we develop a heterogeneous graph-based model to generate meaningful region embeddings at multiple spatial resolutions for predicting different types of inter-level OD flows. To demonstrate the effectiveness of the proposed method, extensive experiments were conducted using real-world aggregated mobile phone datasets collected from Shizuoka Prefecture, Japan. The results indicate that our proposed model outperforms existing models in terms of a uniform urban structure. We extend the understanding of predicted results using reasonable explanations to enhance the credibility of the model.	翻訳日:2024-11-08 04:52:58 公開日:2024-10-23
# 部分的フォールトトレラント量子コンピューティングアーキテクチャのためのトロッター時間進化のコンパイル Compilation of Trotter-Based Time Evolution for Partially Fault-Tolerant Quantum Computing Architecture ( http://arxiv.org/abs/2408.14929v2 ) ライセンス: Link先を確認	Yutaro Akahoshi, Riki Toshio, Jun Fujisaki, Hirotaka Oshima, Shintaro Sato, Keisuke Fujii,	(参考訳) 限られた資源で実用的な量子スピードアップを実現することは、学術と工業の両方において重要な課題である。これを解決するために,「時空効率的なアナログ回転量子コンピューティングアーキテクチャ(STARアーキテクチャ)」と呼ばれる部分的にフォールトトレラントな量子コンピューティングアーキテクチャが最近提案されている。このアーキテクチャは、リソース要件の最小化と、普遍的な量子計算に不可欠な非クリフォードゲートの精度の最大化に焦点を当てている。しかし、リピート・アンティル・サクセス(RUS)プロトコルや状態注入のような非決定論的プロセスは、計算オーバーヘッドを著しく引き起こす可能性がある。したがって、効率的なフォールトトレラント演算を用いることで、このオーバーヘッドを最小限に抑えるために論理回路を最適化することが不可欠である。本稿では,STARアーキテクチャの有望な応用である2次元ハバードモデルハミルトンの時間発展をシミュレーションする効率的な手法を提案する。並列インジェクションプロトコルとアダプティブインジェクション領域の更新という2つの手法を提案する。これらを既存のfSWAP手法と統合することにより、2D Hubbardモデルのための効率的なTrotterベースの時間進化演算を開発する。解析の結果, 単純直列コンパイルに比べて10倍以上の高速化が得られた。この最適化されたコンパイルにより、2次元ハバードモデルの量子位相推定に必要な計算資源を推定できる。物理誤差率が$p_{\rm phys} = 10^{-4}$のデバイスの場合、古典計算と比較して8\times 8$ Hubbardモデルよりも高速な基底状態エネルギー推定を実現するために約6.5 \times 10^4$ physical qubitsが必要であると推定する。 Achieving practical quantum speedup with limited resources is a crucial challenge in both academic and industrial communities. To address this, a partially fault-tolerant quantum computing architecture called ``space-time efficient analog rotation quantum computing architecture (STAR architecture)'' has been recently proposed. This architecture focuses on minimizing resource requirements while maximizing the precision of non-Clifford gates, essential for universal quantum computation. However, non-deterministic processes such as the repeat-until-success (RUS) protocol and state injection can introduce significant computational overhead. Therefore, optimizing the logical circuit to minimize this overhead by using efficient fault-tolerant operations is essential. This paper presents an efficient method for simulating the time evolution of the 2D Hubbard model Hamiltonian, a promising application of the STAR architecture. We present two techniques, parallel injection protocol and adaptive injection region updating, to reduce unnecessary time overhead specific to our architecture. By integrating these with the existing fSWAP technique, we develop an efficient Trotter-based time evolution operation for the 2D Hubbard model. Our analysis reveals an acceleration of over 10 times compared to naive serial compilation. This optimized compilation enables us to estimate the computational resources required for quantum phase estimation of the 2D Hubbard model. For devices with a physical error rate of $p_{\rm phys} = 10^{-4}$, we estimate that approximately $6.5 \times 10^4$ physical qubits are required to achieve faster ground state energy estimation of the $8\times8$ Hubbard model compared to classical computation.	翻訳日:2024-11-08 04:52:58 公開日:2024-10-23
# 手動のプロンプト依存性を低減するための幻覚の活用 : 即時セグメンテーション Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation ( http://arxiv.org/abs/2408.15205v2 ) ライセンス: Link先を確認	Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong,	(参考訳) プロンプタブルなセグメンテーションは、典型的には、各所望のオブジェクトのセグメンテーションを導くために、インスタンス固有の手動プロンプトを必要とする。このようなニーズを最小限に抑えるために、単一のタスクジェネリックプロンプトを使用して、異なるオブジェクトの様々なイメージを同じタスクに分割するタスクジェネリックプロンプトが導入された。現在の手法では、MLLM(Multimodal Large Language Models)を使用して、タスクジェネリックプロンプトから詳細なインスタンス固有のプロンプトを推論し、セグメンテーション精度を向上させる。このセグメンテーションの有効性は、これらの導出されたプロンプトの精度に大きく依存する。しかし、MLLMは推論中に幻覚に悩まされ、不正確なプロンプトを引き起こす。既存の手法では、モデルを改善するための幻覚の除去に重点を置いているが、MLLMの幻覚は、個々の画像を超えて訓練済みの大規模知識を表現するため、正しく活用された場合、貴重な文脈的洞察を明らかにすることができると論じている。本稿では,画像からタスク関連情報を抽出するために幻覚を利用し,その精度を検証し,生成したプロンプトの精度を向上させる。具体的には、プロンプト・マスクサイクル生成フレームワーク(ProMaC)をプロンプトジェネレータとマスクジェネレータで導入し、プロンプトジェネレータは、最初はテスト画像上の拡張文脈知識を抽出するための幻覚を探索し、これらの幻覚を正確にインスタンス固有のプロンプトに還元し、マスクジェネレータにマスクセマンティックアライメントによるタスクセマンティクスと整合したマスクを生成するよう指示する。生成されたマスクは、プロンプトジェネレータを反復的にタスク関連の画像領域に集中させ、無関係な幻覚を減らし、より良いプロンプトとマスクを共同で生成する。 5つのベンチマークの実験では、ProMaCの有効性が示されている。 https://lwpyh.github.io/ProMaC/ Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts. Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator.The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image.These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks that are consistent with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks. Experiments on 5 benchmarks demonstrate the effectiveness of ProMaC. Code given in https://lwpyh.github.io/ProMaC/.	翻訳日:2024-11-08 04:41:58 公開日:2024-10-23
# LLaVA-MoD: MoEナレッジ蒸留によるLLaVAタイニー製造 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation ( http://arxiv.org/abs/2408.15881v2 ) ライセンス: Link先を確認	Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang,	(参考訳) LLaVA-MoDは,大規模MLLM(l-MLLM)の知識を抽出することで,小規模マルチモーダル言語モデルの効率的な訓練を可能にする新しいフレームワークである。本手法はMLLM蒸留における2つの基本的な課題に対処する。まず,スパース・ミックス・オブ・エキスパートズ(MoE)アーキテクチャを言語モデルに統合することにより,s-MLLMのネットワーク構造を最適化し,計算効率とモデル表現性のバランスをとる。第2に,包括的知識移動を保証するための進歩的知識移動戦略を提案する。この戦略は、学生モデルが教師ネットワークの理解をエミュレートできるように、出力分布間のKL(Kullback-Leibler)のばらつきを最小限に抑えるため、模擬蒸留から始まる。次に,l-MLLMを参照モデルとして扱う上で鍵となるDPO(Direct Preference Optimization)による嗜好蒸留を導入する。この段階において、s-MLLMの優良例と劣悪な例を区別する能力は、l-MLLMを超えて著しく向上し、特に幻覚ベンチマークにおいて、教師を超越したより良い学生に繋がる。大規模な実験により、LLaVA-MoDは、活性化パラメータの最小数と計算コストを抑えながら、様々なマルチモーダルベンチマークで既存のモデルより優れていることが示された。注目すべきは、LLaVA-MoDは2Bのアクティベートパラメータだけで、Qwen-VL-Chat-7Bを平均8.8%上回り、トレーニングデータのわずか0.3%、トレーニング可能なパラメータは23%である。これらの結果は、LLaVA-MoDの教師モデルから包括的知識を効果的に抽出する能力を強調し、より効率的なMLLMの開発への道を開いた。コードは、https://github.com/shufangxun/LLaVA-MoD.comで入手できる。 We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD.	翻訳日:2024-11-08 04:30:58 公開日:2024-10-23
# LLaVA-MoD: MoEナレッジ蒸留によるLLaVAタイニー製造 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation ( http://arxiv.org/abs/2408.15881v3 ) ライセンス: Link先を確認	Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang,	(参考訳) LLaVA-MoDは,大規模MLLM(l-MLLM)の知識を抽出することで,小規模マルチモーダル言語モデルの効率的な訓練を可能にする新しいフレームワークである。本手法はMLLM蒸留における2つの基本的な課題に対処する。まず,スパース・ミックス・オブ・エキスパートズ(MoE)アーキテクチャを言語モデルに統合することにより,s-MLLMのネットワーク構造を最適化し,計算効率とモデル表現性のバランスをとる。第2に,包括的知識移動を保証するための進歩的知識移動戦略を提案する。この戦略は、学生モデルが教師ネットワークの理解をエミュレートできるように、出力分布間のKL(Kullback-Leibler)のばらつきを最小限に抑えるため、模擬蒸留から始まる。次に,l-MLLMを参照モデルとして扱う上で鍵となるDPO(Direct Preference Optimization)による嗜好蒸留を導入する。この段階において、s-MLLMの優良例と劣悪な例を区別する能力は、l-MLLMを超えて著しく向上し、特に幻覚ベンチマークにおいて、教師を超越したより良い学生に繋がる。大規模な実験により、LLaVA-MoDは、活性化パラメータの最小数と計算コストを抑えながら、様々なマルチモーダルベンチマークで既存のモデルより優れていることが示された。注目すべきは、LLaVA-MoDは2Bのアクティベートパラメータだけで、Qwen-VL-Chat-7Bを平均8.8%上回り、トレーニングデータのわずか0.3%、トレーニング可能なパラメータは23%である。これらの結果は、LLaVA-MoDの教師モデルから包括的知識を効果的に抽出する能力を強調し、より効率的なMLLMの開発への道を開いた。コードは、https://github.com/shufangxun/LLaVA-MoD.comで入手できる。 We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD.	翻訳日:2024-11-08 04:30:58 公開日:2024-10-23
# 傾斜2次元離散格子における量子粒子のリッサホスダイナミクス Lissajous dynamics of a quantum particle in a tilted two-dimensional discrete lattice ( http://arxiv.org/abs/2409.02268v2 ) ライセンス: Link先を確認	Grzegorz Jaczewski, Tomasz Sowiński,	(参考訳) 離散2次元傾斜格子における単一粒子の量子力学を古典量子対応の観点から解析する。格子の傾きが振動力学をもたらすという事実を利用して、格子のパラメータと粒子の初期状態が、進化の過程で、その中心が古典力学で知られているリッサジョウス曲線の軌跡に従っている間に、その確率分布が形を変えないように調整できることを示す。 The quantum dynamics of a single particle in a discrete two-dimensional tilted lattice is analyzed from the perspective of the classical-quantum correspondence. Utilizing the fact that tilting the lattice results in oscillatory dynamics, we show how the parameters of the lattice and the initial state of the particle can be tuned so that during evolution the probability distribution does not change its shape while its center follows the trajectory known in classical mechanics as Lissajous curves.	翻訳日:2024-11-07 23:56:04 公開日:2024-10-23
# VILA-U:ビジュアル理解と生成を統合した統一ファンデーションモデル VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation ( http://arxiv.org/abs/2409.04429v2 ) ライセンス: Link先を確認	Yecheng Wu, Zhuoyang Zhang, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu,	(参考訳) VILA-Uは、ビデオ、画像、言語理解、生成を統合する統一基盤モデルである。従来の視覚言語モデル(VLM)は、視覚コンテンツを理解し、生成するために別々のモジュールを使用する。対照的に、VILA-Uは両方のタスクに単一の自己回帰的次トーケン予測フレームワークを採用しており、拡散モデルのような追加のコンポーネントは不要である。このアプローチは、モデルを簡単にするだけでなく、ビジュアル言語理解と生成における最先端のパフォーマンスも達成する。 VILA-Uの成功は2つの主な要因に起因している: 個別の視覚トークンを事前学習中にテキスト入力と整列する統合視覚タワー。これによってVILA-Uは、完全なトークンベースの自動回帰フレームワークを使用して、より複雑なモデルに互換性を持って実行することができる。 VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language models (VLMs) use separate modules for understanding and generating visual content, which can lead to misalignment and increased complexity. In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. This approach not only simplifies the model but also achieves near state-of-the-art performance in visual language understanding and generation. The success of VILA-U is attributed to two main factors: the unified vision tower that aligns discrete visual tokens with textual inputs during pretraining, which enhances visual perception, and autoregressive image generation can achieve similar quality as diffusion models with high-quality dataset. This allows VILA-U to perform comparably to more complex models using a fully token-based autoregressive framework.	翻訳日:2024-11-07 23:00:54 公開日:2024-10-23
# 一般化拡張不確実性原理、リウヴィルの定理と状態密度:スナイダー・ド・シッターとヤン模型 Generalized Extended Uncertainty Principles, Liouville theorem and density of states: Snyder-de Sitter and Yang models ( http://arxiv.org/abs/2409.05110v2 ) ライセンス: Link先を確認	A. Pachoł,	(参考訳) 量子力学的位相空間の修正によってハイゼンベルクの不確実性原理が変化し、これは一般化不確実性原理(英語版)(GUP)や拡張不確実性原理(英語版)(EUP)をもたらす。 GUPとEUPの組み合わせにより、一般拡張不確実性原理(GEUPまたはEGUP)は、座標とモータの両方に非可換性を導入することでこれらの修正をさらに一般化する。本稿では,GEUPが統計物理学におけるリウヴィルの定理および非相対論的量子力学の枠組みにおける状態密度に与える影響について検討する。我々は、Snyder-de Sitter と Yang のモデルの場合において、無限小時間進化の下で不変な重み付き位相空間体積要素を発見し、GEUP が状態の密度を変化させ、物理的(熱力学的な)性質に影響を及ぼすことを示した。上記のモデルから一定の制限で得られた特別事例についても論じる。 GEUPとEUPの新しい高次タイプも提案されている。 Modifications in quantum mechanical phase space lead to changes in the Heisenberg uncertainty principle, which can result in the Generalized Uncertainty Principle (GUP) or the Extended Uncertainty Principle (EUP), introducing quantum gravitational effects at small and large distances, respectively. A combination of GUP and EUP, the Generalized Extended Uncertainty Principle (GEUP or EGUP), further generalizes these modifications by incorporating noncommutativity in both coordinates and momenta. This paper examines the impact of GEUP on the Liouville theorem in statistical physics and density of states within non-relativistic quantum mechanics framework. We find a weighted phase space volume element, invariant under the infinitesimal time evolution, in the cases of Snyder-de Sitter and Yang models, presenting how GEUP alters the density of states, potentially affecting physical (thermodynamical) properties. Special cases, obtained in certain limits from the above models are also discussed. New higher order types of GEUP and EUP are also proposed.	翻訳日:2024-11-07 22:49:49 公開日:2024-10-23
# CD-NGP:動的シーンのための高速でスケーラブルな連続表現 CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v2 ) ライセンス: Link先を確認	Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,	(参考訳) 動的シーンにおける3次元再構成と新しいビュー合成のための高速でスケーラブルな表現であるCD-NGPを提案する。連続学習にインスパイアされた本手法は,まず入力ビデオを複数のチャンクに分割し,次にモデルのチャンクをチャンクで訓練し,最後に,第1枝とその後の枝の特徴を融合させる。 DyNeRFデータセットを用いた実験により、提案した新しい表現は、メモリ消費、モデルサイズ、トレーニング速度、レンダリング品質との大きなバランスに達することが示された。具体的には、オフライン方式よりもトレーニングメモリ(<14$GB)を85\%以上消費し、他のオンライン方式に比べてストリーミング帯域(<0.4$MB/frame)を大幅に削減する必要がある。 We present CD-NGP, which is a fast and scalable representation for 3D reconstruction and novel view synthesis in dynamic scenes. Inspired by continual learning, our method first segments input videos into multiple chunks, followed by training the model chunk by chunk, and finally, fuses features of the first branch and subsequent branches. Experiments on the prevailing DyNeRF dataset demonstrate that our proposed novel representation reaches a great balance between memory consumption, model size, training speed, and rendering quality. Specifically, our method consumes $85\%$ less training memory ($<14$GB) than offline methods and requires significantly lower streaming bandwidth ($<0.4$MB/frame) than other online alternatives.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-23
# CD-NGP:動的シーンのための高速でスケーラブルな連続表現 CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v3 ) ライセンス: Link先を確認	Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,	(参考訳) 動的シーンにおける3次元再構成と新しいビュー合成のための高速でスケーラブルな表現であるCD-NGPを提案する。連続学習にインスパイアされた本手法は,まず入力ビデオを複数のチャンクに分割し,次にモデルのチャンクをチャンクで訓練し,最後に,第1枝とその後の枝の特徴を融合させる。 DyNeRFデータセットを用いた実験により、提案した新しい表現は、メモリ消費、モデルサイズ、トレーニング速度、レンダリング品質との大きなバランスに達することが示された。具体的には、オフライン方式よりもトレーニングメモリ(<14$GB)を85\%以上消費し、他のオンライン方式に比べてストリーミング帯域(<0.4$MB/frame)を大幅に削減する必要がある。 We present CD-NGP, which is a fast and scalable representation for 3D reconstruction and novel view synthesis in dynamic scenes. Inspired by continual learning, our method first segments input videos into multiple chunks, followed by training the model chunk by chunk, and finally, fuses features of the first branch and subsequent branches. Experiments on the prevailing DyNeRF dataset demonstrate that our proposed novel representation reaches a great balance between memory consumption, model size, training speed, and rendering quality. Specifically, our method consumes $85\%$ less training memory ($<14$GB) than offline methods and requires significantly lower streaming bandwidth ($<0.4$MB/frame) than other online alternatives.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-23
# CD-NGP:動的シーンのための高速でスケーラブルな連続表現 CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v4 ) ライセンス: Link先を確認	Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,	(参考訳) ダイナミックシーンにおける新しいビュー合成(NVS)の方法論は、メモリ消費の調和、モデルの複雑さ、トレーニング効率、レンダリング忠実度といった重要な課題に直面している。既存のオフライン技術は、高品質な結果を提供する一方で、かなりのメモリ要求と限られたスケーラビリティによって特徴付けられることが多い。対照的に、オンライン手法は、迅速な収束とモデルのコンパクトさのバランスをとるという課題に対処する。これらの問題に対処するため,我々は連続的動的グラフィックスプリミティブ(CD-NGP)を提案する。提案手法では,時間的および空間的ハッシュエンコーディングの機能を相乗化して高いレンダリング品質を実現し,拡張性を高めるためにパラメータ再利用を採用し,メモリオーバーヘッドを軽減するために連続的な学習フレームワークを活用する。さらに,厳密かつ非剛性な動きを持つ多視点,例外的に長いビデオシーケンスからなる新しいデータセットを導入し,提案手法のスケーラビリティを実証する。 Current methodologies for novel view synthesis (NVS) in dynamic scenes encounter significant challenges in harmonizing memory consumption, model complexity, training efficiency, and rendering fidelity. Existing offline techniques, while delivering high-quality results, are often characterized by substantial memory demands and limited scalability. In contrast, online methods grapple with the challenge of balancing rapid convergence with model compactness. To address these issues, we propose continual dynamic neural graphics primitives (CD-NGP). Our approach synergizes features from both temporal and spatial hash encodings to achieve high rendering quality, employs parameter reuse to enhance scalability, and leverages a continual learning framework to mitigate memory overhead. Furthermore, we introduce a novel dataset comprising multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion, thereby substantiating the scalability of our method.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-23
# RotCAtt-TransUNet++: 高度心分離のための新しいディープニューラルネットワーク RotCAtt-TransUNet++: Novel Deep Neural Network for Sophisticated Cardiac Segmentation ( http://arxiv.org/abs/2409.05280v2 ) ライセンス: Link先を確認	Quoc-Bao Nguyen-Le, Tuan-Hy Le, Anh-Triet Do, Quoc-Huy Trinh,	(参考訳) 心臓血管疾患は依然として世界的な健康上の問題であり、世界の死亡率のかなりの部分を占めている。心臓画像データの正確なセグメンテーションは、心血管疾患に伴う死亡率の軽減に重要である。しかし、CNNベースのアプローチとTransformerベースのアプローチを含む既存の最先端(SOTA)ニューラルネットワークは、スライス内情報とともにスライス間接続を効果的にキャプチャできないため、実用性に限界がある。この欠損は、軸方向の冠動脈など、z軸に沿った複雑な長距離の詳細を特徴とするデータセットで特に顕著である。さらに、SOTA法は心筋のセグメンテーションと非心筋成分の区別に失敗し、「スライディング」現象へと繋がる。これらの課題に対処するために、複雑な心構造の堅牢なセグメンテーションに適した新しいアーキテクチャであるRotCAtt-TransUNet++を提案する。提案手法では,エンコーダ内のネストスキップ接続でマルチスケール機能を集約することで,グローバルコンテキストのモデリングを強調する。トランスフォーマー層を統合してパッチ間のインタラクションをキャプチャし、ロータリーアテンション機構を使用して複数のスライス間の接続(インタースライス情報)をキャプチャする。さらに、チャネルワイドのクロスアテンションゲートは、融合したマルチスケールのチャネルワイド情報とデコーダステージからブリッジセマンティックギャップへ特徴を導く。実験の結果,提案モデルでは,4つの心的データセットと1つの腹部的データセットにまたがる既存のSOTAアプローチよりも優れていた。重要なことは、冠状動脈と心筋は、推論中にほぼ完全な精度でアノテートされることである。アブレーション研究では、回転注意機構が意味次元空間に埋め込まれたベクトル化されたパッチを効果的に変換し、セグメンテーション精度を高めることが示されている。 Cardiovascular disease remains a predominant global health concern, responsible for a significant portion of mortality worldwide. Accurate segmentation of cardiac medical imaging data is pivotal in mitigating fatality rates associated with cardiovascular conditions. However, existing state-of-the-art (SOTA) neural networks, including both CNN-based and Transformer-based approaches, exhibit limitations in practical applicability due to their inability to effectively capture inter-slice connections alongside intra-slice information. This deficiency is particularly evident in datasets featuring intricate, long-range details along the z-axis, such as coronary arteries in axial views. Additionally, SOTA methods fail to differentiate non-cardiac components from myocardium in segmentation, leading to the "spraying" phenomenon. To address these challenges, we present RotCAtt-TransUNet++, a novel architecture tailored for robust segmentation of complex cardiac structures. Our approach emphasizes modeling global contexts by aggregating multiscale features with nested skip connections in the encoder. It integrates transformer layers to capture interactions between patches and employs a rotatory attention mechanism to capture connectivity between multiple slices (inter-slice information). Additionally, a channel-wise cross-attention gate guides the fused multi-scale channel-wise information and features from decoder stages to bridge semantic gaps. Experimental results demonstrate that our proposed model outperforms existing SOTA approaches across four cardiac datasets and one abdominal dataset. Importantly, coronary arteries and myocardium are annotated with near-perfect accuracy during inference. An ablation study shows that the rotatory attention mechanism effectively transforms embedded vectorized patches in the semantic dimensional space, enhancing segmentation accuracy.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-23
# オルタナティブベルの状態とテレポーテーション Alternative Bell's states and teleportation ( http://arxiv.org/abs/2409.06885v2 ) ライセンス: Link先を確認	Juan M. Romero, Emiliano Montoya-Gonzalez, Oscar Velazquez-Alvarado,	(参考訳) ベル状態は量子コンピューティングにおいて最も有用なものの一つである。これらの状態は、2つの量子ビットを持つ感染状態の正規直交基底である。絡み合った状態の代替基底を提案する。これらの状態のいくつかは連続パラメータに依存する。これらの代替基底の量子回路と符号を示す。さらに、これらの絡み合った状態を用いて量子テレポーテーションを研究し、量子回路と関連する符号を示す。 Bell's states are among the most useful in quantum computing. These state are an orthonormal base of entagled states with two qubits. We propose alternative bases of entangled states. Some of these states depend on a continuous parameter. We present the quantum circuit and code of these alternative bases. In addition, we study quantum teleportation with these entangled states and present their quantum circuits and codes associated.	翻訳日:2024-11-07 22:05:05 公開日:2024-10-23
# オルタナティブベルの状態とテレポーテーション Alternative Bell's states and teleportation ( http://arxiv.org/abs/2409.06885v3 ) ライセンス: Link先を確認	Juan M. Romero, Emiliano Montoya-Gonzalez, Oscar Velazquez-Alvarado,	(参考訳) ベル状態は量子コンピューティングにおいて最も有用なものの一つである。これらの状態は、2つの量子ビットを持つ感染状態の正規直交基底である。絡み合った状態の代替基底を提案する。これらの状態のいくつかは連続パラメータに依存する。これらの代替基底の量子回路と符号を示す。さらに、これらの絡み合った状態を用いて量子テレポーテーションを研究し、量子回路と関連する符号を示す。 Bell's states are among the most useful in quantum computing. These state are an orthonormal base of entagled states with two qubits. We propose alternative bases of entangled states. Some of these states depend on a continuous parameter. We present the quantum circuit and code of these alternative bases. In addition, we study quantum teleportation with these entangled states and present their quantum circuits and codes associated.	翻訳日:2024-11-07 22:05:05 公開日:2024-10-23
# エージェントベースモデルにおけるエージェンシーの限界について On the limits of agency in agent-based models ( http://arxiv.org/abs/2409.10568v2 ) ライセンス: Link先を確認	Ayush Chopra, Shashank Kumar, Nurullah Giray-Kuru, Ramesh Raskar, Arnau Quera-Bofarull,	(参考訳) エージェント・ベース・モデリング(ABM)は、環境の中で動作し相互作用するエージェントの集合をシミュレートすることで、複雑なシステムの振る舞いを理解しようとする。その実用性には、現実的な環境動態と適応的なエージェントの挙動を捉えながら、百万の人口を効率的にシミュレートする必要がある。大規模言語モデル(LLMs)の最近の進歩は、適応的な振る舞いを捉えうるエージェントとしてLLMを使用することで、ABMを強化する機会を与える。しかし、LLMを多人数で使用するという計算能力の欠如は、その普及を妨げている。本稿では,AMMを数百万のエージェントに拡張するフレームワークであるAgentTorchを紹介する。 ABMエージェントとしてLLMの実用性をベンチマークし、シミュレーションスケールと個々のエージェンシー間のトレードオフを探索する。新型コロナウイルス(COVID-19)のパンデミックをケーススタディとして、AgentTorchはニューヨーク市を代表する840万人のエージェントをシミュレートし、孤立と雇用行動が健康と経済に与える影響を捉えている。我々は, ヒューリスティックエージェントとLCMエージェントをベースとした各種エージェントアーキテクチャの性能を比較し, 疾患波と失業率の予測を行った。さらに、AgentTorchの振り返り、反ファクト、そして予測分析の能力を紹介し、政策設計における歴史的データの限界を克服する上で、適応的なエージェントの振る舞いがどのように役立つかを強調した。 AgentTorchは、世界中のポリシー作成と科学的発見に積極的に利用されているオープンソースプロジェクトである。 github.com/AgentTorch/AgentTorch。 Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a collection of agents that act and interact within an environment. Their practical utility requires capturing realistic environment dynamics and adaptive agent behavior while efficiently simulating million-size populations. Recent advancements in large language models (LLMs) present an opportunity to enhance ABMs by using LLMs as agents with further potential to capture adaptive behavior. However, the computational infeasibility of using LLMs for large populations has hindered their widespread adoption. In this paper, we introduce AgentTorch -- a framework that scales ABMs to millions of agents while capturing high-resolution agent behavior using LLMs. We benchmark the utility of LLMs as ABM agents, exploring the trade-off between simulation scale and individual agency. Using the COVID-19 pandemic as a case study, we demonstrate how AgentTorch can simulate 8.4 million agents representing New York City, capturing the impact of isolation and employment behavior on health and economic outcomes. We compare the performance of different agent architectures based on heuristic and LLM agents in predicting disease waves and unemployment rates. Furthermore, we showcase AgentTorch's capabilities for retrospective, counterfactual, and prospective analyses, highlighting how adaptive agent behavior can help overcome the limitations of historical data in policy design. AgentTorch is an open-source project actively being used for policy-making and scientific discovery around the world. The framework is available here: github.com/AgentTorch/AgentTorch.	翻訳日:2024-11-07 20:24:12 公開日:2024-10-23
# AutoSpec: ニューラルネットワーク仕様の自動生成 AutoSpec: Automated Generation of Neural Network Specifications ( http://arxiv.org/abs/2409.10897v2 ) ライセンス: Link先を確認	Shuowei Jin, Francis Y. Yan, Cheng Tan, Anuj Kalia, Xenofon Foukas, Z. Morley Mao,	(参考訳) 学習強化システムにおけるニューラルネットワークの採用の増加は、モデル安全性と堅牢性、特に安全クリティカルドメインの重要性を強調している。ニューラルネットワークの形式的検証の進展にもかかわらず、現在のプラクティスでは、さまざまなシナリオで期待されるモデルの振る舞いを規定するプロパティであるモデル仕様を手動で定義する必要がある。しかし、この手動のプロセスは人間のミスをしがちで、スコープが限られており、時間がかかります。本稿では,学習強化システムにおけるニューラルネットワークの包括的かつ正確な仕様を自動的に生成する最初のフレームワークであるAutoSpecを紹介する。また、モデル仕様の精度とカバレッジを評価するための最初のメトリクスセットを提案し、将来の比較のためのベンチマークを確立する。 4つの異なるアプリケーションで評価したところ、AutoSpecは人間の定義した仕様よりも優れており、2つのベースラインアプローチが提案されている。 The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, particularly in safety-critical domains. Despite progress in the formal verification of neural networks, current practices require users to manually define model specifications -- properties that dictate expected model behavior in various scenarios. This manual process, however, is prone to human error, limited in scope, and time-consuming. In this paper, we introduce AutoSpec, the first framework to automatically generate comprehensive and accurate specifications for neural networks in learning-augmented systems. We also propose the first set of metrics for assessing the accuracy and coverage of model specifications, establishing a benchmark for future comparisons. Our evaluation across four distinct applications shows that AutoSpec outperforms human-defined specifications as well as two baseline approaches introduced in this study.	翻訳日:2024-11-07 20:13:03 公開日:2024-10-23
# ブロックチェーンとスマートコントラクトを用いたセキュアV2Gトランザクションのためのサイバー物理認証方式 Cyber-Physical Authentication Scheme for Secure V2G Transactions Using Blockchain and Smart Contracts ( http://arxiv.org/abs/2409.14008v1 ) ライセンス: Link先を確認	Yunwang Chen, Yanmin Zhao, Siuming Yiu,	(参考訳) 電気自動車(EV)の急速な普及により、車両間通信(V2G)ネットワークにおける堅牢なサイバーセキュリティ対策の必要性が高まっている。この論文では、ブロックチェーンベースのV2Gシステム内のPnC操作を接続して充電するように設計された、サイバー物理認証プロトコルとスマートコントラクトのトレーディングを提案する。このプロトコルは、高度な暗号化技術とブロックチェーンを活用して、EVと充電ステーション間のセキュアで透明でタンパーセーフなエネルギートランザクションを保証する。主な貢献は、サイバー物理認証手法の開発、安全なエネルギー取引のためのスマートコントラクトフレームワークの実装、詳細なセキュリティとプライバシ分析である。提案プロトコルは、ユーザの匿名性とデータの整合性を保ちながら、分散型サービス拒否(DDoS)攻撃、中間者攻撃(MitM)攻撃、リプレイ攻撃などのリスクを効果的に軽減する。 The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as distributed denial of service (DDoS) attacks, man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.	翻訳日:2024-11-07 04:06:38 公開日:2024-10-23
# ブロックチェーンとスマートコントラクトを用いたセキュアV2Gトランザクションのためのサイバー物理認証方式 Cyber-Physical Authentication Scheme for Secure V2G Transactions Using Blockchain and Smart Contracts ( http://arxiv.org/abs/2409.14008v2 ) ライセンス: Link先を確認	Yunwang Chen, Yanmin Zhao, Siuming Yiu,	(参考訳) 電気自動車(EV)の急速な普及により、車両間通信(V2G)ネットワークにおける堅牢なサイバーセキュリティ対策の必要性が高まっている。この論文では、ブロックチェーンベースのV2Gシステム内のPnC操作を接続して充電するように設計された、サイバー物理認証プロトコルとスマートコントラクトのトレーディングを提案する。このプロトコルは、高度な暗号化技術とブロックチェーンを活用して、EVと充電ステーション間のセキュアで透明でタンパーセーフなエネルギートランザクションを保証する。主な貢献は、サイバー物理認証手法の開発、安全なエネルギー取引のためのスマートコントラクトフレームワークの実装、詳細なセキュリティとプライバシ分析である。提案プロトコルは、ユーザの匿名性とデータの整合性を保ちながら、分散型サービス拒否(DDoS)攻撃、中間者攻撃(MitM)攻撃、リプレイ攻撃などのリスクを効果的に軽減する。 The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as distributed denial of service (DDoS) attacks, man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.	翻訳日:2024-11-07 04:06:38 公開日:2024-10-23
# セキュアなV2Gトランザクションのためのサイバー物理認証方式 Cyber-Physical Authentication Scheme for Secure V2G Transactions ( http://arxiv.org/abs/2409.14008v3 ) ライセンス: Link先を確認	Yunwang Chen, Yanmin Zhao, Siuming Yiu,	(参考訳) 電気自動車(EV)の急速な普及により、車両間通信(V2G)ネットワークにおける堅牢なサイバーセキュリティ対策の必要性が高まっている。この論文では、ブロックチェーンベースのV2Gシステム内のPnC操作を接続して充電するように設計された、サイバー物理認証プロトコルとスマートコントラクトのトレーディングを提案する。このプロトコルは、高度な暗号化技術とブロックチェーンを活用して、EVと充電ステーション間のセキュアで透明でタンパーセーフなエネルギートランザクションを保証する。主な貢献は、サイバー物理認証手法の開発、安全なエネルギー取引のためのスマートコントラクトフレームワークの実装、詳細なセキュリティとプライバシ分析である。提案プロトコルは、ユーザ匿名性とデータの整合性を保ちながら、中間者攻撃やリプレイ攻撃などのリスクを効果的に軽減する。 The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.	翻訳日:2024-11-07 04:06:38 公開日:2024-10-23
# MADial-Bench: メモリ拡張対話生成の実環境評価に向けて MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation ( http://arxiv.org/abs/2409.15240v2 ) ライセンス: Link先を確認	Junqing He, Liang Zhu, Rui Wang, Xi Wang, Reza Haffari, Jiaxing Zhang,	(参考訳) チャットボットや対話システム(DS)にとって長期記憶は、多数の発達したメモリ拡張DS(MADS)によって実証された、一貫性のある人間的な会話を生成するために重要である。このようなMADSの有効性を評価するため、検索精度やパープレキシティ(PPL)などの既存の評価指標は、主にクエリ指向の事実性や言語品質の評価に重点を置いている。しかし、これらの指標は実際的な価値を欠くことが多い。また,DSの人間的評価には評価寸法が不十分である。メモリリコールのパラダイムに関しては、現在の評価スキームは受動的メモリ検索のみを考慮しつつ、多様なメモリリコールを、感情や環境といったリッチなトリガ要因で無視する。このギャップを埋めるために,認知科学と心理学理論に基づく様々なメモリリコールパラダイムをカバーする新しいメモリ拡張ダイアログベンチマーク(MADail-Bench)を構築した。このベンチマークは2つのタスクを別々に評価する: メモリ検索とメモリ認識は、パッシブとプロアクティブの両方のメモリリコールデータを組み込んだものである。本稿では, 記憶注入, 感情支援(ES)能力, 親密性などの評価基準を新たに導入し, 生成した反応を包括的に評価する。このベンチマークにおける最先端の埋め込みモデルと大規模言語モデルの結果は、さらなる進歩の可能性を示している。広範囲なテストにより、メモリインジェクション、ES習熟度、親密さの相関が明らかになる。 Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy.	翻訳日:2024-11-06 20:27:58 公開日:2024-10-23
# リッチリワードのダークサイド:VLMリワードにおけるノイズの理解と緩和 The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards ( http://arxiv.org/abs/2409.15922v2 ) ライセンス: Link先を確認	Sukai Huang, Nir Lipovetzky, Trevor Cohn,	(参考訳) VLM(Vision-Language Models)は、インボディードエージェントに指示に従うための報酬信号を生成するために使われることが多いが、本研究では、本質的な(探索駆動)報酬のみを使用するエージェントと比較して、VLM報酬によって導かれるエージェントは、近年の成果に反するものとして、しばしば性能が低下することが判明した。偽陽性報酬(意図しない軌道が誤って報酬を受ける場合)は偽陰性よりも有害である、という仮説を立てる。分析によってこの仮説が裏付けられ、広く使われているコサイン類似度測定基準が偽陽性報酬推定の傾向にあることが明らかとなった。そこで本稿では,ノイズを緩和する新しい報奨関数であるBiMI({Bi}nary {M}utual {I}nformation)を導入する。 BiMIは多様な、難易度の高いナビゲーション環境における学習効率を大幅に向上させる。我々の研究は、様々な種類の報奨ノイズの影響剤の学習方法の微妙な理解を提供し、トレーニング実施時のマルチモーダル報酬信号ノイズへの対処の重要性を強調した。 While Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents to follow instructions, our research reveals that agents guided by VLM rewards often underperform compared to those employing only intrinsic (exploration-driven) rewards, contradicting expectations set by recent work. We hypothesize that false positive rewards -- instances where unintended trajectories are incorrectly rewarded -- are more detrimental than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric is prone to false positive reward estimates. To address this, we introduce BiMI ({Bi}nary {M}utual {I}nformation), a novel reward function designed to mitigate noise. BiMI significantly enhances learning efficiency across diverse and challenging embodied navigation environments. Our findings offer a nuanced understanding of how different types of reward noise impact agent learning and highlight the importance of addressing multimodal reward signal noise when training embodied agents	翻訳日:2024-11-06 19:21:13 公開日:2024-10-23
# ホップ代数と可解ユニタリ回路 Hopf algebras and solvable unitary circuits ( http://arxiv.org/abs/2409.17215v2 ) ライセンス: Link先を確認	Zhiyuan Wang,	(参考訳) 量子多体力学における厳密に解決可能なモデルは、多くの興味深い物理現象に関する貴重な洞察を与え、基本的な理論的問題を厳密に研究するためのプラットフォームとして機能する。それでも、それらは極めて稀であり、既存の解決可能なモデルと解法には深刻な制限がある。本稿では、離散空間と時間における量子多体ダイナミクスをモデル化する、正確に解けるユニタリ回路の新たなファミリーを紹介する。多くの従来の可解モデルとは異なり、この新しいモデルの族における任意の行列積状態から初期化された完全な量子力学を正確に計算することができる。局所可観測物の時間進化と相関、レニイエンタングルメントエントロピーの線形成長、時空間相関、および時間外相関は、すべて正確に計算可能である。正確な解を可能にするこれらのモデルの鍵となる性質は、任意の時間発展された局所作用素が有限結合次元の正確な行列積作用素であり、任意に長い時間でも、テンソルネットワーク技術と共に基礎となる(弱)ホップ代数構造を用いて証明できることである。このモデルのファミリの構築と解法に関する一般的な枠組みを概説し、いくつかの明示的な例を挙げる。特に、PXPモデルの花束版に非常に近い弱いホップ代数から構築されたモデルについて詳細に研究し、得られた正確な結果は、量子的な多くの身体の傷跡の現象、より一般的には、制約された系の花束量子力学に光を当てる可能性がある。 Exactly solvable models in quantum many body dynamics provide valuable insights into many interesting physical phenomena, and serve as platforms to rigorously investigate fundamental theoretical questions. Nevertheless, they are extremely rare and existing solvable models and solution techniques have serious limitations. In this paper we introduce a new family of exactly solvable unitary circuits which model quantum many body dynamics in discrete space and time. Unlike many previous solvable models, one can exactly compute the full quantum dynamics initialized from any matrix product state in this new family of models. The time evolution of local observables and correlations, the linear growth of Renyi entanglement entropy, spatiotemporal correlations, and out-of-time-order correlations are all exactly computable. A key property of these models enabling the exact solution is that any time evolved local operator is an exact matrix product operator with finite bond dimension, even at arbitrarily long time, which we prove using the underlying (weak) Hopf algebra structure along with tensor network techniques. We lay down the general framework for the construction and solution of this family of models, and give several explicit examples. In particular, we study in detail a model constructed out of a weak Hopf algebra that is very close to a floquet version of the PXP model, and the exact results we obtain may shed light on the phenomenon of quantum many body scars, and more generally, floquet quantum dynamics in constrained systems.	翻訳日:2024-11-06 16:30:51 公開日:2024-10-23
# 思考の証明 : ニューロシンボリックプログラム合成はロバストと解釈可能な推論を可能にする Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning ( http://arxiv.org/abs/2409.17270v2 ) ライセンス: Link先を確認	Debargha Ganguly, Srinivasan Iyengar, Vipin Chaudhary, Shivkumar Kalyanaraman,	(参考訳) 大規模言語モデル(LLM)は自然言語処理に革命をもたらしたが、特に新しいドメインや複雑な論理配列において、一貫性のない推論に苦慮している。本研究では,LLM出力の信頼性と透明性を高めるフレームワークであるProof of Thoughtを紹介する。提案手法は LLM 生成のアイデアを形式論理検証でブリッジし,LLM の出力を 1次論理構造に変換するためのカスタムインタプリタを用いて定理証明の精査を行う。我々の手法の中心はJSONベースのドメイン特化言語であり、設計によって正確な論理構造と直感的な人間の概念のバランスをとる。このハイブリッド表現は、厳密な検証とLLM推論プロセスの人間の理解の両方を可能にする。主なコントリビューションには、論理的整合性を高めるためのソート管理を備えた堅牢な型システム、事実と推論の知識を明確に区別するためのルールの明示、さまざまなドメイン固有のアプリケーションを簡単に拡張できる柔軟なアーキテクチャが含まれる。我々は、StrategyQAと新しいマルチモーダル推論タスクをベンチマークし、オープンエンドシナリオにおける性能改善を示すことにより、思考の有効性を実証する。検証可能かつ解釈可能な結果を提供することで、AIシステムのアカウンタビリティに対する重要なニーズに対処し、ハイテイクドメインにおけるヒューマン・イン・ザ・ループの監視の基礎を設定する。 Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning, particularly in novel domains and complex logical sequences. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Our approach bridges LLM-generated ideas with formal logic verification, employing a custom interpreter to convert LLM outputs into First Order Logic constructs for theorem prover scrutiny. Central to our method is an intermediary JSON-based Domain-Specific Language, which by design balances precise logical structures with intuitive human concepts. This hybrid representation enables both rigorous validation and accessible human comprehension of LLM reasoning processes. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge, and a flexible architecture that allows for easy extension to various domain-specific applications. We demonstrate Proof of Thought's effectiveness through benchmarking on StrategyQA and a novel multimodal reasoning task, showing improved performance in open-ended scenarios. By providing verifiable and interpretable results, our technique addresses critical needs for AI system accountability and sets a foundation for human-in-the-loop oversight in high-stakes domains.	翻訳日:2024-11-06 16:30:51 公開日:2024-10-23
# Evidential Bi-Level Hardest Domain Scheduler によるオープンセットドメインの一般化の促進 Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler ( http://arxiv.org/abs/2409.17555v2 ) ライセンス: Link先を確認	Kunyu Peng, Di Wen, Kailun Yang, Ao Luo, Yufan Chen, Jia Fu, M. Saquib Sarfraz, Alina Roitberg, Rainer Stiefelhagen,	(参考訳) Open-Set Domain Generalization (OSDG)では、このモデルは、テスト時に既知のカテゴリと新しいカテゴリの両方が存在する、新しいデータ外観(ドメイン)とオープンセット条件の両方に露出する。このタスクの課題は、様々な領域をまたいで一般化し、動的環境における応用に欠かせないカテゴリの新規性を正確に定量化することによる。近年、メタ学習技術はOSDGにおいて優れた結果を示し、様々なランダムカテゴリと事前定義されたドメイン分割戦略を用いて、メタトレインとテストタスクを効果的に編成している。これらのアプローチは、主にデータ拡張と差別的特徴学習の強化に焦点を当てた従来の手法よりも、よく設計されたトレーニングスケジュールを優先する。 OSDGの一般的なメタラーニングモデルは、データパーティションを構成するために、定義済みのシーケンシャルドメインスケジューラを使用するのが一般的である。しかし、まだ十分に調査されていない重要な側面は、トレーニング中にドメインスケジューラの戦略によってもたらされた影響である。本稿では、プレフィックス付きシーケンシャルおよびランダムなドメインスケジューラと比較して、OSDGにおいて適応型ドメインスケジューラの方が有利であることを示す。適応型ドメインスケジューラを実現するために,Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS)を提案する。本手法は、従者ネットワークを利用する際の信頼度を評価し、明らかな方法で学習した信頼度をトレーニングし、最大偏差で正規化し、二段階的に最適化することで、ドメインを戦略的にシーケンスする。その結果,本手法はOSDGの性能を著しく向上し,目に見えるカテゴリと目立たないカテゴリの両方に対してより差別的な埋め込みを実現することがわかった。ソースコードはhttps://github.com/KPeng9510/EBiL-HaDSで公開されている。 In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Recently, meta-learning techniques have demonstrated superior results in OSDG, effectively orchestrating the meta-train and -test tasks by employing varied random categories and predefined domain partition strategies. These approaches prioritize a well-designed training schedule over traditional methods that focus primarily on data augmentation and the enhancement of discriminative feature learning. The prevailing meta-learning models in OSDG typically utilize a predefined sequential domain scheduler to structure data partitions. However, a crucial aspect that remains inadequately explored is the influence brought by strategies of domain schedulers during training. In this paper, we observe that an adaptive domain scheduler benefits more in OSDG compared with prefixed sequential and random domain schedulers. We propose the Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) to achieve an adaptive domain scheduler. This method strategically sequences domains by assessing their reliabilities in utilizing a follower network, trained with confidence scores learned in an evidential manner, regularized by max rebiasing discrepancy, and optimized in a bi-level manner. The results show that our method substantially improves OSDG performance and achieves more discriminative embeddings for both the seen and unseen categories. The source code is publicly available at https://github.com/KPeng9510/EBiL-HaDS.	翻訳日:2024-11-06 16:20:44 公開日:2024-10-23
# マルコフ雑音下における情報伝達 Information transmission under Markovian noise ( http://arxiv.org/abs/2409.17743v2 ) ライセンス: Link先を確認	Satvik Singh, Nilanjana Datta,	(参考訳) マルコフ力学に基づく開量子系を考えると、後者は離散時間量子マルコフ半群$(\Phi^n)_{n \in {\mathbb{N}}}$でモデル化され、量子チャネル$\Phi$と$n \in {\mathbb{N}}$が離散時間パラメータである。有限時間$n\in \mathbb{N}$と$\epsilon \in [0,1)$の場合、一発$\epsilon$-error情報伝送容量は$\Phi^n$であり、チャネル$\Phi$の周辺空間の構造は$\epsilon \in [0,1)$である。私たちは$の送信を考えます (i)$ classic information(unssisted and entanglement-assisted settings)$ (ii)$量子情報と$ (iii)私的古典情報 We consider an open quantum system undergoing Markovian dynamics, the latter being modelled by a discrete-time quantum Markov semigroup $(\Phi^n)_{n \in {\mathbb{N}}}$, resulting from the action of sequential uses of a quantum channel $\Phi$, with $n \in {\mathbb{N}}$ being the discrete time parameter. We find upper and lower bounds on the one-shot $\epsilon$-error information transmission capacities of $\Phi^n$ for a finite time $n\in \mathbb{N}$ and $\epsilon \in [0,1)$ in terms of the structure of the peripheral space of the channel $\Phi$. We consider transmission of $(i)$ classical information (both in the unassisted and entanglement-assisted settings); $(ii)$ quantum information and $(iii)$ private classical information.	翻訳日:2024-11-06 16:10:55 公開日:2024-10-23
# 政策グラディエント手法の強ポリノミカル時間と検証分析 Strongly-Polynomial Time and Validation Analysis of Policy Gradient Methods ( http://arxiv.org/abs/2409.19437v1 ) ライセンス: Link先を確認	Caleb Ju, Guanghui Lan,	(参考訳) 強化学習(Reinforcement learning)は、最適性の原則的な尺度を欠き、最適性の証明を持たないアルゴリズムからアルゴリズム、あるいはベースラインの比較に頼らざるを得ない。有限状態および作用マルコフ決定過程(MDP)に着目し、最適性ギャップ上の上界と下界の両方を提供する単純な計算可能なギャップ関数を開発する。したがって、ギャップ関数の収束は最適性ギャップの収束よりも強い収束モードであり、収束が任意の問題依存分布から独立であるような分布自由収束と呼ばれる新しい概念と同値である。基本方針ミラー降下は決定論的および確率的条件の両方に対して高速な分布自由収束を示す。分布自由収束を利用して、いくつかの新しい結果を明らかにする。第一に、決定論的政策ミラー降下は、強いポリノミアル時間で非正規化されたMDPを解くことができる。第2に、確率的ポリシーミラー降下の実行中に追加のサンプルなしで精度推定が得られ、検証ステップで検証できる終了基準として使用できる。 Reinforcement learning lacks a principled measure of optimality, causing research to rely on algorithm-to-algorithm or baselines comparisons with no certificate of optimality. Focusing on finite state and action Markov decision processes (MDP), we develop a simple, computable gap function that provides both upper and lower bounds on the optimality gap. Therefore, convergence of the gap function is a stronger mode of convergence than convergence of the optimality gap, and it is equivalent to a new notion we call distribution-free convergence, where convergence is independent of any problem-dependent distribution. We show the basic policy mirror descent exhibits fast distribution-free convergence for both the deterministic and stochastic setting. We leverage the distribution-free convergence to a uncover a couple new results. First, the deterministic policy mirror descent can solve unregularized MDPs in strongly-polynomial time. Second, accuracy estimates can be obtained with no additional samples while running stochastic policy mirror descent and can be used as a termination criteria, which can be verified in the validation step.	翻訳日:2024-11-05 23:19:24 公開日:2024-10-23
# 政策勾配法の強ポリノミカル時間と検証解析 Strongly-polynomial time and validation analysis of policy gradient methods ( http://arxiv.org/abs/2409.19437v2 ) ライセンス: Link先を確認	Caleb Ju, Guanghui Lan,	(参考訳) 本稿では,有限状態および行動マルコフ決定過程(MDP)と強化学習(RL)のための,優位ギャップ関数と呼ばれる新しい終了基準を提案する。この利点ギャップ関数をステップサイズルールの設計に組み込んで、最適政策の定常状態分布に依存しない新たな線形収束率を導出することにより、政策勾配法が強いポリノミカル時間でMDPを解けることを示す。我々の知る限りでは、政策勾配法にそのような強い収束特性が確立されたのはこれが初めてである。さらに、政策勾配の確率的推定しかできない確率的設定では、有利なギャップ関数が各状態の最適性ギャップを近似し、各状態におけるサブ線形収束率を示すことを示す。利点ギャップ関数は確率的ケースでは容易に推定でき、ポリシー値の計算が容易な上限と組み合わせれば、ポリシー勾配法によって生成される解を検証するのに便利な方法を提供する。したがって、我々の開発はRLの最適性の原理的かつ計算可能な尺度を提供する一方、現在の実践は最適性の証明を持たないアルゴリズムからアルゴリズム、あるいはベースラインの比較に依存する傾向にある。 This paper proposes a novel termination criterion, termed the advantage gap function, for finite state and action Markov decision processes (MDP) and reinforcement learning (RL). By incorporating this advantage gap function into the design of step size rules and deriving a new linear rate of convergence that is independent of the stationary state distribution of the optimal policy, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. To the best of our knowledge, this is the first time that such strong convergence properties have been established for policy gradient methods. Moreover, in the stochastic setting, where only stochastic estimates of policy gradients are available, we show that the advantage gap function provides close approximations of the optimality gap for each individual state and exhibits a sublinear rate of convergence at every state. The advantage gap function can be easily estimated in the stochastic case, and when coupled with easily computable upper bounds on policy values, they provide a convenient way to validate the solutions generated by policy gradient methods. Therefore, our developments offer a principled and computable measure of optimality for RL, whereas current practice tends to rely on algorithm-to-algorithm or baselines comparisons with no certificate of optimality.	翻訳日:2024-11-05 23:19:24 公開日:2024-10-23
# 遺伝子組み換え型ニューラル表現を目指して Towards Croppable Implicit Neural Representations ( http://arxiv.org/abs/2409.19472v1 ) ライセンス: Link先を確認	Maor Ashkenazi, Eran Treister,	(参考訳) Inlicit Neural Representations(INR)は、ニューラルネットワークを使って自然信号をエンコードする能力により、近年注目されている。 INRは、新しい座標の補間や信号圧縮などの有用な応用を可能にするが、そのブラックボックスの性質は、後処理の修正を困難にしている。本稿では、編集可能なINRのアイデアを探求し、特に広く使われている収穫作業に焦点を当てる。この目的のために、我々は、デザインによる収穫をサポートする新しいINRアーキテクチャであるLocal-Global SIRENsを紹介する。局所グローバルSIRENは、信号符号化のための局所的特徴抽出とグローバル的特徴抽出を組み合わせたものである。彼らの設計をユニークなものにしているのは、エンコードされた信号の特定の部分を取り除き、比例重量を減少させる能力である。これは、ネットワークの再トレーニングを必要とせずに、対応する重みをネットワークから排除することで達成される。さらに、このアーキテクチャは、以前符号化された信号の直接拡張をサポートするためにどのように使用できるかを示す。信号編集以外にも、ローカル・グローバル・アプローチがトレーニングを加速し、様々な信号のエンコーディングを強化し、下流性能を改善し、INCODEなどの現代のINRに適用し、その可能性と柔軟性を強調している。コードはhttps://github.com/maorash/Local-Global-INRsで入手できる。 Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.	翻訳日:2024-11-05 23:07:28 公開日:2024-10-23
# 遺伝子組み換え型ニューラル表現を目指して Towards Croppable Implicit Neural Representations ( http://arxiv.org/abs/2409.19472v2 ) ライセンス: Link先を確認	Maor Ashkenazi, Eran Treister,	(参考訳) Inlicit Neural Representations(INR)は、ニューラルネットワークを使って自然信号をエンコードする能力により、近年注目されている。 INRは、新しい座標の補間や信号圧縮などの有用な応用を可能にするが、そのブラックボックスの性質は、後処理の修正を困難にしている。本稿では、編集可能なINRのアイデアを探求し、特に広く使われている収穫作業に焦点を当てる。この目的のために、我々は、デザインによる収穫をサポートする新しいINRアーキテクチャであるLocal-Global SIRENsを紹介する。局所グローバルSIRENは、信号符号化のための局所的特徴抽出とグローバル的特徴抽出を組み合わせたものである。彼らの設計をユニークなものにしているのは、エンコードされた信号の特定の部分を取り除き、比例重量を減少させる能力である。これは、ネットワークの再トレーニングを必要とせずに、対応する重みをネットワークから排除することで達成される。さらに、このアーキテクチャは、以前符号化された信号の直接拡張をサポートするためにどのように使用できるかを示す。信号編集以外にも、ローカル・グローバル・アプローチがトレーニングを加速し、様々な信号のエンコーディングを強化し、下流性能を改善し、INCODEなどの現代のINRに適用し、その可能性と柔軟性を強調している。コードはhttps://github.com/maorash/Local-Global-INRsで入手できる。 Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.	翻訳日:2024-11-05 23:07:28 公開日:2024-10-23
# GameLabel-10K: モバイルゲームクラウドソーシングによる画像優先データ収集 GameLabel-10K: Collecting Image Preference Data Through Mobile Game Crowdsourcing ( http://arxiv.org/abs/2409.19830v1 ) ライセンス: Link先を確認	Jonathan Zhou,	(参考訳) マルチビリオンパラメータモデルの台頭は、ディープラーニングにまたがるデータに対する激しい飢餓を引き起こした。本研究は,ゲーム内通貨に報いる有償アノテータをゲームプレイヤに置き換えることによるパフォーマンス向上の可能性を検討する。私たちはモバイルの歴史戦略ゲーム、Armchair Commanderの開発者と協力して、このアイデアを試しています。より具体的には、現在の研究ではこのアイデアを、通常は微調整拡散モデルに使用されるペアワイズ画像優先データを用いて検証している。この手法を用いて,約10万のラベルと7000のユニークなプロンプトを持つデータセットであるGameLabel-10Kを作成する。これらの結果に加えて、このデータセットのいくつかの制限を分析し、オープンソースライセンス下で公開しています。 The rise of multi-billion parameter models has sparked an intense hunger for data across deep learning. This study explores the possibility of replacing paid annotators with video game players who are rewarded with in-game currency for good performance. We collaborate with the developers of a mobile historical strategy game, Armchair Commander, to test this idea. More specifically, the current study tests this idea using pairwise image preference data, typically used to fine-tune diffusion models. Using this method, we create GameLabel-10K, a dataset with slightly under 10 thousand labels and 7000 unique prompts. In addition to these results, we analyze some limitations of this dataset and publicly release it under an open-source license.	翻訳日:2024-11-05 17:29:56 公開日:2024-10-23
# GameLabel-10K: モバイルゲームクラウドソーシングによる画像優先データ収集 GameLabel-10K: Collecting Image Preference Data Through Mobile Game Crowdsourcing ( http://arxiv.org/abs/2409.19830v2 ) ライセンス: Link先を確認	Jonathan Zhou,	(参考訳) マルチビリオンパラメータモデルの台頭は、ディープラーニングにまたがるデータに対する激しい飢餓を引き起こした。本研究は,ゲーム内通貨に報いる有償アノテータをゲームプレイヤに置き換えることによるパフォーマンス向上の可能性を検討する。私たちはモバイルの歴史戦略ゲーム、Armchair Commanderの開発者と協力して、このアイデアを試しています。より具体的には、現在の研究ではこのアイデアを、通常は微調整拡散モデルに使用されるペアワイズ画像優先データを用いて検証している。この手法を用いて,約10万のラベルと7000のユニークなプロンプトを持つデータセットであるGameLabel-10Kを作成する。このデータセット上でモデルを微調整し、Flux Schnellを微調整し、その即効性を改善し、収集手法の有効性を実証する。さらに、Hugging Face上でデータセットと微調整されたモデルの両方を公開しています。 The rise of multi-billion parameter models has sparked an intense hunger for data across deep learning. This study explores the possibility of replacing paid annotators with video game players who are rewarded with in-game currency for good performance. We collaborate with the developers of a mobile historical strategy game, Armchair Commander, to test this idea. More specifically, the current study tests this idea using pairwise image preference data, typically used to fine-tune diffusion models. Using this method, we create GameLabel-10K, a dataset with slightly under 10 thousand labels and 7000 unique prompts. We fine-tune a model on this dataset, we fine-tune Flux Schnell and find an improvement in its prompt adherence, demonstrating the validity of our collection method. In addition, we publicly release both the dataset and our fine-tuned model on Hugging Face.	翻訳日:2024-11-05 17:29:56 公開日:2024-10-23
# Counter-Current Learning: ディープラーニングのための生物学的にプラザブルなデュアルネットワークアプローチ Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning ( http://arxiv.org/abs/2409.19841v1 ) ライセンス: Link先を確認	Chia-Hsiang Kao, Bharath Hariharan,	(参考訳) ニューラルネットワークで広く使われているにもかかわらず、エラーのバックプロパゲーションは生物学的な妥当性の欠如を批判され、後方ロック問題や重量輸送問題といった問題に悩まされている。これらの制限により、研究者たちはより生物学的に妥当な学習アルゴリズムを探求し、生物学的神経システムがどのように適応し、学習するかについて光を当てる可能性がある。生体システムで観測される対流交換機構に着想を得て,ニューラルネットワークにおける信用代入のための生物学的に妥当なフレームワークである対流学習(CCL)を提案する。このフレームワークは、入力データを処理するフィードフォワードネットワークと、ターゲットを処理するフィードバックネットワークを使用し、各ネットワークは反並列信号の伝搬を通じて互いに強化する。フィードバックネットワークの下位層からのより情報的な信号を利用してフィードフォワードネットワークの上位層の更新を誘導し、その逆の逆で、CCLはソース入力の同時変換を目標出力への変換とこれらの変換の動的相互影響を可能にする。 MNIST、FashionMNIST、CIFAR10、CIFAR100データセットの多層パーセプトロンと畳み込みニューラルネットワークによる実験結果は、CCLがより生物学的に現実的な学習メカニズムを提供しながら、他の生物学的にもっとも有効なアルゴリズムと同等のパフォーマンスを達成することを示した。さらに、自動エンコーダタスクへのアプローチの適用性を示し、教師なし表現学習の可能性を示す。我々の研究は、ニューラルネットワークにおける学習と適応の代替メカニズムを提供する、生物学的にインスパイアされた、そして実証可能な学習アルゴリズムの方向性を示す。 Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt and learn. Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in neural networks. This framework employs a feedforward network to process input data and a feedback network to process targets, with each network enhancing the other through anti-parallel signal propagation. By leveraging the more informative signals from the bottom layer of the feedback network to guide the updates of the top layer of the feedforward network and vice versa, CCL enables the simultaneous transformation of source inputs to target outputs and the dynamic mutual influence of these transformations. Experimental results on MNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets using multi-layer perceptrons and convolutional neural networks demonstrate that CCL achieves comparable performance to other biologically plausible algorithms while offering a more biologically realistic learning mechanism. Furthermore, we showcase the applicability of our approach to an autoencoder task, underscoring its potential for unsupervised representation learning. Our work presents a direction for biologically inspired and plausible learning algorithms, offering an alternative mechanisms of learning and adaptation in neural networks.	翻訳日:2024-11-05 17:19:55 公開日:2024-10-23
# Counter-Current Learning: ディープラーニングのための生物学的にプラザブルなデュアルネットワークアプローチ Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning ( http://arxiv.org/abs/2409.19841v2 ) ライセンス: Link先を確認	Chia-Hsiang Kao, Bharath Hariharan,	(参考訳) ニューラルネットワークで広く使われているにもかかわらず、エラーのバックプロパゲーションは生物学的な妥当性の欠如を批判され、後方ロック問題や重量輸送問題といった問題に悩まされている。これらの制限により、研究者たちはより生物学的に妥当な学習アルゴリズムを探求し、生物学的神経システムがどのように適応し、学習するかについて光を当てる可能性がある。生体システムで観測される対流交換機構に着想を得て,ニューラルネットワークにおける信用代入のための生物学的に妥当なフレームワークである対流学習(CCL)を提案する。このフレームワークは、入力データを処理するフィードフォワードネットワークと、ターゲットを処理するフィードバックネットワークを使用し、各ネットワークは反並列信号の伝搬を通じて互いに強化する。フィードバックネットワークの下位層からのより情報的な信号を利用してフィードフォワードネットワークの上位層の更新を誘導し、その逆の逆で、CCLはソース入力の同時変換を目標出力への変換とこれらの変換の動的相互影響を可能にする。 MNIST、FashionMNIST、CIFAR10、CIFAR100データセットの多層パーセプトロンと畳み込みニューラルネットワークによる実験結果は、CCLがより生物学的に現実的な学習メカニズムを提供しながら、他の生物学的にもっとも有効なアルゴリズムと同等のパフォーマンスを達成することを示した。さらに、自動エンコーダタスクへのアプローチの適用性を示し、教師なし表現学習の可能性を示す。我々の研究は、ニューラルネットワークにおける学習と適応の代替メカニズムを提供する、生物学的にインスパイアされた、そして実証可能な学習アルゴリズムの方向性を示す。 Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt and learn. Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in neural networks. This framework employs a feedforward network to process input data and a feedback network to process targets, with each network enhancing the other through anti-parallel signal propagation. By leveraging the more informative signals from the bottom layer of the feedback network to guide the updates of the top layer of the feedforward network and vice versa, CCL enables the simultaneous transformation of source inputs to target outputs and the dynamic mutual influence of these transformations. Experimental results on MNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets using multi-layer perceptrons and convolutional neural networks demonstrate that CCL achieves comparable performance to other biologically plausible algorithms while offering a more biologically realistic learning mechanism. Furthermore, we showcase the applicability of our approach to an autoencoder task, underscoring its potential for unsupervised representation learning. Our work presents a direction for biologically inspired and plausible learning algorithms, offering an alternative mechanism of learning and adaptation in neural networks.	翻訳日:2024-11-05 17:19:55 公開日:2024-10-23
# 視覚言語モデルは、視覚的手がかりとテキストのあいまいさを解決できるだろうか? Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! ( http://arxiv.org/abs/2410.01023v1 ) ライセンス: Link先を確認	Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu,	(参考訳) 人間は多モードリテラシーを持ち、様々なモダリティからの情報を積極的に統合して推論を形成することができる。テキストの語彙的曖昧さのような課題に直面して、サムネイル画像や教科書のイラストのような他のモダリティを補う。マシンが同様のマルチモーダル理解能力を実現することは可能か? そこで本研究では,語彙の曖昧さを解消する上でのマルチモーダル入力の影響を評価するための新しいベンチマークである,画像説明付き理解パン(UNPIE)を提案する。修道女は本質的な曖昧さのため、この評価の理想的な主題として機能する。私たちのデータセットには1,000の句が含まれており、それぞれに両方の意味を説明する画像が添付されています。マルチモーダルリテラシーの諸側面を評価するアノテーションとして, Pun Grounding, Disambiguation, Restructation の3つの多モーダル課題を提起する。その結果,タスクの複雑さが増大するにつれて,様々なソクラティックモデルや視覚言語モデルが視覚的コンテキストが与えられた場合に,テキストのみのモデルよりも改善することが示唆された。 Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail images or textbook illustrations. Is it possible for machines to achieve a similar multimodal understanding capability? In response, we present Understanding Pun with Image Explanations (UNPIE), a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities. Puns serve as the ideal subject for this evaluation due to their intrinsic ambiguity. Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings. We pose three multimodal challenges with the annotations to assess different aspects of multimodal literacy; Pun Grounding, Disambiguation, and Reconstruction. The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context, particularly as the complexity of the tasks increases.	翻訳日:2024-11-04 23:40:11 公開日:2024-10-23
# 視覚言語モデルは、視覚的手がかりとテキストのあいまいさを解決できるだろうか? Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! ( http://arxiv.org/abs/2410.01023v2 ) ライセンス: Link先を確認	Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu,	(参考訳) 人間は多モードリテラシーを持ち、様々なモダリティからの情報を積極的に統合して推論を形成することができる。テキストの語彙的曖昧さのような課題に直面して、サムネイル画像や教科書のイラストのような他のモダリティを補う。マシンが同様のマルチモーダル理解能力を実現することは可能か? そこで本研究では,語彙の曖昧さを解消する上でのマルチモーダル入力の影響を評価するための新しいベンチマークである,画像説明付き理解パン(UNPIE)を提案する。修道女は本質的な曖昧さのため、この評価の理想的な主題として機能する。私たちのデータセットには1,000の句が含まれており、それぞれに両方の意味を説明する画像が添付されています。マルチモーダルリテラシーの諸側面を評価するアノテーションとして, Pun Grounding, Disambiguation, Restructation の3つの多モーダル課題を提起する。その結果,タスクの複雑さが増大するにつれて,様々なソクラティックモデルや視覚言語モデルが視覚的コンテキストが与えられた場合に,テキストのみのモデルよりも改善することが示唆された。 Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail images or textbook illustrations. Is it possible for machines to achieve a similar multimodal understanding capability? In response, we present Understanding Pun with Image Explanations (UNPIE), a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities. Puns serve as the ideal subject for this evaluation due to their intrinsic ambiguity. Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings. We pose three multimodal challenges with the annotations to assess different aspects of multimodal literacy; Pun Grounding, Disambiguation, and Reconstruction. The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context, particularly as the complexity of the tasks increases.	翻訳日:2024-11-04 23:40:11 公開日:2024-10-23
# 知識サイロがジャーナリズムにおける責任あるAI実践に及ぼす影響 Impact of Knowledge Silos on Responsible AI Practices in Journalism ( http://arxiv.org/abs/2410.01138v1 ) ライセンス: Link先を確認	Tomás Dodds, Astrid Vandendaele, Felix M. Simon, Natali Helberger, Valeria Resendez, Wang Ngai Yeung,	(参考訳) ジャーナリズムにおける責任あるAIプラクティスの効果的な採用には、技術、編集、ジャーナリスト、管理など、さまざまな視点を橋渡しするための協力的な努力が必要である。ニュース組織内の責任あるAIに関する情報共有に影響を与える可能性のある多くの課題の1つは、知識サイロである。この研究は、ナレッジサイロがジャーナリズムにおける責任あるAIプラクティスの採用にどのように影響するかを、オランダの主要4メディアのクロスケーススタディを通じて調査することを目的としている。我々は、AI知識共有に対する個人的および組織的障壁と、知識サイロがニュースルーム内の責任あるAIイニシアチブの運用にどんな影響を及ぼすかを検討する。この問題に対処するため,我々はDe Telegraaf,de Volkskrant,Nederlandse Omroep Stichting (NOS), RTL Nederlandの編集者,マネージャ,ジャーナリストらと14回の半構造化インタビューを行った。インタビューは、知識サイロの存在、AI実践の責任ある採用に対する影響、そしてこれらのダイナミクスに影響を与える組織的プラクティスに関する洞察を明らかにすることを目的としていた。我々の結果は、ニュース組織のすべての層にまたがって、AIに関する情報を共有するためのより良い構造を構築することの重要性を強調します。 The effective adoption of responsible AI practices in journalism requires a concerted effort to bridge different perspectives, including technological, editorial, journalistic, and managerial. Among the many challenges that could impact information sharing around responsible AI inside news organizations are knowledge silos, where information is isolated within one part of the organization and not easily shared with others. This study aims to explore if, and if so, how, knowledge silos affect the adoption of responsible AI practices in journalism through a cross-case study of four major Dutch media outlets. We examine the individual and organizational barriers to AI knowledge sharing and the extent to which knowledge silos could impede the operationalization of responsible AI initiatives inside newsrooms. To address this question, we conducted 14 semi-structured interviews with editors, managers, and journalists at de Telegraaf, de Volkskrant, the Nederlandse Omroep Stichting (NOS), and RTL Nederland. The interviews aimed to uncover insights into the existence of knowledge silos, their effects on responsible AI practice adoption, and the organizational practices influencing these dynamics. Our results emphasize the importance of creating better structures for sharing information on AI across all layers of news organizations.	翻訳日:2024-11-04 23:00:28 公開日:2024-10-23
# 知識サイロがジャーナリズムにおける責任あるAI実践に及ぼす影響 The Impact of Knowledge Silos on Responsible AI Practices in Journalism ( http://arxiv.org/abs/2410.01138v2 ) ライセンス: Link先を確認	Tomás Dodds, Astrid Vandendaele, Felix M. Simon, Natali Helberger, Valeria Resendez, Wang Ngai Yeung,	(参考訳) ジャーナリズムにおける責任あるAIプラクティスの効果的な採用には、技術、編集、ジャーナリスト、管理など、さまざまな視点を橋渡しするための協力的な努力が必要である。ニュース組織内の責任あるAIに関する情報共有に影響を与える可能性のある多くの課題の1つは、知識サイロである。この研究は、ナレッジサイロがジャーナリズムにおける責任あるAIプラクティスの採用にどのように影響するかを、オランダの主要4メディアのクロスケーススタディを通じて調査することを目的としている。我々は、AI知識共有に対する個人的および組織的障壁と、知識サイロがニュースルーム内の責任あるAIイニシアチブの運用にどんな影響を及ぼすかを検討する。この問題に対処するため,我々はDe Telegraaf,de Volkskrant,Nederlandse Omroep Stichting (NOS), RTL Nederlandの編集者,マネージャ,ジャーナリストらと14回の半構造化インタビューを行った。インタビューは、知識サイロの存在、AI実践の責任ある採用に対する影響、そしてこれらのダイナミクスに影響を与える組織的プラクティスに関する洞察を明らかにすることを目的としていた。我々の結果は、ニュース組織のすべての層にまたがって、AIに関する情報を共有するためのより良い構造を構築することの重要性を強調します。 The effective adoption of responsible AI practices in journalism requires a concerted effort to bridge different perspectives, including technological, editorial, journalistic, and managerial. Among the many challenges that could impact information sharing around responsible AI inside news organizations are knowledge silos, where information is isolated within one part of the organization and not easily shared with others. This study aims to explore if, and if so, how, knowledge silos affect the adoption of responsible AI practices in journalism through a cross-case study of four major Dutch media outlets. We examine the individual and organizational barriers to AI knowledge sharing and the extent to which knowledge silos could impede the operationalization of responsible AI initiatives inside newsrooms. To address this question, we conducted 14 semi-structured interviews with editors, managers, and journalists at de Telegraaf, de Volkskrant, the Nederlandse Omroep Stichting (NOS), and RTL Nederland. The interviews aimed to uncover insights into the existence of knowledge silos, their effects on responsible AI practice adoption, and the organizational practices influencing these dynamics. Our results emphasize the importance of creating better structures for sharing information on AI across all layers of news organizations.	翻訳日:2024-11-04 23:00:28 公開日:2024-10-23
# 回帰課題に対するラプラス近似によるメタラーニングのばらつき低減 Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks ( http://arxiv.org/abs/2410.01476v1 ) ライセンス: Link先を確認	Alfredo Reichlin, Gustaf Tegnér, Miguel Vasco, Hang Yin, Mårten Björkman, Danica Kragic,	(参考訳) 有限個のサンプルポイントを与えられたメタラーニングアルゴリズムは、新しい、目に見えないタスクに対する最適な適応戦略を学ぶことを目的としている。多くの場合、このデータは異なるタスクに同時に属する可能性があるため、曖昧である。これは特にメタ回帰タスクではそうである。このような場合、推定適応戦略は各タスクに対するサポートデータの限られた量によって高いばらつきを伴い、しばしば準最適一般化性能をもたらす。本研究では,勾配に基づくメタラーニングにおける分散化の問題に対処し,それに伴う問題のクラスを形式化し,これを「emph{task overlap}」と呼ぶ。具体的には,各支持点をパラメータ上の後方の分散によって個別に重み付けすることで,勾配推定のばらつきを低減する手法を提案する。後部を推定するためにLaplace近似を使い、メタラーナーの損失景観の曲率を表現できる。実験により,提案手法の有効性を実証し,メタラーニングにおける分散化の重要性を強調した。 Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.	翻訳日:2024-11-04 17:34:40 公開日:2024-10-23
# 回帰課題に対するラプラス近似によるメタラーニングのばらつき低減 Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks ( http://arxiv.org/abs/2410.01476v2 ) ライセンス: Link先を確認	Alfredo Reichlin, Gustaf Tegnér, Miguel Vasco, Hang Yin, Mårten Björkman, Danica Kragic,	(参考訳) 有限個のサンプルポイントを与えられたメタラーニングアルゴリズムは、新しい、目に見えないタスクに対する最適な適応戦略を学ぶことを目的としている。多くの場合、このデータは異なるタスクに同時に属する可能性があるため、曖昧である。これは特にメタ回帰タスクではそうである。このような場合、推定適応戦略は各タスクに対するサポートデータの限られた量によって高いばらつきを伴い、しばしば準最適一般化性能をもたらす。本研究では,勾配に基づくメタラーニングにおける分散化の問題に対処し,それに伴う問題のクラスを形式化し,これを「emph{task overlap}」と呼ぶ。具体的には,各支持点をパラメータ上の後方の分散によって個別に重み付けすることで,勾配推定のばらつきを低減する手法を提案する。後部を推定するためにLaplace近似を使い、メタラーナーの損失景観の曲率を表現できる。実験により,提案手法の有効性を実証し,メタラーニングにおける分散化の重要性を強調した。 Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.	翻訳日:2024-11-04 17:34:40 公開日:2024-10-23
# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃 SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v1 ) ライセンス: Link先を確認	Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,	(参考訳) 制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。私たちのコードはhttps://github.com/Pan-Zihao/SCA}{https://github.com/Pan-Zihao/SCAで見られます。 Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our code can be found at https://github.com/Pan-Zihao/SCA}{https://github.com/Pan-Zihao/SCA.	翻訳日:2024-11-04 07:46:05 公開日:2024-10-23
# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃 SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v2 ) ライセンス: Link先を確認	Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,	(参考訳) 制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。私たちのコードはhttps://github.com/Pan-Zihao/SCA.orgで見られます。 Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our code can be found at https://github.com/Pan-Zihao/SCA.	翻訳日:2024-11-04 07:46:05 公開日:2024-10-23
# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃 SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v3 ) ライセンス: Link先を確認	Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,	(参考訳) センシティブな環境にデプロイされたディープニューラルネットワークベースのシステムは、敵の攻撃に対して脆弱である。制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。我々の研究はマルチメディア情報のセキュリティにさらに注意を向けることができる。 Deep neural network based systems deployed in sensitive environments are vulnerable to adversarial attacks. Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our research can further draw attention to the security of multimedia information.	翻訳日:2024-11-04 07:46:05 公開日:2024-10-23
# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃 SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v4 ) ライセンス: Link先を確認	Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,	(参考訳) センシティブな環境にデプロイされたディープニューラルネットワークベースのシステムは、敵の攻撃に対して脆弱である。制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。我々の研究はマルチメディア情報のセキュリティにさらに注意を向けることができる。 Deep neural network based systems deployed in sensitive environments are vulnerable to adversarial attacks. Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our research can further draw attention to the security of multimedia information.	翻訳日:2024-11-04 07:46:05 公開日:2024-10-23
# SafeguardはDouble-edged Sword:大規模言語モデルに対するDoS攻撃 Safeguard is a Double-edged Sword: Denial-of-service Attack on Large Language Models ( http://arxiv.org/abs/2410.02916v1 ) ライセンス: Link先を確認	Qingzhao Zhang, Ziyang Xiong, Z. Morley Mao,	(参考訳) 安全性は、オープンデプロイメントにおける大きな言語モデル(LLM)の最大の関心事である。この目的のために、安全確保法は、安全アライメントやガードレール機構を通じて、LLMの倫理的かつ責任ある使用を強制することを目的としている。しかし、悪意のある攻撃者は、セーフガードの偽陽性を悪用し、すなわち、セーフガードモデルを騙してセーフコンテンツが誤ってブロックされることを発見し、LLMに対する新たなDoS攻撃につながった。具体的には、ユーザークライアントソフトウェアに対するソフトウェアやフィッシング攻撃によって、攻撃者は構成ファイルのテンプレートに短い、一見無害な敵のプロンプトを挿入する。勾配情報と注意情報を利用する最適化プロセスの設計により、Llama Guard 3上の99%以上のユーザリクエストを普遍的にブロックする、約30文字の、一見安全な敵のプロンプトを自動的に生成できる。この攻撃は、昔ながらのジェイルブレイクと根本的に異なる偽陽性に焦点を当てたLSMのセーフガードを評価する新しい次元を示す。 Safety is a paramount concern of large language models (LLMs) in their open deployment. To this end, safeguard methods aim to enforce the ethical and responsible use of LLMs through safety alignment or guardrail mechanisms. However, we found that the malicious attackers could exploit false positives of safeguards, i.e., fooling the safeguard model to block safe content mistakenly, leading to a new denial-of-service (DoS) attack on LLMs. Specifically, by software or phishing attacks on user client software, attackers insert a short, seemingly innocuous adversarial prompt into to user prompt templates in configuration files; thus, this prompt appears in final user requests without visibility in the user interface and is not trivial to identify. By designing an optimization process that utilizes gradient and attention information, our attack can automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97\% of user requests on Llama Guard 3. The attack presents a new dimension of evaluating LLM safeguards focusing on false positives, fundamentally different from the classic jailbreak.	翻訳日:2024-11-03 04:55:13 公開日:2024-10-23
# SafeguardはDouble-edged Sword:大規模言語モデルに対するDoS攻撃 Safeguard is a Double-edged Sword: Denial-of-service Attack on Large Language Models ( http://arxiv.org/abs/2410.02916v2 ) ライセンス: Link先を確認	Qingzhao Zhang, Ziyang Xiong, Z. Morley Mao,	(参考訳) 安全性は、オープンデプロイメントにおける大きな言語モデル(LLM)の最大の関心事である。この目的のために、安全確保法は、安全アライメントやガードレール機構を通じて、LLMの倫理的かつ責任ある使用を強制することを目的としている。しかし、悪意のある攻撃者は、セーフガードの偽陽性を悪用し、すなわち、セーフガードモデルを騙してセーフコンテンツが誤ってブロックされることを発見し、LLMに対する新たなDoS攻撃につながった。具体的には、ユーザークライアントソフトウェアに対するソフトウェアやフィッシング攻撃によって、攻撃者は構成ファイルのテンプレートに短い、一見無害な敵のプロンプトを挿入する。勾配情報と注意情報を利用する最適化プロセスの設計により、Llama Guard 3上の99%以上のユーザリクエストを普遍的にブロックする、約30文字の、一見安全な敵のプロンプトを自動的に生成できる。この攻撃は、昔ながらのジェイルブレイクと根本的に異なる偽陽性に焦点を当てたLSMのセーフガードを評価する新しい次元を示す。 Safety is a paramount concern of large language models (LLMs) in their open deployment. To this end, safeguard methods aim to enforce the ethical and responsible use of LLMs through safety alignment or guardrail mechanisms. However, we found that the malicious attackers could exploit false positives of safeguards, i.e., fooling the safeguard model to block safe content mistakenly, leading to a new denial-of-service (DoS) attack on LLMs. Specifically, by software or phishing attacks on user client software, attackers insert a short, seemingly innocuous adversarial prompt into to user prompt templates in configuration files; thus, this prompt appears in final user requests without visibility in the user interface and is not trivial to identify. By designing an optimization process that utilizes gradient and attention information, our attack can automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97\% of user requests on Llama Guard 3. The attack presents a new dimension of evaluating LLM safeguards focusing on false positives, fundamentally different from the classic jailbreak.	翻訳日:2024-11-03 04:55:13 公開日:2024-10-23
# 視覚言語モデルのための一般化可能なプロンプトチューニング Generalizable Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2410.03189v1 ) ライセンス: Link先を確認	Qian Zhang,	(参考訳) CLIPのようなビジョン言語モデルのプロンプトチューニングでは、特定の下流タスクのための画像テキストペアを生成するために使用されるテキストプロンプトを最適化する。手作りのプロンプトやテンプレートベースのプロンプトは一般的に、目に見えない幅広いクラスに適用できるが、下流のタスク(例えば、目に見えないクラス)ではパフォーマンスが悪くなる傾向がある。一方、学習可能なソフトプロンプトは下流のタスクではよく機能するが、一般化性に欠ける。さらに、先行研究は主にテキストのモダリティに集中しており、視覚のモダリティからプロンプトの一般化の可能性を探究する研究はほとんどない。これらの制約を念頭に置いて、競争力のある下流性能と一般化の両方を得るために、チューニングを迅速に行う方法について検討する。本研究は,ソフトプロンプトと手作りプロンプトをテキストモダリティの双対ビューとして扱うことにより,それらの相互情報を最大化することにより,タスク特化情報と一般的な意味情報をよりうまくアンサンブルすることができることを示す。さらに、より表現力のあるプロンプトを生成するために、視覚的モダリティからのクラスワイド増強を導入し、より広い範囲の未確認クラスに顕著なロバスト性をもたらす。いくつかのベンチマークでは、提案手法はタスク固有の性能と一般的な能力の両面で競合する結果が得られると報告されている。 Prompt tuning for vision-language models such as CLIP involves optimizing the text prompts used to generate image-text pairs for specific downstream tasks. While hand-crafted or template-based prompts are generally applicable to a wider range of unseen classes, they tend to perform poorly in downstream tasks (i.e., seen classes). Learnable soft prompts, on the other hand, often perform well in downstream tasks but lack generalizability. Additionally, prior research has predominantly concentrated on the textual modality, with very few studies attempting to explore the prompt's generalization potential from the visual modality. Keeping these limitations in mind, we investigate how to prompt tuning to obtain both a competitive downstream performance and generalization. The study shows that by treating soft and hand-crafted prompts as dual views of the textual modality, and maximizing their mutual information, we can better ensemble task-specific and general semantic information. Moreover, to generate more expressive prompts, the study introduces a class-wise augmentation from the visual modality, resulting in significant robustness to a wider range of unseen classes. Extensive evaluations on several benchmarks report that the proposed approach achieves competitive results in terms of both task-specific performance and general abilities.	翻訳日:2024-11-03 03:04:25 公開日:2024-10-23
# 視覚言語モデルのための一般化可能なプロンプトチューニング Generalizable Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2410.03189v2 ) ライセンス: Link先を確認	Qian Zhang,	(参考訳) CLIPのようなビジョン言語モデルのプロンプトチューニングでは、特定の下流タスクのための画像テキストペアを生成するために使用されるテキストプロンプトを最適化する。手作りのプロンプトやテンプレートベースのプロンプトは一般的に、目に見えない幅広いクラスに適用できるが、下流のタスク(例えば、目に見えないクラス)ではパフォーマンスが悪くなる傾向がある。一方、学習可能なソフトプロンプトは下流のタスクではよく機能するが、一般化性に欠ける。さらに、先行研究は主にテキストのモダリティに集中しており、視覚のモダリティからプロンプトの一般化の可能性を探究する研究はほとんどない。これらの制約を念頭に置いて、競争力のある下流性能と一般化の両方を得るために、チューニングを迅速に行う方法について検討する。本研究は,ソフトプロンプトと手作りプロンプトをテキストモダリティの双対ビューとして扱うことにより,それらの相互情報を最大化することにより,タスク特化情報と一般的な意味情報をよりうまくアンサンブルすることができることを示す。さらに、より表現力のあるプロンプトを生成するために、視覚的モダリティからのクラスワイド増強を導入し、より広い範囲の未確認クラスに顕著なロバスト性をもたらす。いくつかのベンチマークでは、提案手法はタスク固有の性能と一般的な能力の両面で競合する結果が得られると報告されている。 Prompt tuning for vision-language models such as CLIP involves optimizing the text prompts used to generate image-text pairs for specific downstream tasks. While hand-crafted or template-based prompts are generally applicable to a wider range of unseen classes, they tend to perform poorly in downstream tasks (i.e., seen classes). Learnable soft prompts, on the other hand, often perform well in downstream tasks but lack generalizability. Additionally, prior research has predominantly concentrated on the textual modality, with very few studies attempting to explore the prompt's generalization potential from the visual modality. Keeping these limitations in mind, we investigate how to prompt tuning to obtain both a competitive downstream performance and generalization. The study shows that by treating soft and hand-crafted prompts as dual views of the textual modality, and maximizing their mutual information, we can better ensemble task-specific and general semantic information. Moreover, to generate more expressive prompts, the study introduces a class-wise augmentation from the visual modality, resulting in significant robustness to a wider range of unseen classes. Extensive evaluations on several benchmarks report that the proposed approach achieves competitive results in terms of both task-specific performance and general abilities.	翻訳日:2024-11-03 03:04:25 公開日:2024-10-23
# 多モード核融合モデルのための勾配ベースジェイルブレイク画像 Gradient-based Jailbreak Images for Multimodal Fusion Models ( http://arxiv.org/abs/2410.03489v1 ) ライセンス: Link先を確認	Javier Rando, Hannah Korevaar, Erik Brinkman, Ivan Evtimov, Florian Tramèr,	(参考訳) 画像入力による言語モデルの強化は、個別の最適化を必要とするテキスト入力とは異なり、継続的な最適化を通じてより効果的なジェイルブレイク攻撃を可能にする可能性がある。しかし、新しいマルチモーダル融合モデルでは、非微分可能関数を用いて全ての入力モダリティをトークン化し、直接攻撃を妨げている。本稿では,トークン化を連続関数と近似し,連続的な最適化を可能にするトークン化ショートカットの概念を紹介する。我々はトークンライザショートカットを用いて、マルチモーダル融合モデルに対する最初のエンドツーエンドの勾配画像アタックを生成する。我々は、Chameleonモデルに対する攻撃を評価し、72.5%のプロンプトに対して有害な情報をもたらすジェイルブレイク画像を取得する。 Jailbreakイメージは、同じ目的で最適化されたテキストジェイルブレークよりも優れており、50倍の入力トークンを最適化するためには、計算予算が3倍低い。最後に、Circuit Breakersのような表現工学の防御は、テキストアタックのみで訓練され、敵画像の入力に効果的に転送できることがわかった。 Augmenting language models with image inputs may enable more effective jailbreak attacks through continuous optimization, unlike text inputs that require discrete optimization. However, new multimodal fusion models tokenize all input modalities using non-differentiable functions, which hinders straightforward attacks. In this work, we introduce the notion of a tokenizer shortcut that approximates tokenization with a continuous function and enables continuous optimization. We use tokenizer shortcuts to create the first end-to-end gradient image attacks against multimodal fusion models. We evaluate our attacks on Chameleon models and obtain jailbreak images that elicit harmful information for 72.5% of prompts. Jailbreak images outperform text jailbreaks optimized with the same objective and require 3x lower compute budget to optimize 50x more input tokens. Finally, we find that representation engineering defenses, like Circuit Breakers, trained only on text attacks can effectively transfer to adversarial image inputs.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-23
# 多モード核融合モデルのための勾配ベースジェイルブレイク画像 Gradient-based Jailbreak Images for Multimodal Fusion Models ( http://arxiv.org/abs/2410.03489v2 ) ライセンス: Link先を確認	Javier Rando, Hannah Korevaar, Erik Brinkman, Ivan Evtimov, Florian Tramèr,	(参考訳) 画像入力による言語モデルの強化は、個別の最適化を必要とするテキスト入力とは異なり、継続的な最適化を通じてより効果的なジェイルブレイク攻撃を可能にする可能性がある。しかし、新しいマルチモーダル融合モデルでは、非微分可能関数を用いて全ての入力モダリティをトークン化し、直接攻撃を妨げている。本稿では,トークン化を連続関数と近似し,連続的な最適化を可能にするトークン化ショートカットの概念を紹介する。我々はトークンライザショートカットを用いて、マルチモーダル融合モデルに対する最初のエンドツーエンドの勾配画像アタックを生成する。我々は、Chameleonモデルに対する攻撃を評価し、72.5%のプロンプトに対して有害な情報をもたらすジェイルブレイク画像を取得する。 Jailbreakイメージは、同じ目的で最適化されたテキストジェイルブレークよりも優れており、50倍の入力トークンを最適化するためには、計算予算が3倍低い。最後に、Circuit Breakersのような表現工学の防御は、テキストアタックのみで訓練され、敵画像の入力に効果的に転送できることがわかった。 Augmenting language models with image inputs may enable more effective jailbreak attacks through continuous optimization, unlike text inputs that require discrete optimization. However, new multimodal fusion models tokenize all input modalities using non-differentiable functions, which hinders straightforward attacks. In this work, we introduce the notion of a tokenizer shortcut that approximates tokenization with a continuous function and enables continuous optimization. We use tokenizer shortcuts to create the first end-to-end gradient image attacks against multimodal fusion models. We evaluate our attacks on Chameleon models and obtain jailbreak images that elicit harmful information for 72.5% of prompts. Jailbreak images outperform text jailbreaks optimized with the same objective and require 3x lower compute budget to optimize 50x more input tokens. Finally, we find that representation engineering defenses, like Circuit Breakers, trained only on text attacks can effectively transfer to adversarial image inputs.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-23
# P1-KAN 有効コルモゴロフ・アーノルドネットワークによる関数近似 P1-KAN an effective Kolmogorov Arnold Network for function approximation ( http://arxiv.org/abs/2410.03801v1 ) ライセンス: Link先を確認	Xavier Warin,	(参考訳) 新しいコルモゴロフ・アルノルドネットワーク(KAN)は、高次元の潜在的不規則関数を近似するために提案されている。精度で多層パーセプトロンより優れ、収束が速いことを示す。また、最近提案されたネットワークであるReLU-KANと比較し、ReLU-KANよりも時間がかかりますが、より正確です。 A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimension. We show that it outperforms multilayer perceptrons in terms of accuracy and converges faster. We also compare it with ReLU-KAN, a recently proposed network: it is more time consuming than ReLU-KAN, but more accurate.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-23
# P1-KAN 有効コルモゴロフ・アーノルドネットワークによる関数近似 P1-KAN an effective Kolmogorov Arnold Network for function approximation ( http://arxiv.org/abs/2410.03801v2 ) ライセンス: Link先を確認	Xavier Warin,	(参考訳) 新しいコルモゴロフ・アルノルドネットワーク(KAN)は、高次元の潜在的不規則関数を近似するために提案されている。精度で多層パーセプトロンより優れ、収束が速いことを示す。 P1-KANネットワークは不規則関数に対してより効果的であるのに対し、元のスプラインベースkanネットワークはスムーズ関数に対してより効果的であるように思われる。 A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimension. We show that it outperforms multilayer perceptrons in terms of accuracy and converges faster. We also compare it with several proposed KAN networks: the original spline-based KAN network appears to be more effective for smooth functions, while the P1-KAN network is more effective for irregular functions.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-23
# 合成進化によるコード優先学習 Learning Code Preference via Synthetic Evolution ( http://arxiv.org/abs/2410.03837v1 ) ライセンス: Link先を確認	Jiawei Liu, Thanh Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang,	(参考訳) 大規模言語モデル(LLM)は、最近顕著なコーディング機能を示した。しかし、十分に整ったプロパティに基づいてコード生成を評価し、それを開発者の好みに合わせることは依然として難しい。本稿では,コード優先学習という新たな課題の下で,2つの重要な課題について考察する。 i) コードに対する意味のある嗜好を予測するためにモデルをトレーニングするにはどうすればよいか? そして (ii)人間とLLMの嗜好は、検証可能なコードプロパティや開発者コードの嗜好とどのように一致しますか? この目的のために、コードコミットやコード批判を含む合成進化データからペアワイズなコード嗜好モデルをトレーニングするためのフレームワークであるCodeFavorを提案する。コード優先性を評価するために,1364個の厳格にキュレートされたコード優先タスクからなるベンチマークであるCodePrefBenchを紹介した。評価の結果、CodeFavorはモデルベースのコード優先の精度を最大28.8%改善した。一方、CodeFavorモデルは、34倍のコスト効率で、6-9倍のパラメータでモデルのパフォーマンスと一致させることができる。また、CodeFavorの設計選択を包括的な制御実験を通じて厳格に検証します。さらに、各タスクに23.4パーソナライズしたにもかかわらず、15.1-40.3%のタスクは未解決のままである。モデルに基づく嗜好と比較すると、人間の嗜好はコードの正確さを目標としつつ、機能的でない目的に準最適である傾向にある。 Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation based on well-formed properties and aligning it with developer preferences remains challenging. In this paper, we explore two key questions under the new challenge of code preference learning: (i) How do we train models to predict meaningful preferences for code? and (ii) How do human and LLM preferences align with verifiable code properties and developer code tastes? To this end, we propose CodeFavor, a framework for training pairwise code preference models from synthetic evolution data, including code commits and code critiques. To evaluate code preferences, we introduce CodePrefBench, a benchmark comprising 1364 rigorously curated code preference tasks to cover three verifiable properties-correctness, efficiency, and security-along with human preference. Our evaluation shows that CodeFavor holistically improves the accuracy of model-based code preferences by up to 28.8%. Meanwhile, CodeFavor models can match the performance of models with 6-9x more parameters while being 34x more cost-effective. We also rigorously validate the design choices in CodeFavor via a comprehensive set of controlled experiments. Furthermore, we discover the prohibitive costs and limitations of human-based code preference: despite spending 23.4 person-minutes on each task, 15.1-40.3% of tasks remain unsolved. Compared to model-based preference, human preference tends to be more accurate under the objective of code correctness, while being sub-optimal for non-functional objectives.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-23
# 合成進化によるコード優先学習 Learning Code Preference via Synthetic Evolution ( http://arxiv.org/abs/2410.03837v2 ) ライセンス: Link先を確認	Jiawei Liu, Thanh Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang,	(参考訳) 大規模言語モデル(LLM)は、最近顕著なコーディング機能を示した。しかし、十分に整ったプロパティに基づいてコード生成を評価し、それを開発者の好みに合わせることは依然として難しい。本稿では,コード優先学習という新たな課題の下で,2つの重要な課題について考察する。 i) コードに対する意味のある嗜好を予測するためにモデルをトレーニングするにはどうすればよいか? そして (ii)人間とLLMの嗜好は、検証可能なコードプロパティや開発者コードの嗜好とどのように一致しますか? この目的のために、コードコミットやコード批判を含む合成進化データからペアワイズなコード嗜好モデルをトレーニングするためのフレームワークであるCodeFavorを提案する。コード優先性を評価するために,1364個の厳格にキュレートされたコード優先タスクからなるベンチマークであるCodePrefBenchを紹介した。評価の結果、CodeFavorはモデルベースのコード優先の精度を最大28.8%改善した。一方、CodeFavorモデルは、34倍のコスト効率で、6-9倍のパラメータでモデルのパフォーマンスと一致させることができる。また、CodeFavorの設計選択を包括的な制御実験を通じて厳格に検証します。さらに、各タスクに23.4パーソナライズしたにもかかわらず、15.1-40.3%のタスクは未解決のままである。モデルに基づく嗜好と比較すると、人間の嗜好はコードの正確さを目標としつつ、機能的でない目的に準最適である傾向にある。 Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation based on well-formed properties and aligning it with developer preferences remains challenging. In this paper, we explore two key questions under the new challenge of code preference learning: (i) How do we train models to predict meaningful preferences for code? and (ii) How do human and LLM preferences align with verifiable code properties and developer code tastes? To this end, we propose CodeFavor, a framework for training pairwise code preference models from synthetic evolution data, including code commits and code critiques. To evaluate code preferences, we introduce CodePrefBench, a benchmark comprising 1364 rigorously curated code preference tasks to cover three verifiable properties-correctness, efficiency, and security-along with human preference. Our evaluation shows that CodeFavor holistically improves the accuracy of model-based code preferences by up to 28.8%. Meanwhile, CodeFavor models can match the performance of models with 6-9x more parameters while being 34x more cost-effective. We also rigorously validate the design choices in CodeFavor via a comprehensive set of controlled experiments. Furthermore, we discover the prohibitive costs and limitations of human-based code preference: despite spending 23.4 person-minutes on each task, 15.1-40.3% of tasks remain unsolved. Compared to model-based preference, human preference tends to be more accurate under the objective of code correctness, while being sub-optimal for non-functional objectives.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-23
# DiffSpec: 自然言語仕様とコードアーチファクトを使用したLLMによる微分テスト DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts ( http://arxiv.org/abs/2410.04249v1 ) ライセンス: Link先を確認	Nikitha Rao, Elizabeth Gilbert, Tahina Ramananandro, Nikhil Swamy, Claire Le Goues, Sarah Fakhoury,	(参考訳) 差分テストは、コンパイラ、ネットワークプロトコルパーサ、言語ランタイムなど、同じ仕様に準拠した複数の実装を持つソフトウェアシステムのバグを見つける効果的な方法である。このようなシステムの仕様は、インストラクション・セット・アーキテクチャ(ISA)仕様、Wasm仕様、IETF RFCなどの自然言語文書で標準化されることが多い。大きな言語モデル(LLM)は、テストの生成と大量の自然言語テキストの処理の両方の可能性を実証しており、仕様文書、バグレポート、コード実装などのアーティファクトを活用するのに適している。本研究では、自然言語とコードアーティファクトを活用し、LLMをガイドして、バグに対応するものを含む実装間の意味のある振る舞いの違いを強調する、ターゲットとなる有意義なテストを生成する。本稿では,プロンプト連鎖を用いたLCMによる差分テストを生成するフレームワークであるDiffSpecを紹介する。本稿では,2つの異なるシステム,すなわちeBPFランタイムとWasmバリデータに対するDiffSpecの有効性を示す。 DiffSpecを使って359の差別化テストを生成し、カーネルメモリリーク、ジャンプ命令の不整合挙動、スタックポインタの使用時の未定義動作を含む、少なくとも4つの異なる、確認されたeBPFのバグを発見した。 Wasmバリデータでは279の差別化テストが発見されました。 Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, and language runtimes. Specifications for such systems are often standardized in natural language documents, like Instruction Set Architecture (ISA) specifications, Wasm specifications or IETF RFC's. Large Language Models (LLMs) have demonstrated potential in both generating tests and handling large volumes of natural language text, making them well-suited for utilizing artifacts like specification documents, bug reports, and code implementations. In this work, we leverage natural language and code artifacts to guide LLMs to generate targeted, meaningful tests that highlight meaningful behavioral differences between implementations, including those corresponding to bugs. We introduce DiffSpec, a framework for generating differential tests with LLMs using prompt chaining. We demonstrate the efficacy of DiffSpec on two different systems, namely, eBPF runtimes and Wasm validators. Using DiffSpec, we generated 359 differentiating tests, uncovering at least four distinct and confirmed bugs in eBPF, including a kernel memory leak, inconsistent behavior in jump instructions, and undefined behavior when using the stack pointer. We also found 279 differentiating tests in Wasm validators, that point to at least 2 confirmed and fixed bugs.	翻訳日:2024-11-02 08:59:37 公開日:2024-10-23
# DiffSpec: 自然言語仕様とコードアーチファクトを使用したLLMによる微分テスト DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts ( http://arxiv.org/abs/2410.04249v2 ) ライセンス: Link先を確認	Nikitha Rao, Elizabeth Gilbert, Tahina Ramananandro, Nikhil Swamy, Claire Le Goues, Sarah Fakhoury,	(参考訳) 差分テストは、コンパイラ、ネットワークプロトコルパーサ、言語ランタイムなど、同じ仕様に準拠した複数の実装を持つソフトウェアシステムのバグを見つける効果的な方法である。このようなシステムの仕様は、インストラクション・セット・アーキテクチャ(ISA)仕様、Wasm仕様、IETF RFCなどの自然言語文書で標準化されることが多い。大きな言語モデル(LLM)は、テストの生成と大量の自然言語テキストの処理の両方の可能性を実証しており、仕様文書、バグレポート、コード実装などのアーティファクトを活用するのに適している。本研究では、自然言語とコードアーティファクトを活用し、LLMをガイドして、バグに対応するものを含む実装間の意味のある振る舞いの違いを強調する、ターゲットとなる有意義なテストを生成する。本稿では,プロンプト連鎖を用いたLCMによる差分テストを生成するフレームワークであるDiffSpecを紹介する。本稿では,2つの異なるシステム,すなわちeBPFランタイムとWasmバリデータに対するDiffSpecの有効性を示す。 DiffSpecを使って359の差別化テストを生成し、カーネルメモリリーク、ジャンプ命令の不整合挙動、スタックポインタの使用時の未定義動作を含む、少なくとも4つの異なる、確認されたeBPFのバグを発見した。 Wasm Validatorでは279の差別化テストがあり、Wizard Engineでは少なくとも2つの確認済みと修正済みのバグが確認されました。 Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, and language runtimes. Specifications for such systems are often standardized in natural language documents, like Instruction Set Architecture (ISA) specifications, Wasm specifications or IETF RFC's. Large Language Models (LLMs) have demonstrated potential in both generating tests and handling large volumes of natural language text, making them well-suited for utilizing artifacts like specification documents, bug reports, and code implementations. In this work, we leverage natural language and code artifacts to guide LLMs to generate targeted, meaningful tests that highlight meaningful behavioral differences between implementations, including those corresponding to bugs. We introduce DiffSpec, a framework for generating differential tests with LLMs using prompt chaining. We demonstrate the efficacy of DiffSpec on two different systems, namely, eBPF runtimes and Wasm validators. Using DiffSpec, we generated 359 differentiating tests, uncovering at least four distinct and confirmed bugs in eBPF, including a kernel memory leak, inconsistent behavior in jump instructions, and undefined behavior when using the stack pointer. We also found 279 differentiating tests in Wasm validators, that point to at least 2 confirmed and fixed bugs in Wizard Engine.	翻訳日:2024-11-02 08:59:37 公開日:2024-10-23
# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習 Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v1 ) ライセンス: Link先を確認	Yifan Wang, Cheng Zhang, Yuanndong Zhuang, Yongming Huang,	(参考訳) 人工知能をサポートする無線ネットワークは注目され、Over-the-Air Federated Learningがそのユニークな伝送特性と分散コンピューティング特性のために重要なアプリケーションとして登場している。本稿では,セルフリーMIMOシステムにおける過空フェデレーション学習の誤差境界を導出し,電力制御とビームフォーミングの連成最適化による最適性ギャップを最小化するために最適化問題を定式化する。 MOP-LOFPCアルゴリズムを導入し、Lyapunov最適化を用いてラウンド間の長期制約を分離し、因果チャネル状態情報のみを必要とする。実験結果から,MOP-LOFPCはモデルのトレーニング損失と,既存のベースラインと比較して長期的電力制約への固執との間に,より優れた,より柔軟なトレードオフを実現することが示された。 Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.	翻訳日:2024-11-01 19:07:22 公開日:2024-10-23
# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習 Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v2 ) ライセンス: Link先を確認	Yifan Wang, Cheng Zhang, Yuanndong Zhuang, Yongming Huang,	(参考訳) 人工知能をサポートする無線ネットワークは注目され、Over-the-Air Federated Learningがそのユニークな伝送特性と分散コンピューティング特性のために重要なアプリケーションとして登場している。本稿では,セルフリーMIMOシステムにおける過空フェデレーション学習の誤差境界を導出し,電力制御とビームフォーミングの連成最適化による最適性ギャップを最小化するために最適化問題を定式化する。 MOP-LOFPCアルゴリズムを導入し、Lyapunov最適化を用いてラウンド間の長期制約を分離し、因果チャネル状態情報のみを必要とする。実験結果から,MOP-LOFPCはモデルのトレーニング損失と,既存のベースラインと比較して長期的電力制約への固執との間に,より優れた,より柔軟なトレードオフを実現することが示された。 Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.	翻訳日:2024-11-01 19:07:22 公開日:2024-10-23
# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習 Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v3 ) ライセンス: Link先を確認	Yifan Wang, Cheng Zhang, Yuanndon Zhuang, Mingzeng Dai, Haiming Wang, Yongming Huang,	(参考訳) 人工知能をサポートする無線ネットワークは注目され、Over-the-Air Federated Learningがそのユニークな伝送特性と分散コンピューティング特性のために重要なアプリケーションとして登場している。本稿では,セルフリーMIMOシステムにおける過空フェデレーション学習の誤差境界を導出し,電力制御とビームフォーミングの連成最適化による最適性ギャップを最小化するために最適化問題を定式化する。 MOP-LOFPCアルゴリズムを導入し、Lyapunov最適化を用いてラウンド間の長期制約を分離し、因果チャネル状態情報のみを必要とする。実験結果から,MOP-LOFPCはモデルのトレーニング損失と,既存のベースラインと比較して長期的電力制約への固執との間に,より優れた,より柔軟なトレードオフを実現することが示された。 Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.	翻訳日:2024-11-01 19:07:22 公開日:2024-10-23
# 勾配ブースティング分類器の理解:$γ_j$の訓練・予測・役割 Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$ ( http://arxiv.org/abs/2410.05623v1 ) ライセンス: Link先を確認	Hung-Hsuan Chen,	(参考訳) Gradient Boosting Classifier (GBC)は、二分分類のための機械学習アルゴリズムで、予測エラーを最小限に抑えるために反復的に決定木を構築する。この文書はGBCのトレーニングと予測プロセスを説明し、ロジスティック損失関数の最適化に不可欠である端末ノード値$\gamma_j$の計算に焦点を当てている。テイラー級数近似を用いて$\gamma_j$を導き、アルゴリズムの実装のためにステップバイステップの擬似コードを提供する。このガイドはGBCの理論とその実践的応用を説明し、バイナリ分類タスクにおけるその有効性を示す。私たちは、読者が理解できるように、付録にステップバイステップの例を提供します。 The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC's training and prediction processes, focusing on the computation of terminal node values $\gamma_j$, which are crucial to optimizing the logistic loss function. We derive $\gamma_j$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm's implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.	翻訳日:2024-11-01 17:38:51 公開日:2024-10-23
# 勾配ブースティング分類器の理解:$γ_j$の訓練・予測・役割 Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$ ( http://arxiv.org/abs/2410.05623v2 ) ライセンス: Link先を確認	Hung-Hsuan Chen,	(参考訳) Gradient Boosting Classifier (GBC)は、二分分類のための機械学習アルゴリズムで、予測エラーを最小限に抑えるために反復的に決定木を構築する。この文書はGBCのトレーニングと予測プロセスを説明し、ロジスティック損失関数の最適化に不可欠である端末ノード値$\gamma_j$の計算に焦点を当てている。テイラー級数近似を用いて$\gamma_j$を導き、アルゴリズムの実装のためにステップバイステップの擬似コードを提供する。このガイドはGBCの理論とその実践的応用を説明し、バイナリ分類タスクにおけるその有効性を示す。私たちは、読者が理解できるように、付録にステップバイステップの例を提供します。 The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC's training and prediction processes, focusing on the computation of terminal node values $\gamma_j$, which are crucial to optimizing the logistic loss function. We derive $\gamma_j$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm's implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.	翻訳日:2024-11-01 17:38:51 公開日:2024-10-23
# 言語モデルは間接的エビデンスから文法的知識を誘導できるか? Can Language Models Induce Grammatical Knowledge from Indirect Evidence? ( http://arxiv.org/abs/2410.06022v1 ) ライセンス: Link先を確認	Miyu Oba, Yohei Oseki, Akiyo Fukatsu, Akari Haga, Hiroki Ouchi, Taro Watanabe, Saku Sugawara,	(参考訳) 文の受理性を判断するために文法的知識を誘導する言語モデルに必要なデータの種類と量。最近の言語モデルでは、人間に比べてデータ効率が向上する余地が残っている。本稿では,言語モデルが間接的データ(間接的証拠)を効率的に用いているかを検討する。対照的に、人間は間接的エビデンスを効率的に使用しており、これは効率的な言語習得に寄与する帰納的バイアスの1つと考えられている。この問題を調査するために、事前学習データと評価インスタンスに挿入されたトレーニングインスタンスからなるデータセットであるWIDET(Wug InDirect Evidence Test)を紹介した。我々は,新たに造語されたwug単語を用いた合成インスタンスを事前学習データに注入し,それらの単語に対する文法的受容性を評価する評価データにモデルの振る舞いを探索する。インジェクトされたインスタンスは、間接性と量のレベルを変化させて作成する。実験の結果, 言語モデルでは, 同じ構造を持つインスタンスに対して繰り返し露出しても文法的知識を誘導せず, 特定の言語現象における評価事例と語彙的項目でのみ異なることがわかった。本研究は,潜在的間接的証拠を用いて文法知識を誘導するモデルの構築という,今後の研究の方向性を示唆するものである。 What kinds of and how much data is necessary for language models to induce grammatical knowledge to judge sentence acceptability? Recent language models still have much room for improvement in their data efficiency compared to humans. This paper investigates whether language models efficiently use indirect data (indirect evidence), from which they infer sentence acceptability. In contrast, humans use indirect evidence efficiently, which is considered one of the inductive biases contributing to efficient language acquisition. To explore this question, we introduce the Wug InDirect Evidence Test (WIDET), a dataset consisting of training instances inserted into the pre-training data and evaluation instances. We inject synthetic instances with newly coined wug words into pretraining data and explore the model's behavior on evaluation data that assesses grammatical acceptability regarding those words. We prepare the injected instances by varying their levels of indirectness and quantity. Our experiments surprisingly show that language models do not induce grammatical knowledge even after repeated exposure to instances with the same structure but differing only in lexical items from evaluation instances in certain language phenomena. Our findings suggest a potential direction for future research: developing models that use latent indirect evidence to induce grammatical knowledge.	翻訳日:2024-11-01 11:30:40 公開日:2024-10-23
# 言語モデルは間接的エビデンスから文法的知識を誘導できるか? Can Language Models Induce Grammatical Knowledge from Indirect Evidence? ( http://arxiv.org/abs/2410.06022v2 ) ライセンス: Link先を確認	Miyu Oba, Yohei Oseki, Akiyo Fukatsu, Akari Haga, Hiroki Ouchi, Taro Watanabe, Saku Sugawara,	(参考訳) 文の受理性を判断するために文法的知識を誘導する言語モデルに必要なデータの種類と量。最近の言語モデルでは、人間に比べてデータ効率が向上する余地が残っている。本稿では,言語モデルが間接的データ(間接的証拠)を効率的に用いているかを検討する。対照的に、人間は間接的エビデンスを効率的に使用しており、これは効率的な言語習得に寄与する帰納的バイアスの1つと考えられている。この問題を調査するために、事前学習データと評価インスタンスに挿入されたトレーニングインスタンスからなるデータセットであるWIDET(Wug InDirect Evidence Test)を紹介した。我々は,新たに造語されたwug単語を用いた合成インスタンスを事前学習データに注入し,それらの単語に対する文法的受容性を評価する評価データにモデルの振る舞いを探索する。インジェクトされたインスタンスは、間接性と量のレベルを変化させて作成する。実験の結果, 言語モデルでは, 同じ構造を持つインスタンスに対して繰り返し露出しても文法的知識を誘導せず, 特定の言語現象における評価事例と語彙的項目でのみ異なることがわかった。本研究は,潜在的間接的証拠を用いて文法知識を誘導するモデルの構築という,今後の研究の方向性を示唆するものである。 What kinds of and how much data is necessary for language models to induce grammatical knowledge to judge sentence acceptability? Recent language models still have much room for improvement in their data efficiency compared to humans. This paper investigates whether language models efficiently use indirect data (indirect evidence), from which they infer sentence acceptability. In contrast, humans use indirect evidence efficiently, which is considered one of the inductive biases contributing to efficient language acquisition. To explore this question, we introduce the Wug InDirect Evidence Test (WIDET), a dataset consisting of training instances inserted into the pre-training data and evaluation instances. We inject synthetic instances with newly coined wug words into pretraining data and explore the model's behavior on evaluation data that assesses grammatical acceptability regarding those words. We prepare the injected instances by varying their levels of indirectness and quantity. Our experiments surprisingly show that language models do not induce grammatical knowledge even after repeated exposure to instances with the same structure but differing only in lexical items from evaluation instances in certain language phenomena. Our findings suggest a potential direction for future research: developing models that use latent indirect evidence to induce grammatical knowledge.	翻訳日:2024-11-01 11:30:40 公開日:2024-10-23
# 量子符号化技術の比較 Comparing Quantum Encoding Techniques ( http://arxiv.org/abs/2410.09121v1 ) ライセンス: Link先を確認	Nidhi Munikote, Ang Li, Chenxu Liu, Samuel Stein,	(参考訳) 量子コンピュータの能力が向上し続ければ、その応用の可能性も高まる。例えば、量子技術は機械学習を実行するために古典的なニューラルネットワークと統合されている。このように、または量子化学シミュレーションや暗号アプリケーションのような他の広く使われるために、古典的なデータは量子符号化によって量子状態に変換する必要がある。基礎、振幅、回転の3つの基本的な符号化法と、いくつかの提案された組み合わせがある。本研究では、特にハイブリッド量子古典機械学習の文脈における符号化手法について検討する。本研究は、QuClassi量子ニューラルネットワークアーキテクチャを用いて、MNISTデータセットから `3' と `6' 桁のバイナリ分類を行い、資源使用量と計算複雑性を考慮しつつ、精度、エントロピー、損失、ノイズ耐性などのいくつかの指標を得る。 As quantum computers continue to become more capable, the possibilities of their applications increase. For example, quantum techniques are being integrated with classical neural networks to perform machine learning. In order to be used in this way, or for any other widespread use like quantum chemistry simulations or cryptographic applications, classical data must be converted into quantum states through quantum encoding. There are three fundamental encoding methods: basis, amplitude, and rotation, as well as several proposed combinations. This study explores the encoding methods, specifically in the context of hybrid quantum-classical machine learning. Using the QuClassi quantum neural network architecture to perform binary classification of the `3' and `6' digits from the MNIST datasets, this study obtains several metrics such as accuracy, entropy, loss, and resistance to noise, while considering resource usage and computational complexity to compare the three main encoding methods.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-23
# 量子符号化技術の比較 Comparing Quantum Encoding Techniques ( http://arxiv.org/abs/2410.09121v2 ) ライセンス: Link先を確認	Nidhi Munikote,	(参考訳) 量子コンピュータの能力が向上し続ければ、その応用の可能性も高まる。例えば、量子技術は機械学習を実行するために古典的なニューラルネットワークと統合されている。このように、または量子化学シミュレーションや暗号アプリケーションのような他の広く使われるために、古典的なデータは量子符号化によって量子状態に変換する必要がある。基礎、振幅、回転の3つの基本的な符号化法と、いくつかの提案された組み合わせがある。本研究では、特にハイブリッド量子古典機械学習の文脈における符号化手法について検討する。本研究は、QuClassi量子ニューラルネットワークアーキテクチャを用いて、MNISTデータセットから `3' と `6' 桁のバイナリ分類を行い、資源使用量と計算複雑性を考慮しつつ、精度、エントロピー、損失、ノイズ耐性などのいくつかの指標を得る。 As quantum computers continue to become more capable, the possibilities of their applications increase. For example, quantum techniques are being integrated with classical neural networks to perform machine learning. In order to be used in this way, or for any other widespread use like quantum chemistry simulations or cryptographic applications, classical data must be converted into quantum states through quantum encoding. There are three fundamental encoding methods: basis, amplitude, and rotation, as well as several proposed combinations. This study explores the encoding methods, specifically in the context of hybrid quantum-classical machine learning. Using the QuClassi quantum neural network architecture to perform binary classification of the `3' and `6' digits from the MNIST datasets, this study obtains several metrics such as accuracy, entropy, loss, and resistance to noise, while considering resource usage and computational complexity to compare the three main encoding methods.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-23
# 自己回帰型タブラル変圧器を用いた事象予測のための簡易ベースライン A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers ( http://arxiv.org/abs/2410.10648v1 ) ライセンス: Link先を確認	Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum,	(参考訳) 例えば、クレジットカード取引が不正であるか、顧客が小売プラットフォームに商品を割り当てる格付けがあるかなどである。イベント予測への既存のアプローチには、コスト、脆性、タイムアウェアな位置埋め込み、学習行とフィールドエンコーディング、クラス不均衡に対処するオーバーサンプリングメソッドなど、アプリケーションに依存した技術がある。さらに、これらのアプローチは、例えば、すべての歴史的なイベントのラベルを知っている、あるいは、データの特徴自体ではなく、事前に指定されたラベルだけを予測する、といった特定のユースケースを前提としています。本研究では,基本的な位置埋め込みと因果言語モデリングの目的を有する標準自己回帰型LPM型トランスフォーマを用いた,単純だが柔軟なベースラインを提案する。私たちのベースラインは、一般的なデータセットで既存のアプローチよりも優れており、さまざまなユースケースに使用することができます。我々は、同じモデルがラベルを予測したり、欠落した値をインプットしたり、イベントシーケンスをモデル化できることを示した。 Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.	翻訳日:2024-10-29 20:25:02 公開日:2024-10-23
# 自己回帰型タブラル変圧器を用いた事象予測のための簡易ベースライン A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers ( http://arxiv.org/abs/2410.10648v2 ) ライセンス: Link先を確認	Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, C. Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum,	(参考訳) 例えば、クレジットカード取引が不正であるか、顧客が小売プラットフォームに商品を割り当てる格付けがあるかなどである。イベント予測への既存のアプローチには、コスト、脆性、タイムアウェアな位置埋め込み、学習行とフィールドエンコーディング、クラス不均衡に対処するオーバーサンプリングメソッドなど、アプリケーションに依存した技術がある。さらに、これらのアプローチは、例えば、すべての歴史的なイベントのラベルを知っている、あるいは、データの特徴自体ではなく、事前に指定されたラベルだけを予測する、といった特定のユースケースを前提としています。本研究では,基本的な位置埋め込みと因果言語モデリングの目的を有する標準自己回帰型LPM型トランスフォーマを用いた,単純だが柔軟なベースラインを提案する。私たちのベースラインは、一般的なデータセットで既存のアプローチよりも優れており、さまざまなユースケースに使用することができます。我々は、同じモデルがラベルを予測したり、欠落した値をインプットしたり、イベントシーケンスをモデル化できることを示した。 Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.	翻訳日:2024-10-29 20:25:02 公開日:2024-10-23

Title

Authors

Abstract

論文公表日・翻訳日

# 時間差学習の簡易化

Simplifying Deep Temporal Difference Learning ( http://arxiv.org/abs/2407.04811v2 )

ライセンス: Link先を確認

Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, Mario Martin,

(参考訳) Q-ラーニングは、フィールド強化学習(RL)において基礎的な役割を担った。しかし、Qラーニングやディープニューラルネットワークのような非線形関数近似のような非政治データを持つTDアルゴリズムは、主にリプレイバッファとターゲットネットワークのトレーニングを安定化するために、いくつかの追加のトリックを必要とする。残念なことに、ターゲットネットワークにおける凍結ネットワークパラメータの更新が遅れてサンプル効率が損なわれ、同様にリプレイバッファもメモリと実装のオーバーヘッドを発生させる。本稿では,安定性を維持しつつ,TDトレーニングの高速化と簡易化が可能であるかを検討する。我々の重要な理論的結果は、LayerNormのような正規化手法が、目的のネットワークを必要とせずに、たとえ政治外のデータであっても、確実に収束したTDアルゴリズムが得られることを初めて示している。実験的に、ベクトル化された環境によって可能とされたオンライン並列サンプリングは、リプレイバッファを必要とせずにトレーニングを安定化させる。これらの結果に触発され,より簡易なオンラインQ-LearningアルゴリズムであるPQNを提案する。意外なことに、この単純なアルゴリズムは、AtariのRainbow、HanabiのR2D2、SmaxのQMix、CraftaxのPPO-RNNといった複雑な手法と競合する。 PPOがゴーツーRLアルゴリズムになった時代に、PQNはQラーニングを実行可能な代替手段として再確立する。

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need of a replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, R2D2 in Hanabi, QMix in Smax, PPO-RNN in Craftax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes Q-learning as a viable alternative.

翻訳日:2024-11-08 23:35:45 公開日:2024-10-23

# Richelieu: AI外交のための自己進化型LLMベースのエージェント

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v2 )

ライセンス: Link先を確認

Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,

(参考訳) 外交は人間社会における最も洗練された活動の1つである。複数の当事者やエージェント間の複雑な相互作用には、社会的推論、交渉術、長期戦略計画など様々な能力が含まれる。従来のAIエージェントは、複数のエージェントを含むタスクにおいて、多段階ゲームやより大きなアクションスペースを扱う能力を確実に証明している。しかし外交は、特に必要な交渉段階を考慮して、決定空間の停滞を伴う。近年, LLMエージェントは, 複雑なマルチエージェント環境において, 従来のエージェントの境界を拡張できる可能性を示しているが, 複雑なマルチエージェント環境において, 非常に長い計画期間を扱うには不十分である。最先端のLLM技術を活用して、我々は、より強力なLLMベースの社会エージェントに3つのコアと必須の機能を組み合わせることで、高度に包括的なマルチエージェントミッションのために、AIの上限を人間のようなエージェントに向けて探索する第一歩を踏み出します。 1) 記憶とリフレクションを有する戦略的プランナー 2 社会的理由づけによる目標志向の交渉 3) 自己プレイゲームによって記憶を増強し, ループ内の人間を介さずに自己進化する。

Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop.

翻訳日:2024-11-08 23:02:19 公開日:2024-10-23

# Richelieu: AI外交のための自己進化型LLMベースのエージェント

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v3 )

ライセンス: Link先を確認

Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,

(参考訳) 外交は、社会的推論、交渉、長期戦略計画のスキルを必要とする複数の当事者間の複雑な相互作用を含む、人間社会における最も洗練された活動の1つである。従来のAIエージェントは、マルチエージェントタスクにおいて、マルチステップゲームと大きなアクションスペースを扱う能力を示した。しかし外交は、特に必要な交渉段階を考慮して、決定空間の停滞を伴う。大規模言語モデル(LLM)に基づく最近のエージェントは、様々なアプリケーションに可能性を示しているが、複雑なマルチエージェント設定において、計画期間の延長に苦慮している。 LLMベースのエージェントの最近の技術を活用し、我々は3つの基本的な機能を統合することで包括的なマルチエージェントミッションを実行することができる人間のようなエージェントを作るAIの可能性を探究することを目的としている。 1) 記憶とリフレクションによる戦略的計画 2 社会的理由づけによる目標志向の交渉 3) 自己学習ゲームによって記憶を増強し, ループ内に人間がいない自己進化を図った。

Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop.

翻訳日:2024-11-08 23:02:19 公開日:2024-10-23

# Richelieu: AI外交のための自己進化型LLMベースのエージェント

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy ( http://arxiv.org/abs/2407.06813v4 )

ライセンス: Link先を確認

Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang,

翻訳日:2024-11-08 23:02:19 公開日:2024-10-23

# Attribute or Abstain: 長期ドキュメントアシスタントとしての大規模言語モデル

Attribute or Abstain: Large Language Models as Long Document Assistants ( http://arxiv.org/abs/2407.07799v2 )

ライセンス: Link先を確認

Jan Buchmann, Xiao Liu, Iryna Gurevych,

(参考訳) LLMは人間が長い文書を扱うのを助けることができるが、幻覚で知られている。 LLMは、その応答を支持する証拠を提供し、検証可能性を高める。既存の属性に対するアプローチはRAG設定でのみ評価されている。これは、検索が不要な長いドキュメント設定とは大きく異なるが、助けになる可能性がある。これにより、属性の長い文書特定評価が欠落する。このギャップを埋めるために、LABは、6つの異なる長文タスクを属性付きでベンチマークし、異なるサイズの5 LLMに対して異なるアプローチで帰属する実験を行う。一つのステップで応答生成とエビデンス抽出という引用が、大規模で微調整されたモデルに最適であるのに対して、追加の検索は小さなモデルに有効であることがわかった。我々は,「中流の失われた」現象が帰属に有効であるかどうかを考察するが,この現象は見つからない。また、モデルが複雑なクレームのエビデンスの提供に苦労しているため、単純な応答を持つデータセットで応答品質を予測できるが、複雑なレスポンスには当てはまらないこともわかりました。

LLMs can help humans working with long documents, but are known to hallucinate. Attribution can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. Thus, a long document specific evaluation of attribution is missing. To fill this gap, we present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiments with different approaches to attribution on 5 LLMs of different sizes. We find that citation, i.e. response generation and evidence extraction in one step, performs best for large and fine-tuned models, while additional retrieval can help for small, prompted models. We investigate whether the "Lost in the Middle'' phenomenon exists for attribution, but do not find this. We also find that evidence quality can predict response quality on datasets with simple responses, but not so for complex responses, as models struggle with providing evidence for complex claims.

翻訳日:2024-11-08 22:40:08 公開日:2024-10-23

# Twitterのリアルタイム要約

Real-Time Summarization of Twitter ( http://arxiv.org/abs/2407.08125v2 )

ライセンス: Link先を確認

Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu,

(参考訳) 本稿では,Twitter のリアルタイム要約における TREC のアプローチについて述べる。我々は、リアルタイムプッシュ通知のシナリオに焦点を当て、システムはサンプルツイートのストリームを監視し、関連するツイートと新規なツイートを、特定の興味のあるプロフィールに返却する。ダイリクレスコア(ダイリクレスコア)は、非常にスムーズな(ベースライン)で、あるツイートが特定の関心プロファイルに関連するかどうかを分類するために使用される。平均利得(MAP),累積利得(CG),ディスカウント累積利得(DCG)などの指標を用いて,本手法が良好な性能を示すことを示す。また、プッシュキューから冗長なツイートを削除することも望まれる。精度の限界のため,本論文ではアルゴリズムについてのみ記述する。

In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.

翻訳日:2024-11-08 22:29:08 公開日:2024-10-23

# 算数推論のための自己学習言語モデル

Self-training Language Models for Arithmetic Reasoning ( http://arxiv.org/abs/2407.08400v2 )

ライセンス: Link先を確認

Marek Kadlčík, Michal Štefánik,

(参考訳) 最近の言語モデルは、複雑な多段階推論を含むタスクにおいて印象的な結果をもたらすが、これらの機能をさらに拡張するには、より注釈付きデータの高価な収集が必要である。本研究では,算術的推論(自己学習)における予測の有効性に対する自動フィードバックを用いることで,新しいデータなしにモデルの推論能力を向上させる可能性を検討する。 6つの異なる算術推論データセットの体系的な実験では、モデルは単ラウンド(オフライン)とオンラインの自己学習の両方で大幅に改善され、それぞれ+13.9%と+25.9%のケースで正しい結果が得られる。さらに、単一ラウンドでオフラインのセルフトレーニングでは、従来の教師あり学習は好みの最適化に匹敵する効果をもたらすが、オンラインのセルフトレーニングでは、教師あり学習よりも優れた安定性と、目に見えない種類の問題に対する堅牢性により、好みの最適化手法が優れていることが判明した。

Recent language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving models' reasoning capabilities without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). In systematic experimentation across six different arithmetic reasoning datasets, we find that models can substantially improve in both single-round (offline) and online self-training, reaching a correct result in +13.9% and +25.9% more cases, respectively, underlining the importance of actuality of self-training feedback. We further find that in the single-round, offline self-training, traditional supervised training can deliver gains comparable to preference optimization, but in online self-training, preference optimization methods largely outperform supervised training thanks to their superior stability and robustness on unseen types of problems.

翻訳日:2024-11-08 22:29:08 公開日:2024-10-23

# 算数推論のための自己学習言語モデル

Self-training Language Models for Arithmetic Reasoning ( http://arxiv.org/abs/2407.08400v3 )

ライセンス: Link先を確認

Marek Kadlčík, Michal Štefánik,

翻訳日:2024-11-08 22:29:08 公開日:2024-10-23

# 言語モデルを用いたA*探索の高速化のためのトレーニングデータ

A Training Data Recipe to Accelerate A* Search with Language Models ( http://arxiv.org/abs/2407.09985v2 )

ライセンス: Link先を確認

Devaansh Gupta, Boyang Li,

(参考訳) 大規模言語モデル(LLM)とA*のようなヒューリスティック検索アルゴリズムを組み合わせることで、LLM推論の強化とスケーラブルな推論が期待できる。 LLMヒューリスティック学習のトレーニングデータのコアセット選択問題について検討する。ヒューリスティック関数を学習する手法はほとんどなく,探索アルゴリズムと機械学習モデルとの相互作用を考察する。本研究では, A* 探索アルゴリズムの要件を LLM の要件から実証的に切り離して, この課題を一般化する。 A* は目標に近い探索ノードに対してより正確な予測を必要とし、LLM は効率的な一般化のために同じノードセットを必要とする。これらの知見により,LLMに基づくヒューリスティックス学習のためのデータ選択分布を導出する。迷路ナビゲーション,ソコバン,スライディングタイルパズルの3つの古典的計画領域において,我々の手法は,解を見つけるのに必要な反復回数を最大15倍に減らし,探索時間を最大5倍に短縮する。コードベースはhttps://github.com/devaansh100/a_starにある。

Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.

翻訳日:2024-11-08 21:43:45 公開日:2024-10-23

# AlleNoise: 実世界のラベルノイズを用いた大規模テキスト分類ベンチマークデータセット

AlleNoise: large-scale text classification benchmark dataset with real-world label noise ( http://arxiv.org/abs/2407.10992v2 )

ライセンス: Link先を確認

Alicja Rączkowska, Aleksandra Osowska-Kurczab, Jacek Szczerbiński, Kalina Jasinska-Kobus, Klaudia Nazarko,

(参考訳) ラベルノイズは、ロバストな分類モデルのトレーニングにおいて依然として課題である。ラベルノイズを緩和するほとんどの方法は、主に合成ノイズを伴うデータセットを用いてベンチマークされている。現実的なノイズ分布を持つデータセットの必要性は、WebVisionやChrothing1MといったWebスクラッドベンチマークによって部分的に解決されているが、これらのベンチマークはコンピュータビジョンドメインに限定されている。 Transformerベースのモデルの重要性が高まっているため、ノイズのあるラベルで学習するためのテキスト分類ベンチマークを確立することが不可欠である。本稿では、約5600のクラスに50,000以上のサンプルを含む実世界のインスタンス依存ラベルノイズを持つ、新しいキュレートされたテキスト分類ベンチマークであるAlleNoiseについて、有意義で階層的な分類法を補完する。ノイズの分布は、主要なeコマースマーケットプレースの実際のユーザから来ており、人間のミスの意味を現実的に反映している。ノイズラベルに加えて、典型的にはフィールドで使用されるWebスクラッドデータセットとは異なり、ノイズ分布に関する深い洞察を得るのに役立つ、人間検証されたクリーンラベルも提供します。このような実環境騒音に対処するには,雑音ラベルを用いた学習方法の代表的選択が不十分であることを示す。さらに,これらのアルゴリズムが過剰な暗記を緩和しないことを示す。そこで、AlleNoiseでは、テキスト分類タスクにおける実世界のラベルノイズを処理できるラベルノイズ法の開発のために、バーを高く設定する。コードとデータセットはhttps://github.com/allegro/AlleNoise.comからダウンロードできる。

Label noise remains a challenge for training robust classification models. Most methods for mitigating label noise have been benchmarked using primarily datasets with synthetic noise. While the need for datasets with realistic noise distribution has partially been addressed by web-scraped benchmarks such as WebVision and Clothing1M, those benchmarks are restricted to the computer vision domain. With the growing importance of Transformer-based models, it is crucial to establish text classification benchmarks for learning with noisy labels. In this paper, we present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise, containing over 500,000 examples across approximately 5,600 classes, complemented with a meaningful, hierarchical taxonomy of categories. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. In addition to the noisy labels, we provide human-verified clean labels, which help to get a deeper insight into the noise distribution, unlike web-scraped datasets typically used in the field. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise. In addition, we show evidence that these algorithms do not alleviate excessive memorization. As such, with AlleNoise, we set the bar high for the development of label noise methods that can handle real-world label noise in text classification tasks. The code and dataset are available for download at https://github.com/allegro/AlleNoise.

翻訳日:2024-11-08 21:21:36 公開日:2024-10-23

# 99の問題があるが、FLOPSは一つではない

I've Got 99 Problems But FLOPS Ain't One ( http://arxiv.org/abs/2407.12819v2 )

ライセンス: Link先を確認

Alexandru M. Gherghescu, Vlad-Andrei Bădoiu, Alexandru Agache, Mihai-Valentin Dumitru, Iuliu Vasilescu, Radu Mantu, Costin Raiciu,

(参考訳) ハイパースケーラは大規模なネットワーク展開の状況を支配していますが、直面する課題に関するデータや洞察を共有することはめったにありません。この優位性を考慮して、この分野で解決すべき問題は何か? 私たちは、機械学習アプリケーションのための1000億ドルのデータセンターを構築するための公開計画から始まり、関連する研究方向を見つけるために、従来からあるアプローチを取っています。法律をスケールする言語モデルを活用することで、データセンターのようなワークロードが持つものを見つけ、ネットワーク研究に焦点をあてて、その上で直面する課題を探求します。我々は、データセンターの構築とそのようなモデルの訓練は技術的に可能であると結論づけるが、これはDC間通信のための新しい広域輸送、マルチパストランスポート、データセンター内通信のための新しいデータセンタートポロジ、高速スケールアップネットワークおよびトランスポート、ネットワークコミュニティのための豊富な研究課題を概説する。

Hyperscalers dominate the landscape of large network deployments, yet they rarely share data or insights about the challenges they face. In light of this supremacy, what problems can we find to solve in this space? We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications. Leveraging the language models scaling laws, we discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies for intra-datacenter communication, high speed scale-up networks and transports, outlining a rich research agenda for the networking community.

翻訳日:2024-11-08 20:25:29 公開日:2024-10-23

# 想像の技:少数のデモから長い水平操作課題を学習する

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations ( http://arxiv.org/abs/2407.13432v2 )

ライセンス: Link先を確認

Jan Ole von Hartz, Tim Welschehold, Abhinav Valada, Joschka Boedecker,

(参考訳) Task Parametrized Gaussian Mixture Models (TP-GMM) は、オブジェクト中心のロボット操作タスクを学習するためのサンプル効率のよい手法である。しかし、TP-GMMの適用にはいくつかのオープンな課題がある。本研究では, 相乗的に3つの重要な課題に取り組む。第一に、エンドエフェクタ速度は非ユークリッドであり、したがって標準GMMを用いたモデリングは困難である。そこで本研究では,ロボットのエンドエフェクタ速度をその方向と大きさに分解し,リーマンGMMを用いてモデル化する。第二に、複雑な実演軌跡のセグメンテーションとシーケンシャルスキルに分解速度を利用する。セグメンテーションを通じて、スキルトラジェクトリをさらに整列させ、従って時間を強力な帰納バイアスとして活用する。第3に,視覚的観察からスキル毎のタスクパラメータを自動的に検出する手法を提案する。提案手法は,RGB-D観測のみを用いて,たった5つの実演から複雑な操作タスクを学習することを可能にする。 RLBenchの大規模実験により,20倍の試料効率向上を図った。我々のポリシーは様々な環境、オブジェクトインスタンス、オブジェクトの位置にまたがって一般化され、学習スキルは再利用されます。

Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.

翻訳日:2024-11-08 20:14:30 公開日:2024-10-23

# 想像の技:少数のデモから長い水平操作課題を学習する

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations ( http://arxiv.org/abs/2407.13432v3 )

ライセンス: Link先を確認

Jan Ole von Hartz, Tim Welschehold, Abhinav Valada, Joschka Boedecker,

翻訳日:2024-11-08 20:14:30 公開日:2024-10-23

# 言語リワードモデルのための目標条件付き表現の学習

Learning Goal-Conditioned Representations for Language Reward Models ( http://arxiv.org/abs/2407.13887v2 )

ライセンス: Link先を確認

Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx,

(参考訳) 従来の強化学習(RL)では,オフラインデータや自己教師対象による表現の改善を学習する技術が目覚ましい成果を上げている。それでも、表現学習の改善が、言語モデル(LM)における人間からのフィードバック(RLHF)からの強化学習にどのような効果があるかは明らかでない。本研究は、サンプル化された好ましくない軌道に沿った将来の状態の表現類似度を高め、ランダムにサンプリングされた非推奨軌道に沿った類似度を減少させることにより、対照的に$\textit{goal-conditioned}$ファッションのトレーニング報酬モデル(RM)を提案する。この目的により、MATHやGSM8kといった挑戦的なベンチマークにおいて、RM性能は最大0.09 AUROCまで大幅に向上した。これらの結果は、Helpful-Harmlessデータセット上の一般的なアライメントにも及んでいる。報酬モデルのパフォーマンスの改善以外にも、このRM表現のトレーニング方法により、$\textit{steerability}$の改善が可能となる。この洞察を活用すれば、過半数投票中に生成したトークンの最大5,5\%をフィルタして、トラジェクトリを破棄して、結果として“誤った”状態に陥り、コストを大幅に削減できることが分かります。さらに、これらの表現は、希望する将来の目標状態に条件付けすることで、きめ細かい制御を行うことができる。例えば、Llama 3モデルを有用な世代に向けて操ることで、教師付き微調整トレーニングベースラインよりも9.6\%の利便性が向上することを示す。同様に、複雑な世代に向けてモデルをステアリングすることで、ベースラインよりも21.6\%の複雑さが向上する。全体として、この対照的な目標条件の方法でのRMのトレーニングは、性能を大幅に改善し、モデルステアビリティを実現している。

Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive, $\textit{goal-conditioned}$ fashion by increasing the representation similarity of future states along sampled preferred trajectories and decreasing the similarity along randomly sampled dispreferred trajectories. This objective significantly improves RM performance by up to 0.09 AUROC across challenging benchmarks, such as MATH and GSM8k. These findings extend to general alignment as well -- on the Helpful-Harmless dataset, we observe $2.3\%$ increase in accuracy. Beyond improving reward model performance, we show this way of training RM representations enables improved $\textit{steerability}$ because it allows us to evaluate the likelihood of an action achieving a particular goal-state (e.g., whether a solution is correct or helpful). Leveraging this insight, we find that we can filter up to $55\%$ of generated tokens during majority voting by discarding trajectories likely to end up in an "incorrect" state, which leads to significant cost savings. We additionally find that these representations can perform fine-grained control by conditioning on desired future goal-states. For example, we show that steering a Llama 3 model towards helpful generations with our approach improves helpfulness by $9.6\%$ over a supervised-fine-tuning trained baseline. Similarly, steering the model towards complex generations improves complexity by $21.6\%$ over the baseline. Overall, we find that training RMs in this contrastive, goal-conditioned fashion significantly improves performance and enables model steerability.

翻訳日:2024-11-08 20:01:00 公開日:2024-10-23

# 量子電磁場によるスピン重ね合わせ状態のデコヒーレンス

Decoherence of spin superposition state caused by a quantum electromagnetic field ( http://arxiv.org/abs/2407.14581v2 )

ライセンス: Link先を確認

Kensuke Gallock-Yoshimura, Yuuki Sugiyama, Akira Matsumura, Kazuhiro Yamamoto,

(参考訳) 本研究では、ミンコフスキー時空における相対論的量子電磁場の存在下で、空間的に重畳された電気的中性スピン-$\frac12$粒子のデコヒーレンスについて検討する。スピン磁場結合によるデコヒーレンスを, 重畳軌道の各分岐に沿った2点相関関数から生じる局所的デコヒーレンスと, 重畳軌道間の相関関数から生じる非局所的デコヒーレンスに分類できることを示した。これらの効果は位相減衰と振幅減衰と関連している。また、量子場が熱状態で準備されている場合、デコヒーレンスは磁場温度とともに単調に増大することを示した。

In this study, we investigate the decoherence of a spatially superposed electrically neutral spin-$\frac12$ particle in the presence of a relativistic quantum electromagnetic field in Minkowski spacetime. We demonstrate that decoherence due to the spin-magnetic field coupling can be categorized into two distinct factors: local decoherence, originating from the two-point correlation functions along each branch of the superposed trajectories, and nonlocal decoherence, which arises from the correlation functions between the two superposed trajectories. These effects are linked to phase damping and amplitude damping. We also show that if the quantum field is prepared in a thermal state, decoherence monotonically increases with the field temperature.

翻訳日:2024-11-08 19:27:32 公開日:2024-10-23

# GPHM : 単眼頭アバター再建のためのガウスパラメトリック頭部モデル

GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction ( http://arxiv.org/abs/2407.15070v2 )

ライセンス: Link先を確認

Yuelang Xu, Zhaoqi Su, Qingyao Wu, Yebin Liu,

(参考訳) 高忠実度3D人間の頭部アバターの作成は、VR/AR、デジタル人間、映画製作における応用に不可欠である。近年の進歩は、変形可能な顔モデルを利用して、容易にアクセス可能なデータからアニメーションヘッドアバターを生成し、低次元パラメトリック空間内の様々なアイデンティティと表現を表現している。しかし、既存の手法は、例えばヘアスタイルのような複雑な外観の詳細をモデル化するのに苦労し、レンダリング品質と効率の低下に悩まされることが多い。本稿では,人間の頭部の複雑さを正確に表現するために,3次元ガウス的パラメトリック頭部モデル(3D Gaussian Parametric Head Model)を提案する。ガウスモデルは複雑な詳細を扱うことができ、様々な外観や複雑な表現の現実的な表現を可能にする。さらに、スムーズな収束を保証するために、よく設計されたトレーニングフレームワークを提示し、リッチコンテンツを学ぶための堅牢な保証を提供する。提案手法は,高画質でリアルタイムな実写レンダリングを実現し,パラメトリックヘッドモデルの分野に有意義な貢献をする。最後に、3Dガウスパラメトリックヘッドモデルをモノクロビデオや数発の頭部アバター再構成タスクに適用し、入力データが極端に制限された場合でも高品質な3Dヘッドアバターの即時再構築を可能にする。

Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.

翻訳日:2024-11-08 15:56:37 公開日:2024-10-23

# Conditional Language Policy: ステアブルな多目的ファインタニングのための汎用フレームワーク

Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning ( http://arxiv.org/abs/2407.15762v2 )

ライセンス: Link先を確認

Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent,

(参考訳) リワードベースの微調整は、言語ポリシーを意図した行動(創造性と安全性など)と整合させることに不可欠である。重要な課題は、複数の(競合する)目標を柔軟かつ効率的な方法でトレードオフする、ステアブル言語モデルを開発することである。本稿では,多目的言語モデルを微調整するための一般的なフレームワークである条件言語政策(CLP)について述べる。マルチタスクトレーニングとパラメータ効率の微調整のテクニックに基づいて、CLPは推論時に競合する目標を効果的にトレードオフするステアブルモデルを学習する。特に、目標間の異なるトレードオフを達成するために、トレーニングや複数のモデルのメンテナンスは必要ありません。 CLPは2つの要約データセットに関する広範な実験と改善を通じて,多目的ファインタニングにおける既存のアプローチを上回り,Paretoが優位に立つステアブル言語モデルを学習していることを示す。

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning.

翻訳日:2024-11-08 15:45:25 公開日:2024-10-23

# あらゆる場所を操作するための学習:強化学習のための視覚的一般化可能なフレームワーク

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning ( http://arxiv.org/abs/2407.15815v2 )

ライセンス: Link先を確認

Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu,

(参考訳) 多様なオープンワールドシナリオで動作するための一般化機能を備えたビジュモータロボットを開発できるだろうか? 本稿では,視覚的強化学習に適した一般化可能なフレームワークである「textbf{Maniwhere}」を提案する。具体的には,Spatial Transformer Network (STN) モジュールと融合した多視点表現学習手法を提案する。さらに、カリキュラムベースのランダム化と拡張手法を用いて、RLトレーニングプロセスの安定化と視覚一般化能力の向上を図る。 Maniwhereの有効性を示すために、3つのハードウェアプラットフォームにまたがる強力な視覚的一般化とsim2realトランスファー能力を示すために、明瞭なオブジェクト、バイマニュアル、デクスタスな手操作タスクを含む8つのタスクを慎重に設計した。実験の結果,Maniwhereは既存の最先端手法を著しく上回っていることがわかった。ビデオはhttps://gemcollector.github.io/maniwhere/で公開されている。

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://gemcollector.github.io/maniwhere/.

翻訳日:2024-11-08 15:45:25 公開日:2024-10-23

# 視覚テキストの理解と生成の調和

Harmonizing Visual Text Comprehension and Generation ( http://arxiv.org/abs/2407.16364v2 )

ライセンス: Link先を確認

Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie,

(参考訳) 本研究では,視覚テキストの理解と生成に熟練した,統一的で汎用的な多モード生成モデルであるTextHarmonyを提案する。画像とテキストを同時に生成すると、視覚と言語モダリティの固有の矛盾により、パフォーマンスが劣化する。この課題を克服するため、既存のアプローチでは、個別のモデルインスタンスを必要とする、教師付き微調整のためのモダリティ固有のデータを活用している。マルチモーダル生成空間を部分的に分離し,モダリティ特化およびモダリティ非依存のLoRAエキスパートを動的に集約するSlide-LoRAを提案する。 Slide-LoRAは特異モデルインスタンス内の視覚と言語の生成を調和させ、より統一された生成プロセスを促進する。さらに,高品質な画像キャプションデータセットであるDetailedTextCaps-100Kを開発した。様々なベンチマークによる総合的な実験により,提案手法の有効性が示された。 Slide-LoRAにインスパイアされたTextHarmonyは、パラメータがわずか2%増加し、ビジュアルテキスト理解タスクが2.5%、ビジュアルテキスト生成タスクが4.0%改善した。本研究は,視覚テキスト領域におけるマルチモーダル生成への統合的アプローチの実現可能性について述べる。コードはhttps://github.com/bytedance/TextHarmonyで入手できる。

In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervised fine-tuning, necessitating distinct model instances. We propose Slide-LoRA, which dynamically aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space. Slide-LoRA harmonizes the generation of vision and language within a singular model instance, thereby facilitating a more unified generative process. Additionally, we develop a high-quality image caption dataset, DetailedTextCaps-100K, synthesized with a sophisticated closed-source MLLM to enhance visual text generation capabilities further. Comprehensive experiments across various benchmarks demonstrate the effectiveness of the proposed approach. Empowered by Slide-LoRA, TextHarmony achieves comparable performance to modality-specific fine-tuning results with only a 2% increase in parameters and shows an average improvement of 2.5% in visual text comprehension tasks and 4.0% in visual text generation tasks. Our work delineates the viability of an integrated approach to multimodal generation within the visual text domain, setting a foundation for subsequent inquiries. Code is available at https://github.com/bytedance/TextHarmony.

翻訳日:2024-11-08 15:34:26 公開日:2024-10-23

# 学習からスピン"ペン"への教訓

Lessons from Learning to Spin "Pens" ( http://arxiv.org/abs/2407.18902v2 )

ライセンス: Link先を確認

Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang,

(参考訳) ハンマーやスクリュードライバーのような道具も同じような形をしているので、ペンのような物体を手作業で操作することは私たちの日常生活にとって重要なスキルです。しかし,従来の学習手法では,高品質な実演が欠如し,シミュレーションと実世界の間に大きなギャップがあるため,この課題に苦慮している。本研究では,ペンのような物体を回転させる能力を示すことによって,学習に基づく手動操作システムの境界を推し進める。まず、強化学習を用いて、特権情報でオラクルポリシーを訓練し、シミュレーションにおいて高忠実度軌道データセットを生成する。これは2つの目的がある。 1) シミュレーションにおける感覚士政策の事前学習 2) 実世界におけるオープンループ軌道再生の実施。次に、これらの実世界の軌道を用いて感覚運動のポリシーを微調整し、実世界の力学に適応させる。 50個未満の軌道で、我々のポリシーは、複数の革命のために異なる物理的特性を持つ10個以上のペンのような物体を回転させることを学ぶ。デザイン選択の包括的分析を行い、開発中に学んだ教訓を共有します。

In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.

翻訳日:2024-11-08 14:50:05 公開日:2024-10-23

# 深部ニューラルネットワークにおける特徴学習のバネブロック理論

A spring-block theory of feature learning in deep neural networks ( http://arxiv.org/abs/2407.19353v2 )

ライセンス: Link先を確認

Cheng Shi, Liming Pan, Ivan Dokmanić,

(参考訳) 特徴学習深層ネットは、定期的に低次元の幾何学にデータを徐々に崩壊させる。この現象は、非線形性、ノイズ、学習率、および力学を形成する他の選択の集合的作用からどのように生じるかは、顕微鏡神経力学から構築された第一原理理論を解明した。浅い層や深い層がより効果的に学習するレシエーションを識別するノイズ非線形位相図を示す。次に、図を再現するマクロ力学的理論を提案し、なぜいくつかのDNNが遅延でアクティブなのかを説明し、層をまたいだ特徴学習と一般化をリンクする。

Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively. We then propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-23

# AI生成画像検出のためのCLIPの逆ロバスト性探索

Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection ( http://arxiv.org/abs/2407.19553v2 )

ライセンス: Link先を確認

Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, Luisa Verdoliva,

(参考訳) 近年、AI生成画像の検出や悪意のある目的での使用を防止するために、多くの法医学的検知器が提案されている。畳み込みニューラルネットワーク(CNN)はこの分野で長い間支配的なアーキテクチャであり、激しい研究の対象となっている。しかし、最近提案されたTransformerベースの検出器は、特に一般化の点において、CNNベースの検出器と一致するか、さらに優れていることが示されている。本稿では,視覚変換器 (ViT) のバックボーンに依存するコントラスト言語-画像事前学習 (CLIP) 法に着目し,その性能をCNN法と比較し,AI生成画像検出器の対角的ロバスト性について検討する。種々の条件下で異なる敵攻撃に対するロバスト性について検討し、数値結果と周波数領域パターンの両方を解析する。 CLIPベースの検出器は、CNNベースの検出器と同様に、ホワイトボックス攻撃に対して脆弱である。しかし、攻撃はCNNベースのメソッドとCLIPベースのメソッド間で簡単に伝達できない。また、周波数領域における逆方向雑音パターンの異なる分布により、このことが確認される。全体として、この分析はより効果的な戦略を開発するのに役立つ法医学的検出器の特性に関する新たな洞察を提供する。

In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.

翻訳日:2024-11-08 14:27:29 公開日:2024-10-23

# タスクプロンプトベクトル:マルチタスクソフトプロンプト転送による効果的な初期化

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer ( http://arxiv.org/abs/2408.01119v2 )

ライセンス: Link先を確認

Robert Belanec, Simon Ostermann, Ivan Srba, Maria Bielikova,

(参考訳) プロンプトチューニングは、大きな言語モデル(LLM)をトレーニングするための効率的なソリューションである。しかし、現在のソフトプロンプトベースの手法は、しばしばマルチタスクのモジュラリティを犠牲にし、新たに追加されたタスクごとにトレーニングプロセスを完全にあるいは部分的に繰り返す必要がある。タスクベクトルに関する最近の研究は、望まれるマルチタスク性能を達成するために、フルモデルウェイトに算術演算を適用しているが、ソフトプロンプトに対する同様のアプローチはいまだに欠落している。そこで本研究では,調整したソフトプロンプトの重みとランダム初期化との要素的差異から生成したタスクプロンプトベクトルを提案する。 12個のNLUデータセットの実験結果から、タスクプロンプトベクトルを低リソース設定で使用して、類似タスクのプロンプトチューニングを効果的に初期化できることが示されている。さらに、タスクプロンプトベクトルは、2つの異なる言語モデルアーキテクチャ上でのプロンプトチューニングのランダム初期化とは無関係であることを示す。これにより、異なるタスクから事前訓練されたベクトルで即時算術を行うことができる。このようにして、複数のタスクからタスクプロンプトベクトルを算術的に加算することで、最先端のベースラインと競合する代替手段を提供する。

Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning on 2 different language model architectures. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, we provide a competitive alternative to state-of-the-art baselines by arithmetic addition of task prompt vectors from multiple tasks.

翻訳日:2024-11-08 13:18:17 公開日:2024-10-23

# MoC-System:Sparse Mixture-of-Experts Model Trainingのための効率的なフォールトトレランス

MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training ( http://arxiv.org/abs/2408.04307v2 )

ライセンス: Link先を確認

Weilin Cai, Le Qin, Jiayi Huang,

(参考訳) 大きな言語モデルがスケールアップを続けるにつれて、分散トレーニングシステムは10kノードを超えて拡張され、フォールトトレランスの重要性が増している。チェックポイントは耐障害性の主要な戦略として現れ、その効率を最適化するための広範な研究がなされている。しかし,Sparse Mixture-of-Experts (MoE) モデルの出現は,高密度モデルに対する計算要求に匹敵するものの,モデルサイズが大幅に増加するために新たな課題を提起する。本研究では,分散学習システムで発生する多数のチェックポイントシャードをオーケストレーションするMixture-of-Checkpoint System (MoC-System)を提案する。 MoC-Systemは、新しい部分エキスパートチェックポイント機構(PEC)を特徴としている。これはアルゴリズムシステムの共同設計で、選択した専門家のサブセットを戦略的に保存し、MoEチェックポイントのサイズを高密度モデルに匹敵するレベルに効果的に縮小する。ハイブリッド並列戦略を取り入れたMoC-Systemは、分散階級間でワークロードを均等に分散する完全シャードなチェックポイント戦略を含む。さらに、MoC-Systemはメモリ内スナップショットと永続化プロセスを非同期に処理する2段階のチェックポイント管理手法を導入している。 We build MoC-System on the Megatron-DeepSpeed framework, achieved a 98.9% down the overhead for each checkpointing process than the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism。さらに,本手法は,下流タスクにおける平均精度が1.08%向上しても,同等のモデル精度を維持しながら効率を向上させることを実証的分析により実証した。

As large language models continue to scale up, distributed training systems have expanded beyond 10k nodes, intensifying the importance of fault tolerance. Checkpoint has emerged as the predominant fault tolerance strategy, with extensive studies dedicated to optimizing its efficiency. However, the advent of the sparse Mixture-of-Experts (MoE) model presents new challenges due to the substantial increase in model size, despite comparable computational demands to dense models. In this work, we propose the Mixture-of-Checkpoint System (MoC-System) to orchestrate the vast array of checkpoint shards produced in distributed training systems. MoC-System features a novel Partial Experts Checkpointing (PEC) mechanism, an algorithm-system co-design that strategically saves a selected subset of experts, effectively reducing the MoE checkpoint size to levels comparable with dense models. Incorporating hybrid parallel strategies, MoC-System involves fully sharded checkpointing strategies to evenly distribute the workload across distributed ranks. Furthermore, MoC-System introduces a two-level checkpointing management method that asynchronously handles in-memory snapshots and persistence processes. We build MoC-System upon the Megatron-DeepSpeed framework, achieving up to a 98.9% reduction in overhead for each checkpointing process compared to the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism. Additionally, extensive empirical analyses substantiate that our methods enhance efficiency while maintaining comparable model accuracy, even achieving an average accuracy increase of 1.08% on downstream tasks.

翻訳日:2024-11-08 12:22:45 公開日:2024-10-23

# 異常予測: 明示的遅延と水平による新しいアプローチ

Anomaly Prediction: A Novel Approach with Explicit Delay and Horizon ( http://arxiv.org/abs/2408.04377v3 )

ライセンス: Link先を確認

Jiang You, Arben Cela, René Natowicz, Jacob Ouanounou, Patrick Siarry,

(参考訳) 時系列データの異常検出は、さまざまな領域において重要な課題である。従来の手法では、通常、後続のステップで異常を識別することに集中しており、多くの場合、遅延時間や異常の地平線といった時間的ダイナミクスの重要性を過小評価している。本稿では,時系列異常予測に時間的情報を直接組み込んだ新しい手法を提案する。本稿では,このアプローチの評価と,いくつかの最先端手法を用いた包括的実験を行うために設計された新しいデータセットを提案する。本研究は, 時間的, 正確な異常予測を行う上で, 提案手法の有効性を実証し, 今後の研究のための新しいベンチマークを設定した。

Anomaly detection in time series data is a critical challenge across various domains. Traditional methods typically focus on identifying anomalies in immediate subsequent steps, often underestimating the significance of temporal dynamics such as delay time and horizons of anomalies, which generally require extensive post-analysis. This paper introduces a novel approach for time series anomaly prediction, incorporating temporal information directly into the prediction results. We propose a new dataset specifically designed to evaluate this approach and conduct comprehensive experiments using several state-of-the-art methods. Our results demonstrate the efficacy of our approach in providing timely and accurate anomaly predictions, setting a new benchmark for future research in this field.

翻訳日:2024-11-08 12:22:45 公開日:2024-10-23

# 固定予算ベイズ型ベストアーム識別のためのUCB探索

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification ( http://arxiv.org/abs/2408.04869v2 )

ライセンス: Link先を確認

Rong J. B. Zhu, Yanqi Qiu,

(参考訳) 固定予算設定におけるベストアーム識別(BAI)について検討した。 UCBEのような上位信頼境界(UCB)に基づく適応的アロケーションは、BAIでうまく機能することが知られている。しかし、その最適後悔が理論的にインスタンスに依存していることはよく知られており、これは多くの固定予算のBAI問題においてアーティファクトであることが示されている。本稿では, ベイズ条件下での固定予算BAI問題に対して, 理論的かつ実験的に効率的なUPB探索アルゴリズムを提案する。鍵となる考え方は事前情報を学習することであり、これは累積的後悔の最小化問題において行ったような UCB ベースの BAI アルゴリズムの性能を向上させることができる。我々は、失敗確率とベイズ的BAI問題に対する単純な後悔の限界を確立し、次数 $\tilde{O}(\sqrt{K/n})$ の上限を対数因子まで与え、$n$ は予算を表し、$K$ は武器の数を表す。さらに,本手法が最先端のベースラインを一貫して上回ることを示す実証実験を行った。

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.

翻訳日:2024-11-08 12:11:36 公開日:2024-10-23

# 固定予算ベイズ型ベストアーム識別のためのUCB探索

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification ( http://arxiv.org/abs/2408.04869v3 )

ライセンス: Link先を確認

Rong J. B. Zhu, Yanqi Qiu,

翻訳日:2024-11-08 12:11:36 公開日:2024-10-23

# 混乱したパイロット:RAGによるLCMの混乱した副次的リスク

ConfusedPilot: Confused Deputy Risks in RAG-based LLMs ( http://arxiv.org/abs/2408.04870v5 )

ライセンス: Link先を確認

Ayush RoyChowdhury, Mulong Luo, Prateek Sahu, Sarbartha Banerjee, Mohit Tiwari,

(参考訳) 検索拡張生成(英: Retrieval augmented generation、RAG)とは、大規模言語モデル(LLM)がデータベースから有用な情報を検索し、応答を生成するプロセスである。日常業務の企業環境では人気が高まっている。例えば、Copilot for Microsoft 365は数百万のビジネスを蓄積している。しかしながら、RAGベースのシステムを採用する際のセキュリティへの影響は明らかでない。本稿では、Copilotを混乱させ、その応答に完全性と機密性を侵害するRAGシステムのセキュリティ脆弱性のクラスであるConfusedPilotを紹介する。まず、RAGの修正プロンプトに悪意のあるテキストを埋め込んだ脆弱性を調査し、LSMが生成した応答を破損させる。第2に、シークレットデータを漏洩する脆弱性を示し、検索時にキャッシュ機構を活用する。第3に,企業内の誤情報を伝播し,最終的に販売や製造といった業務に影響を及ぼすために,両脆弱性をどのように活用するかを検討する。また、RAGベースのシステムのアーキテクチャを調べることにより、これらの攻撃の根本原因についても論じる。本研究は,現在のRAGベースのシステムにおけるセキュリティ脆弱性を強調し,今後のRAGベースのシステムを保護するための設計ガイドラインを提案する。

Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear. In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.

翻訳日:2024-11-08 12:11:36 公開日:2024-10-23

# レーザー添加物製造における機械学習を用いたその場監視のための音響・視覚的クロスモーダル知識伝達

Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing ( http://arxiv.org/abs/2408.05307v2 )

ライセンス: Link先を確認

Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao,

(参考訳) レーザー添加物製造(LAM)プロセスにおける異常や欠陥を検出するために、機械学習(ML)に基づく様々なin-situモニタリングシステムを開発した。視覚、オーディオ、その他のモダリティからのデータを統合するマルチモーダルフュージョンは、監視性能を向上させることができるが、複数のセンサータイプを使用することにより、ハードウェア、計算、運用コストも向上する。本稿では,LAMインサイトモニタリングのための相互モダリティ・ナレッジ・トランスファー(CMKT)手法を提案する。 CMKTは、目標モダリティから抽出した特徴の表現性を向上し、予測中のソースモダリティセンサの除去を可能にする。本稿では,意味的アライメント,完全教師付きマッピング,半教師付きマッピングという3つのCMKT手法を提案する。セマンティックアライメント法は、モダリティ間の共有符号化空間を確立し、知識伝達を容易にする。これは、同じグループ(例えば、視覚的および音声的欠陥群)の分布を整合させる意味的アライメント損失と、異なるグループ(例えば、視覚的欠陥と音声的欠陥のないグループ)を区別する分離損失を用いる。この2つのマッピング手法は、完全な教師付きおよび半教師付き学習アプローチを用いて、特徴を1つのモダリティから別のモダリティへと導出することで知識を伝達する。 LAMインサイト欠陥検出のケーススタディでは,CMKT法とマルチモーダル・オーディオ・ビジュアル・フュージョンを比較した。セマンティックアライメント法は、予測フェーズ中に音質を除去しながら98.7%の精度を達成し、これはマルチモーダル融合によって得られる98.2%の精度に匹敵する。説明可能な人工知能を用いて,意味的アライメントCMKTは,モーダル間の固有相関を利用して,雑音を低減しつつ,より代表的な特徴を抽出できることを発見した。

Various machine learning (ML)-based in-situ monitoring systems have been developed to detect anomalies and defects in laser additive manufacturing (LAM) processes. While multimodal fusion, which integrates data from visual, audio, and other modalities, can improve monitoring performance, it also increases hardware, computational, and operational costs due to the use of multiple sensor types. This paper introduces a cross-modality knowledge transfer (CMKT) methodology for LAM in-situ monitoring, which transfers knowledge from a source modality to a target modality. CMKT enhances the representativeness of the features extracted from the target modality, allowing the removal of source modality sensors during prediction. This paper proposes three CMKT methods: semantic alignment, fully supervised mapping, and semi-supervised mapping. The semantic alignment method establishes a shared encoded space between modalities to facilitate knowledge transfer. It employs a semantic alignment loss to align the distributions of identical groups (e.g., visual and audio defective groups) and a separation loss to distinguish different groups (e.g., visual defective and audio defect-free groups). The two mapping methods transfer knowledge by deriving features from one modality to another using fully supervised and semi-supervised learning approaches. In a case study for LAM in-situ defect detection, the proposed CMKT methods were compared with multimodal audio-visual fusion. The semantic alignment method achieved an accuracy of 98.7% while removing the audio modality during the prediction phase, which is comparable to the 98.2% accuracy obtained through multimodal fusion. Using explainable artificial intelligence, we discovered that semantic alignment CMKT can extract more representative features while reducing noise by leveraging the inherent correlations between modalities.

翻訳日:2024-11-08 12:00:35 公開日:2024-10-23

# ランプ素子2分割インピーダンス整合SNAILパラメトリック増幅器

Lumped-element two-section impedance-matched SNAIL parametric amplifier ( http://arxiv.org/abs/2408.06154v2 )

ライセンス: Link先を確認

D. Moskaleva, N. Smirnov, D. Moskalev, A. Ivanov, A. Matanin, D. Baklykov, M. Teleganov, V. Polozov, V. Echeistov, E. Malevannaya, I. Korobenko. A. Kuguk, G. Nikerov, J. Agafonova, I. Rodionov,

(参考訳) 広帯域インピーダンス整合ジョセフソンパラメトリック増幅器は、高忠実度シングルショットマルチキュービット読み出しの鍵となる要素である。現在、いくつかのインピーダンス整合パラメトリック増幅器が提案されており、第1はクロップフェンシュタインテーパーに基づくインピーダンス整合パラメトリック増幅器、第2は補助共振器に基づくインピーダンス整合パラメトリック増幅器である。本稿では,2ユニットインピーダンス整合変換器を用いた量子制限型3波混合ラム素子SNAILパラメトリック増幅器を提案する。パラレルプレートコンデンサと超伝導平面コイルに基づく共振器付き2極チェビシェフ整合回路フラックスポンプモードで動作し、600MHzの帯域で平均15dB$、平均飽和電力で平均107dBm$、量子制限ノイズ温度で平均15dB$を実証した。

Broadband impedance-matched Josephson parametric amplifiers are key components for high-fidelity single-shot multi-qubit readout. Nowadays, several types of impedance matched parametric amplifiers have been proposed: the first is an impedance-matched parametric amplifier based on a Klopfenstein taper, and the second is an impedance-matched parametric amplifier based on auxiliary resonators. Here, we present the quantum-limited 3-wave-mixing lumped-element SNAIL parametric amplifier with two-units impedance matching transformer. A two-pole Chebyshev matching network with shunted resonators based on parallel-plate capacitors and superconducting planar coil. Operating in a flux-pumped mode, we experimentally demonstrate an average gain of $15dB$ across a $600MHz$ bandwidth, along with an average saturation power of $-107dBm$ and quantum-limited noise temperature.

翻訳日:2024-11-08 11:38:16 公開日:2024-10-23

# OWL2Vec4OA: オントロジーアライメントのための知識グラフ埋め込みの設計

OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment ( http://arxiv.org/abs/2408.06310v2 )

ライセンス: Link先を確認

Sevinj Teymurova, Ernesto Jiménez-Ruiz, Tillman Weyde, Jiaoyan Chen,

(参考訳) 交差するドメインをカバーするオントロジーの数が増えるにつれて、オントロジーのアライメントはセマンティック・インターオペラビリティの実現に不可欠である。本稿では、オントロジー埋め込みシステムOWL2Vec*の拡張であるOWL2Vec4OAを提案する。 OWL2Vec*は、オントロジー埋め込みの強力なテクニックとして登場したが、現在、オントロジーアライメントタスクへの埋め込みを調整するためのメカニズムが欠如している。 OWL2Vec4OAは、種子マッピングからのエッジ信頼値を組み込んでランダムウォーク戦略を導出する。本稿では,提案する拡張の理論的基礎,実装の詳細,および実験的評価を行い,オントロジーアライメントタスクの有効性を実証する。

Ontology alignment is integral to achieving semantic interoperability as the number of available ontologies covering intersecting domains is increasing. This paper proposes OWL2Vec4OA, an extension of the ontology embedding system OWL2Vec*. While OWL2Vec* has emerged as a powerful technique for ontology embedding, it currently lacks a mechanism to tailor the embedding to the ontology alignment task. OWL2Vec4OA incorporates edge confidence values from seed mappings to guide the random walk strategy. We present the theoretical foundations, implementation details, and experimental evaluation of our proposed extension, demonstrating its potential effectiveness for ontology alignment tasks.

翻訳日:2024-11-08 11:26:46 公開日:2024-10-23

# クラスバリアを壊す:クラス間特徴補償器による効率的なデータセット蒸留

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator ( http://arxiv.org/abs/2408.06927v2 )

ライセンス: Link先を確認

Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou,

(参考訳) データセット蒸留は、大規模で自然なデータセットから情報的特徴をコンパクトで合成的な形式に凝縮する技術として登場した。近年の進歩により、この技術は洗練されているが、その性能は一般的なクラス固有の合成パラダイムによってボトルネックになっている。このパラダイムの下では、合成データは事前に割り当てられた1ホットラベルにのみ最適化され、特徴凝縮における暗黙のクラスバリアを生成する。このことは, 蒸留予算の非効率利用, クラス間特徴分布の監視につながり, 最終的に有効性と効率を損なう結果となった。これらの制約を克服するために,本論文では,現在のデータセット蒸留法で広く利用されているクラス固有のデータラベルフレームワークを超越した,革新的な蒸留手法であるInter-class Feature Compensator (INFER)を提案する。特に、INFERはUniversal Feature Compensator (UFC)を活用して、クラス間の機能統合を強化し、単一のUFC入力から複数の合成インスタンスを生成する。これにより蒸留予算の効率が大幅に向上する。さらに、INFERは、蒸留中のクラス間相互作用を強化し、蒸留データの有効性と一般化性を高める。従来のデータセットと同様のラベルの線形補間を可能にすることにより、INFERは、合成データを厳密に最適化し、合成データセットのソフトラベルのサイズをほぼゼロに減らし、データセットの蒸留における効率と有効性のための新しいベンチマークを確立する。

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation.

翻訳日:2024-11-08 07:53:35 公開日:2024-10-23

# 準金属SWCNTにおける量子輸送ストレイントロニクスとメカニカルアハロノフ・ボーム効果

Quantum Transport Straintronics and Mechanical Aharonov-Bohm Effect in Quasi-metallic SWCNTs ( http://arxiv.org/abs/2408.10355v2 )

ライセンス: Link先を確認

L. Huang, G. Wei, A. R. Champagne,

(参考訳) 単層カーボンナノチューブ(SWCNT)は、原子的に精密なエッジを持つ2次元材料の効果的に狭いリボンである。量子輸送ストレトロニクス(QTS)、すなわち量子輸送を制御するための機械的ひずみを利用するのに理想的なシステムである。彼らの大きなサブバンドエネルギー間隔($ 0.8 eV)は、単一の量子輸送チャネルを持つトランジスタにつながる。一軸拘束された準金属-SWCNTトランジスタにおけるQTSの研究に応用モデルを適用した。デバイスパラメータは既存の実験プラットフォームに基づいており、チャネル長は$L=$50 nm、直径は$d\approx$1.5 nm、歪は$\varepsilon_{\text{tot}}\approx$7$\%である。電荷キャリアの伝搬角 $\Theta$ が $\varepsilon_{\text{tot}}$ で完全に調整可能であることを示す。 $\Theta$が90$^o$に達すると、コンダクタンス$G$は完全に抑制される。ひずみ発生バンドギャップは、$\approx$400 meVまで調整できる。機械ひずみはスカラー$\phi_{\varepsilon}$とベクトル$\textbf{A}$ゲージポテンシャルをトランジスタのハミルトニアンに付加する。これらのポテンシャルは、メカニカルなアハロノフ・ボーム効果として記述できる、$G$の量子干渉のスペクトルを豊富に生成する。電荷キャリアの量子相は純粋に機械的な手段で制御できる。例えば、フル2$\pi$の位相シフトは(12,9)チューブで0.7$\%のひずみ変化によって引き起こされる。この研究は、2D材料とそのナノチューブに基づく量子技術のツールボックスに定量的な量子輸送ひずみ効果を加える機会を開く。

Single-wall carbon nanotubes (SWCNTs) are effectively narrow ribbons of 2D materials with atomically precise edges. They are ideal systems to harness quantum transport straintronics (QTS), i.e. using mechanical strain to control quantum transport. Their large subband energy spacing ($\sim$ 0.8 eV) leads to transistors with a single quantum transport channel. We adapt an applied model to study QTS in uniaxially-strained quasi-metallic-SWCNT transistors. The device parameters are based on an existing experimental platform, with channel lengths of $L=$ 50 nm, diameters $d\approx$ 1.5 nm, and strains up to $\varepsilon_{\text{tot}}\approx$ 7 $\%$. We demonstrate that the charge carrier's propagation angle $\Theta$ is fully tunable with $\varepsilon_{\text{tot}}$. When $\Theta$ reaches 90$^o$, the conductance $G$ is completely suppressed. A strain-generated band gap can be tuned up to $\approx$ 400 meV. Mechanical strain adds both scalar $\phi_{\varepsilon}$ and vector $\textbf{A}$ gauge potentials to the transistor's Hamiltonian. These potentials create a rich spectrum of quantum interferences in $G$, which can be described as a mechanical Aharonov-Bohm effect. The charge carriers' quantum phase can be controlled by purely mechanical means. For instance, a full 2$\pi$ phase shift can be induced in a (12,9) tube by a 0.7 $\%$ strain change. This work opens opportunities to add quantitative quantum transport strain effects to the tools box of quantum technologies based on 2D materials and their nanotubes.

翻訳日:2024-11-08 06:44:48 公開日:2024-10-23

# KeySpace:惑星間ネットワークにおける公開鍵インフラストラクチャの考察

KeySpace: Public Key Infrastructure Considerations in Interplanetary Networks ( http://arxiv.org/abs/2408.10963v2 )

ライセンス: Link先を確認

Joshua Smailes, Sebastian Köhler, Simon Birnbach, Martin Strohmeier, Ivan Martinovic,

(参考訳) 衛星ネットワークが拡大し、惑星間通信を取り入れ始めるにつれ、これらの条件下でPKIにアプローチする方法に関する未解決問題への関心が高まっている。本稿では,メガコンステレーションと惑星間ネットワークに着目し,衛星ネットワークにおける鍵管理システムの実現に向けた目標と要件について検討する。我々は、特定のネットワークトポロジにおいて、システムとシステムを比較するのに使用できる標準化された実験のセットを設計する。これらを用いて、高度に分散した惑星間ネットワークにおいて地球上のPKI技術が実現可能であることを実証し、効率的な低遅延接続を実現するためにPKIシステムを構成でき、効果的な再起動による攻撃の影響を最小限に抑えることができることを示した。我々は,大規模な宇宙ネットワークの効率的なシミュレーションを目的とした新しいネットワークシミュレータであるDeep Space Network Simulator (DSNS) を構築し,これを評価した。広範囲なPKI構成で接続確立とキーの取り消しを評価するシミュレーションを実行する。最後に、OCSP Hybridとリレーノードをファイアウォールとして使用する2つの追加構成オプションを提案し、評価する。これらの組み合わせにより、攻撃者が妥協鍵で到達できるネットワークの範囲を最小化し、惑星間リレーリンクに対する攻撃者の負荷を低減できる。

As satellite networks grow larger and begin to incorporate interplanetary communication, there is an increasing interest in the unsolved problem of how to approach PKI in these conditions. In this paper we explore the goals and requirements for implementing key management systems in satellite networks, focusing on megaconstellations and interplanetary networks. We design a set of standardized experiments which can be used to compare systems against one another for particular network topologies. Using these, we demonstrate that terrestrial PKI techniques are feasible in highly distributed interplanetary networks, showing that it is possible to configure PKI systems to achieve efficient low-latency connection establishment, and minimize the impact of attacks through effective revocations. We evaluate this by building the Deep Space Network Simulator (DSNS), a novel network simulator aimed at efficient simulation of large space networks. We run simulations evaluating connection establishment and key revocation under a wide range of PKI configurations. Finally, we propose and evaluate two additional configuration options: OCSP Hybrid, and the use of relay nodes as a firewall. Together these minimize the extent of the network an attacker can reach with a compromised key, and reduce the attacker's load on interplanetary relay links.

翻訳日:2024-11-08 06:22:37 公開日:2024-10-23

# 機能選択のための大規模言語モデル探索:データ中心の視点

Exploring Large Language Models for Feature Selection: A Data-centric Perspective ( http://arxiv.org/abs/2408.12025v2 )

ライセンス: Link先を確認

Dawei Li, Zhen Tan, Huan Liu,

(参考訳) LLM(Large Language Models)の急速な進歩は様々な領域に大きく影響を与え、例外的な少数ショットとゼロショットの学習能力を活用している。本研究では,データ中心の観点からLLMに基づく特徴選択手法を探索し,理解することを目的としている。まず, LLM を用いた既存の特徴選択手法を, 統計的推測を行うためにサンプルの数値値を必要とするデータ駆動特徴選択と, 記述的文脈を用いた意味的関連付けを行うために LLM の事前知識を利用するテキストベースの特徴選択の2つのグループに分類することから始める。我々は, LLM の分類と回帰作業において, 様々なサイズ (例えば , GPT-4, ChatGPT, LLaMA-2) で実験を行った。本研究は,テキストベースの特徴選択手法の有効性とロバスト性を強調し,実世界の医療応用を用いてその可能性を示す。また,LLMを機能選択に活用する上での課題と今後の可能性についても論じ,この新興分野におけるさらなる研究・開発のための洞察を提供する。

The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, leveraging their exceptional few-shot and zero-shot learning capabilities. In this work, we aim to explore and understand the LLMs-based feature selection methods from a data-centric perspective. We begin by categorizing existing feature selection methods with LLMs into two groups: data-driven feature selection which requires numerical values of samples to do statistical inference and text-based feature selection which utilizes prior knowledge of LLMs to do semantical associations using descriptive context. We conduct experiments in both classification and regression tasks with LLMs in various sizes (e.g., GPT-4, ChatGPT and LLaMA-2). Our findings emphasize the effectiveness and robustness of text-based feature selection methods and showcase their potentials using a real-world medical application. We also discuss the challenges and future opportunities in employing LLMs for feature selection, offering insights for further research and development in this emerging field.

翻訳日:2024-11-08 05:49:00 公開日:2024-10-23

# TensorOpera Router: 効率的なLLM推論のためのマルチモデルルータ

TensorOpera Router: A Multi-Model Router for Efficient LLM Inference ( http://arxiv.org/abs/2408.12320v3 )

ライセンス: Link先を確認

Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He,

(参考訳) 様々なドメインにわたる大規模言語モデル(LLM)の急速な成長に伴い、多くの新しいLLMが出現し、それぞれがドメイン固有の専門知識を持っている。この増殖は、高速で高品質で費用対効果の高いLCMクエリ応答方法の必要性を強調している。しかし、このトリレンマを効率的にバランスさせるLLMは存在しない。一部のモデルは強力だが非常に高価であり、他のモデルは高速で安価だが質的に劣る。この課題に対処するために,TO-Routerを提案する。TO-RouterはモノリシックなLLMクエリシステムで,多様なLLM専門家をシームレスに単一のクエリインターフェースに統合し,クエリの要求に応じて,入力クエリを最も高性能なエキスパートに動的にルーティングする。大規模な実験により,TO-Routerは,スタンドアロンのエキスパートモデルと比較してクエリ効率を最大40%向上し,最大30%のコスト削減を実現し,モデル性能を最大10%向上させることを示した。

With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present TO-Router, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, TO-Router improves query efficiency by up to 40\%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

翻訳日:2024-11-08 05:49:00 公開日:2024-10-23

# 単光子検出器アレイを用いた線形多重光子数分解

Linearly Multiplexed Photon Number Resolving Single-photon Detectors Array ( http://arxiv.org/abs/2408.12345v2 )

ライセンス: Link先を確認

Leonardo Limongi, Francesco Martini, Thu Ha Dao, Alessandro Gaggero, Hamza Hasnaoui, Igor Lopez-Gonzalez, Fabio Chiarello, Fabio de Matteis, Alberto Quaranta, Andrea Salamon, Francesco Mattioli, Martino Bernard, Mirko Lobino,

(参考訳) 光子数分解検出器(英: Photon Number Resolving Detector、PNRD)は、入射光ビームに存在する光子数を測定する装置であり、光を量子レベルで測定し、特徴付けることができる。本稿では, 単一モード導波路上に集積された線形多重光子数分解型単一光子検出器アレイの性能と設計について考察する。本研究は, 種々の条件下でのアレーの忠実度の定義と解析に焦点をあて, 実装のための実用的な設計を提案する。理論的解析と数値シミュレーションにより, 伝搬損失や暗黒数の増加がシステムの性能にどのような影響を及ぼすかを示し, 実用化においてこれらの効果を緩和することの重要性を強調した。

Photon Number Resolving Detectors (PNRDs) are devices capable of measuring the number of photons present in an incident optical beam, enabling light sources to be measured and characterized at the quantum level. In this paper, we explore the performance and design considerations of a linearly multiplexed photon number-resolving single-photon detector array, integrated on a single mode waveguide. Our investigation focus on defining and analyzing the fidelity of such an array under various conditions and proposing practical designs for its implementation. Through theoretical analysis and numerical simulations, we show how propagation losses and dark counts may have a strong impact on the performance of the system and highlight the importance of mitigating these effects in practical implementations.

翻訳日:2024-11-08 05:37:29 公開日:2024-10-23

# 再検討によるプルーニング:CNNとトランスフォーマーの属性最適化

Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers ( http://arxiv.org/abs/2408.12568v2 )

ライセンス: Link先を確認

Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin,

(参考訳) より複雑な問題を解決するために、Deep Neural Networksは数十億のパラメータにスケールされ、膨大な計算コストがかかる。計算要求の削減と効率の向上のための効果的なアプローチは、しばしば過パラメータ化されたネットワークの不要なコンポーネントを創り出すことである。これまでの研究では、eXplainable AIの分野からの帰属法が、最も関係の低いネットワークコンポーネントを数ショットで抽出し、プルークする効果的な手段であることが示された。我々は、刈り取り作業における属性法のハイパーパラメーターを明示的に最適化し、解析にトランスフォーマーベースのネットワークを含めることにより、現在の状態を拡張した。提案手法は,ImageNet分類タスクにおいて高い性能を保ちながら,大規模トランスフォーマーおよび畳み込み型アーキテクチャ(VGG, ResNet, ViT)のモデル圧縮率を向上させる。ここでは, 畳み込みニューラルネットワークと比較して, トランスフォーマーの過度パラメータ化の度合いが高いことを示す。コードはhttps://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorchで入手できる。

To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at https://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorch.

翻訳日:2024-11-08 05:37:29 公開日:2024-10-23

# 通勤予測のための説明可能な階層型都市表現学習

Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction ( http://arxiv.org/abs/2408.14762v3 )

ライセンス: Link先を確認

Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto,

(参考訳) 通勤フロー予測は、現実の自治体の業務に欠かせない課題である。従来の研究では、複数の補助データを用いて都市内における通勤起因決定(OD)需要を推定することは可能であることが明らかになっている。しかし、既存の方法の多くは、維持すべき地理的単位の増加により、都道府県や全国で同様の業務を大規模にこなすには適していない。さらに、地域表現学習は、多様な都市下流課題に対する都市知識獲得のための普遍的なアプローチである。多くの研究者がマルチソースデータから都市単位を記述するための包括的枠組みを開発してきたが、選択した地理的要素の関係は明らかになっていない。さらに、都心部は、都市やその包括地区などの格付け構造を自然に保存しており、都市単位間の関係を解明する必要がある。そこで我々は,複数の空間解像度で有意な領域埋め込みを生成できる不均一なグラフベースモデルを構築し,異なるタイプのODフローを予測する。提案手法の有効性を実証するために,静岡県から収集した実世界の携帯電話データを用いた広範な実験を行った。その結果,提案モデルが一様都市構造の観点から既存モデルより優れていたことが示唆された。モデルの信頼性を高めるために、合理的な説明を用いて予測結果の理解を拡大する。

Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the increased number of geographical units that need to be maintained. In addition, region representation learning is a universal approach for gaining urban knowledge for diverse metropolitan downstream tasks. Although many researchers have developed comprehensive frameworks to describe urban units from multi-source data, they have not clarified the relationship between the selected geographical elements. Furthermore, metropolitan areas naturally preserve ranked structures, like cities and their inclusive districts, which makes elucidating relations between cross-level urban units necessary. Therefore, we develop a heterogeneous graph-based model to generate meaningful region embeddings at multiple spatial resolutions for predicting different types of inter-level OD flows. To demonstrate the effectiveness of the proposed method, extensive experiments were conducted using real-world aggregated mobile phone datasets collected from Shizuoka Prefecture, Japan. The results indicate that our proposed model outperforms existing models in terms of a uniform urban structure. We extend the understanding of predicted results using reasonable explanations to enhance the credibility of the model.

翻訳日:2024-11-08 04:52:58 公開日:2024-10-23

# 通勤予測のための説明可能な階層型都市表現学習

Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction ( http://arxiv.org/abs/2408.14762v4 )

ライセンス: Link先を確認

Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto,

翻訳日:2024-11-08 04:52:58 公開日:2024-10-23

# 部分的フォールトトレラント量子コンピューティングアーキテクチャのためのトロッター時間進化のコンパイル

Compilation of Trotter-Based Time Evolution for Partially Fault-Tolerant Quantum Computing Architecture ( http://arxiv.org/abs/2408.14929v2 )

ライセンス: Link先を確認

Yutaro Akahoshi, Riki Toshio, Jun Fujisaki, Hirotaka Oshima, Shintaro Sato, Keisuke Fujii,

(参考訳) 限られた資源で実用的な量子スピードアップを実現することは、学術と工業の両方において重要な課題である。これを解決するために,「時空効率的なアナログ回転量子コンピューティングアーキテクチャ(STARアーキテクチャ)」と呼ばれる部分的にフォールトトレラントな量子コンピューティングアーキテクチャが最近提案されている。このアーキテクチャは、リソース要件の最小化と、普遍的な量子計算に不可欠な非クリフォードゲートの精度の最大化に焦点を当てている。しかし、リピート・アンティル・サクセス(RUS)プロトコルや状態注入のような非決定論的プロセスは、計算オーバーヘッドを著しく引き起こす可能性がある。したがって、効率的なフォールトトレラント演算を用いることで、このオーバーヘッドを最小限に抑えるために論理回路を最適化することが不可欠である。本稿では,STARアーキテクチャの有望な応用である2次元ハバードモデルハミルトンの時間発展をシミュレーションする効率的な手法を提案する。並列インジェクションプロトコルとアダプティブインジェクション領域の更新という2つの手法を提案する。これらを既存のfSWAP手法と統合することにより、2D Hubbardモデルのための効率的なTrotterベースの時間進化演算を開発する。解析の結果, 単純直列コンパイルに比べて10倍以上の高速化が得られた。この最適化されたコンパイルにより、2次元ハバードモデルの量子位相推定に必要な計算資源を推定できる。物理誤差率が$p_{\rm phys} = 10^{-4}$のデバイスの場合、古典計算と比較して8\times 8$ Hubbardモデルよりも高速な基底状態エネルギー推定を実現するために約6.5 \times 10^4$ physical qubitsが必要であると推定する。

Achieving practical quantum speedup with limited resources is a crucial challenge in both academic and industrial communities. To address this, a partially fault-tolerant quantum computing architecture called ``space-time efficient analog rotation quantum computing architecture (STAR architecture)'' has been recently proposed. This architecture focuses on minimizing resource requirements while maximizing the precision of non-Clifford gates, essential for universal quantum computation. However, non-deterministic processes such as the repeat-until-success (RUS) protocol and state injection can introduce significant computational overhead. Therefore, optimizing the logical circuit to minimize this overhead by using efficient fault-tolerant operations is essential. This paper presents an efficient method for simulating the time evolution of the 2D Hubbard model Hamiltonian, a promising application of the STAR architecture. We present two techniques, parallel injection protocol and adaptive injection region updating, to reduce unnecessary time overhead specific to our architecture. By integrating these with the existing fSWAP technique, we develop an efficient Trotter-based time evolution operation for the 2D Hubbard model. Our analysis reveals an acceleration of over 10 times compared to naive serial compilation. This optimized compilation enables us to estimate the computational resources required for quantum phase estimation of the 2D Hubbard model. For devices with a physical error rate of $p_{\rm phys} = 10^{-4}$, we estimate that approximately $6.5 \times 10^4$ physical qubits are required to achieve faster ground state energy estimation of the $8\times8$ Hubbard model compared to classical computation.

翻訳日:2024-11-08 04:52:58 公開日:2024-10-23

# 手動のプロンプト依存性を低減するための幻覚の活用 : 即時セグメンテーション

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation ( http://arxiv.org/abs/2408.15205v2 )

ライセンス: Link先を確認

Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong,

(参考訳) プロンプタブルなセグメンテーションは、典型的には、各所望のオブジェクトのセグメンテーションを導くために、インスタンス固有の手動プロンプトを必要とする。このようなニーズを最小限に抑えるために、単一のタスクジェネリックプロンプトを使用して、異なるオブジェクトの様々なイメージを同じタスクに分割するタスクジェネリックプロンプトが導入された。現在の手法では、MLLM(Multimodal Large Language Models)を使用して、タスクジェネリックプロンプトから詳細なインスタンス固有のプロンプトを推論し、セグメンテーション精度を向上させる。このセグメンテーションの有効性は、これらの導出されたプロンプトの精度に大きく依存する。しかし、MLLMは推論中に幻覚に悩まされ、不正確なプロンプトを引き起こす。既存の手法では、モデルを改善するための幻覚の除去に重点を置いているが、MLLMの幻覚は、個々の画像を超えて訓練済みの大規模知識を表現するため、正しく活用された場合、貴重な文脈的洞察を明らかにすることができると論じている。本稿では,画像からタスク関連情報を抽出するために幻覚を利用し,その精度を検証し,生成したプロンプトの精度を向上させる。具体的には、プロンプト・マスクサイクル生成フレームワーク(ProMaC)をプロンプトジェネレータとマスクジェネレータで導入し、プロンプトジェネレータは、最初はテスト画像上の拡張文脈知識を抽出するための幻覚を探索し、これらの幻覚を正確にインスタンス固有のプロンプトに還元し、マスクジェネレータにマスクセマンティックアライメントによるタスクセマンティクスと整合したマスクを生成するよう指示する。生成されたマスクは、プロンプトジェネレータを反復的にタスク関連の画像領域に集中させ、無関係な幻覚を減らし、より良いプロンプトとマスクを共同で生成する。 5つのベンチマークの実験では、ProMaCの有効性が示されている。 https://lwpyh.github.io/ProMaC/

Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts. Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator.The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image.These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks that are consistent with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks. Experiments on 5 benchmarks demonstrate the effectiveness of ProMaC. Code given in https://lwpyh.github.io/ProMaC/.

翻訳日:2024-11-08 04:41:58 公開日:2024-10-23

# LLaVA-MoD: MoEナレッジ蒸留によるLLaVAタイニー製造

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation ( http://arxiv.org/abs/2408.15881v2 )

ライセンス: Link先を確認

Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang,

(参考訳) LLaVA-MoDは,大規模MLLM(l-MLLM)の知識を抽出することで,小規模マルチモーダル言語モデルの効率的な訓練を可能にする新しいフレームワークである。本手法はMLLM蒸留における2つの基本的な課題に対処する。まず,スパース・ミックス・オブ・エキスパートズ(MoE)アーキテクチャを言語モデルに統合することにより,s-MLLMのネットワーク構造を最適化し,計算効率とモデル表現性のバランスをとる。第2に,包括的知識移動を保証するための進歩的知識移動戦略を提案する。この戦略は、学生モデルが教師ネットワークの理解をエミュレートできるように、出力分布間のKL(Kullback-Leibler)のばらつきを最小限に抑えるため、模擬蒸留から始まる。次に,l-MLLMを参照モデルとして扱う上で鍵となるDPO(Direct Preference Optimization)による嗜好蒸留を導入する。この段階において、s-MLLMの優良例と劣悪な例を区別する能力は、l-MLLMを超えて著しく向上し、特に幻覚ベンチマークにおいて、教師を超越したより良い学生に繋がる。大規模な実験により、LLaVA-MoDは、活性化パラメータの最小数と計算コストを抑えながら、様々なマルチモーダルベンチマークで既存のモデルより優れていることが示された。注目すべきは、LLaVA-MoDは2Bのアクティベートパラメータだけで、Qwen-VL-Chat-7Bを平均8.8%上回り、トレーニングデータのわずか0.3%、トレーニング可能なパラメータは23%である。これらの結果は、LLaVA-MoDの教師モデルから包括的知識を効果的に抽出する能力を強調し、より効率的なMLLMの開発への道を開いた。コードは、https://github.com/shufangxun/LLaVA-MoD.comで入手できる。

We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD.

翻訳日:2024-11-08 04:30:58 公開日:2024-10-23

# LLaVA-MoD: MoEナレッジ蒸留によるLLaVAタイニー製造

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation ( http://arxiv.org/abs/2408.15881v3 )

ライセンス: Link先を確認

Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang,

翻訳日:2024-11-08 04:30:58 公開日:2024-10-23

# 傾斜2次元離散格子における量子粒子のリッサホスダイナミクス

Lissajous dynamics of a quantum particle in a tilted two-dimensional discrete lattice ( http://arxiv.org/abs/2409.02268v2 )

ライセンス: Link先を確認

Grzegorz Jaczewski, Tomasz Sowiński,

(参考訳) 離散2次元傾斜格子における単一粒子の量子力学を古典量子対応の観点から解析する。格子の傾きが振動力学をもたらすという事実を利用して、格子のパラメータと粒子の初期状態が、進化の過程で、その中心が古典力学で知られているリッサジョウス曲線の軌跡に従っている間に、その確率分布が形を変えないように調整できることを示す。

The quantum dynamics of a single particle in a discrete two-dimensional tilted lattice is analyzed from the perspective of the classical-quantum correspondence. Utilizing the fact that tilting the lattice results in oscillatory dynamics, we show how the parameters of the lattice and the initial state of the particle can be tuned so that during evolution the probability distribution does not change its shape while its center follows the trajectory known in classical mechanics as Lissajous curves.

翻訳日:2024-11-07 23:56:04 公開日:2024-10-23

# VILA-U:ビジュアル理解と生成を統合した統一ファンデーションモデル

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation ( http://arxiv.org/abs/2409.04429v2 )

ライセンス: Link先を確認

Yecheng Wu, Zhuoyang Zhang, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu,

(参考訳) VILA-Uは、ビデオ、画像、言語理解、生成を統合する統一基盤モデルである。従来の視覚言語モデル(VLM)は、視覚コンテンツを理解し、生成するために別々のモジュールを使用する。対照的に、VILA-Uは両方のタスクに単一の自己回帰的次トーケン予測フレームワークを採用しており、拡散モデルのような追加のコンポーネントは不要である。このアプローチは、モデルを簡単にするだけでなく、ビジュアル言語理解と生成における最先端のパフォーマンスも達成する。 VILA-Uの成功は2つの主な要因に起因している: 個別の視覚トークンを事前学習中にテキスト入力と整列する統合視覚タワー。これによってVILA-Uは、完全なトークンベースの自動回帰フレームワークを使用して、より複雑なモデルに互換性を持って実行することができる。

VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language models (VLMs) use separate modules for understanding and generating visual content, which can lead to misalignment and increased complexity. In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. This approach not only simplifies the model but also achieves near state-of-the-art performance in visual language understanding and generation. The success of VILA-U is attributed to two main factors: the unified vision tower that aligns discrete visual tokens with textual inputs during pretraining, which enhances visual perception, and autoregressive image generation can achieve similar quality as diffusion models with high-quality dataset. This allows VILA-U to perform comparably to more complex models using a fully token-based autoregressive framework.

翻訳日:2024-11-07 23:00:54 公開日:2024-10-23

# 一般化拡張不確実性原理、リウヴィルの定理と状態密度:スナイダー・ド・シッターとヤン模型

Generalized Extended Uncertainty Principles, Liouville theorem and density of states: Snyder-de Sitter and Yang models ( http://arxiv.org/abs/2409.05110v2 )

ライセンス: Link先を確認

A. Pachoł,

(参考訳) 量子力学的位相空間の修正によってハイゼンベルクの不確実性原理が変化し、これは一般化不確実性原理(英語版)(GUP)や拡張不確実性原理(英語版)(EUP)をもたらす。 GUPとEUPの組み合わせにより、一般拡張不確実性原理(GEUPまたはEGUP)は、座標とモータの両方に非可換性を導入することでこれらの修正をさらに一般化する。本稿では,GEUPが統計物理学におけるリウヴィルの定理および非相対論的量子力学の枠組みにおける状態密度に与える影響について検討する。我々は、Snyder-de Sitter と Yang のモデルの場合において、無限小時間進化の下で不変な重み付き位相空間体積要素を発見し、GEUP が状態の密度を変化させ、物理的(熱力学的な)性質に影響を及ぼすことを示した。上記のモデルから一定の制限で得られた特別事例についても論じる。 GEUPとEUPの新しい高次タイプも提案されている。

Modifications in quantum mechanical phase space lead to changes in the Heisenberg uncertainty principle, which can result in the Generalized Uncertainty Principle (GUP) or the Extended Uncertainty Principle (EUP), introducing quantum gravitational effects at small and large distances, respectively. A combination of GUP and EUP, the Generalized Extended Uncertainty Principle (GEUP or EGUP), further generalizes these modifications by incorporating noncommutativity in both coordinates and momenta. This paper examines the impact of GEUP on the Liouville theorem in statistical physics and density of states within non-relativistic quantum mechanics framework. We find a weighted phase space volume element, invariant under the infinitesimal time evolution, in the cases of Snyder-de Sitter and Yang models, presenting how GEUP alters the density of states, potentially affecting physical (thermodynamical) properties. Special cases, obtained in certain limits from the above models are also discussed. New higher order types of GEUP and EUP are also proposed.

翻訳日:2024-11-07 22:49:49 公開日:2024-10-23

# CD-NGP:動的シーンのための高速でスケーラブルな連続表現

CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v2 )

ライセンス: Link先を確認

Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,

(参考訳) 動的シーンにおける3次元再構成と新しいビュー合成のための高速でスケーラブルな表現であるCD-NGPを提案する。連続学習にインスパイアされた本手法は,まず入力ビデオを複数のチャンクに分割し,次にモデルのチャンクをチャンクで訓練し,最後に,第1枝とその後の枝の特徴を融合させる。 DyNeRFデータセットを用いた実験により、提案した新しい表現は、メモリ消費、モデルサイズ、トレーニング速度、レンダリング品質との大きなバランスに達することが示された。具体的には、オフライン方式よりもトレーニングメモリ(<14$GB)を85\%以上消費し、他のオンライン方式に比べてストリーミング帯域(<0.4$MB/frame)を大幅に削減する必要がある。

We present CD-NGP, which is a fast and scalable representation for 3D reconstruction and novel view synthesis in dynamic scenes. Inspired by continual learning, our method first segments input videos into multiple chunks, followed by training the model chunk by chunk, and finally, fuses features of the first branch and subsequent branches. Experiments on the prevailing DyNeRF dataset demonstrate that our proposed novel representation reaches a great balance between memory consumption, model size, training speed, and rendering quality. Specifically, our method consumes $85\%$ less training memory ($<14$GB) than offline methods and requires significantly lower streaming bandwidth ($<0.4$MB/frame) than other online alternatives.

翻訳日:2024-11-07 22:38:45 公開日:2024-10-23

# CD-NGP:動的シーンのための高速でスケーラブルな連続表現

CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v3 )

ライセンス: Link先を確認

Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,

翻訳日:2024-11-07 22:38:45 公開日:2024-10-23

# CD-NGP:動的シーンのための高速でスケーラブルな連続表現

CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes ( http://arxiv.org/abs/2409.05166v4 )

ライセンス: Link先を確認

Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,

(参考訳) ダイナミックシーンにおける新しいビュー合成(NVS)の方法論は、メモリ消費の調和、モデルの複雑さ、トレーニング効率、レンダリング忠実度といった重要な課題に直面している。既存のオフライン技術は、高品質な結果を提供する一方で、かなりのメモリ要求と限られたスケーラビリティによって特徴付けられることが多い。対照的に、オンライン手法は、迅速な収束とモデルのコンパクトさのバランスをとるという課題に対処する。これらの問題に対処するため,我々は連続的動的グラフィックスプリミティブ(CD-NGP)を提案する。提案手法では,時間的および空間的ハッシュエンコーディングの機能を相乗化して高いレンダリング品質を実現し,拡張性を高めるためにパラメータ再利用を採用し,メモリオーバーヘッドを軽減するために連続的な学習フレームワークを活用する。さらに,厳密かつ非剛性な動きを持つ多視点,例外的に長いビデオシーケンスからなる新しいデータセットを導入し,提案手法のスケーラビリティを実証する。

Current methodologies for novel view synthesis (NVS) in dynamic scenes encounter significant challenges in harmonizing memory consumption, model complexity, training efficiency, and rendering fidelity. Existing offline techniques, while delivering high-quality results, are often characterized by substantial memory demands and limited scalability. In contrast, online methods grapple with the challenge of balancing rapid convergence with model compactness. To address these issues, we propose continual dynamic neural graphics primitives (CD-NGP). Our approach synergizes features from both temporal and spatial hash encodings to achieve high rendering quality, employs parameter reuse to enhance scalability, and leverages a continual learning framework to mitigate memory overhead. Furthermore, we introduce a novel dataset comprising multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion, thereby substantiating the scalability of our method.

翻訳日:2024-11-07 22:38:45 公開日:2024-10-23

# RotCAtt-TransUNet++: 高度心分離のための新しいディープニューラルネットワーク

RotCAtt-TransUNet++: Novel Deep Neural Network for Sophisticated Cardiac Segmentation ( http://arxiv.org/abs/2409.05280v2 )

ライセンス: Link先を確認

Quoc-Bao Nguyen-Le, Tuan-Hy Le, Anh-Triet Do, Quoc-Huy Trinh,

(参考訳) 心臓血管疾患は依然として世界的な健康上の問題であり、世界の死亡率のかなりの部分を占めている。心臓画像データの正確なセグメンテーションは、心血管疾患に伴う死亡率の軽減に重要である。しかし、CNNベースのアプローチとTransformerベースのアプローチを含む既存の最先端(SOTA)ニューラルネットワークは、スライス内情報とともにスライス間接続を効果的にキャプチャできないため、実用性に限界がある。この欠損は、軸方向の冠動脈など、z軸に沿った複雑な長距離の詳細を特徴とするデータセットで特に顕著である。さらに、SOTA法は心筋のセグメンテーションと非心筋成分の区別に失敗し、「スライディング」現象へと繋がる。これらの課題に対処するために、複雑な心構造の堅牢なセグメンテーションに適した新しいアーキテクチャであるRotCAtt-TransUNet++を提案する。提案手法では,エンコーダ内のネストスキップ接続でマルチスケール機能を集約することで,グローバルコンテキストのモデリングを強調する。トランスフォーマー層を統合してパッチ間のインタラクションをキャプチャし、ロータリーアテンション機構を使用して複数のスライス間の接続(インタースライス情報)をキャプチャする。さらに、チャネルワイドのクロスアテンションゲートは、融合したマルチスケールのチャネルワイド情報とデコーダステージからブリッジセマンティックギャップへ特徴を導く。実験の結果,提案モデルでは,4つの心的データセットと1つの腹部的データセットにまたがる既存のSOTAアプローチよりも優れていた。重要なことは、冠状動脈と心筋は、推論中にほぼ完全な精度でアノテートされることである。アブレーション研究では、回転注意機構が意味次元空間に埋め込まれたベクトル化されたパッチを効果的に変換し、セグメンテーション精度を高めることが示されている。

Cardiovascular disease remains a predominant global health concern, responsible for a significant portion of mortality worldwide. Accurate segmentation of cardiac medical imaging data is pivotal in mitigating fatality rates associated with cardiovascular conditions. However, existing state-of-the-art (SOTA) neural networks, including both CNN-based and Transformer-based approaches, exhibit limitations in practical applicability due to their inability to effectively capture inter-slice connections alongside intra-slice information. This deficiency is particularly evident in datasets featuring intricate, long-range details along the z-axis, such as coronary arteries in axial views. Additionally, SOTA methods fail to differentiate non-cardiac components from myocardium in segmentation, leading to the "spraying" phenomenon. To address these challenges, we present RotCAtt-TransUNet++, a novel architecture tailored for robust segmentation of complex cardiac structures. Our approach emphasizes modeling global contexts by aggregating multiscale features with nested skip connections in the encoder. It integrates transformer layers to capture interactions between patches and employs a rotatory attention mechanism to capture connectivity between multiple slices (inter-slice information). Additionally, a channel-wise cross-attention gate guides the fused multi-scale channel-wise information and features from decoder stages to bridge semantic gaps. Experimental results demonstrate that our proposed model outperforms existing SOTA approaches across four cardiac datasets and one abdominal dataset. Importantly, coronary arteries and myocardium are annotated with near-perfect accuracy during inference. An ablation study shows that the rotatory attention mechanism effectively transforms embedded vectorized patches in the semantic dimensional space, enhancing segmentation accuracy.

翻訳日:2024-11-07 22:38:45 公開日:2024-10-23

# オルタナティブベルの状態とテレポーテーション

Alternative Bell's states and teleportation ( http://arxiv.org/abs/2409.06885v2 )

ライセンス: Link先を確認

Juan M. Romero, Emiliano Montoya-Gonzalez, Oscar Velazquez-Alvarado,

(参考訳) ベル状態は量子コンピューティングにおいて最も有用なものの一つである。これらの状態は、2つの量子ビットを持つ感染状態の正規直交基底である。絡み合った状態の代替基底を提案する。これらの状態のいくつかは連続パラメータに依存する。これらの代替基底の量子回路と符号を示す。さらに、これらの絡み合った状態を用いて量子テレポーテーションを研究し、量子回路と関連する符号を示す。

Bell's states are among the most useful in quantum computing. These state are an orthonormal base of entagled states with two qubits. We propose alternative bases of entangled states. Some of these states depend on a continuous parameter. We present the quantum circuit and code of these alternative bases. In addition, we study quantum teleportation with these entangled states and present their quantum circuits and codes associated.

翻訳日:2024-11-07 22:05:05 公開日:2024-10-23

# オルタナティブベルの状態とテレポーテーション

Alternative Bell's states and teleportation ( http://arxiv.org/abs/2409.06885v3 )

ライセンス: Link先を確認

Juan M. Romero, Emiliano Montoya-Gonzalez, Oscar Velazquez-Alvarado,

翻訳日:2024-11-07 22:05:05 公開日:2024-10-23

# エージェントベースモデルにおけるエージェンシーの限界について

On the limits of agency in agent-based models ( http://arxiv.org/abs/2409.10568v2 )

ライセンス: Link先を確認

Ayush Chopra, Shashank Kumar, Nurullah Giray-Kuru, Ramesh Raskar, Arnau Quera-Bofarull,

(参考訳) エージェント・ベース・モデリング(ABM)は、環境の中で動作し相互作用するエージェントの集合をシミュレートすることで、複雑なシステムの振る舞いを理解しようとする。その実用性には、現実的な環境動態と適応的なエージェントの挙動を捉えながら、百万の人口を効率的にシミュレートする必要がある。大規模言語モデル(LLMs)の最近の進歩は、適応的な振る舞いを捉えうるエージェントとしてLLMを使用することで、ABMを強化する機会を与える。しかし、LLMを多人数で使用するという計算能力の欠如は、その普及を妨げている。本稿では,AMMを数百万のエージェントに拡張するフレームワークであるAgentTorchを紹介する。 ABMエージェントとしてLLMの実用性をベンチマークし、シミュレーションスケールと個々のエージェンシー間のトレードオフを探索する。新型コロナウイルス(COVID-19)のパンデミックをケーススタディとして、AgentTorchはニューヨーク市を代表する840万人のエージェントをシミュレートし、孤立と雇用行動が健康と経済に与える影響を捉えている。我々は, ヒューリスティックエージェントとLCMエージェントをベースとした各種エージェントアーキテクチャの性能を比較し, 疾患波と失業率の予測を行った。さらに、AgentTorchの振り返り、反ファクト、そして予測分析の能力を紹介し、政策設計における歴史的データの限界を克服する上で、適応的なエージェントの振る舞いがどのように役立つかを強調した。 AgentTorchは、世界中のポリシー作成と科学的発見に積極的に利用されているオープンソースプロジェクトである。 github.com/AgentTorch/AgentTorch。

Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a collection of agents that act and interact within an environment. Their practical utility requires capturing realistic environment dynamics and adaptive agent behavior while efficiently simulating million-size populations. Recent advancements in large language models (LLMs) present an opportunity to enhance ABMs by using LLMs as agents with further potential to capture adaptive behavior. However, the computational infeasibility of using LLMs for large populations has hindered their widespread adoption. In this paper, we introduce AgentTorch -- a framework that scales ABMs to millions of agents while capturing high-resolution agent behavior using LLMs. We benchmark the utility of LLMs as ABM agents, exploring the trade-off between simulation scale and individual agency. Using the COVID-19 pandemic as a case study, we demonstrate how AgentTorch can simulate 8.4 million agents representing New York City, capturing the impact of isolation and employment behavior on health and economic outcomes. We compare the performance of different agent architectures based on heuristic and LLM agents in predicting disease waves and unemployment rates. Furthermore, we showcase AgentTorch's capabilities for retrospective, counterfactual, and prospective analyses, highlighting how adaptive agent behavior can help overcome the limitations of historical data in policy design. AgentTorch is an open-source project actively being used for policy-making and scientific discovery around the world. The framework is available here: github.com/AgentTorch/AgentTorch.

翻訳日:2024-11-07 20:24:12 公開日:2024-10-23

# AutoSpec: ニューラルネットワーク仕様の自動生成

AutoSpec: Automated Generation of Neural Network Specifications ( http://arxiv.org/abs/2409.10897v2 )

ライセンス: Link先を確認

Shuowei Jin, Francis Y. Yan, Cheng Tan, Anuj Kalia, Xenofon Foukas, Z. Morley Mao,

(参考訳) 学習強化システムにおけるニューラルネットワークの採用の増加は、モデル安全性と堅牢性、特に安全クリティカルドメインの重要性を強調している。ニューラルネットワークの形式的検証の進展にもかかわらず、現在のプラクティスでは、さまざまなシナリオで期待されるモデルの振る舞いを規定するプロパティであるモデル仕様を手動で定義する必要がある。しかし、この手動のプロセスは人間のミスをしがちで、スコープが限られており、時間がかかります。本稿では,学習強化システムにおけるニューラルネットワークの包括的かつ正確な仕様を自動的に生成する最初のフレームワークであるAutoSpecを紹介する。また、モデル仕様の精度とカバレッジを評価するための最初のメトリクスセットを提案し、将来の比較のためのベンチマークを確立する。 4つの異なるアプリケーションで評価したところ、AutoSpecは人間の定義した仕様よりも優れており、2つのベースラインアプローチが提案されている。

The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, particularly in safety-critical domains. Despite progress in the formal verification of neural networks, current practices require users to manually define model specifications -- properties that dictate expected model behavior in various scenarios. This manual process, however, is prone to human error, limited in scope, and time-consuming. In this paper, we introduce AutoSpec, the first framework to automatically generate comprehensive and accurate specifications for neural networks in learning-augmented systems. We also propose the first set of metrics for assessing the accuracy and coverage of model specifications, establishing a benchmark for future comparisons. Our evaluation across four distinct applications shows that AutoSpec outperforms human-defined specifications as well as two baseline approaches introduced in this study.

翻訳日:2024-11-07 20:13:03 公開日:2024-10-23

# ブロックチェーンとスマートコントラクトを用いたセキュアV2Gトランザクションのためのサイバー物理認証方式

Cyber-Physical Authentication Scheme for Secure V2G Transactions Using Blockchain and Smart Contracts ( http://arxiv.org/abs/2409.14008v1 )

ライセンス: Link先を確認

Yunwang Chen, Yanmin Zhao, Siuming Yiu,

(参考訳) 電気自動車(EV)の急速な普及により、車両間通信(V2G)ネットワークにおける堅牢なサイバーセキュリティ対策の必要性が高まっている。この論文では、ブロックチェーンベースのV2Gシステム内のPnC操作を接続して充電するように設計された、サイバー物理認証プロトコルとスマートコントラクトのトレーディングを提案する。このプロトコルは、高度な暗号化技術とブロックチェーンを活用して、EVと充電ステーション間のセキュアで透明でタンパーセーフなエネルギートランザクションを保証する。主な貢献は、サイバー物理認証手法の開発、安全なエネルギー取引のためのスマートコントラクトフレームワークの実装、詳細なセキュリティとプライバシ分析である。提案プロトコルは、ユーザの匿名性とデータの整合性を保ちながら、分散型サービス拒否(DDoS)攻撃、中間者攻撃(MitM)攻撃、リプレイ攻撃などのリスクを効果的に軽減する。

The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as distributed denial of service (DDoS) attacks, man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.

翻訳日:2024-11-07 04:06:38 公開日:2024-10-23

Yunwang Chen, Yanmin Zhao, Siuming Yiu,

翻訳日:2024-11-07 04:06:38 公開日:2024-10-23

# セキュアなV2Gトランザクションのためのサイバー物理認証方式

Cyber-Physical Authentication Scheme for Secure V2G Transactions ( http://arxiv.org/abs/2409.14008v3 )

ライセンス: Link先を確認

Yunwang Chen, Yanmin Zhao, Siuming Yiu,

(参考訳) 電気自動車(EV)の急速な普及により、車両間通信(V2G)ネットワークにおける堅牢なサイバーセキュリティ対策の必要性が高まっている。この論文では、ブロックチェーンベースのV2Gシステム内のPnC操作を接続して充電するように設計された、サイバー物理認証プロトコルとスマートコントラクトのトレーディングを提案する。このプロトコルは、高度な暗号化技術とブロックチェーンを活用して、EVと充電ステーション間のセキュアで透明でタンパーセーフなエネルギートランザクションを保証する。主な貢献は、サイバー物理認証手法の開発、安全なエネルギー取引のためのスマートコントラクトフレームワークの実装、詳細なセキュリティとプライバシ分析である。提案プロトコルは、ユーザ匿名性とデータの整合性を保ちながら、中間者攻撃やリプレイ攻撃などのリスクを効果的に軽減する。

The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.

翻訳日:2024-11-07 04:06:38 公開日:2024-10-23

# MADial-Bench: メモリ拡張対話生成の実環境評価に向けて

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation ( http://arxiv.org/abs/2409.15240v2 )

ライセンス: Link先を確認

Junqing He, Liang Zhu, Rui Wang, Xi Wang, Reza Haffari, Jiaxing Zhang,

(参考訳) チャットボットや対話システム(DS)にとって長期記憶は、多数の発達したメモリ拡張DS(MADS)によって実証された、一貫性のある人間的な会話を生成するために重要である。このようなMADSの有効性を評価するため、検索精度やパープレキシティ(PPL)などの既存の評価指標は、主にクエリ指向の事実性や言語品質の評価に重点を置いている。しかし、これらの指標は実際的な価値を欠くことが多い。また,DSの人間的評価には評価寸法が不十分である。メモリリコールのパラダイムに関しては、現在の評価スキームは受動的メモリ検索のみを考慮しつつ、多様なメモリリコールを、感情や環境といったリッチなトリガ要因で無視する。このギャップを埋めるために,認知科学と心理学理論に基づく様々なメモリリコールパラダイムをカバーする新しいメモリ拡張ダイアログベンチマーク(MADail-Bench)を構築した。このベンチマークは2つのタスクを別々に評価する: メモリ検索とメモリ認識は、パッシブとプロアクティブの両方のメモリリコールデータを組み込んだものである。本稿では, 記憶注入, 感情支援(ES)能力, 親密性などの評価基準を新たに導入し, 生成した反応を包括的に評価する。このベンチマークにおける最先端の埋め込みモデルと大規模言語モデルの結果は、さらなる進歩の可能性を示している。広範囲なテストにより、メモリインジェクション、ES習熟度、親密さの相関が明らかになる。

Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy.

翻訳日:2024-11-06 20:27:58 公開日:2024-10-23

# リッチリワードのダークサイド:VLMリワードにおけるノイズの理解と緩和

The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards ( http://arxiv.org/abs/2409.15922v2 )

ライセンス: Link先を確認

Sukai Huang, Nir Lipovetzky, Trevor Cohn,

(参考訳) VLM(Vision-Language Models)は、インボディードエージェントに指示に従うための報酬信号を生成するために使われることが多いが、本研究では、本質的な(探索駆動)報酬のみを使用するエージェントと比較して、VLM報酬によって導かれるエージェントは、近年の成果に反するものとして、しばしば性能が低下することが判明した。偽陽性報酬(意図しない軌道が誤って報酬を受ける場合)は偽陰性よりも有害である、という仮説を立てる。分析によってこの仮説が裏付けられ、広く使われているコサイン類似度測定基準が偽陽性報酬推定の傾向にあることが明らかとなった。そこで本稿では,ノイズを緩和する新しい報奨関数であるBiMI({Bi}nary {M}utual {I}nformation)を導入する。 BiMIは多様な、難易度の高いナビゲーション環境における学習効率を大幅に向上させる。我々の研究は、様々な種類の報奨ノイズの影響剤の学習方法の微妙な理解を提供し、トレーニング実施時のマルチモーダル報酬信号ノイズへの対処の重要性を強調した。

While Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents to follow instructions, our research reveals that agents guided by VLM rewards often underperform compared to those employing only intrinsic (exploration-driven) rewards, contradicting expectations set by recent work. We hypothesize that false positive rewards -- instances where unintended trajectories are incorrectly rewarded -- are more detrimental than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric is prone to false positive reward estimates. To address this, we introduce BiMI ({Bi}nary {M}utual {I}nformation), a novel reward function designed to mitigate noise. BiMI significantly enhances learning efficiency across diverse and challenging embodied navigation environments. Our findings offer a nuanced understanding of how different types of reward noise impact agent learning and highlight the importance of addressing multimodal reward signal noise when training embodied agents

翻訳日:2024-11-06 19:21:13 公開日:2024-10-23

# ホップ代数と可解ユニタリ回路

Hopf algebras and solvable unitary circuits ( http://arxiv.org/abs/2409.17215v2 )

ライセンス: Link先を確認

Zhiyuan Wang,

(参考訳) 量子多体力学における厳密に解決可能なモデルは、多くの興味深い物理現象に関する貴重な洞察を与え、基本的な理論的問題を厳密に研究するためのプラットフォームとして機能する。それでも、それらは極めて稀であり、既存の解決可能なモデルと解法には深刻な制限がある。本稿では、離散空間と時間における量子多体ダイナミクスをモデル化する、正確に解けるユニタリ回路の新たなファミリーを紹介する。多くの従来の可解モデルとは異なり、この新しいモデルの族における任意の行列積状態から初期化された完全な量子力学を正確に計算することができる。局所可観測物の時間進化と相関、レニイエンタングルメントエントロピーの線形成長、時空間相関、および時間外相関は、すべて正確に計算可能である。正確な解を可能にするこれらのモデルの鍵となる性質は、任意の時間発展された局所作用素が有限結合次元の正確な行列積作用素であり、任意に長い時間でも、テンソルネットワーク技術と共に基礎となる(弱)ホップ代数構造を用いて証明できることである。このモデルのファミリの構築と解法に関する一般的な枠組みを概説し、いくつかの明示的な例を挙げる。特に、PXPモデルの花束版に非常に近い弱いホップ代数から構築されたモデルについて詳細に研究し、得られた正確な結果は、量子的な多くの身体の傷跡の現象、より一般的には、制約された系の花束量子力学に光を当てる可能性がある。

Exactly solvable models in quantum many body dynamics provide valuable insights into many interesting physical phenomena, and serve as platforms to rigorously investigate fundamental theoretical questions. Nevertheless, they are extremely rare and existing solvable models and solution techniques have serious limitations. In this paper we introduce a new family of exactly solvable unitary circuits which model quantum many body dynamics in discrete space and time. Unlike many previous solvable models, one can exactly compute the full quantum dynamics initialized from any matrix product state in this new family of models. The time evolution of local observables and correlations, the linear growth of Renyi entanglement entropy, spatiotemporal correlations, and out-of-time-order correlations are all exactly computable. A key property of these models enabling the exact solution is that any time evolved local operator is an exact matrix product operator with finite bond dimension, even at arbitrarily long time, which we prove using the underlying (weak) Hopf algebra structure along with tensor network techniques. We lay down the general framework for the construction and solution of this family of models, and give several explicit examples. In particular, we study in detail a model constructed out of a weak Hopf algebra that is very close to a floquet version of the PXP model, and the exact results we obtain may shed light on the phenomenon of quantum many body scars, and more generally, floquet quantum dynamics in constrained systems.

翻訳日:2024-11-06 16:30:51 公開日:2024-10-23

# 思考の証明 : ニューロシンボリックプログラム合成はロバストと解釈可能な推論を可能にする

Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning ( http://arxiv.org/abs/2409.17270v2 )

ライセンス: Link先を確認

Debargha Ganguly, Srinivasan Iyengar, Vipin Chaudhary, Shivkumar Kalyanaraman,

(参考訳) 大規模言語モデル(LLM)は自然言語処理に革命をもたらしたが、特に新しいドメインや複雑な論理配列において、一貫性のない推論に苦慮している。本研究では,LLM出力の信頼性と透明性を高めるフレームワークであるProof of Thoughtを紹介する。提案手法は LLM 生成のアイデアを形式論理検証でブリッジし,LLM の出力を 1次論理構造に変換するためのカスタムインタプリタを用いて定理証明の精査を行う。我々の手法の中心はJSONベースのドメイン特化言語であり、設計によって正確な論理構造と直感的な人間の概念のバランスをとる。このハイブリッド表現は、厳密な検証とLLM推論プロセスの人間の理解の両方を可能にする。主なコントリビューションには、論理的整合性を高めるためのソート管理を備えた堅牢な型システム、事実と推論の知識を明確に区別するためのルールの明示、さまざまなドメイン固有のアプリケーションを簡単に拡張できる柔軟なアーキテクチャが含まれる。我々は、StrategyQAと新しいマルチモーダル推論タスクをベンチマークし、オープンエンドシナリオにおける性能改善を示すことにより、思考の有効性を実証する。検証可能かつ解釈可能な結果を提供することで、AIシステムのアカウンタビリティに対する重要なニーズに対処し、ハイテイクドメインにおけるヒューマン・イン・ザ・ループの監視の基礎を設定する。

Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning, particularly in novel domains and complex logical sequences. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Our approach bridges LLM-generated ideas with formal logic verification, employing a custom interpreter to convert LLM outputs into First Order Logic constructs for theorem prover scrutiny. Central to our method is an intermediary JSON-based Domain-Specific Language, which by design balances precise logical structures with intuitive human concepts. This hybrid representation enables both rigorous validation and accessible human comprehension of LLM reasoning processes. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge, and a flexible architecture that allows for easy extension to various domain-specific applications. We demonstrate Proof of Thought's effectiveness through benchmarking on StrategyQA and a novel multimodal reasoning task, showing improved performance in open-ended scenarios. By providing verifiable and interpretable results, our technique addresses critical needs for AI system accountability and sets a foundation for human-in-the-loop oversight in high-stakes domains.

翻訳日:2024-11-06 16:30:51 公開日:2024-10-23

# Evidential Bi-Level Hardest Domain Scheduler によるオープンセットドメインの一般化の促進

Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler ( http://arxiv.org/abs/2409.17555v2 )

ライセンス: Link先を確認

Kunyu Peng, Di Wen, Kailun Yang, Ao Luo, Yufan Chen, Jia Fu, M. Saquib Sarfraz, Alina Roitberg, Rainer Stiefelhagen,

(参考訳) Open-Set Domain Generalization (OSDG)では、このモデルは、テスト時に既知のカテゴリと新しいカテゴリの両方が存在する、新しいデータ外観(ドメイン)とオープンセット条件の両方に露出する。このタスクの課題は、様々な領域をまたいで一般化し、動的環境における応用に欠かせないカテゴリの新規性を正確に定量化することによる。近年、メタ学習技術はOSDGにおいて優れた結果を示し、様々なランダムカテゴリと事前定義されたドメイン分割戦略を用いて、メタトレインとテストタスクを効果的に編成している。これらのアプローチは、主にデータ拡張と差別的特徴学習の強化に焦点を当てた従来の手法よりも、よく設計されたトレーニングスケジュールを優先する。 OSDGの一般的なメタラーニングモデルは、データパーティションを構成するために、定義済みのシーケンシャルドメインスケジューラを使用するのが一般的である。しかし、まだ十分に調査されていない重要な側面は、トレーニング中にドメインスケジューラの戦略によってもたらされた影響である。本稿では、プレフィックス付きシーケンシャルおよびランダムなドメインスケジューラと比較して、OSDGにおいて適応型ドメインスケジューラの方が有利であることを示す。適応型ドメインスケジューラを実現するために,Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS)を提案する。本手法は、従者ネットワークを利用する際の信頼度を評価し、明らかな方法で学習した信頼度をトレーニングし、最大偏差で正規化し、二段階的に最適化することで、ドメインを戦略的にシーケンスする。その結果,本手法はOSDGの性能を著しく向上し,目に見えるカテゴリと目立たないカテゴリの両方に対してより差別的な埋め込みを実現することがわかった。ソースコードはhttps://github.com/KPeng9510/EBiL-HaDSで公開されている。

In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Recently, meta-learning techniques have demonstrated superior results in OSDG, effectively orchestrating the meta-train and -test tasks by employing varied random categories and predefined domain partition strategies. These approaches prioritize a well-designed training schedule over traditional methods that focus primarily on data augmentation and the enhancement of discriminative feature learning. The prevailing meta-learning models in OSDG typically utilize a predefined sequential domain scheduler to structure data partitions. However, a crucial aspect that remains inadequately explored is the influence brought by strategies of domain schedulers during training. In this paper, we observe that an adaptive domain scheduler benefits more in OSDG compared with prefixed sequential and random domain schedulers. We propose the Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) to achieve an adaptive domain scheduler. This method strategically sequences domains by assessing their reliabilities in utilizing a follower network, trained with confidence scores learned in an evidential manner, regularized by max rebiasing discrepancy, and optimized in a bi-level manner. The results show that our method substantially improves OSDG performance and achieves more discriminative embeddings for both the seen and unseen categories. The source code is publicly available at https://github.com/KPeng9510/EBiL-HaDS.

翻訳日:2024-11-06 16:20:44 公開日:2024-10-23

# マルコフ雑音下における情報伝達

Information transmission under Markovian noise ( http://arxiv.org/abs/2409.17743v2 )

ライセンス: Link先を確認

Satvik Singh, Nilanjana Datta,

(参考訳) マルコフ力学に基づく開量子系を考えると、後者は離散時間量子マルコフ半群$(\Phi^n)_{n \in {\mathbb{N}}}$でモデル化され、量子チャネル$\Phi$と$n \in {\mathbb{N}}$が離散時間パラメータである。有限時間$n\in \mathbb{N}$と$\epsilon \in [0,1)$の場合、一発$\epsilon$-error情報伝送容量は$\Phi^n$であり、チャネル$\Phi$の周辺空間の構造は$\epsilon \in [0,1)$である。私たちは$の送信を考えます (i)$ classic information(unssisted and entanglement-assisted settings)$ (ii)$量子情報と$ (iii)私的古典情報

We consider an open quantum system undergoing Markovian dynamics, the latter being modelled by a discrete-time quantum Markov semigroup $(\Phi^n)_{n \in {\mathbb{N}}}$, resulting from the action of sequential uses of a quantum channel $\Phi$, with $n \in {\mathbb{N}}$ being the discrete time parameter. We find upper and lower bounds on the one-shot $\epsilon$-error information transmission capacities of $\Phi^n$ for a finite time $n\in \mathbb{N}$ and $\epsilon \in [0,1)$ in terms of the structure of the peripheral space of the channel $\Phi$. We consider transmission of $(i)$ classical information (both in the unassisted and entanglement-assisted settings); $(ii)$ quantum information and $(iii)$ private classical information.

翻訳日:2024-11-06 16:10:55 公開日:2024-10-23

# 政策グラディエント手法の強ポリノミカル時間と検証分析

Strongly-Polynomial Time and Validation Analysis of Policy Gradient Methods ( http://arxiv.org/abs/2409.19437v1 )

ライセンス: Link先を確認

Caleb Ju, Guanghui Lan,

(参考訳) 強化学習(Reinforcement learning)は、最適性の原則的な尺度を欠き、最適性の証明を持たないアルゴリズムからアルゴリズム、あるいはベースラインの比較に頼らざるを得ない。有限状態および作用マルコフ決定過程(MDP)に着目し、最適性ギャップ上の上界と下界の両方を提供する単純な計算可能なギャップ関数を開発する。したがって、ギャップ関数の収束は最適性ギャップの収束よりも強い収束モードであり、収束が任意の問題依存分布から独立であるような分布自由収束と呼ばれる新しい概念と同値である。基本方針ミラー降下は決定論的および確率的条件の両方に対して高速な分布自由収束を示す。分布自由収束を利用して、いくつかの新しい結果を明らかにする。第一に、決定論的政策ミラー降下は、強いポリノミアル時間で非正規化されたMDPを解くことができる。第2に、確率的ポリシーミラー降下の実行中に追加のサンプルなしで精度推定が得られ、検証ステップで検証できる終了基準として使用できる。

Reinforcement learning lacks a principled measure of optimality, causing research to rely on algorithm-to-algorithm or baselines comparisons with no certificate of optimality. Focusing on finite state and action Markov decision processes (MDP), we develop a simple, computable gap function that provides both upper and lower bounds on the optimality gap. Therefore, convergence of the gap function is a stronger mode of convergence than convergence of the optimality gap, and it is equivalent to a new notion we call distribution-free convergence, where convergence is independent of any problem-dependent distribution. We show the basic policy mirror descent exhibits fast distribution-free convergence for both the deterministic and stochastic setting. We leverage the distribution-free convergence to a uncover a couple new results. First, the deterministic policy mirror descent can solve unregularized MDPs in strongly-polynomial time. Second, accuracy estimates can be obtained with no additional samples while running stochastic policy mirror descent and can be used as a termination criteria, which can be verified in the validation step.

翻訳日:2024-11-05 23:19:24 公開日:2024-10-23

# 政策勾配法の強ポリノミカル時間と検証解析

Strongly-polynomial time and validation analysis of policy gradient methods ( http://arxiv.org/abs/2409.19437v2 )

ライセンス: Link先を確認

Caleb Ju, Guanghui Lan,

(参考訳) 本稿では,有限状態および行動マルコフ決定過程(MDP)と強化学習(RL)のための,優位ギャップ関数と呼ばれる新しい終了基準を提案する。この利点ギャップ関数をステップサイズルールの設計に組み込んで、最適政策の定常状態分布に依存しない新たな線形収束率を導出することにより、政策勾配法が強いポリノミカル時間でMDPを解けることを示す。我々の知る限りでは、政策勾配法にそのような強い収束特性が確立されたのはこれが初めてである。さらに、政策勾配の確率的推定しかできない確率的設定では、有利なギャップ関数が各状態の最適性ギャップを近似し、各状態におけるサブ線形収束率を示すことを示す。利点ギャップ関数は確率的ケースでは容易に推定でき、ポリシー値の計算が容易な上限と組み合わせれば、ポリシー勾配法によって生成される解を検証するのに便利な方法を提供する。したがって、我々の開発はRLの最適性の原理的かつ計算可能な尺度を提供する一方、現在の実践は最適性の証明を持たないアルゴリズムからアルゴリズム、あるいはベースラインの比較に依存する傾向にある。

This paper proposes a novel termination criterion, termed the advantage gap function, for finite state and action Markov decision processes (MDP) and reinforcement learning (RL). By incorporating this advantage gap function into the design of step size rules and deriving a new linear rate of convergence that is independent of the stationary state distribution of the optimal policy, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. To the best of our knowledge, this is the first time that such strong convergence properties have been established for policy gradient methods. Moreover, in the stochastic setting, where only stochastic estimates of policy gradients are available, we show that the advantage gap function provides close approximations of the optimality gap for each individual state and exhibits a sublinear rate of convergence at every state. The advantage gap function can be easily estimated in the stochastic case, and when coupled with easily computable upper bounds on policy values, they provide a convenient way to validate the solutions generated by policy gradient methods. Therefore, our developments offer a principled and computable measure of optimality for RL, whereas current practice tends to rely on algorithm-to-algorithm or baselines comparisons with no certificate of optimality.

翻訳日:2024-11-05 23:19:24 公開日:2024-10-23

# 遺伝子組み換え型ニューラル表現を目指して

Towards Croppable Implicit Neural Representations ( http://arxiv.org/abs/2409.19472v1 )

ライセンス: Link先を確認

Maor Ashkenazi, Eran Treister,

(参考訳) Inlicit Neural Representations(INR)は、ニューラルネットワークを使って自然信号をエンコードする能力により、近年注目されている。 INRは、新しい座標の補間や信号圧縮などの有用な応用を可能にするが、そのブラックボックスの性質は、後処理の修正を困難にしている。本稿では、編集可能なINRのアイデアを探求し、特に広く使われている収穫作業に焦点を当てる。この目的のために、我々は、デザインによる収穫をサポートする新しいINRアーキテクチャであるLocal-Global SIRENsを紹介する。局所グローバルSIRENは、信号符号化のための局所的特徴抽出とグローバル的特徴抽出を組み合わせたものである。彼らの設計をユニークなものにしているのは、エンコードされた信号の特定の部分を取り除き、比例重量を減少させる能力である。これは、ネットワークの再トレーニングを必要とせずに、対応する重みをネットワークから排除することで達成される。さらに、このアーキテクチャは、以前符号化された信号の直接拡張をサポートするためにどのように使用できるかを示す。信号編集以外にも、ローカル・グローバル・アプローチがトレーニングを加速し、様々な信号のエンコーディングを強化し、下流性能を改善し、INCODEなどの現代のINRに適用し、その可能性と柔軟性を強調している。コードはhttps://github.com/maorash/Local-Global-INRsで入手できる。

Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.

翻訳日:2024-11-05 23:07:28 公開日:2024-10-23

# 遺伝子組み換え型ニューラル表現を目指して

Towards Croppable Implicit Neural Representations ( http://arxiv.org/abs/2409.19472v2 )

ライセンス: Link先を確認

Maor Ashkenazi, Eran Treister,

翻訳日:2024-11-05 23:07:28 公開日:2024-10-23

# GameLabel-10K: モバイルゲームクラウドソーシングによる画像優先データ収集

GameLabel-10K: Collecting Image Preference Data Through Mobile Game Crowdsourcing ( http://arxiv.org/abs/2409.19830v1 )

ライセンス: Link先を確認

Jonathan Zhou,

(参考訳) マルチビリオンパラメータモデルの台頭は、ディープラーニングにまたがるデータに対する激しい飢餓を引き起こした。本研究は,ゲーム内通貨に報いる有償アノテータをゲームプレイヤに置き換えることによるパフォーマンス向上の可能性を検討する。私たちはモバイルの歴史戦略ゲーム、Armchair Commanderの開発者と協力して、このアイデアを試しています。より具体的には、現在の研究ではこのアイデアを、通常は微調整拡散モデルに使用されるペアワイズ画像優先データを用いて検証している。この手法を用いて,約10万のラベルと7000のユニークなプロンプトを持つデータセットであるGameLabel-10Kを作成する。これらの結果に加えて、このデータセットのいくつかの制限を分析し、オープンソースライセンス下で公開しています。

The rise of multi-billion parameter models has sparked an intense hunger for data across deep learning. This study explores the possibility of replacing paid annotators with video game players who are rewarded with in-game currency for good performance. We collaborate with the developers of a mobile historical strategy game, Armchair Commander, to test this idea. More specifically, the current study tests this idea using pairwise image preference data, typically used to fine-tune diffusion models. Using this method, we create GameLabel-10K, a dataset with slightly under 10 thousand labels and 7000 unique prompts. In addition to these results, we analyze some limitations of this dataset and publicly release it under an open-source license.

翻訳日:2024-11-05 17:29:56 公開日:2024-10-23

# GameLabel-10K: モバイルゲームクラウドソーシングによる画像優先データ収集

GameLabel-10K: Collecting Image Preference Data Through Mobile Game Crowdsourcing ( http://arxiv.org/abs/2409.19830v2 )

ライセンス: Link先を確認

Jonathan Zhou,

(参考訳) マルチビリオンパラメータモデルの台頭は、ディープラーニングにまたがるデータに対する激しい飢餓を引き起こした。本研究は,ゲーム内通貨に報いる有償アノテータをゲームプレイヤに置き換えることによるパフォーマンス向上の可能性を検討する。私たちはモバイルの歴史戦略ゲーム、Armchair Commanderの開発者と協力して、このアイデアを試しています。より具体的には、現在の研究ではこのアイデアを、通常は微調整拡散モデルに使用されるペアワイズ画像優先データを用いて検証している。この手法を用いて,約10万のラベルと7000のユニークなプロンプトを持つデータセットであるGameLabel-10Kを作成する。このデータセット上でモデルを微調整し、Flux Schnellを微調整し、その即効性を改善し、収集手法の有効性を実証する。さらに、Hugging Face上でデータセットと微調整されたモデルの両方を公開しています。

The rise of multi-billion parameter models has sparked an intense hunger for data across deep learning. This study explores the possibility of replacing paid annotators with video game players who are rewarded with in-game currency for good performance. We collaborate with the developers of a mobile historical strategy game, Armchair Commander, to test this idea. More specifically, the current study tests this idea using pairwise image preference data, typically used to fine-tune diffusion models. Using this method, we create GameLabel-10K, a dataset with slightly under 10 thousand labels and 7000 unique prompts. We fine-tune a model on this dataset, we fine-tune Flux Schnell and find an improvement in its prompt adherence, demonstrating the validity of our collection method. In addition, we publicly release both the dataset and our fine-tuned model on Hugging Face.

翻訳日:2024-11-05 17:29:56 公開日:2024-10-23

# Counter-Current Learning: ディープラーニングのための生物学的にプラザブルなデュアルネットワークアプローチ

Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning ( http://arxiv.org/abs/2409.19841v1 )

ライセンス: Link先を確認

Chia-Hsiang Kao, Bharath Hariharan,

(参考訳) ニューラルネットワークで広く使われているにもかかわらず、エラーのバックプロパゲーションは生物学的な妥当性の欠如を批判され、後方ロック問題や重量輸送問題といった問題に悩まされている。これらの制限により、研究者たちはより生物学的に妥当な学習アルゴリズムを探求し、生物学的神経システムがどのように適応し、学習するかについて光を当てる可能性がある。生体システムで観測される対流交換機構に着想を得て,ニューラルネットワークにおける信用代入のための生物学的に妥当なフレームワークである対流学習(CCL)を提案する。このフレームワークは、入力データを処理するフィードフォワードネットワークと、ターゲットを処理するフィードバックネットワークを使用し、各ネットワークは反並列信号の伝搬を通じて互いに強化する。フィードバックネットワークの下位層からのより情報的な信号を利用してフィードフォワードネットワークの上位層の更新を誘導し、その逆の逆で、CCLはソース入力の同時変換を目標出力への変換とこれらの変換の動的相互影響を可能にする。 MNIST、FashionMNIST、CIFAR10、CIFAR100データセットの多層パーセプトロンと畳み込みニューラルネットワークによる実験結果は、CCLがより生物学的に現実的な学習メカニズムを提供しながら、他の生物学的にもっとも有効なアルゴリズムと同等のパフォーマンスを達成することを示した。さらに、自動エンコーダタスクへのアプローチの適用性を示し、教師なし表現学習の可能性を示す。我々の研究は、ニューラルネットワークにおける学習と適応の代替メカニズムを提供する、生物学的にインスパイアされた、そして実証可能な学習アルゴリズムの方向性を示す。

Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt and learn. Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in neural networks. This framework employs a feedforward network to process input data and a feedback network to process targets, with each network enhancing the other through anti-parallel signal propagation. By leveraging the more informative signals from the bottom layer of the feedback network to guide the updates of the top layer of the feedforward network and vice versa, CCL enables the simultaneous transformation of source inputs to target outputs and the dynamic mutual influence of these transformations. Experimental results on MNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets using multi-layer perceptrons and convolutional neural networks demonstrate that CCL achieves comparable performance to other biologically plausible algorithms while offering a more biologically realistic learning mechanism. Furthermore, we showcase the applicability of our approach to an autoencoder task, underscoring its potential for unsupervised representation learning. Our work presents a direction for biologically inspired and plausible learning algorithms, offering an alternative mechanisms of learning and adaptation in neural networks.

翻訳日:2024-11-05 17:19:55 公開日:2024-10-23

Chia-Hsiang Kao, Bharath Hariharan,

翻訳日:2024-11-05 17:19:55 公開日:2024-10-23

# 視覚言語モデルは、視覚的手がかりとテキストのあいまいさを解決できるだろうか?

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! ( http://arxiv.org/abs/2410.01023v1 )

ライセンス: Link先を確認

Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu,

(参考訳) 人間は多モードリテラシーを持ち、様々なモダリティからの情報を積極的に統合して推論を形成することができる。テキストの語彙的曖昧さのような課題に直面して、サムネイル画像や教科書のイラストのような他のモダリティを補う。マシンが同様のマルチモーダル理解能力を実現することは可能か? そこで本研究では,語彙の曖昧さを解消する上でのマルチモーダル入力の影響を評価するための新しいベンチマークである,画像説明付き理解パン(UNPIE)を提案する。修道女は本質的な曖昧さのため、この評価の理想的な主題として機能する。私たちのデータセットには1,000の句が含まれており、それぞれに両方の意味を説明する画像が添付されています。マルチモーダルリテラシーの諸側面を評価するアノテーションとして, Pun Grounding, Disambiguation, Restructation の3つの多モーダル課題を提起する。その結果,タスクの複雑さが増大するにつれて,様々なソクラティックモデルや視覚言語モデルが視覚的コンテキストが与えられた場合に,テキストのみのモデルよりも改善することが示唆された。

Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail images or textbook illustrations. Is it possible for machines to achieve a similar multimodal understanding capability? In response, we present Understanding Pun with Image Explanations (UNPIE), a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities. Puns serve as the ideal subject for this evaluation due to their intrinsic ambiguity. Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings. We pose three multimodal challenges with the annotations to assess different aspects of multimodal literacy; Pun Grounding, Disambiguation, and Reconstruction. The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context, particularly as the complexity of the tasks increases.

翻訳日:2024-11-04 23:40:11 公開日:2024-10-23

Jiwan Chung, Seungwon Lim, Jaehyun Jeon, Seungbeen Lee, Youngjae Yu,

翻訳日:2024-11-04 23:40:11 公開日:2024-10-23

# 知識サイロがジャーナリズムにおける責任あるAI実践に及ぼす影響

Impact of Knowledge Silos on Responsible AI Practices in Journalism ( http://arxiv.org/abs/2410.01138v1 )

ライセンス: Link先を確認

Tomás Dodds, Astrid Vandendaele, Felix M. Simon, Natali Helberger, Valeria Resendez, Wang Ngai Yeung,

(参考訳) ジャーナリズムにおける責任あるAIプラクティスの効果的な採用には、技術、編集、ジャーナリスト、管理など、さまざまな視点を橋渡しするための協力的な努力が必要である。ニュース組織内の責任あるAIに関する情報共有に影響を与える可能性のある多くの課題の1つは、知識サイロである。この研究は、ナレッジサイロがジャーナリズムにおける責任あるAIプラクティスの採用にどのように影響するかを、オランダの主要4メディアのクロスケーススタディを通じて調査することを目的としている。我々は、AI知識共有に対する個人的および組織的障壁と、知識サイロがニュースルーム内の責任あるAIイニシアチブの運用にどんな影響を及ぼすかを検討する。この問題に対処するため,我々はDe Telegraaf,de Volkskrant,Nederlandse Omroep Stichting (NOS), RTL Nederlandの編集者,マネージャ,ジャーナリストらと14回の半構造化インタビューを行った。インタビューは、知識サイロの存在、AI実践の責任ある採用に対する影響、そしてこれらのダイナミクスに影響を与える組織的プラクティスに関する洞察を明らかにすることを目的としていた。我々の結果は、ニュース組織のすべての層にまたがって、AIに関する情報を共有するためのより良い構造を構築することの重要性を強調します。

The effective adoption of responsible AI practices in journalism requires a concerted effort to bridge different perspectives, including technological, editorial, journalistic, and managerial. Among the many challenges that could impact information sharing around responsible AI inside news organizations are knowledge silos, where information is isolated within one part of the organization and not easily shared with others. This study aims to explore if, and if so, how, knowledge silos affect the adoption of responsible AI practices in journalism through a cross-case study of four major Dutch media outlets. We examine the individual and organizational barriers to AI knowledge sharing and the extent to which knowledge silos could impede the operationalization of responsible AI initiatives inside newsrooms. To address this question, we conducted 14 semi-structured interviews with editors, managers, and journalists at de Telegraaf, de Volkskrant, the Nederlandse Omroep Stichting (NOS), and RTL Nederland. The interviews aimed to uncover insights into the existence of knowledge silos, their effects on responsible AI practice adoption, and the organizational practices influencing these dynamics. Our results emphasize the importance of creating better structures for sharing information on AI across all layers of news organizations.

翻訳日:2024-11-04 23:00:28 公開日:2024-10-23

# 知識サイロがジャーナリズムにおける責任あるAI実践に及ぼす影響

The Impact of Knowledge Silos on Responsible AI Practices in Journalism ( http://arxiv.org/abs/2410.01138v2 )

ライセンス: Link先を確認

Tomás Dodds, Astrid Vandendaele, Felix M. Simon, Natali Helberger, Valeria Resendez, Wang Ngai Yeung,

翻訳日:2024-11-04 23:00:28 公開日:2024-10-23

# 回帰課題に対するラプラス近似によるメタラーニングのばらつき低減

Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks ( http://arxiv.org/abs/2410.01476v1 )

ライセンス: Link先を確認

Alfredo Reichlin, Gustaf Tegnér, Miguel Vasco, Hang Yin, Mårten Björkman, Danica Kragic,

(参考訳) 有限個のサンプルポイントを与えられたメタラーニングアルゴリズムは、新しい、目に見えないタスクに対する最適な適応戦略を学ぶことを目的としている。多くの場合、このデータは異なるタスクに同時に属する可能性があるため、曖昧である。これは特にメタ回帰タスクではそうである。このような場合、推定適応戦略は各タスクに対するサポートデータの限られた量によって高いばらつきを伴い、しばしば準最適一般化性能をもたらす。本研究では,勾配に基づくメタラーニングにおける分散化の問題に対処し,それに伴う問題のクラスを形式化し,これを「emph{task overlap}」と呼ぶ。具体的には,各支持点をパラメータ上の後方の分散によって個別に重み付けすることで,勾配推定のばらつきを低減する手法を提案する。後部を推定するためにLaplace近似を使い、メタラーナーの損失景観の曲率を表現できる。実験により,提案手法の有効性を実証し,メタラーニングにおける分散化の重要性を強調した。

Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.

翻訳日:2024-11-04 17:34:40 公開日:2024-10-23

# 回帰課題に対するラプラス近似によるメタラーニングのばらつき低減

Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks ( http://arxiv.org/abs/2410.01476v2 )

ライセンス: Link先を確認

Alfredo Reichlin, Gustaf Tegnér, Miguel Vasco, Hang Yin, Mårten Björkman, Danica Kragic,

翻訳日:2024-11-04 17:34:40 公開日:2024-10-23

# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v1 )

ライセンス: Link先を確認

Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,

(参考訳) 制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。私たちのコードはhttps://github.com/Pan-Zihao/SCA}{https://github.com/Pan-Zihao/SCAで見られます。

Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our code can be found at https://github.com/Pan-Zihao/SCA}{https://github.com/Pan-Zihao/SCA.

翻訳日:2024-11-04 07:46:05 公開日:2024-10-23

# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v2 )

ライセンス: Link先を確認

Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,

(参考訳) 制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。私たちのコードはhttps://github.com/Pan-Zihao/SCA.orgで見られます。

Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our code can be found at https://github.com/Pan-Zihao/SCA.

翻訳日:2024-11-04 07:46:05 公開日:2024-10-23

# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v3 )

ライセンス: Link先を確認

Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,

(参考訳) センシティブな環境にデプロイされたディープニューラルネットワークベースのシステムは、敵の攻撃に対して脆弱である。制限のない敵攻撃は、通常、画像(例えば色やテクスチャ)のセマンティックな内容を操作して、効果的かつフォトリアリスティックな敵の例を作成する。近年の研究では、拡散反転法を用いて画像を潜時空間にマッピングし、摂動を導入して高レベルの意味論を操作している。しかし、それらはしばしば、復調された出力に実質的な意味的歪みをもたらし、低効率に悩まされる。本研究では、編集しやすいノイズマップを抽出する逆法と、その過程全体を通して意味的なガイダンスを提供するためのマルチモーダル大言語モデル(MLLM)を用いて、セマンティック・一貫性のない非制限逆攻撃(SCA)と呼ばれる新しいフレームワークを提案する。 MLLMが提供するリッチセマンティック情報の条件下では、一連の編集フレンドリなノイズマップを用いて各ステップのDDPM復調処理を行い、DPM Solver++を利用してこの処理を高速化し、セマンティック一貫性のある効率的なサンプリングを可能にする。既存の手法と比較して,本フレームワークは,最小限の識別可能な意味変化を示す敵例の効率的な生成を可能にする。その結果,セマンティック・コンスタント・アドバイサル・ケース(SCAE)を初めて紹介した。大規模な実験と可視化は、特に最先端の攻撃の12倍の速度でSCAの高効率性を実証している。我々の研究はマルチメディア情報のセキュリティにさらに注意を向けることができる。

Deep neural network based systems deployed in sensitive environments are vulnerable to adversarial attacks. Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our research can further draw attention to the security of multimedia information.

翻訳日:2024-11-04 07:46:05 公開日:2024-10-23

# SCA: 非常に効率的なセマンティック一貫性のない非制限の敵攻撃

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack ( http://arxiv.org/abs/2410.02240v4 )

ライセンス: Link先を確認

Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng,

翻訳日:2024-11-04 07:46:05 公開日:2024-10-23

# SafeguardはDouble-edged Sword:大規模言語モデルに対するDoS攻撃

Safeguard is a Double-edged Sword: Denial-of-service Attack on Large Language Models ( http://arxiv.org/abs/2410.02916v1 )

ライセンス: Link先を確認

Qingzhao Zhang, Ziyang Xiong, Z. Morley Mao,

(参考訳) 安全性は、オープンデプロイメントにおける大きな言語モデル(LLM)の最大の関心事である。この目的のために、安全確保法は、安全アライメントやガードレール機構を通じて、LLMの倫理的かつ責任ある使用を強制することを目的としている。しかし、悪意のある攻撃者は、セーフガードの偽陽性を悪用し、すなわち、セーフガードモデルを騙してセーフコンテンツが誤ってブロックされることを発見し、LLMに対する新たなDoS攻撃につながった。具体的には、ユーザークライアントソフトウェアに対するソフトウェアやフィッシング攻撃によって、攻撃者は構成ファイルのテンプレートに短い、一見無害な敵のプロンプトを挿入する。勾配情報と注意情報を利用する最適化プロセスの設計により、Llama Guard 3上の99%以上のユーザリクエストを普遍的にブロックする、約30文字の、一見安全な敵のプロンプトを自動的に生成できる。この攻撃は、昔ながらのジェイルブレイクと根本的に異なる偽陽性に焦点を当てたLSMのセーフガードを評価する新しい次元を示す。

Safety is a paramount concern of large language models (LLMs) in their open deployment. To this end, safeguard methods aim to enforce the ethical and responsible use of LLMs through safety alignment or guardrail mechanisms. However, we found that the malicious attackers could exploit false positives of safeguards, i.e., fooling the safeguard model to block safe content mistakenly, leading to a new denial-of-service (DoS) attack on LLMs. Specifically, by software or phishing attacks on user client software, attackers insert a short, seemingly innocuous adversarial prompt into to user prompt templates in configuration files; thus, this prompt appears in final user requests without visibility in the user interface and is not trivial to identify. By designing an optimization process that utilizes gradient and attention information, our attack can automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97\% of user requests on Llama Guard 3. The attack presents a new dimension of evaluating LLM safeguards focusing on false positives, fundamentally different from the classic jailbreak.

翻訳日:2024-11-03 04:55:13 公開日:2024-10-23

Qingzhao Zhang, Ziyang Xiong, Z. Morley Mao,

翻訳日:2024-11-03 04:55:13 公開日:2024-10-23

# 視覚言語モデルのための一般化可能なプロンプトチューニング

Generalizable Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2410.03189v1 )

ライセンス: Link先を確認

Qian Zhang,

(参考訳) CLIPのようなビジョン言語モデルのプロンプトチューニングでは、特定の下流タスクのための画像テキストペアを生成するために使用されるテキストプロンプトを最適化する。手作りのプロンプトやテンプレートベースのプロンプトは一般的に、目に見えない幅広いクラスに適用できるが、下流のタスク(例えば、目に見えないクラス)ではパフォーマンスが悪くなる傾向がある。一方、学習可能なソフトプロンプトは下流のタスクではよく機能するが、一般化性に欠ける。さらに、先行研究は主にテキストのモダリティに集中しており、視覚のモダリティからプロンプトの一般化の可能性を探究する研究はほとんどない。これらの制約を念頭に置いて、競争力のある下流性能と一般化の両方を得るために、チューニングを迅速に行う方法について検討する。本研究は,ソフトプロンプトと手作りプロンプトをテキストモダリティの双対ビューとして扱うことにより,それらの相互情報を最大化することにより,タスク特化情報と一般的な意味情報をよりうまくアンサンブルすることができることを示す。さらに、より表現力のあるプロンプトを生成するために、視覚的モダリティからのクラスワイド増強を導入し、より広い範囲の未確認クラスに顕著なロバスト性をもたらす。いくつかのベンチマークでは、提案手法はタスク固有の性能と一般的な能力の両面で競合する結果が得られると報告されている。

Prompt tuning for vision-language models such as CLIP involves optimizing the text prompts used to generate image-text pairs for specific downstream tasks. While hand-crafted or template-based prompts are generally applicable to a wider range of unseen classes, they tend to perform poorly in downstream tasks (i.e., seen classes). Learnable soft prompts, on the other hand, often perform well in downstream tasks but lack generalizability. Additionally, prior research has predominantly concentrated on the textual modality, with very few studies attempting to explore the prompt's generalization potential from the visual modality. Keeping these limitations in mind, we investigate how to prompt tuning to obtain both a competitive downstream performance and generalization. The study shows that by treating soft and hand-crafted prompts as dual views of the textual modality, and maximizing their mutual information, we can better ensemble task-specific and general semantic information. Moreover, to generate more expressive prompts, the study introduces a class-wise augmentation from the visual modality, resulting in significant robustness to a wider range of unseen classes. Extensive evaluations on several benchmarks report that the proposed approach achieves competitive results in terms of both task-specific performance and general abilities.

翻訳日:2024-11-03 03:04:25 公開日:2024-10-23

# 視覚言語モデルのための一般化可能なプロンプトチューニング

Generalizable Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2410.03189v2 )

ライセンス: Link先を確認

Qian Zhang,

翻訳日:2024-11-03 03:04:25 公開日:2024-10-23

# 多モード核融合モデルのための勾配ベースジェイルブレイク画像

Gradient-based Jailbreak Images for Multimodal Fusion Models ( http://arxiv.org/abs/2410.03489v1 )

ライセンス: Link先を確認

Javier Rando, Hannah Korevaar, Erik Brinkman, Ivan Evtimov, Florian Tramèr,

(参考訳) 画像入力による言語モデルの強化は、個別の最適化を必要とするテキスト入力とは異なり、継続的な最適化を通じてより効果的なジェイルブレイク攻撃を可能にする可能性がある。しかし、新しいマルチモーダル融合モデルでは、非微分可能関数を用いて全ての入力モダリティをトークン化し、直接攻撃を妨げている。本稿では,トークン化を連続関数と近似し,連続的な最適化を可能にするトークン化ショートカットの概念を紹介する。我々はトークンライザショートカットを用いて、マルチモーダル融合モデルに対する最初のエンドツーエンドの勾配画像アタックを生成する。我々は、Chameleonモデルに対する攻撃を評価し、72.5%のプロンプトに対して有害な情報をもたらすジェイルブレイク画像を取得する。 Jailbreakイメージは、同じ目的で最適化されたテキストジェイルブレークよりも優れており、50倍の入力トークンを最適化するためには、計算予算が3倍低い。最後に、Circuit Breakersのような表現工学の防御は、テキストアタックのみで訓練され、敵画像の入力に効果的に転送できることがわかった。

Augmenting language models with image inputs may enable more effective jailbreak attacks through continuous optimization, unlike text inputs that require discrete optimization. However, new multimodal fusion models tokenize all input modalities using non-differentiable functions, which hinders straightforward attacks. In this work, we introduce the notion of a tokenizer shortcut that approximates tokenization with a continuous function and enables continuous optimization. We use tokenizer shortcuts to create the first end-to-end gradient image attacks against multimodal fusion models. We evaluate our attacks on Chameleon models and obtain jailbreak images that elicit harmful information for 72.5% of prompts. Jailbreak images outperform text jailbreaks optimized with the same objective and require 3x lower compute budget to optimize 50x more input tokens. Finally, we find that representation engineering defenses, like Circuit Breakers, trained only on text attacks can effectively transfer to adversarial image inputs.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-23

# 多モード核融合モデルのための勾配ベースジェイルブレイク画像

Gradient-based Jailbreak Images for Multimodal Fusion Models ( http://arxiv.org/abs/2410.03489v2 )

ライセンス: Link先を確認

Javier Rando, Hannah Korevaar, Erik Brinkman, Ivan Evtimov, Florian Tramèr,

翻訳日:2024-11-02 21:59:46 公開日:2024-10-23

# P1-KAN 有効コルモゴロフ・アーノルドネットワークによる関数近似

P1-KAN an effective Kolmogorov Arnold Network for function approximation ( http://arxiv.org/abs/2410.03801v1 )

ライセンス: Link先を確認

Xavier Warin,

(参考訳) 新しいコルモゴロフ・アルノルドネットワーク(KAN)は、高次元の潜在的不規則関数を近似するために提案されている。精度で多層パーセプトロンより優れ、収束が速いことを示す。また、最近提案されたネットワークであるReLU-KANと比較し、ReLU-KANよりも時間がかかりますが、より正確です。

A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimension. We show that it outperforms multilayer perceptrons in terms of accuracy and converges faster. We also compare it with ReLU-KAN, a recently proposed network: it is more time consuming than ReLU-KAN, but more accurate.

翻訳日:2024-11-02 16:10:45 公開日:2024-10-23

# P1-KAN 有効コルモゴロフ・アーノルドネットワークによる関数近似

P1-KAN an effective Kolmogorov Arnold Network for function approximation ( http://arxiv.org/abs/2410.03801v2 )

ライセンス: Link先を確認

Xavier Warin,

(参考訳) 新しいコルモゴロフ・アルノルドネットワーク(KAN)は、高次元の潜在的不規則関数を近似するために提案されている。精度で多層パーセプトロンより優れ、収束が速いことを示す。 P1-KANネットワークは不規則関数に対してより効果的であるのに対し、元のスプラインベースkanネットワークはスムーズ関数に対してより効果的であるように思われる。

A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimension. We show that it outperforms multilayer perceptrons in terms of accuracy and converges faster. We also compare it with several proposed KAN networks: the original spline-based KAN network appears to be more effective for smooth functions, while the P1-KAN network is more effective for irregular functions.

翻訳日:2024-11-02 16:10:45 公開日:2024-10-23

# 合成進化によるコード優先学習

Learning Code Preference via Synthetic Evolution ( http://arxiv.org/abs/2410.03837v1 )

ライセンス: Link先を確認

Jiawei Liu, Thanh Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang,

(参考訳) 大規模言語モデル(LLM)は、最近顕著なコーディング機能を示した。しかし、十分に整ったプロパティに基づいてコード生成を評価し、それを開発者の好みに合わせることは依然として難しい。本稿では,コード優先学習という新たな課題の下で,2つの重要な課題について考察する。 i) コードに対する意味のある嗜好を予測するためにモデルをトレーニングするにはどうすればよいか? そして (ii)人間とLLMの嗜好は、検証可能なコードプロパティや開発者コードの嗜好とどのように一致しますか? この目的のために、コードコミットやコード批判を含む合成進化データからペアワイズなコード嗜好モデルをトレーニングするためのフレームワークであるCodeFavorを提案する。コード優先性を評価するために,1364個の厳格にキュレートされたコード優先タスクからなるベンチマークであるCodePrefBenchを紹介した。評価の結果、CodeFavorはモデルベースのコード優先の精度を最大28.8%改善した。一方、CodeFavorモデルは、34倍のコスト効率で、6-9倍のパラメータでモデルのパフォーマンスと一致させることができる。また、CodeFavorの設計選択を包括的な制御実験を通じて厳格に検証します。さらに、各タスクに23.4パーソナライズしたにもかかわらず、15.1-40.3%のタスクは未解決のままである。モデルに基づく嗜好と比較すると、人間の嗜好はコードの正確さを目標としつつ、機能的でない目的に準最適である傾向にある。

Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation based on well-formed properties and aligning it with developer preferences remains challenging. In this paper, we explore two key questions under the new challenge of code preference learning: (i) How do we train models to predict meaningful preferences for code? and (ii) How do human and LLM preferences align with verifiable code properties and developer code tastes? To this end, we propose CodeFavor, a framework for training pairwise code preference models from synthetic evolution data, including code commits and code critiques. To evaluate code preferences, we introduce CodePrefBench, a benchmark comprising 1364 rigorously curated code preference tasks to cover three verifiable properties-correctness, efficiency, and security-along with human preference. Our evaluation shows that CodeFavor holistically improves the accuracy of model-based code preferences by up to 28.8%. Meanwhile, CodeFavor models can match the performance of models with 6-9x more parameters while being 34x more cost-effective. We also rigorously validate the design choices in CodeFavor via a comprehensive set of controlled experiments. Furthermore, we discover the prohibitive costs and limitations of human-based code preference: despite spending 23.4 person-minutes on each task, 15.1-40.3% of tasks remain unsolved. Compared to model-based preference, human preference tends to be more accurate under the objective of code correctness, while being sub-optimal for non-functional objectives.

翻訳日:2024-11-02 16:00:59 公開日:2024-10-23

# 合成進化によるコード優先学習

Learning Code Preference via Synthetic Evolution ( http://arxiv.org/abs/2410.03837v2 )

ライセンス: Link先を確認

Jiawei Liu, Thanh Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang,

翻訳日:2024-11-02 16:00:59 公開日:2024-10-23

# DiffSpec: 自然言語仕様とコードアーチファクトを使用したLLMによる微分テスト

DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts ( http://arxiv.org/abs/2410.04249v1 )

ライセンス: Link先を確認

Nikitha Rao, Elizabeth Gilbert, Tahina Ramananandro, Nikhil Swamy, Claire Le Goues, Sarah Fakhoury,

(参考訳) 差分テストは、コンパイラ、ネットワークプロトコルパーサ、言語ランタイムなど、同じ仕様に準拠した複数の実装を持つソフトウェアシステムのバグを見つける効果的な方法である。このようなシステムの仕様は、インストラクション・セット・アーキテクチャ(ISA)仕様、Wasm仕様、IETF RFCなどの自然言語文書で標準化されることが多い。大きな言語モデル(LLM)は、テストの生成と大量の自然言語テキストの処理の両方の可能性を実証しており、仕様文書、バグレポート、コード実装などのアーティファクトを活用するのに適している。本研究では、自然言語とコードアーティファクトを活用し、LLMをガイドして、バグに対応するものを含む実装間の意味のある振る舞いの違いを強調する、ターゲットとなる有意義なテストを生成する。本稿では,プロンプト連鎖を用いたLCMによる差分テストを生成するフレームワークであるDiffSpecを紹介する。本稿では,2つの異なるシステム,すなわちeBPFランタイムとWasmバリデータに対するDiffSpecの有効性を示す。 DiffSpecを使って359の差別化テストを生成し、カーネルメモリリーク、ジャンプ命令の不整合挙動、スタックポインタの使用時の未定義動作を含む、少なくとも4つの異なる、確認されたeBPFのバグを発見した。 Wasmバリデータでは279の差別化テストが発見されました。

Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, and language runtimes. Specifications for such systems are often standardized in natural language documents, like Instruction Set Architecture (ISA) specifications, Wasm specifications or IETF RFC's. Large Language Models (LLMs) have demonstrated potential in both generating tests and handling large volumes of natural language text, making them well-suited for utilizing artifacts like specification documents, bug reports, and code implementations. In this work, we leverage natural language and code artifacts to guide LLMs to generate targeted, meaningful tests that highlight meaningful behavioral differences between implementations, including those corresponding to bugs. We introduce DiffSpec, a framework for generating differential tests with LLMs using prompt chaining. We demonstrate the efficacy of DiffSpec on two different systems, namely, eBPF runtimes and Wasm validators. Using DiffSpec, we generated 359 differentiating tests, uncovering at least four distinct and confirmed bugs in eBPF, including a kernel memory leak, inconsistent behavior in jump instructions, and undefined behavior when using the stack pointer. We also found 279 differentiating tests in Wasm validators, that point to at least 2 confirmed and fixed bugs.

翻訳日:2024-11-02 08:59:37 公開日:2024-10-23

Nikitha Rao, Elizabeth Gilbert, Tahina Ramananandro, Nikhil Swamy, Claire Le Goues, Sarah Fakhoury,

(参考訳) 差分テストは、コンパイラ、ネットワークプロトコルパーサ、言語ランタイムなど、同じ仕様に準拠した複数の実装を持つソフトウェアシステムのバグを見つける効果的な方法である。このようなシステムの仕様は、インストラクション・セット・アーキテクチャ(ISA)仕様、Wasm仕様、IETF RFCなどの自然言語文書で標準化されることが多い。大きな言語モデル(LLM)は、テストの生成と大量の自然言語テキストの処理の両方の可能性を実証しており、仕様文書、バグレポート、コード実装などのアーティファクトを活用するのに適している。本研究では、自然言語とコードアーティファクトを活用し、LLMをガイドして、バグに対応するものを含む実装間の意味のある振る舞いの違いを強調する、ターゲットとなる有意義なテストを生成する。本稿では,プロンプト連鎖を用いたLCMによる差分テストを生成するフレームワークであるDiffSpecを紹介する。本稿では,2つの異なるシステム,すなわちeBPFランタイムとWasmバリデータに対するDiffSpecの有効性を示す。 DiffSpecを使って359の差別化テストを生成し、カーネルメモリリーク、ジャンプ命令の不整合挙動、スタックポインタの使用時の未定義動作を含む、少なくとも4つの異なる、確認されたeBPFのバグを発見した。 Wasm Validatorでは279の差別化テストがあり、Wizard Engineでは少なくとも2つの確認済みと修正済みのバグが確認されました。

Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, and language runtimes. Specifications for such systems are often standardized in natural language documents, like Instruction Set Architecture (ISA) specifications, Wasm specifications or IETF RFC's. Large Language Models (LLMs) have demonstrated potential in both generating tests and handling large volumes of natural language text, making them well-suited for utilizing artifacts like specification documents, bug reports, and code implementations. In this work, we leverage natural language and code artifacts to guide LLMs to generate targeted, meaningful tests that highlight meaningful behavioral differences between implementations, including those corresponding to bugs. We introduce DiffSpec, a framework for generating differential tests with LLMs using prompt chaining. We demonstrate the efficacy of DiffSpec on two different systems, namely, eBPF runtimes and Wasm validators. Using DiffSpec, we generated 359 differentiating tests, uncovering at least four distinct and confirmed bugs in eBPF, including a kernel memory leak, inconsistent behavior in jump instructions, and undefined behavior when using the stack pointer. We also found 279 differentiating tests in Wasm validators, that point to at least 2 confirmed and fixed bugs in Wizard Engine.

翻訳日:2024-11-02 08:59:37 公開日:2024-10-23

# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習

Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v1 )

ライセンス: Link先を確認

Yifan Wang, Cheng Zhang, Yuanndong Zhuang, Yongming Huang,

(参考訳) 人工知能をサポートする無線ネットワークは注目され、Over-the-Air Federated Learningがそのユニークな伝送特性と分散コンピューティング特性のために重要なアプリケーションとして登場している。本稿では,セルフリーMIMOシステムにおける過空フェデレーション学習の誤差境界を導出し,電力制御とビームフォーミングの連成最適化による最適性ギャップを最小化するために最適化問題を定式化する。 MOP-LOFPCアルゴリズムを導入し、Lyapunov最適化を用いてラウンド間の長期制約を分離し、因果チャネル状態情報のみを必要とする。実験結果から,MOP-LOFPCはモデルのトレーニング損失と,既存のベースラインと比較して長期的電力制約への固執との間に,より優れた,より柔軟なトレードオフを実現することが示された。

Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.

翻訳日:2024-11-01 19:07:22 公開日:2024-10-23

# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習

Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v2 )

ライセンス: Link先を確認

Yifan Wang, Cheng Zhang, Yuanndong Zhuang, Yongming Huang,

翻訳日:2024-11-01 19:07:22 公開日:2024-10-23

# 長期電力制約を持つ無細胞MIMOにおける過空フェデレーション学習

Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint ( http://arxiv.org/abs/2410.05354v3 )

ライセンス: Link先を確認

Yifan Wang, Cheng Zhang, Yuanndon Zhuang, Mingzeng Dai, Haiming Wang, Yongming Huang,

翻訳日:2024-11-01 19:07:22 公開日:2024-10-23

# 勾配ブースティング分類器の理解:$γ_j$の訓練・予測・役割

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$ ( http://arxiv.org/abs/2410.05623v1 )

ライセンス: Link先を確認

Hung-Hsuan Chen,

(参考訳) Gradient Boosting Classifier (GBC)は、二分分類のための機械学習アルゴリズムで、予測エラーを最小限に抑えるために反復的に決定木を構築する。この文書はGBCのトレーニングと予測プロセスを説明し、ロジスティック損失関数の最適化に不可欠である端末ノード値$\gamma_j$の計算に焦点を当てている。テイラー級数近似を用いて$\gamma_j$を導き、アルゴリズムの実装のためにステップバイステップの擬似コードを提供する。このガイドはGBCの理論とその実践的応用を説明し、バイナリ分類タスクにおけるその有効性を示す。私たちは、読者が理解できるように、付録にステップバイステップの例を提供します。

The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC's training and prediction processes, focusing on the computation of terminal node values $\gamma_j$, which are crucial to optimizing the logistic loss function. We derive $\gamma_j$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm's implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.

翻訳日:2024-11-01 17:38:51 公開日:2024-10-23

# 勾配ブースティング分類器の理解:$γ_j$の訓練・予測・役割

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$ ( http://arxiv.org/abs/2410.05623v2 )

ライセンス: Link先を確認

Hung-Hsuan Chen,

翻訳日:2024-11-01 17:38:51 公開日:2024-10-23

# 言語モデルは間接的エビデンスから文法的知識を誘導できるか?

Can Language Models Induce Grammatical Knowledge from Indirect Evidence? ( http://arxiv.org/abs/2410.06022v1 )

ライセンス: Link先を確認

Miyu Oba, Yohei Oseki, Akiyo Fukatsu, Akari Haga, Hiroki Ouchi, Taro Watanabe, Saku Sugawara,

(参考訳) 文の受理性を判断するために文法的知識を誘導する言語モデルに必要なデータの種類と量。最近の言語モデルでは、人間に比べてデータ効率が向上する余地が残っている。本稿では,言語モデルが間接的データ(間接的証拠)を効率的に用いているかを検討する。対照的に、人間は間接的エビデンスを効率的に使用しており、これは効率的な言語習得に寄与する帰納的バイアスの1つと考えられている。この問題を調査するために、事前学習データと評価インスタンスに挿入されたトレーニングインスタンスからなるデータセットであるWIDET(Wug InDirect Evidence Test)を紹介した。我々は,新たに造語されたwug単語を用いた合成インスタンスを事前学習データに注入し,それらの単語に対する文法的受容性を評価する評価データにモデルの振る舞いを探索する。インジェクトされたインスタンスは、間接性と量のレベルを変化させて作成する。実験の結果, 言語モデルでは, 同じ構造を持つインスタンスに対して繰り返し露出しても文法的知識を誘導せず, 特定の言語現象における評価事例と語彙的項目でのみ異なることがわかった。本研究は,潜在的間接的証拠を用いて文法知識を誘導するモデルの構築という,今後の研究の方向性を示唆するものである。

What kinds of and how much data is necessary for language models to induce grammatical knowledge to judge sentence acceptability? Recent language models still have much room for improvement in their data efficiency compared to humans. This paper investigates whether language models efficiently use indirect data (indirect evidence), from which they infer sentence acceptability. In contrast, humans use indirect evidence efficiently, which is considered one of the inductive biases contributing to efficient language acquisition. To explore this question, we introduce the Wug InDirect Evidence Test (WIDET), a dataset consisting of training instances inserted into the pre-training data and evaluation instances. We inject synthetic instances with newly coined wug words into pretraining data and explore the model's behavior on evaluation data that assesses grammatical acceptability regarding those words. We prepare the injected instances by varying their levels of indirectness and quantity. Our experiments surprisingly show that language models do not induce grammatical knowledge even after repeated exposure to instances with the same structure but differing only in lexical items from evaluation instances in certain language phenomena. Our findings suggest a potential direction for future research: developing models that use latent indirect evidence to induce grammatical knowledge.

翻訳日:2024-11-01 11:30:40 公開日:2024-10-23

# 言語モデルは間接的エビデンスから文法的知識を誘導できるか?

Can Language Models Induce Grammatical Knowledge from Indirect Evidence? ( http://arxiv.org/abs/2410.06022v2 )

ライセンス: Link先を確認

Miyu Oba, Yohei Oseki, Akiyo Fukatsu, Akari Haga, Hiroki Ouchi, Taro Watanabe, Saku Sugawara,

翻訳日:2024-11-01 11:30:40 公開日:2024-10-23

# 量子符号化技術の比較

Comparing Quantum Encoding Techniques ( http://arxiv.org/abs/2410.09121v1 )

ライセンス: Link先を確認

Nidhi Munikote, Ang Li, Chenxu Liu, Samuel Stein,

(参考訳) 量子コンピュータの能力が向上し続ければ、その応用の可能性も高まる。例えば、量子技術は機械学習を実行するために古典的なニューラルネットワークと統合されている。このように、または量子化学シミュレーションや暗号アプリケーションのような他の広く使われるために、古典的なデータは量子符号化によって量子状態に変換する必要がある。基礎、振幅、回転の3つの基本的な符号化法と、いくつかの提案された組み合わせがある。本研究では、特にハイブリッド量子古典機械学習の文脈における符号化手法について検討する。本研究は、QuClassi量子ニューラルネットワークアーキテクチャを用いて、MNISTデータセットから `3' と `6' 桁のバイナリ分類を行い、資源使用量と計算複雑性を考慮しつつ、精度、エントロピー、損失、ノイズ耐性などのいくつかの指標を得る。

As quantum computers continue to become more capable, the possibilities of their applications increase. For example, quantum techniques are being integrated with classical neural networks to perform machine learning. In order to be used in this way, or for any other widespread use like quantum chemistry simulations or cryptographic applications, classical data must be converted into quantum states through quantum encoding. There are three fundamental encoding methods: basis, amplitude, and rotation, as well as several proposed combinations. This study explores the encoding methods, specifically in the context of hybrid quantum-classical machine learning. Using the QuClassi quantum neural network architecture to perform binary classification of the `3' and `6' digits from the MNIST datasets, this study obtains several metrics such as accuracy, entropy, loss, and resistance to noise, while considering resource usage and computational complexity to compare the three main encoding methods.

翻訳日:2024-10-30 16:13:24 公開日:2024-10-23

# 量子符号化技術の比較

Comparing Quantum Encoding Techniques ( http://arxiv.org/abs/2410.09121v2 )

ライセンス: Link先を確認

Nidhi Munikote,

翻訳日:2024-10-30 16:13:24 公開日:2024-10-23

# 自己回帰型タブラル変圧器を用いた事象予測のための簡易ベースライン

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers ( http://arxiv.org/abs/2410.10648v1 )

ライセンス: Link先を確認

Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum,

(参考訳) 例えば、クレジットカード取引が不正であるか、顧客が小売プラットフォームに商品を割り当てる格付けがあるかなどである。イベント予測への既存のアプローチには、コスト、脆性、タイムアウェアな位置埋め込み、学習行とフィールドエンコーディング、クラス不均衡に対処するオーバーサンプリングメソッドなど、アプリケーションに依存した技術がある。さらに、これらのアプローチは、例えば、すべての歴史的なイベントのラベルを知っている、あるいは、データの特徴自体ではなく、事前に指定されたラベルだけを予測する、といった特定のユースケースを前提としています。本研究では,基本的な位置埋め込みと因果言語モデリングの目的を有する標準自己回帰型LPM型トランスフォーマを用いた,単純だが柔軟なベースラインを提案する。私たちのベースラインは、一般的なデータセットで既存のアプローチよりも優れており、さまざまなユースケースに使用することができます。我々は、同じモデルがラベルを予測したり、欠落した値をインプットしたり、イベントシーケンスをモデル化できることを示した。

Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.

翻訳日:2024-10-29 20:25:02 公開日:2024-10-23

# 自己回帰型タブラル変圧器を用いた事象予測のための簡易ベースライン

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers ( http://arxiv.org/abs/2410.10648v2 )

ライセンス: Link先を確認

Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, C. Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum,

翻訳日:2024-10-29 20:25:02 公開日:2024-10-23

PDF登録状況（公開日: 20241023）