Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240622となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 論理的知識追跡モデルにおける意図的要因とスペーシングを統合して学習順序がカテゴリー学習に与える影響を探索する Integrating Attentional Factors and Spacing in Logistic Knowledge Tracing Models to Explore the Impact of Training Sequences on Category Learning ( http://arxiv.org/abs/2407.15020v1 ) ライセンス: Link先を確認	Meng Cao, Philip I. Pavlik Jr., Wei Chu, Liang Zhang,	(参考訳) カテゴリー学習では、ブロックとは対照的に、インターリービングの影響を探求する文献が増えている。逐次的注意仮説は、インターリービングはカテゴリ間の差異に注意を向け、ブロックはカテゴリ内の類似性に注意を向ける、という仮説である。近年の研究では、記憶と注意要素の協調的影響がシークエンシング効果に与える影響を浮き彫りにしているが、学生のパフォーマンスに与える影響を総合的に理解するために、注意と記憶の両方を統合した効果的な計算モデルが不足している。本研究は,留学生の学習シーケンス(インターリービングとブロッキング)におけるパフォーマンスを監視するために,注目要因の新たな統合とロジスティック・ナレッジ・トレース(LKT)モデルへのスペーシングを提案する。目的因子は, 同一又は異なるカテゴリーに属するか否かを考慮し, 隣接臨床試験の比較回数を記録することで構成した。時間間隔を考慮するためにいくつかの特徴が採用された。モデルの適合性テストや,学習セッションやテスト後の予測にクロスバリデーションを使用した。その結果,AFM(Additive Factors Model)に注意要素とスペーシング特徴を組み込むことによって,インターリーブとブロッキングの効果を捉える能力が著しく向上し,学生の学習結果の予測精度が向上することが判明した。注意要因と記憶過程のギャップを埋めることで、我々の計算手法は、教育環境におけるカテゴリ学習の結果を理解し予測するためのより包括的なフレームワークを提供する。 In category learning, a growing body of literature has increasingly focused on exploring the impacts of interleaving in contrast to blocking. The sequential attention hypothesis posits that interleaving draws attention to the differences between categories while blocking directs attention toward similarities within categories. Although a recent study underscores the joint influence of memory and attentional factors on sequencing effects, there remains a scarcity of effective computational models integrating both attentional and memory considerations to comprehensively understand the effect of training sequences on students' performance. This study introduces a novel integration of attentional factors and spacing into the logistic knowledge tracing (LKT) models to monitor students' performance across different training sequences (interleaving and blocking). Attentional factors were incorporated by recording the counts of comparisons between adjacent trials, considering whether they belong to the same or different category. Several features were employed to account for temporal spacing. We used cross-validations to test the model fit and predictions on the learning session and posttest. Our findings reveal that incorporating both attentional factors and spacing features in the Additive Factors Model (AFM) significantly enhances its capacity to capture the effects of interleaving and blocking and demonstrates superior predictive accuracy for students' learning outcomes. By bridging the gap between attentional factors and memory processes, our computational approach offers a more comprehensive framework for understanding and predicting category learning outcomes in educational settings.	翻訳日:2024-07-28 18:39:09 公開日:2024-06-22
# AI適応の規制:AI医療機器のアップデートの分析 Regulating AI Adaptation: An Analysis of AI Medical Device Updates ( http://arxiv.org/abs/2407.16900v1 ) ライセンス: Link先を確認	Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou,	(参考訳) 近年、AIの開発ペースは急速に進歩しているが、安全で効果的な規制フレームワークの実装は遅れを取っている。特に、AIモデルの適応性は、モデルのアップデートによってパフォーマンスが向上するだけでなく、安全性のリスクも伴うため、規制当局に固有の課題をもたらす。米国では、食品医薬品局(FDA)が、何百ものAI医療機器の規制と承認の先駆者だ。 AIの更新方法とその規制上の配慮をより深く理解するために、FDAが承認したAI医療機器の更新頻度と性質を体系的に分析する。新しいデータで再トレーニングした結果、全デバイスレポートの2%未満が更新されていることがわかった。一方、デバイスの4分の1近くが、新しい機能とマーケティングクレームの形でアップデートを報告している。実験例では, 気胸検出モデルの解析を行い, 新たな部位で評価した場合, モデル性能は0.18AUCまで低下するが, サイト固有のデータによる再トレーニングは, この性能低下を軽減し, 0.23AUCまで回復できることがわかった。しかし,新たなサイトからのデータを用いて再トレーニングを行った結果,元のサイトが著しく劣化していることも確認できた。私たちの分析は、FDAが承認したAIデバイスのアップデートの状況を詳細に分析し、モデル更新と適応AIに対する将来の規制ポリシーに関する洞察を提供します。 While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner in regulating and approving hundreds of AI medical devices. To better understand how AI is updated and its regulatory considerations, we systematically analyze the frequency and nature of updates in FDA-approved AI medical devices. We find that less than 2% of all devices report having been updated by being re-trained on new data. Meanwhile, nearly a quarter of devices report updates in the form of new functionality and marketing claims. As an illustrative case study, we analyze pneumothorax detection models and find that while model performance can degrade by as much as 0.18 AUC when evaluated on new sites, re-training on site-specific data can mitigate this performance drop, recovering up to 0.23 AUC. However, we also observed significant degradation on the original site after re-training using data from new sites, providing insight from one example that challenges the current one-model-fits-all approach to regulatory approvals. Our analysis provides an in-depth look at the current state of FDA-approved AI device updates and insights for future regulatory policies toward model updating and adaptive AI.	翻訳日:2024-07-28 18:19:29 公開日:2024-06-22
# 5G対応モノのインターネットのための適応型ディジタルツインと通信効率の良いフェデレーション学習ネットワークスライシング Adaptive Digital Twin and Communication-Efficient Federated Learning Network Slicing for 5G-enabled Internet of Things ( http://arxiv.org/abs/2407.10987v1 ) ライセンス: Link先を確認	Daniel Ayepah-Mensah, Guolin Sun, Yu Pang, Wei Jiang,	(参考訳) ネットワークスライシングは、ネットワークリソースの効率的な使用と管理を通じて、増大する要求を満たすために、マルチサービスとリソース要求を区別した産業用IoTネットワークを可能にする。通常、ネットワークスライスオーケストレータは、各スライスに対する需要予測に依存して、情報的決定を行い、リソース利用を最大化する。次世代のIndustry 4.0は、物理システムを正確な意思決定のためにデジタルモデルにマッピングするデジタルツインを導入した。提案手法では,まずグラフアテンションネットワークを用いて,ネットワークスライスのためのディジタルツイン環境を構築し,リアルタイムトラフィック分析,監視,需要予測を実現する。これらの予測に基づいて、資源配分問題を連合型多エージェント強化学習問題として定式化し、資源配分政策を決定するために、スライスのプライバシを保ちながら、深い決定論的政策勾配を用いる。提案手法は,ネットワークスライスに対する需要予測の精度を向上し,動的ネットワークスライシングの通信オーバーヘッドを低減できることを示す。 Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has introduced digital twins to map physical systems to digital models for accurate decision-making. In our approach, we first use graph-attention networks to build a digital twin environment for network slices, enabling real-time traffic analysis, monitoring, and demand forecasting. Based on these predictions, we formulate the resource allocation problem as a federated multi-agent reinforcement learning problem and employ a deep deterministic policy gradient to determine the resource allocation policy while preserving the privacy of the slices. Our results demonstrate that the proposed approaches can improve the accuracy of demand prediction for network slices and reduce the communication overhead of dynamic network slicing.	翻訳日:2024-07-22 12:39:32 公開日:2024-06-22
# LOGIC-LM++:シンボリックな定式化のためのマルチステップリファインメント LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations ( http://arxiv.org/abs/2407.02514v1 ) ライセンス: Link先を確認	Shashank Kirtania, Priyanshu Gupta, Arjun Radhakirshna,	(参考訳) 本稿では,複雑な推論タスクに対するLarge Language Models(LLM)の限界について検討する。現在のアプローチでは、形式言語を推論問題の中間表現として活用しているが、中間形式仕様の生成とそれらの表現の修正に苦慮している。そこで本研究では,Logic-LM++の改良であるLogic-LM++を提案する。 LLMの機能をペアで比較し、LLMが提案する改善点の評価を可能にする。本稿では、Logic-LM++が、FOLIOとAR-LSATという2つのデータセット上の自然言語推論タスクにおいて、Logic-LMとLLMに基づく技術より優れていることを示す。 Logic-LM++は、標準のプロンプトで13.5%、思考の連鎖で11%、Logic-LMで5%の平均的な改善を示している。 In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. While current approaches leverage formal languages as intermediate representation of reasoning problems, they struggle with generating intermediate formal specifications and refining these representations. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and LLM based techniques on natural language reasoning tasks on two datasets, FOLIO and AR-LSAT. Logic-LM++ show an average improvement of 13.5% on standard prompting, 11% on chain of thought prompting and 5% on Logic-LM.	翻訳日:2024-07-07 13:14:55 公開日:2024-06-22
# ビッツ2に基づくマルチスピーカ多言語音声クローニングシステム A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge ( http://arxiv.org/abs/2406.17801v1 ) ライセンス: Link先を確認	Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu,	(参考訳) 本稿では,トラック2を中心に,LIMMITS'24チャレンジのための音声合成システムの開発について述べる。この課題の目的は、男性と女性の両方の話者で7つのインド語をカバーし、音声クローニング機能を備えた多言語多言語Indic Text-to-Speechシステムを確立することである。このシステムは、課題データを用いて訓練され、ターゲットスピーカー上で数発の音声クローンを行うための微調整が行われた。評価には、自然性および話者類似性を評価する主観的テストを含む、7言語すべてにわたる単言語合成と多言語合成の両方が含まれていた。本システムは,多言語IDとBERTモデルで拡張したVITS2アーキテクチャを用いて,文脈言語理解を強化する。追加のデータ使用が許可されていないトラック1では、私たちのモデルは話者類似度スコア4.02を達成しました。追加データの使用を可能にするトラック2では、話者類似度スコアが4.17に達した。 This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers. Evaluation included both mono-lingual and cross-lingual synthesis across all seven languages, with subjective tests assessing naturalness and speaker similarity. Our system uses the VITS2 architecture, augmented with a multi-lingual ID and a BERT model to enhance contextual language comprehension. In Track 1, where no additional data usage was permitted, our model achieved a Speaker Similarity score of 4.02. In Track 2, which allowed the use of extra data, it attained a Speaker Similarity score of 4.17.	翻訳日:2024-06-27 17:46:26 公開日:2024-06-22
# 大規模言語モデルのパーソナライズにおけるユーザプロファイルの役割の理解 Understanding the Role of User Profile in the Personalization of Large Language Models ( http://arxiv.org/abs/2406.17803v1 ) ライセンス: Link先を確認	Bin Wu, Zhengyan Shi, Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz,	(参考訳) LLM(Large Language Models)をパーソナライズするためにユーザプロファイルを利用することで、幅広いタスクのパフォーマンスを向上させることが示されている。しかし,LLMにおけるユーザプロファイルの正確な役割とその効果メカニズムは未だ不明である。本研究はまず,ユーザプロファイルの有効性が主に意味情報よりもパーソナライズ情報によるものであることを確認した。さらに,ユーザプロファイルがLLMのパーソナライズにどのように影響するかを検討する。ユーザプロファイル内では,LDMをパーソナライズする上で重要な役割を果たすユーザによって作成された,あるいは承認された歴史的パーソナライズされた応答が明らかにされている。この発見はLLMの可能性を解き明かし、限られた入力長の制約により多くのユーザプロファイルを組み込む。ユーザプロファイルの位置については、入力コンテキストの異なる位置に統合されたユーザプロファイルがパーソナライズに等しく寄与しないことが観察される。代わりに、始めに近いユーザープロファイルがLLMのパーソナライズにより多くの影響を及ぼす。本研究は, LLMのパーソナライズにおけるユーザプロファイルの役割を明らかにし, ユーザプロファイルを組み込むことが, ユーザプロファイルを効果的に活用するための洞察を提供する性能に与える影響を明らかにする。 Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.	翻訳日:2024-06-27 17:46:26 公開日:2024-06-22
# 低磁場可搬型MRIスキャナの電磁界除去法の検討 A Review of Electromagnetic Elimination Methods for low-field portable MRI scanner ( http://arxiv.org/abs/2406.17804v1 ) ライセンス: Link先を確認	Wanyu Bian,	(参考訳) 本稿では,MRIシステムにおける電磁干渉(EMI)を除去するための従来法と深層学習法の両方を包括的に分析する。我々は、最先端のディープラーニングアプローチと同様に、従来の分析的および適応的なEMI除去技術の基礎となる原則と実装について検討する。詳細な比較によって,各手法の強度と限界が強調される。複数の外部EMI受信コイルと解析技術を用いたアクティブEMI除去の最近の進歩は、広範囲なMRIデータに基づいてトレーニングされたニューラルネットワークを利用するディープラーニング手法の優れた性能と並行して議論されている。深層学習手法は、EMIの抑制、診断能力の向上、MRI技術のアクセシビリティ向上など、大幅な改善を示す一方で、特にプロダクションおよび商用アプリケーションにおいて、潜在的なセキュリティと安全性の懸念も導入している。本研究は、EMI除去におけるディープラーニングのメリットを十分に実現するために、これらの課題に対処する必要があることを明らかにする。この結果は,従来の手法の信頼性とディープラーニングの高度な能力を組み合わせることで,MRIシステムにおけるより堅牢で効果的なEMI抑制戦略を開発するためのバランスのとれたアプローチを示唆している。 This paper presents a comprehensive analysis of both conventional and deep learning methods for eliminating electromagnetic interference (EMI) in MRI systems. We explore the underlying principles and implementation of traditional analytical and adaptive EMI elimination techniques, as well as cutting-edge deep learning approaches. Through a detailed comparison, the strengths and limitations of each method are highlighted. Recent advancements in active EMI elimination utilizing multiple external EMI receiver coils and analytical techniques are discussed alongside the superior performance of deep learning methods, which leverage neural networks trained on extensive MRI data. While deep learning methods demonstrate significant improvements in EMI suppression, enhancing diagnostic capabilities and accessibility of MRI technology, they also introduce potential security and safety concerns, especially in production and commercial applications. This study underscores the need to address these challenges to fully realize the benefits of deep learning in EMI elimination. The findings suggest a balanced approach, combining the reliability of conventional methods with the advanced capabilities of deep learning, to develop more robust and effective EMI suppression strategies in MRI systems.	翻訳日:2024-06-27 17:46:26 公開日:2024-06-22
# LLMはデータレスプロンプトで可視化を生成することができるか? Can LLMs Generate Visualizations with Dataless Prompts? ( http://arxiv.org/abs/2406.17805v1 ) ライセンス: Link先を確認	Darius Coelho, Harshit Barot, Naitik Rathod, Klaus Mueller,	(参考訳) 大規模言語モデルの最近の進歩は情報アクセスに革命をもたらし、これらのモデルはWeb上で利用可能なデータを利用して複雑なクエリに対処し、多くのユーザにとって好まれる情報ソースとなっている。場合によっては、クエリは公開データに関するもので、データビジュアライゼーションによって効果的に答えられる。本稿では,このようなクエリに応答して,大規模言語モデルが正確なデータと関連する視覚化を提供する能力について検討する。具体的には,GPT-3 と GPT-4 によるデータレスプロンプトによる可視化機能について検討する。モデルの結果を,可視化の専門家が作成した浮き彫りシートと比較することで評価する。 Recent advancements in large language models have revolutionized information access, as these models harness data available on the web to address complex queries, becoming the preferred information source for many users. In certain cases, queries are about publicly available data, which can be effectively answered with data visualizations. In this paper, we investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries. Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query. We evaluate the results of the models by comparing them to visualization cheat sheets created by visualization experts.	翻訳日:2024-06-27 17:46:26 公開日:2024-06-22
# MOSSBench: マルチモーダル言語モデルは安全なクエリに過敏か? MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? ( http://arxiv.org/abs/2406.17806v1 ) ライセンス: Link先を確認	Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh,	(参考訳) 人間は認知の歪みに悩まされがちで、特定の刺激に対する過大な反応を引き起こす偏見のある思考パターンは、状況によって異なる。本稿では,高度マルチモーダル言語モデル (MLLM) が同様の傾向を示すことを示す。これらのモデルは、安全メカニズムの下でクエリに応答するように設計されているが、特定の視覚刺激が存在する場合の無害なクエリを拒否することがある。この行動を調査する最初のステップとして、既存のMLLMの過敏性を引き起こす3つの種類の刺激を同定する。これらの刺激に対するMLLMの過敏度を体系的に評価するために,Multimodal OverSenSitivity Benchmark (MOSSBench)を提案する。このツールキットは300個の手作業で収集された良質なマルチモーダルクエリで構成されており、サードパーティのレビュアー(AMT)によって相互に検証されている。 20個のMLLM上でのMOSSBenchを用いた実証研究により,いくつかの知見が得られた。 SOTA MLLMでは過敏性が一般的であり、無害なクエリに対して最大76%の拒絶率に達する。 (2)。安全性の増大は、モデルの応答において不注意と保守性を必然的に引き起こす可能性がある。 (3)。 MLLMの反応過程において、異なる種類の刺激が特定の段階(知覚、意図的推論、安全判断)でエラーを引き起こす傾向がある。これらの知見は、文脈的に適切な応答に注意を払い、現実のアプリケーションにおけるMLLMの信頼性を向上させるための、洗練された安全メカニズムの必要性を強調している。私たちのプロジェクトはhttps://turningpoint-ai.github.io/MOSSBench/で公開しています。 Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts. As the initial step in investigating this behavior, we identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT). Empirical studies using MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages -- perception, intent reasoning, and safety judgement -- in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications. We make our project available at https://turningpoint-ai.github.io/MOSSBench/.	翻訳日:2024-06-27 17:46:26 公開日:2024-06-22
# 平均場状態におけるニューラルネットによる関数近似:エントロピー正則化とマッキーン・ブラソフダイナミクスの制御 Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics ( http://arxiv.org/abs/2002.01987v4 ) ライセンス: Link先を確認	Belinda Tzen, Maxim Raginsky,	(参考訳) クルバック・リーブラーの発散という意味では、「ほぼガウス的」なランダム重みを持つ2層ニューラルネットによる関数近似の問題を考察する。我々の設定は平均場限界であり、隠れた層のニューロンの有限個体群は連続的なアンサンブルに置き換えられる。この問題は、重みの確率測度よりも(有限長の)経路の空間上で機能する自由エネルギーの大域的最小化と表現できる。この関数は、前の等方的ブラウン運動に関して経路のKL発散に対して終端測度の$L^2$近似リスクを負う。我々は、一意な大域最小化器を特徴づけ、それを達成できる重みよりも確率測度の空間における力学を考察する。特に、最適経路空間測度は、古典的なシュリンガー橋問題と密接に関連するマッキーン・ヴラソフ最適制御問題の解であるF\"ollmer drift"に対応することを示す。 F\'ollmer のドリフトは一般に閉形式では得られないため、その潜在的なアルゴリズム的有用性は制限されるが、エントロピー正則化の様々な条件下での有限時間近似として平均場ランゲヴィン拡散が実現可能であることを示す。具体的には、正則化が最小化密度が対数凹凸であるような場合、F\"ollmer"ドリフトを密に追跡することを示す。 We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the $L^2$ approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the F\"ollmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schr\"odinger bridge problem. While the F\"ollmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the F\"ollmer drift when the regularization is such that the minimizing density is log-concave.	翻訳日:2024-06-26 23:34:57 公開日:2024-06-22
# MetaGreen: グリーンセマンティックコミュニケーションのためのメタラーニングインスパイアされたトランスフォーマー選択 MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication ( http://arxiv.org/abs/2406.16962v1 ) ライセンス: Link先を確認	Shubhabrata Mukherjee, Cory Beard, Sejun Song,	(参考訳) セマンティックコミュニケーションは、情報伝達の方法を変え、個々のシンボルやビットに対して有意義で効果的なコンテンツを優先することができる。この進化は、レイテンシの削減、帯域使用量の削減、従来の通信に比べてスループットの向上など、大きなメリットをもたらす。しかし、セマンティックコミュニケーションの発展は、意味情報損失とエネルギー消費の合同効果をベンチマークするための普遍的な指標の必要性という、重要な課題に直面している。本研究では,「エネルギー最適化セマンティックロス」関数(EOSL)を導入し,セマンティック情報損失とエネルギー消費を効果的にバランスさせる新しい多目的ロス関数を提案する。エネルギーベンチマークを含む変圧器モデルに関する総合的な実験を通じて、EOSLモデル選択の顕著な効果を実証する。 EOSLベースのトランスフォーマーモデル選択はBLEUのスコアベース選択と比較して最大で83%の類似度/パワー比(SPR)と67倍のSPRを達成できることを確認した。さらに,EOSLの適用性は,メタラーニングの原則に触発されて,多様で多様なコンテキストに拡張する。 EOSLを累積的に適用することにより、モデル選択システムがこの変化に適応できるようにし、歴史的EOSL値を利用して学習プロセスをガイドする。この研究は、エネルギー効率の良いモデル選択とグリーンセマンティックコミュニケーションの発展の基礎を築いた。 Semantic Communication can transform the way we transmit information, prioritizing meaningful and effective content over individual symbols or bits. This evolution promises significant benefits, including reduced latency, lower bandwidth usage, and higher throughput compared to traditional communication. However, the development of Semantic Communication faces a crucial challenge: the need for universal metrics to benchmark the joint effects of semantic information loss and energy consumption. This research introduces an innovative solution: the ``Energy-Optimized Semantic Loss'' (EOSL) function, a novel multi-objective loss function that effectively balances semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including energy benchmarking, we demonstrate the remarkable effectiveness of EOSL-based model selection. We have established that EOSL-based transformer model selection achieves up to 83\% better similarity-to-power ratio (SPR) compared to BLEU score-based selection and 67\% better SPR compared to solely lowest power usage-based selection. Furthermore, we extend the applicability of EOSL to diverse and varying contexts, inspired by the principles of Meta-Learning. By cumulatively applying EOSL, we enable the model selection system to adapt to this change, leveraging historical EOSL values to guide the learning process. This work lays the foundation for energy-efficient model selection and the development of green semantic communication.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# グラフニューラルネットワークに対するリンクステアリング攻撃のための大規模言語モデル Large Language Models for Link Stealing Attacks Against Graph Neural Networks ( http://arxiv.org/abs/2406.16963v1 ) ライセンス: Link先を確認	Faqian Guan, Tianqing Zhu, Hui Sun, Wanlei Zhou, Philip S. Yu,	(参考訳) グラフデータには、引用ネットワークやレコメンデーションシステムなど、さまざまな領域に適用される、豊富なノードの特徴とユニークなエッジ情報が含まれている。グラフニューラルネットワーク(GNN)はそのようなデータを扱うのに特化しており、多くのアプリケーションで顕著なパフォーマンスを示している。しかし、GNNには機密情報が含まれており、プライバシー攻撃の影響を受けやすい可能性がある。例えば、リンク盗難は、攻撃者が2つのノードがリンクされているかどうかを推測する攻撃の一種である。以前のリンク盗難攻撃は、主にターゲットのGNNモデルからの後方確率に依存しており、ノードの特徴の重要性を無視していた。さらに、異なるデータセットにわたるノードクラスの変動は、後続確率の異なる次元をもたらす。これらのさまざまなデータ次元の扱いは、単一のモデルを使用して異なるデータセットに対するリンク盗難攻撃を効果的に実行する上で、課題となった。これらの課題に対処するため、GNNに対するリンク盗難攻撃を行うために、LLM(Large Language Models)を導入する。 LLMはテキスト機能を効果的に統合し、強力な一般化性を示し、様々なデータセットにわたる多様なデータ次元を攻撃で処理することができる。グラフノードのテキスト特徴と後部確率を効果的に組み合わせる2つの異なるLCMプロンプトを設計する。これらのプロンプトを通じて、リンク盗難攻撃タスクに対応するためにLLMを微調整する。さらに,複数のデータセットを用いてLLMを微調整し,異なるデータセットからの特徴を同時に学習することを可能にする。実験の結果,提案手法は,ホワイトボックスとブラックボックスの両方のシナリオにおいて,既存のリンク盗難攻撃タスクの性能を大幅に向上させることが示された。提案手法は,1つのモデルのみを用いて,異なるデータセットをまたいだリンク盗難攻撃を実行し,リンク盗難攻撃をより現実のシナリオに適用することができる。 Graph data contains rich node features and unique edge information, which have been applied across various domains, such as citation networks or recommendation systems. Graph Neural Networks (GNNs) are specialized for handling such data and have shown impressive performance in many applications. However, GNNs may contain of sensitive information and susceptible to privacy attacks. For example, link stealing is a type of attack in which attackers infer whether two nodes are linked or not. Previous link stealing attacks primarily relied on posterior probabilities from the target GNN model, neglecting the significance of node features. Additionally, variations in node classes across different datasets lead to different dimensions of posterior probabilities. The handling of these varying data dimensions posed a challenge in using a single model to effectively conduct link stealing attacks on different datasets. To address these challenges, we introduce Large Language Models (LLMs) to perform link stealing attacks on GNNs. LLMs can effectively integrate textual features and exhibit strong generalizability, enabling attacks to handle diverse data dimensions across various datasets. We design two distinct LLM prompts to effectively combine textual features and posterior probabilities of graph nodes. Through these designed prompts, we fine-tune the LLM to adapt to the link stealing attack task. Furthermore, we fine-tune the LLM using multiple datasets and enable the LLM to learn features from different datasets simultaneously. Experimental results show that our approach significantly enhances the performance of existing link stealing attack tasks in both white-box and black-box scenarios. Our method can execute link stealing attacks across different datasets using only a single model, making link stealing attacks more applicable to real-world scenarios.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# 言語モデルは時系列予測に本当に有用か? Are Language Models Actually Useful for Time Series Forecasting? ( http://arxiv.org/abs/2406.16964v1 ) ライセンス: Link先を確認	Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, Thomas Hartvigsen,	(参考訳) 大規模言語モデル(LLM)は時系列タスク、特に時系列予測に応用されている。しかし、言語モデルは実際に時系列に役立ちますか? 最近の3つのLCMに基づく時系列予測手法に関する一連のアブレーション研究の結果、LCM成分を除去したり、基本的な注意層に置き換えたりしても、予測結果を劣化させることはないことが判明した。また,計算コストが非常に高いにもかかわらず,事前学習したLLMは,スクラッチからトレーニングしたモデルに劣らず,時系列の逐次的依存関係を表現せず,数ショット設定を補助しないことがわかった。さらに、時系列エンコーダを探索し、パッチとアテンション構造が最先端のLSMベースの予測器と同様に機能することを明らかにする。 Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results -- in most cases the results even improved. We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and reveal that patching and attention structures perform similarly to state-of-the-art LLM-based forecasters.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# 再生可能エネルギー分野におけるAIの現状と将来 : 総合調査 Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey ( http://arxiv.org/abs/2406.16965v1 ) ライセンス: Link先を確認	Abdur Rashid, Parag Biswas, Angona Biswas, MD Abdullah Al Nasim, Kishor Datta Gupta, Roy George,	(参考訳) 人工知能(AI)は、近年のデジタル化の結果として、電力システムを含む様々な産業におけるプロセスの合理化に欠かせない手段となっている。人工知能のアルゴリズムは、統計学習理論に基づくデータ駆動モデルであり、電力システムとそのユーザが生成するデータを利用するためのツールとして使用される。当初我々は,再生可能エネルギー(RE)に関連する人工知能(AI)応用の詳細な文献分析を行った。次に、再生可能エネルギー工場の徹底的な分析を行い、その適合性と、最も広く使われている適切なAIアルゴリズムのリストを示す。現代の電力システムの再生可能エネルギー(RE)を支援するため、9つのAIベースの戦略がここで特定されている。本研究は、再生可能エネルギーに使用されるいくつかのAI技術と、再生可能エネルギーの異なる分野にわたる様々なインテリジェントシステム応用ドメインの研究のための文献の方法論的分析を含む。本研究は,9種類の研究手法の性能と成果を評価し,その強度と限界に関する貴重な知見を抽出することを目的としている。この研究は、再生可能エネルギー生成にAI技術を使用すること、再生可能エネルギー予測にAIを活用すること、エネルギーシステムの最適化という3つの主要なトピックについても論じている。さらに、制御性、データハンドリング、サイバーアタック防止、スマートグリッド実装、ロボティクスといった従来のモデルよりもAIの方が、エネルギー産業の未来を形作る上で重要であることを探求した。さらに、再生可能エネルギーのためのAIの統合における今後の方向性について概説する。 Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we perform a thorough literature analysis of artificial intelligence (AI) applications related to renewable energy (RE). Next, we present a thorough analysis of renewable energy factories and assess their suitability, along with a list of the most widely used and appropriate AI algorithms. Nine AI-based strategies are identified here to assist Renewable Energy (RE) in contemporary power systems. This survey paper comprises an extensive review of the several AI techniques used for renewable energy as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of renewable energy. This literature review identifies the performance and outcomes of nine different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. This study also addressed three main topics: using AI technology for renewable power generation, utilizing AI for renewable energy forecasting, and optimizing energy systems. Additionally, it explored AI's superiority over conventional models in controllability, data handling, cyberattack prevention, smart grid implementation, robotics- AI's significance in shaping the future of the energy industry. Furthermore, this article outlines future directions in the integration of AI for renewable energy.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# ソフトラベルを用いた合成サンプルによる雑音重畳の緩和 Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels ( http://arxiv.org/abs/2406.16966v1 ) ライセンス: Link先を確認	Yangdi Lu, Wenbo He,	(参考訳) ノイズラベルは、特にクラウドソーシングやWeb検索から派生した大規模データセットにおいて、現実世界のデータセットにおいてユビキタスである。トレーニング中にノイズラベルを過度に適合させる傾向にあるため、ノイズの多いデータセットでディープニューラルネットワークをトレーニングすることは困難であり、結果として一般化性能は低下する。早期学習期間中、深層ニューラルネットワークは、誤ってラベル付けされたサンプルを記憶する前にクリーンなサンプルに適合することが観察されている。本稿では,初期学習段階における表現分布を深く掘り下げ,ノイズラベルによらず,同じカテゴリのイメージの学習表現がいっしょに集まっていることを見出した。そこで本研究では,ノイズラベルの影響を軽減するために,新しい合成サンプルを用いてモデルを訓練するフレームワークを提案する。具体的には, 原試料を最寄りの近傍に凝集させて合成試料を合成する方法を提案し, 試料当たりの損失分布から学習した混合モデルを用いて重量を算出する。極端ラベルノイズの存在下での性能を高めるため,ノイズラベルを徐々に補正することにより,ソフトターゲットを推定する。さらに, 推定したソフトターゲットは, より正確な真実ラベルの近似を導出し, 提案手法は, より分離された, 明確に有界なクラスタを持つ学習表現の優れた品質が得られることを示した。 2つのベンチマーク(CIFAR-10とCIFAR-100)と2つの大規模実世界のデータセット(Clothing1MとWebvision)の広範な実験により、我々のアプローチは、最先端の手法と学習表現の堅牢性より優れていることが示された。 Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to fit the clean samples before memorizing the mislabeled samples. In this paper, we dig deeper into the representation distributions in the early learning phase and find that, regardless of their noisy labels, learned representations of images from the same category still congregate together. Inspired by it, we propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels. Specifically, we propose a mixing strategy to create the synthetic samples by aggregating original samples with their top-K nearest neighbours, wherein the weights are calculated using a mixture model learning from the per-sample loss distribution. To enhance the performance in the presence of extreme label noise, we estimate the soft targets by gradually correcting the noisy labels. Furthermore, we demonstrate that the estimated soft targets yield a more accurate approximation to ground truth labels and the proposed method produces a superior quality of learned representations with more separated and clearly bounded clusters. The extensive experiments in two benchmarks (CIFAR-10 and CIFAR-100) and two larg-scale real-world datasets (Clothing1M and Webvision) demonstrate that our approach outperforms the state-of-the-art methods and robustness of the learned representation.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# 抑うつ認識のためのマルチスケールコントラストを用いたマルチモーダル生理信号表現学習 Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition ( http://arxiv.org/abs/2406.16968v1 ) ライセンス: Link先を確認	Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen,	(参考訳) 機能的近赤外分光法(fNIRS)や脳波法(EEG)などの生理的信号に基づく抑うつ認識は大きな進歩を遂げている。しかし、既存のほとんどの研究は、複雑な時空間パターンにおける同じ刺激課題の下での多モード生理的信号の相補性と意味的一貫性を無視している。本稿では,抑うつ認識のためのマルチスケールコントラストを用いたシームズアーキテクチャを用いたマルチモーダル生理学的信号表現学習フレームワークを提案する。まず、fNIRSとEEGは、時間領域データ拡張戦略に基づいて異なるが相関したデータに変換される。そして,重み共有型マルチスケール時空間畳み込みにより,fNIRSとEEGの表現を学習する時空間コントラストモジュールを設計する。さらに、刺激タスクに関連する意味表現の学習を強化するために、fNIRSとEEGの意味的類似性を最大化することを目的とした意味一貫性コントラストモジュールを提案する。公開および自己収集された多モード生理信号データセットに関する大規模な実験は、MRLMCが最先端のモデルよりも優れていることを示している。さらに,提案するフレームワークは,下流タスクをマルチモーダル時系列に転送することができる。 Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# 流れの正規化のためのフレキシブルテール Flexible Tails for Normalizing Flows ( http://arxiv.org/abs/2406.16971v1 ) ライセンス: Link先を確認	Tennessee Hickling, Dennis Prangle,	(参考訳) 正規化フローは、単純な基底分布の変換として表される確率分布の柔軟なクラスである。標準正規化フローの制限は、密度推定と変分推論の両方に適用される重い尾を持つ分布を表す。この問題に対する一般的な解決策は、重い尾のベース分布を使用することである。例えば、Laszkiewicz et al (2022)のTAF法がある。これは、重み付き入力下でのフローの正規化などのニューラルネットワークの最適化が困難であるため、パフォーマンスが低下する可能性がある、と我々は主張する。この問題は我々の論文で実証されている。ガウス基底分布と重みを生成可能な最終変換層を用いる方法を提案する。我々はこのアプローチをテールトランスフォーメーションフロー (TTF) と呼ぶ。実験により,本手法は,特にターゲット分布が大きな寸法あるいはテールウェイトである場合,現在の手法よりも優れることが示された。 Normalizing flows are a flexible class of probability distributions, expressed as transformations of a simple base distribution. A limitation of standard normalizing flows is representing distributions with heavy tails, which arise in applications to both density estimation and variational inference. A popular current solution to this problem is to use a heavy tailed base distribution. Examples include the tail adaptive flow (TAF) methods of Laszkiewicz et al (2022). We argue this can lead to poor performance due to the difficulty of optimising neural networks, such as normalizing flows, under heavy tailed input. This problem is demonstrated in our paper. We propose an alternative: use a Gaussian base distribution and a final transformation layer which can produce heavy tails. We call this approach tail transform flow (TTF). Experimental results show this approach outperforms current methods, especially when the target distribution has large dimension or tail weight.	翻訳日:2024-06-26 19:10:10 公開日:2024-06-22
# 効率的なNASに基づく不均衡データセット処理手法 An Efficient NAS-based Approach for Handling Imbalanced Datasets ( http://arxiv.org/abs/2406.16972v1 ) ライセンス: Link先を確認	Zhiwei Yao,	(参考訳) クラス不均衡は実世界のデータ配信において一般的な問題であり、正確な分類器の訓練に悪影響を及ぼす。この問題を緩和する伝統的なアプローチは、クラス再バランス、情報伝達、表現学習の3つの主要なカテゴリに分類される。本稿では,ニューラルアーキテクチャサーチ(NAS)によるバックボーンアーキテクチャの最適化により,長い尾を持つデータセットの性能を向上させる新しい手法を提案する。我々の研究は、バランスの取れたデータセット上でのアーキテクチャの精度が、バランスの取れていないデータセットのパフォーマンスを確実に予測できないことを示している。これにより、計算コストのかかる長いデータセット上で完全なNASを実行する必要がある。 IMB-NASは、バランスの取れたソースデータセットでトレーニングされたNASスーパーネットワークを、不均衡なターゲットデータセットに効率的に適応することを提案する。本報告では, IMB-NASの基本技術について, NASやアーキテクチャ転送など, 詳細な解説を行う。様々な適応戦略の中で、最も効果的なアプローチは、バランスの取れたソースデータセットでトレーニングされたバックボーンNASスーパーネットワークを凍結させながら、線形分類ヘッドを再重み付き損失で再訓練することである。最後に,不均衡なCIFARデータセットを用いて評価実験を行った。結論はIMB-NAS論文で提案されているものと同じである。 Class imbalance is a common issue in real-world data distributions, negatively impacting the training of accurate classifiers. Traditional approaches to mitigate this problem fall into three main categories: class re-balancing, information transfer, and representation learning. This paper introduces a novel approach to enhance performance on long-tailed datasets by optimizing the backbone architecture through neural architecture search (NAS). Our research shows that an architecture's accuracy on a balanced dataset does not reliably predict its performance on imbalanced datasets. This necessitates a complete NAS run on long-tailed datasets, which can be computationally expensive. To address this computational challenge, we focus on existing work, called IMB-NAS, which proposes efficiently adapting a NAS super-network trained on a balanced source dataset to an imbalanced target dataset. A detailed description of the fundamental techniques for IMB-NAS is provided in this paper, including NAS and architecture transfer. Among various adaptation strategies, we find that the most effective approach is to retrain the linear classification head with reweighted loss while keeping the backbone NAS super-network trained on the balanced source dataset frozen. Finally, we conducted a series of experiments on the imbalanced CIFAR dataset for performance evaluation. Our conclusions are the same as those proposed in the IMB-NAS paper.	翻訳日:2024-06-26 19:00:25 公開日:2024-06-22
# 循環行列のMDS特性に関する一考察 A note on MDS Property of Circulant Matrices ( http://arxiv.org/abs/2406.16973v1 ) ライセンス: Link先を確認	Tapas Chatterjee, Ayantika Laha,	(参考訳) 2014年、Gupta と Ray は有限体 $\mathbb{F}_{2^m}$ 上の循環不変行列が最大距離分離(MDS)できないことを証明した。この非存在は、標数 2^d \times 2^d$ の巡回直交行列(英語版)(circulant orthogonal matrices of order $2^d \times 2^d$ over finite field of characteristic $2$)にまで拡張する。これらの知見は, 実用化を念頭に, 軽量MDS行列を構築するための循環特性を一般化するきっかけとなった。最近、2022年、チャタジーとラハは半インボリュート性および半直交性を考慮した循環行列の研究を開始した。彼らの研究を拡張して、この記事は有限体 $\mathbb{F}_{2^m} 上のこれらの特性を持つ循環行列に展開する。例えば、関連する対角行列のトレースと行列のMDS特性の相関関係を確立する。この相関は、任意の順序の半直交行列とすべての順序の半インボリュートリー行列に対して成り立つことを証明している。さらに、標数 2$ の有限体上の奇数順序の循環的半直交行列に対して、関連する対角行列のトレースはゼロでない値を持つことができる。 In $2014$, Gupta and Ray proved that the circulant involutory matrices over the finite field $\mathbb{F}_{2^m}$ can not be maximum distance separable (MDS). This non-existence also extends to circulant orthogonal matrices of order $2^d \times 2^d$ over finite fields of characteristic $2$. These findings inspired many authors to generalize the circulant property for constructing lightweight MDS matrices with practical applications in mind. Recently, in $2022,$ Chatterjee and Laha initiated a study of circulant matrices by considering semi-involutory and semi-orthogonal properties. Expanding on their work, this article delves into circulant matrices possessing these characteristics over the finite field $\mathbb{F}_{2^m}.$ Notably, we establish a correlation between the trace of associated diagonal matrices and the MDS property of the matrix. We prove that this correlation holds true for even order semi-orthogonal matrices and semi-involutory matrices of all orders. Additionally, we provide examples that for circulant, semi-orthogonal matrices of odd orders over a finite field with characteristic $2$, the trace of associated diagonal matrices may possess non-zero values.	翻訳日:2024-06-26 19:00:25 公開日:2024-06-22
# SHDB-AF : 心房細動のホルター心電図データベース SHDB-AF: a Japanese Holter ECG database of atrial fibrillation ( http://arxiv.org/abs/2406.16974v1 ) ライセンス: Link先を確認	Kenta Tsutsui, Shany Biton Brimer, Noam Ben-Moshe, Jean Marc Sellal, Julien Oster, Hitoshi Mori, Yoshifumi Ikeda, Takahide Arai, Shintaro Nakano, Ritsushi Kato, Joachim A. Behar,	(参考訳) 心房細動 (AF) は、生活の質を損なう一般的な心房不整脈であり、塞栓性脳卒中、心不全、その他の合併症を引き起こす。機械学習(ML)とディープラーニング(DL)の最近の進歩は、診断精度を高める可能性を示している。 DLモデルは、民族、年齢、性別、その他の要因において堅牢で一般化可能であることが不可欠である。多くのECGデータベースが研究コミュニティで利用可能になっているが、日本の人口サンプルは含まれていない。 Saitama Heart Database Atrial Fibrillation (SHDB-AF)は、日本発の新規オープンソースホルター心電図データベースである。 SHDB-AFのそれぞれのレコードは24時間の長さで200Hzでサンプリングされ、合計で2400万秒のECGデータである。 Atrial fibrillation (AF) is a common atrial arrhythmia that impairs quality of life and causes embolic stroke, heart failure and other complications. Recent advancements in machine learning (ML) and deep learning (DL) have shown potential for enhancing diagnostic accuracy. It is essential for DL models to be robust and generalizable across variations in ethnicity, age, sex, and other factors. Although a number of ECG database have been made available to the research community, none includes a Japanese population sample. Saitama Heart Database Atrial Fibrillation (SHDB-AF) is a novel open-sourced Holter ECG database from Japan, containing data from 100 unique patients with paroxysmal AF. Each record in SHDB-AF is 24 hours long and sampled at 200 Hz, totaling 24 million seconds of ECG data.	翻訳日:2024-06-26 19:00:25 公開日:2024-06-22
# 多出力回帰のためのエクササイズと近似等式推論 Exact and Approximate Conformal Inference for Multi-Output Regression ( http://arxiv.org/abs/2210.17405v2 ) ライセンス: Link先を確認	Chancellor Johnstone, Eugene Ndiaye,	(参考訳) 機械学習では、与えられた共変量情報$x$のレスポンスを$y$と見積もるのが一般的である。しかし、これらの予測だけでは、これらの予測に関連する不確実性は定量化されない。この欠損を克服する1つの方法は、所定の確率で観測されない応答$y$を含む集合を構成する共形推論法である。残念なことに、1次元の応答であっても、最近の奨励的な進歩にもかかわらず、共形推論は計算に高価である。本稿では、予測モデルが$y$の線形関数として記述できる場合に、共形推論の正確な導出を$p$-値で提供する多出力回帰について検討する。さらに, 線形および非線形の両方の多出力予測器に対して, 共形予測領域を効率よく近似し, 計算上の優位性を保ちながら, より効率的な方法として, textt{unionCP} と多変量拡張を提案する。また、実世界とシミュレーションデータの両方を用いて、これらの手法の有効性に関する理論的および実証的な証拠を提供する。 It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference $p$-values when the predictive model can be described as a linear function of $y$. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.	翻訳日:2024-06-26 05:28:15 公開日:2024-06-22
# 視野を超えて:Clip-recurrent Transformerによるシーンの可視性と知覚を高める Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer ( http://arxiv.org/abs/2211.11293v3 ) ライセンス: Link先を確認	Hao Shi, Qi Jiang, Kailun Yang, Xiaoting Yin, Ze Wang, Kaiwei Wang,	(参考訳) 視覚センサーは車両、ロボット、道路インフラストラクチャーに広く応用されている。しかし、ハードウェアコストとシステムサイズに制限があるため、FoV(Field-of-View)はしばしば制限され、十分なカバレッジを提供することができない。しかし、時空間的観点からは、過去のビデオストリームからカメラの物理的FoV以外の情報を得ることができる。本稿では,車両の視野を拡大し,シーンの可視性,知覚性,システム安全性を向上するオンラインビデオインペインティングの概念を提案する。これを実現するために、光学フローを明示的に用い、特徴伝搬のための新規なクリップリカレント変換器を暗黙的に組み込んだFlowLensアーキテクチャを導入する。 FlowLensには2つの重要な機能がある。 1) FlowLensは、3Dデカップリングされたクロスアテンション(DDCA)を備えた新たに設計されたClip-Recurrent Hubを含み、時間とともに蓄積されたグローバル情報を段階的に処理する。 2)MixF3N(MixF3N)とMixF3N(MixF3N)を統合し,局所的な特徴の正確な空間フローを向上させる。トレーニングと評価を容易にするため,様々なFoVマスクを用いたKITTI360データセットを作成した。また,FoV以上のセマンティクスの定量的評価と定性比較と,FoV以外のオブジェクト検出を異なるモデルで行う。本研究では,FlowLensを用いて見えないシーンを再構成することで,信頼性の高いセマンティックコンテキストを提供することで,視野内での認識を向上することを示す。オフラインおよびオンラインビデオのインペイントを含む大規模な実験とユーザスタディ、さらにはFoV以外の知覚タスクは、FlowLensが最先端のパフォーマンスを達成することを実証している。ソースコードとデータセットはhttps://github.com/MasterHow/FlowLensで公開されている。 Vision sensors are widely applied in vehicles, robots, and roadside infrastructure. However, due to limitations in hardware cost and system size, camera Field-of-View (FoV) is often restricted and may not provide sufficient coverage. Nevertheless, from a spatiotemporal perspective, it is possible to obtain information beyond the camera's physical FoV from past video streams. In this paper, we propose the concept of online video inpainting for autonomous vehicles to expand the field of view, thereby enhancing scene visibility, perception, and system safety. To achieve this, we introduce the FlowLens architecture, which explicitly employs optical flow and implicitly incorporates a novel clip-recurrent transformer for feature propagation. FlowLens offers two key features: 1) FlowLens includes a newly designed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated over time. 2) It integrates a multi-branch Mix Fusion Feed Forward Network (MixF3N) to enhance the precise spatial flow of local features. To facilitate training and evaluation, we derive the KITTI360 dataset with various FoV mask, which covers both outer- and inner FoV expansion scenarios. We also conduct both quantitative assessments and qualitative comparisons of beyond-FoV semantics and beyond-FoV object detection across different models. We illustrate that employing FlowLens to reconstruct unseen scenes even enhances perception within the field of view by providing reliable semantic context. Extensive experiments and user studies involving offline and online video inpainting, as well as beyond-FoV perception tasks, demonstrate that FlowLens achieves state-of-the-art performance. The source code and dataset are made publicly available at https://github.com/MasterHow/FlowLens.	翻訳日:2024-06-26 05:28:15 公開日:2024-06-22
# 確率分布の近接性と$k$-wise一様性に対する付帯量子テスター Succinct quantum testers for closeness and $k$-wise uniformity of probability distributions ( http://arxiv.org/abs/2304.12916v3 ) ライセンス: Link先を確認	Jingquan Luo, Qisheng Wang, Lvzhou Li,	(参考訳) 確率分布の近さ特性と$k$-wise均一性をテストする基本的な問題に対する潜在的な量子スピードアップについて検討する。 \textit{Closeness testing} は、2つの$n$次元分布が同一であるか、少なくとも$$\varepsilon$-far in $\ell^1$- または $\ell^2$-distance を区別する問題である。我々は、$\ell^1$-および$\ell^2$-closeness testの量子クエリ複雑性が、それぞれ$O\rbra{\sqrt{n}/\varepsilon}$と$O\rbra{1/\varepsilon}$であり、それぞれ$\varepsilon$への最適依存を達成し、 \hyperlink{cite.gilyen2019distriional}{Gily{\'e}nとLi~(2019)}の事前の最適結果を改善することを示す。 \textit{$k$-wise uniformity testing} は、$\cbra{0, 1}^n$ 上の分布が任意の$k$座標に制限された場合、またはそのような分布から$\varepsilon$-far に制限された場合、その分布が一様であるかどうかを区別する問題である。我々は、クエリ複雑性$O\rbra{\sqrt{n^k}/\varepsilon}$で、サンプル複雑性$O\rbra{n^k/\varepsilon^2}$で最先端の古典的アルゴリズムを2次高速化する。さらに、$k = 2$のとき、我々の量子アルゴリズムは古典的下界$\Omega\rbra{n/\varepsilon^2}$のために古典的よりも優れる。我々の量子アルゴリズムは、振幅推定のような基本的な量子サブルーチンのみを用いて、かなり単純で時間効率が高い。 We explore potential quantum speedups for the fundamental problem of testing the properties of closeness and $k$-wise uniformity of probability distributions. \textit{Closeness testing} is the problem of distinguishing whether two $n$-dimensional distributions are identical or at least $\varepsilon$-far in $\ell^1$- or $\ell^2$-distance. We show that the quantum query complexities for $\ell^1$- and $\ell^2$-closeness testing are $O\rbra{\sqrt{n}/\varepsilon}$ and $O\rbra{1/\varepsilon}$, respectively, both of which achieve optimal dependence on $\varepsilon$, improving the prior best results of \hyperlink{cite.gilyen2019distributional}{Gily{\'e}n and Li~(2019)}. \textit{$k$-wise uniformity testing} is the problem of distinguishing whether a distribution over $\cbra{0, 1}^n$ is uniform when restricted to any $k$ coordinates or $\varepsilon$-far from any such distributions. We propose the first quantum algorithm for this problem with query complexity $O\rbra{\sqrt{n^k}/\varepsilon}$, achieving a quadratic speedup over the state-of-the-art classical algorithm with sample complexity $O\rbra{n^k/\varepsilon^2}$ by \hyperlink{cite.o2018closeness}{O'Donnell and Zhao (2018)}. Moreover, when $k = 2$ our quantum algorithm outperforms any classical one because of the classical lower bound $\Omega\rbra{n/\varepsilon^2}$. All our quantum algorithms are fairly simple and time-efficient, using only basic quantum subroutines such as amplitude estimation.	翻訳日:2024-06-26 05:18:24 公開日:2024-06-22
# CLImage: 補完的なラベル学習のための人間アノテーションデータセット CLImage: Human-Annotated Datasets for Complementary-Label Learning ( http://arxiv.org/abs/2305.08295v3 ) ライセンス: Link先を確認	Hsiu-Hsuan Wang, Tan-Ha Mai, Nai-Xuan Ye, Wei-I Lin, Hsuan-Tien Lin,	(参考訳) 補完ラベル学習(英:complementary-label learning, CLL)は、補完ラベルのみを用いて多クラス分類器を訓練することを目的とした、弱い教師付き学習パラダイムである。多くのアルゴリズムによるCLLの提案にもかかわらず、その実用性は2つの理由により検証されていない。第一に、これらのアルゴリズムはしばしば相補的なラベルの生成に関する仮定に依存しており、仮定が現実からどこまで遠いかは定かではない。第二に、それらの評価は合成データセットに限られている。 CLLアルゴリズムの実際の性能に関する知見を得るため,人間のアノテータから補完ラベルを収集するプロトコルを開発した。 CLCIFAR10, CLCIFAR20, CLMicroImageNet10, CLMicroImageNet20の4つのデータセットを作成した。これらのデータセットは、最初の現実世界のCLLデータセットを表している。大規模なベンチマーク実験により、合成データセットから実世界のデータセットに移行する際の顕著な性能低下が判明した。本研究は, データセットレベルのアブレーション研究により, 減少に寄与する重要な要因について検討した。本分析では, 実世界のデータセットにおいて, アノテーションノイズが最も影響のある要因として強調する。さらに,人間の注釈付き補完ラベルの偏りと,補完ラベルのみによる検証の難しさが,実用的CLLの2つの際立った障壁であることが判明した。これらの結果から,CLLアルゴリズムの開発や,雑音に頑健で相補的ラベル分布に偏った検証手法の開発に,コミュニティがより多くの研究を注いでいることが示唆された。 Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetic datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetic datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.	翻訳日:2024-06-26 05:18:24 公開日:2024-06-22
# ナノスケールにおける効率的な量子ワーク貯留層 Efficient Quantum Work Reservoirs at the Nanoscale ( http://arxiv.org/abs/2305.17815v4 ) ライセンス: Link先を確認	Jinghao Lyu, Alexander B. Boyd, James P. Crutchfield,	(参考訳) 資源理論として再編成された場合、熱力学は単発状態における系の挙動を分析することができる。この場合、状態遷移を実装するのに必要な作業は {\alpha}-Renyi の発散によって制限されるため、確率的熱力学と比較して効率的な演算の特定が異なる。したがって、確率と資源理論の熱力学の違いを詳細に理解する必要がある。この目的のために, 単発式作業貯水池の可逆性について検討し, 多段式作業貯水池に使用される2段式作業貯水池を一般化した。これにより、単発体制におけるあらゆる遷移において可逆性が達成される。そこで我々は, 触媒を使わずとも非散逸状態の多層作業貯水池を体系的に開発する。資源理論的な結果から、ランダウアーの制約下にある2段階の作業貯水池は、計算中のエネルギー散逸を誤解を招くことを示している。対照的に、多層作業貯水池はランダウアーの境界を任意に低エントロピーを発生させながら達成できることを実証する。 When reformulated as a resource theory, thermodynamics can analyze system behaviors in the single-shot regime. In this, the work required to implement state transitions is bounded by {\alpha}-Renyi divergences and so differs in identifying efficient operations compared to stochastic thermodynamics. Thus, a detailed understanding of the difference between stochastic and resource-theoretic thermodynamics is needed. To this end, we explore reversibility in the single-shot regime, generalizing the two-level work reservoirs used there to multi-level work reservoirs. This achieves reversibility in any transition in the single-shot regime. Building on this, we systematically develop multi-level work reservoirs in the nondissipation regime with and without catalysts. The resource-theoretic results show that two-level work reservoirs undershoot Landauer's bound, misleadingly implying energy dissipation during computation. In contrast, we demonstrate that multilevel work reservoirs achieve Landauer's bound while producing arbitrarily low entropy.	翻訳日:2024-06-26 05:18:24 公開日:2024-06-22
# 安定拡散噴流のエンベディングの操作 Manipulating Embeddings of Stable Diffusion Prompts ( http://arxiv.org/abs/2308.12059v2 ) ライセンス: Link先を確認	Niklas Deckers, Julia Peters, Martin Potthast,	(参考訳) プロンプトエンジニアリングは、生成した画像をターゲットとして操作する生成テキスト・画像モデルのユーザにとって、依然として主要な方法である。モデルを連続関数として扱い,画像空間と即時埋め込み空間の勾配を渡すことにより,プロンプトテキストの代わりにプロンプトの埋め込みを直接操作する新しい手法を提案し,解析する。次に,画像生成を支援するための3つの実践的インタラクションツールを導出する。(1)画像空間で定義されたメトリックの最適化。 2) ユーザを創造的なタスクで支援し, 画像空間内を「近づいた」プロンプト埋め込みの選択方向に沿って移動させることにより, ユーザを創造的タスクで支援する。 (3) ユーザが特定のシードで見た情報を含むようにプロンプトの埋め込みを変更するが、プロンプトでの記述が困難になる。プロンプトエンジニアリングと比較して、ユーザ主導のプロンプト埋め込み操作は、ユーザの意図を統合する、よりきめ細かなターゲット制御を可能にする。ユーザスタディでは,我々の手法は退屈さを減らし,結果のイメージが好まれることが示されている。 Prompt engineering is still the primary way for users of generative text-to-image models to manipulate generated images in a targeted way. Based on treating the model as a continuous function and by passing gradients between the image space and the prompt embedding space, we propose and analyze a new method to directly manipulate the embedding of a prompt instead of the prompt text. We then derive three practical interaction tools to support users with image generation: (1) Optimization of a metric defined in the image space that measures, for example, the image style. (2) Supporting a user in creative tasks by allowing them to navigate in the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that a user has seen in a particular seed but has difficulty describing in the prompt. Compared to prompt engineering, user-driven prompt embedding manipulation enables a more fine-grained, targeted control that integrates a user's intentions. Our user study shows that our methods are considered less tedious and that the resulting images are often preferred.	翻訳日:2024-06-26 04:58:37 公開日:2024-06-22
# ロングビデオにおける時間的文字グループ化のための統一および動的グラフ Unified and Dynamic Graph for Temporal Character Grouping in Long Videos ( http://arxiv.org/abs/2308.14105v3 ) ライセンス: Link先を確認	Xiujun Shu, Wei Wen, Liangsheng Xu, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun,	(参考訳) ビデオ時間的キャラクタグループ化は、ビデオ内の主要なキャラクタの出現モーメントを、そのアイデンティティに応じて特定する。この目的のために、最近の研究は教師なしクラスタリングからグラフベースのクラスタリングへと進化してきた。しかし、グラフ法は固定親和性グラフの前提の上に構築されており、多くの不正確な接続をもたらす。さらに、デプロイに不都合なモデルでマルチモーダルな特徴を抽出する。本稿では,時間的文字グループ化のための統一動的グラフ(UniDG)フレームワークを提案する。これはまず、同じ空間内で複数のモジュラリティの表現を学習し、同時にモダリティの特異性を保存する統一表現ネットワークによって達成される。第2に,各ノードごとに異なる量の近傍を循環マッチング戦略により動的に構築し,より信頼性の高い親和性グラフを生成する動的グラフクラスタリングを提案する。第3に、異なるモダリティ間の空間的・時間的文脈を活用するためのプログレッシブアソシエーション手法を導入し、マルチモーダルクラスタリング結果をうまく融合させる。現在のデータセットは事前抽出された特徴しか提供しないため、各文字の顔と体と発声音声トラックの出現クリップを含むMTCGと呼ばれる収集データセット上で、UniDG法の評価を行う。また、既存のクラスタリングおよび検索データセットの重要なコンポーネントを評価し、一般化能力を検証する。実験結果から,提案手法は有望な結果を達成し,いくつかの最先端手法より優れることが示された。 Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of models, which are unfriendly to deployment. In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping. This is accomplished firstly by a unified representation network that learns representations of multiple modalities within the same space and still preserves the modality's uniqueness simultaneously. Secondly, we present a dynamic graph clustering where the neighbors of different quantities are dynamically constructed for each node via a cyclic matching strategy, leading to a more reliable affinity graph. Thirdly, a progressive association method is introduced to exploit spatial and temporal contexts among different modalities, allowing multi-modal clustering results to be well fused. As current datasets only provide pre-extracted features, we evaluate our UniDG method on a collected dataset named MTCG, which contains each character's appearing clips of face and body and speaking voice tracks. We also evaluate our key components on existing clustering and retrieval datasets to verify the generalization ability. Experimental results manifest that our method can achieve promising results and outperform several state-of-the-art approaches.	翻訳日:2024-06-26 04:58:37 公開日:2024-06-22
# 認証ロバスト性向上のためのレシピ A Recipe for Improved Certifiable Robustness ( http://arxiv.org/abs/2310.02513v2 ) ライセンス: Link先を確認	Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson,	(参考訳) 近年の研究は、リプシッツをベースとした、敵の攻撃に対して確実に堅牢なニューラルネットワークを訓練する手法の可能性を強調している。理論的にも経験的にも支持される重要な課題は、ロバストネスが通常のトレーニングよりもネットワーク容量とデータ量を必要とすることだ。しかし、厳密なリプシッツ制約の下で効果的にキャパシティを追加することは、おそらくより難しいことが証明されている。さらに,Lipshitzをベースとしたアプローチの設計空間を慎重に探索する能力が欠如していることから,性能向上の可能性が示唆された。本研究では,リプシッツに基づく認証手法の可能性を明らかにするため,より包括的な評価を行う。新たな手法,設計最適化,先行作業の合成を組み合わせることで,さまざまなベンチマークデータセットに対する決定論的証明と,さまざまな摂動サイズに対して,最先端のVRAを大幅に改善することができる。特に,既存技術であるリプシッツ制御ResNetアーキテクチャの終端に 'Cholesky-orthogonalized residual dense'' 層を追加することは,ネットワーク容量と性能の向上に特に有効である。フィルタリング生成データ拡張と組み合わせて、最終結果は、最先端の決定論的VRAを最大8.5ポイント向上させる。 Recent studies have highlighted the potential of Lipschitz-based methods for training certifiably robust neural networks against adversarial attacks. A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training. However, effectively adding capacity under stringent Lipschitz constraints has proven more difficult than it may seem, evident by the fact that state-of-the-art approach tend more towards \emph{underfitting} than overfitting. Moreover, we posit that a lack of careful exploration of the design space for Lipshitz-based approaches has left potential performance gains on the table. In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. Using a combination of novel techniques, design optimizations, and synthesis of prior work, we are able to significantly improve the state-of-the-art VRA for deterministic certification on a variety of benchmark datasets, and over a range of perturbation sizes. Of particular note, we discover that the addition of large ``Cholesky-orthogonalized residual dense'' layers to the end of existing state-of-the-art Lipschitz-controlled ResNet architectures is especially effective for increasing network capacity and performance. Combined with filtered generative data augmentation, our final results further the state of the art deterministic VRA by up to 8.5 percentage points\footnote{Code is available at \url{https://github.com/hukkai/liresnet}}.	翻訳日:2024-06-26 04:48:52 公開日:2024-06-22
# 逆バックプロパゲーションを用いたテキスト・画像拡散モデルの調整 Aligning Text-to-Image Diffusion Models with Reward Backpropagation ( http://arxiv.org/abs/2310.03739v2 ) ライセンス: Link先を確認	Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki,	(参考訳) テキスト・ツー・イメージの拡散モデルは、画像生成の最前線に登場し、非常に大規模な教師なしまたは弱い教師付きテキスト・ツー・イメージのトレーニングデータセットによって実現されている。教師なしの訓練のため、人間の知覚された画像品質、画像テキストアライメント、倫理的画像生成などの下流作業における行動を制御することは困難である。近年のバニラ強化学習による下流の報酬関数への拡散モデルの研究は、勾配推定器の高分散で有名である。本稿では,拡散モデルと下流の報酬関数を協調する手法であるAlignPropを提案する。このようなバックプロパゲーションの実装は、現代のテキスト・ツー・イメージモデルの部分的なデリバティブを格納するために禁止的なメモリリソースを必要とするが、AlignPropは低ランクのアダプタ重みモジュールを微調整し、グラデーション・チェックポインティングを使用してメモリ使用率を高める。画像テキストのセマンティックアライメント,美学,オブジェクト数の圧縮性と制御性,およびそれらの組み合わせなど,さまざまな目的に対する微調整拡散モデルでAlignPropをテストする。また,AlignPropは,学習段階を減らしてより高い報酬を得られるが,概念的にはシンプルであり,興味のある報酬関数に対する拡散モデルを最適化するための簡単な選択であることを示す。コードと視覚化結果はhttps://align-prop.github.io/.com/で公開されている。 Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.	翻訳日:2024-06-26 04:48:52 公開日:2024-06-22
# 常識推論のための忠実な知識グラフ記述法 Faithful Knowledge Graph Explanations for Commonsense Reasoning ( http://arxiv.org/abs/2310.04910v4 ) ライセンス: Link先を確認	Weihe Zhai, Arkaitz Zubiaga,	(参考訳) 言語モデル(LM)と知識グラフ(KG)の融合は、常識的な質問応答において広く用いられているが、忠実な説明を生み出すことは依然として難しい。現在の手法は、しばしば忠実さをデコードするパスを見落とし、グラフエンコーダ出力とモデル予測の間にばらつきをもたらす。本研究は,LM-KGのミスアライメントと相反する効果を,突発的な説明を引き起こす重要な要因として同定する。そこで本研究では,KG表現の信頼性を評価するためのLM-KGフィデリティ尺度を導入し,説明忠実度を改善するためのLM-KG分布認識アライメント(\textit{LKDA})アルゴリズムを提案する。基礎的な事実を欠くことなく,提案したF-Sparsity Trade-off Curveを用いてKGの説明を評価する。 CommonsenseQAとOpenBookQAの実験では、LKDAは説明の忠実度とモデル性能を著しく向上させ、信頼性の高いCommonsense推論のための分散的不整合に対処する必要性を強調している。 The fusion of language models (LMs) and knowledge graphs (KGs) is widely used in commonsense question answering, but generating faithful explanations remains challenging. Current methods often overlook path decoding faithfulness, leading to divergence between graph encoder outputs and model predictions. We identify confounding effects and LM-KG misalignment as key factors causing spurious explanations. To address this, we introduce the LM-KG Fidelity metric to assess KG representation reliability and propose the LM-KG Distribution-aware Alignment (\textit{LKDA}) algorithm to improve explanation faithfulness. Without ground truth, we evaluate KG explanations using the proposed Fidelity-Sparsity Trade-off Curve. Experiments on CommonsenseQA and OpenBookQA show that LKDA significantly enhances explanation fidelity and model performance, highlighting the need to address distributional misalignment for reliable commonsense reasoning.	翻訳日:2024-06-26 04:48:52 公開日:2024-06-22
# ニューラルコード生成のための関数オーバーラップリグレード Functional Overlap Reranking for Neural Code Generation ( http://arxiv.org/abs/2311.03366v3 ) ライセンス: Link先を確認	Hung Quoc To, Minh Huynh Nguyen, Nghi D. Q. Bui,	(参考訳) Code Large Language Models (CodeLLMs) は、コード生成の進歩の新たな時代を支えている。しかし、可能なすべてのCodeLLM出力から最高のコードソリューションを選択することは、依然として困難である。それまでの手法では、複雑な機能的類似性やソリューションクラスタ間の相互作用を見落としていた。 SRankは、ソリューションのクラスタ間の関係をモデル化することに焦点を当てた、コード生成から最良のソリューションを選択するための、新しい優先順位付け戦略である。ソリューションクラスタ間の機能の重複を定量化することにより、私たちのアプローチは、コードソリューションのより良いランキング戦略を提供します。実験結果から,pass@1のスコアで顕著な結果が得られることがわかった。例えば、Human-Evalベンチマークでは、Codex002で69.66%、WizardCoderで75.31%、StarCoderで53.99%、CodeGenで60.55%、同じCodeLLMでCodeTやCoder-Reviewerのような最先端のコード生成メソッドをかなり上回っている(平均で約6.1%改善)。サンプル化されたソリューションやテストケースが限られているシナリオであっても、私たちのアプローチは堅牢性と優位性を示し、コード生成の新たなベンチマークを再評価します。私たちの実装はhttps://github.com/FSoft-AI4Code/SRank-CodeRankerで確認できます。 Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation, focusing on modeling the relationships between clusters of solutions. By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions. Empirical results show that our method achieves remarkable results on the pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66% in pass@1 with Codex002, 75.31% with WizardCoder, 53.99% with StarCoder, and 60.55% with CodeGen, surpassing state-of-the-art code generation reranking methods such as CodeT and Coder-Reviewer on the same CodeLLM by a significant margin (approximately 6.1% improvement on average). Even in scenarios with a limited number of sampled solutions and test cases, our approach demonstrates robustness and superiority, marking a new benchmark in code generation reranking. Our implementation can be found at https://github.com/FSoft-AI4Code/SRank-CodeRanker.	翻訳日:2024-06-26 04:39:08 公開日:2024-06-22
# iACOS:InformativeおよびAdaptive Negative例を用いたインプシットセンシティメント抽出の改善 iACOS: Advancing Implicit Sentiment Extraction with Informative and Adaptive Negative Examples ( http://arxiv.org/abs/2311.03896v2 ) ライセンス: Link先を確認	Xiancai Xu, Jia-Dong Zhang, Lei Xiong, Zhishang Liu,	(参考訳) アスペクトベース感情分析(ABSA)は広く研究されているが,4つの基本要素(アスペクト,カテゴリ,意見,感情,特に暗黙的な側面と意見)から構成される4倍の抽出にはほとんど光が当たっていない。本稿では,カテゴリとオピニオンを感性で抽出する新しい手法iACOSを提案する。まず、iACOSはテキストの最後に2つの暗黙のトークンを付加し、暗黙のアスペクトや意見を含むすべてのトークンのコンテキスト認識表現をキャプチャする。第2に、iACOSは、明示的で暗黙的な側面と意見の共抽出のために、コンテキスト認識トークン表現上のシーケンスラベリングモデルを開発する。第3に、iACOSはアスペクトオピニオン対を発見し、カテゴリと感情を同時に予測する、特別なマルチヘッドアテンションを持つマルチラベル分類器を考案した。第4に、iACOSは情報的かつ適応的な負の例を利用して、マルチタスク学習によって、カテゴリと感情に関するマルチラベル分類器と他の2つの分類器を共同で訓練する。最後に、実験結果から、iACOSは2つの公開ベンチマークデータセットのF1スコアに従って、他の4倍の抽出ベースラインを著しく上回っていることが示された。 Aspect-based sentiment analysis (ABSA) have been extensively studied, but little light has been shed on the quadruple extraction consisting of four fundamental elements: aspects, categories, opinions and sentiments, especially with implicit aspects and opinions. In this paper, we propose a new method iACOS for extracting Implicit Aspects with Categories and Opinions with Sentiments. First, iACOS appends two implicit tokens at the end of a text to capture the context-aware representation of all tokens including implicit aspects and opinions. Second, iACOS develops a sequence labeling model over the context-aware token representation to co-extract explicit and implicit aspects and opinions. Third, iACOS devises a multi-label classifier with a specialized multi-head attention for discovering aspect-opinion pairs and predicting their categories and sentiments simultaneously. Fourth, iACOS leverages informative and adaptive negative examples to jointly train the multi-label classifier and the other two classifiers on categories and sentiments by multi-task learning. Finally, the experimental results show that iACOS significantly outperforms other quadruple extraction baselines according to the F1 score on two public benchmark datasets.	翻訳日:2024-06-26 04:39:08 公開日:2024-06-22
# コンセンサスによる高次元自由エネルギー表面の構築 Consensus-based construction of high-dimensional free energy surface ( http://arxiv.org/abs/2311.05009v3 ) ライセンス: Link先を確認	Liyao Lyu, Huan Lei,	(参考訳) 分子系の集合的挙動の定量化における重要な問題は、自由エネルギー表面(FES)の正確な構築にある。主な課題は、エネルギー障壁の出現と高次元性から生じる。既存のアプローチは、フルフェーズ空間の効率的な探索を確立するための洗練されたサンプリング手法に基づいていることが多い。一方、FESの数値近似のための最適なサンプル点の収集は、多くの集合変数 (CV) を持つシステムでは、離散化誤差が支配的になりうるため、ほとんど未探索のままである。本稿では,関数表現とトレーニングセットを同時に最適化するミニマックス問題として構成を再構成し,コンセンサスサンプリングに基づくアプローチを提案する。特に、最大化ステップは、現在損失関数のラプラス近似の活用と未チャート位相空間の探索を調節し、最大残留状態の適応サンプリングを達成する確率的相互作用粒子系を確立し、最小化ステップは新しいトレーニングセットでFES近似を更新する。本手法は,ミニマックス問題を反復的に解くことにより,位相空間探索と後部誤差強調サンプリングの両面において,FESの対角学習を実現する。分子系のFESを最大30個までのCVで構築し,本手法を実証した。 One essential problem in quantifying the collective behaviors of molecular systems lies in the accurate construction of free energy surfaces (FESs). The main challenges arise from the prevalence of energy barriers and the high dimensionality. Existing approaches are often based on sophisticated enhanced sampling methods to establish efficient exploration of the full-phase space. On the other hand, the collection of optimal sample points for the numerical approximation of FESs remains largely under-explored, where the discretization error could become dominant for systems with a large number of collective variables (CVs). We propose a consensus sampling-based approach by reformulating the construction as a minimax problem which simultaneously optimizes the function representation and the training set. In particular, the maximization step establishes a stochastic interacting particle system to achieve the adaptive sampling of the max-residue regime by modulating the exploitation of the Laplace approximation of the current loss function and the exploration of the uncharted phase space; the minimization step updates the FES approximation with the new training set. By iteratively solving the minimax problem, the present method essentially achieves an adversarial learning of the FESs with unified tasks for both phase space exploration and posterior error-enhanced sampling. We demonstrate the method by constructing the FESs of molecular systems with a number of CVs up to 30.	翻訳日:2024-06-26 04:39:08 公開日:2024-06-22
# グラフの弱さと強烈な専門家の混在 Mixture of Weak & Strong Experts on Graphs ( http://arxiv.org/abs/2311.05185v2 ) ライセンス: Link先を確認	Hanqing Zeng, Hanjia Lyu, Diyi Hu, Yinglong Xia, Jiebo Luo,	(参考訳) 現実的なグラフは(1) ノードの豊富な自己特徴と(2) 近隣の情報構造の両方を含み、典型的な設定ではグラフニューラルネットワーク(GNN)が共同で処理する。本稿では,弱い多層パーセプトロン (MLP) と弱い多層パーセプトロン (MLP) の混合により2つのモードを分離することを提案する。専門家の協力関係を異なる目標ノードに適応させるために,弱い専門家の予測ロジットの分散に基づく「自信」機構を提案する。強い専門家は、ノードの分類が近隣情報に依存するか、弱い専門家がモデル品質の低い場合、低信頼領域で条件的に活性化される。我々は,信頼度関数が損失に与える影響を分析することによって,興味深いトレーニングダイナミクスを明らかにする。さらに、我々の"自信"設計は、GNNのより良い一般化能力の恩恵を受けるために、強力な専門家に対して望ましいバイアスを課します。 Mowstは最適化が容易で、単一のGNNに匹敵する計算コストで強力な表現力を実現する。 4つのバックボーンGNNアーキテクチャ上のMowstは、ホモフィルグラフとヘテロフィルグラフ(https://github.com/facebookresearch/mowst-gnn)を含む6つの標準ノード分類ベンチマークにおいて、大幅な精度向上を示している。 Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).	翻訳日:2024-06-26 04:39:08 公開日:2024-06-22
# Social Bias Probing: 言語モデルのフェアネスベンチマーク Social Bias Probing: Fairness Benchmarking for Language Models ( http://arxiv.org/abs/2311.09090v3 ) ライセンス: Link先を確認	Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein,	(参考訳) 言語モデルにおける社会的バイアスの影響は認識されているが、偏見評価の先行手法は、小さなデータセット上でのバイナリアソシエーションテストに限られており、偏見の複雑さに対する理解が制限されている。本稿では,社会的偏見を考慮した言語モデル構築のための新しい枠組みを提案する。既存のフェアネスコレクションの制限に対処するために設計された大規模なベンチマークであるSOFAをキュレートする。 SOFAは、ステレオタイプとアンチステレオタイプIDのバイナリ比較を超えて分析を拡張し、多様なアイデンティティとステレオタイプを含む。提案手法を既存のベンチマークと比較したところ,言語モデル内のバイアスは認識されるよりもニュアンスが高いことが判明した。 SOFA上でのLMのベンチマークにより、異なる宗教を表現しているアイデンティティが、すべてのモデルにおいて最も顕著な異なる治療につながることを明らかにした。最後に,女性や障害者などの多様な集団が直面する現実の逆境が,これらのモデルの行動に反映されていることを示す。 While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SOFA, a large-scale benchmark designed to address the limitations of existing fairness collections. SOFA expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SOFA, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.	翻訳日:2024-06-26 04:39:08 公開日:2024-06-22
# 大規模言語モデルにおけるマルチステップ推論のためのグラフ励振 Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models ( http://arxiv.org/abs/2311.09762v2 ) ライセンス: Link先を確認	Jinyoung Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim,	(参考訳) CoT(Chain-of-Thought)はサブクエスト生成と応答を促進させ、LLM(Large Language Models)の多段階推論機能を強化した。しかし、LCMが直接サブクエストを生成するように促すことは、しばしば冗長または無関係な質問を生成するため、最適ではない。そこで我々はGE-Reasoning法を提案する。GE-Reasoning法はLLMに対して適切なサブクエストとそれに対応する回答を生成するよう指示する。具体的には,まず LLM に知識三重項を生成するように促し,質問のグラフ表現を形成する。従来の知識三重項とは異なり、本手法は変数を頭や尾の実体として許容し、質問を知識三重項として効果的に表現する。第2に、各三重項に対して、LLMは対応するサブクエストを生成し、知識検索とともに回答する。予測信頼度がしきい値を超えると、サブクエストと予測がその後の処理のプロンプトに組み込まれる。このアプローチは、サブクエストが抽出された知識三重項に根ざされていることを奨励し、冗長性と無関係性を減少させる。実験により,提案手法は,マルチホップ質問応答ベンチマークデータセットにおいて,従来のCoTプロンプト手法とその変種よりも優れていることが示された。 Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# RecExplainer: 推奨モデルを説明するための大規模言語モデルの調整 RecExplainer: Aligning Large Language Models for Explaining Recommendation Models ( http://arxiv.org/abs/2311.10947v2 ) ライセンス: Link先を確認	Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, Xing Xie,	(参考訳) リコメンダシステムはオンラインサービスで広く使われており、埋め込みベースのモデルは複雑な信号を表現することの表現力から特に人気がある。しかしながら、これらのモデルはブラックボックスとして機能するため、ユーザと開発者にとって透明性が低く、信頼性も低い。近年,大規模言語モデル (LLM) は理解,推論,指導において顕著な知性を示している。本稿では, ブラックボックスレコメンデータモデルを説明するために, LLM を代理モデルとして利用することについて検討する。第一の概念は、ターゲットレコメンデータモデルの振る舞いを理解し、エミュレートするためにLLMを訓練することである。 LLMの広い世界知識と多段階の推論能力を活用することで、これらのLCMは高度なサロゲートとして機能し、観測について推論することができる。さらに、自然言語をインターフェースとして使用することで、個々のユーザの好みに合わせてカスタマイズ可能な説明を作成することができる。効果的なアライメントを容易にするために,行動アライメント,意図アライメント,ハイブリッドアライメントという3つの手法を導入する。ビヘイビアアライメントは、ユーザ好みとアイテム情報をテキストとしてテキストとして表現し、リコメンデーションモデルの潜在空間で意図アライメントを行い、ユーザとアイテムの表現を使ってモデルの振る舞いを理解する。 3つの公開データセットで実施された総合的な実験により、我々のアプローチは、ターゲットモデルを理解し、模倣し、高品質で、高忠実で、独特な説明をもたらす有望な結果をもたらすことが示された。私たちのコードはhttps://github.com/microsoft/RecAI.comで公開されています。 Recommender systems are widely used in online services, with embedding-based models being particularly popular due to their expressiveness in representing complex signals. However, these models often function as a black box, making them less transparent and reliable for both users and developers. Recently, large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following. This paper presents the initial exploration of using LLMs as surrogate models to explaining black-box recommender models. The primary concept involves training LLMs to comprehend and emulate the behavior of target recommender models. By leveraging LLMs' own extensive world knowledge and multi-step reasoning abilities, these aligned LLMs can serve as advanced surrogates, capable of reasoning about observations. Moreover, employing natural language as an interface allows for the creation of customizable explanations that can be adapted to individual user preferences. To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment. Behavior alignment operates in the language space, representing user preferences and item information as text to mimic the target model's behavior; intention alignment works in the latent space of the recommendation model, using user and item representations to understand the model's behavior; hybrid alignment combines both language and latent spaces. Comprehensive experiments conducted on three public datasets show that our approach yields promising results in understanding and mimicking target models, producing high-quality, high-fidelity, and distinct explanations. Our code is available at https://github.com/microsoft/RecAI.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# 身体運動のアーカイブ:中国書道の集合的生成 Archiving Body Movements: Collective Generation of Chinese Calligraphy ( http://arxiv.org/abs/2311.13770v3 ) ライセンス: Link先を確認	Aven Le Zhou, Jiayi Ye, Tianchen Liu, Kang Zhang,	(参考訳) コミュニケーションチャネルとして、身体運動は行動研究やキネシクスで広く研究されている。演技と視覚芸術は同じ関心を共有しているが、ダンス表記や視覚作品の作成など、人間の身体運動の文書化と表現に重点を置いている。本稿では, 東洋書道における身体運動と書道の原理を適用し, 身体運動を刺激し, アーカイブする方法について検討する。著者らは、アートワーク(武州)を通じて、読者の身体参加と身体運動のアーカイブ化を対話的で創造的なアプローチで試みた。読者は書き手と読み手の両方の役割を前提としており、生成した書道(読み手)は、この無限の「書」の中で循環的な過程となり、漢字や書道に関するさらなる注意と議論の動機となる。 As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# 高等教育におけるChatGPTの社会的バイアスの可能性:スコーピング・レビュー Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review ( http://arxiv.org/abs/2311.14381v2 ) ライセンス: Link先を確認	Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng,	(参考訳) 目的:ChatGPTのような生成人工知能(Generative Artificial Intelligence, GAI)モデルは、広範囲なデータセットのトレーニングによって社会的バイアスを継承または増幅することができる。高等教育機関(HEIs)における学生、教員、職員のGAI利用の増加に伴い、これらの技術に関連する倫理的問題や潜在的なバイアスについて検討することが急務である。デザイン/アプリケーション/メソッド:このスコーピングレビューは、近年の学術論文で、AIに関するバイアスがどのように研究され議論されているかを明らかにすることを目的としている。我々は、高等教育分野において、GAIが引き起こす可能性のある社会的バイアスを分類した。本レビューでは,4つの主要データベースにまたがる英語,中国語,日本語の記事を取り上げ,高等教育におけるGAI活用と偏見に着目した。我々の発見は、AI分野におけるLSMに関するバイアスと差別に関する有意義な学術的な議論がある一方で、高等教育のアプローチに関するほとんどの記事が表面上問題にアプローチしていることを示している。異なる状況下で特定の種類の偏見を識別する記事はほとんどなく、実証研究の欠如が顕著である。概説では、主に医学・工学に関する教育・研究分野に焦点をあてており、一部は英語教育について論じている。しかし、人文科学や社会科学に関する議論はほとんどない。さらに、現在の談話の大部分は英語で書かれており、主に英語の文脈を扱う。原性/価値:私たちの知識を最大限に活用するために、私たちの研究は、高等教育における潜在的な社会的バイアスを初めて要約したものです。このレビューは、GAIが教育環境で導入または増幅する可能性のある特定のバイアスを理解するために、より深い研究と実証的な研究の必要性を強調し、高等教育におけるより倫理的なAIアプリケーションの開発を導く。 Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings:Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value:To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# 多体散逸量子物質における創発的トポロジー Emergent Topology in Many-Body Dissipative Quantum Matter ( http://arxiv.org/abs/2311.14640v3 ) ライセンス: Link先を確認	Antonio M. García-García, Lucas Sá, Jacobus J. M. Verbaarschot, Can Yin,	(参考訳) トポロジカル特徴の同定、記述、分類は、物理学のいくつかの分野における発見と革新のエンジンである。この研究は、凝縮物質中の整数および分数チャーン絶縁体から光学における複雑なフォトニック格子における保護状態、QCD真空の構造まで幅広いシステムを含む。ここでは、擬エルミート多体量子系の散逸ダイナミクスという、トポロジーの別の遊び場を紹介する。そこで我々は,散逸型Sachdev-Ye-Kitaev(SYK)モデルと量子カオスデファスリングスピン鎖の2つの異なるシステムについて検討した。 2つの異なる多体モデルに対して、それらが普遍的であることを示す幅広いパラメータの位相的特徴を求める。 SYKモデルでは、フェルミオン交換を行うユニタリ作用素の異常なトレースの存在に直接関係するベクトル化されたリウビリアンの長方形ブロック表現によって特徴づけられる擬ハーミティティーに関連する4つの普遍性クラスを同定する。この長方形化の結果、対称性にのみ依存する位相指標 $\nu$ が特定される。矩形化のもう1つの顕著な結果として、入浴へのカップリングについて、リウヴィリアの純粋に実位相モードの観測がある。これらの実モードのレベル統計は対応するランダム行列のアンサンブルと一致し、4つの位相対称性クラスを特徴づけるために用いられる。浴への弱いカップリングの限界において、トポロジカルモードは平衡へのアプローチを制御し、散逸性多体量子カオス系におけるトポロジの実験的な確認を可能にする。 The identification, description, and classification of topological features is an engine of discovery and innovation in several fields of physics. This research encompasses a broad variety of systems, from the integer and fractional Chern insulators in condensed matter, to protected states in complex photonic lattices in optics, and the structure of the QCD vacuum. Here, we introduce another playground for topology: the dissipative dynamics of pseudo-Hermitian many-body quantum systems. For that purpose, we study two different systems, the dissipative Sachdev-Ye-Kitaev (SYK) model, and a quantum chaotic dephasing spin chain. For the two different many-body models, we find the same topological features for a wide range of parameters suggesting that they are universal. In the SYK model, we identify four universality classes, related to pseudo-Hermiticity, characterized by a rectangular block representation of the vectorized Liouvillian that is directly related to the existence of an anomalous trace of the unitary operator implementing fermionic exchange. As a consequence of this rectangularization, we identify a topological index $\nu$ that only depends on symmetry. Another distinct consequence of the rectangularization is the observation, for any coupling to the bath, of purely real topological modes in the Liouvillian. The level statistics of these real modes agree with that of the corresponding random matrix ensemble and therefore can be employed to characterize the four topological symmetry classes. In the limit of weak coupling to the bath, topological modes govern the approach to equilibrium, which may enable a direct path for experimental confirmation of topology in dissipative many-body quantum chaotic systems.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# Feature Speed Formula: ディープニューラルネットワークのハイパーパラメータ拡張のためのフレキシブルアプローチ The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks ( http://arxiv.org/abs/2311.18718v3 ) ライセンス: Link先を確認	Lénaïc Chizat, Praneeth Netrapalli,	(参考訳) ディープラーニングは階層的な特徴学習によって成功するが、初期化スケールや学習率などのハイパーパラメータ(HP)のチューニングは、この振る舞いを間接的に制御するだけである。本稿では,機能更新と後方パスの間の角度$\theta_\ell$(層インデックス$\ell$)を予測し,制御するための重要な概念を紹介する。この角度$\theta_\ell$, 損失減衰, 後方通過の大きさから, 任意のトレーニング時間において, 任意のGDステップ後の特徴更新の程度を, 単純かつ一般の \emph{feature speed formula} で表すことができることを示す。この角 $\theta_\ell$ は層から層へのヤコビアンの条件付けとランダム初期化によって制御され、あるカーネルのスペクトルによって決定される。 Theta_\ell$が与えられたとき、特徴速度公式はHP(スケールと学習率)を調整し、特徴学習や損失減衰といった特定の力学特性を満たすためのルールを提供する。本研究では,ReLU MLPとResNetの広帯域幅幅幅制限に対するアプローチの有効性について検討する。先行研究に基づき、 iid 初期化を伴う ReLU MLP において、角度は $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$ で縮退することを示す。対照的に、ブランチスケール $O(1/\sqrt{\text{depth}})$ の ResNets は非退化角 $\cos(\theta_\ell)=\Theta(1)$ を維持する。我々はこれらの知見を用いて、既知のHPスケーリングの重要な特性を復元し、また、理論的性質が好ましい大深度ReLU MLPのための新しいHPスケーリングを導入する。 Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of feature updates after one GD step, at any training time, can be expressed via a simple and general \emph{feature speed formula} in terms of this angle $\theta_\ell$, the loss decay, and the magnitude of the backward pass. This angle $\theta_\ell$ is controlled by the conditioning of the layer-to-layer Jacobians and at random initialization, it is determined by the spectrum of a certain kernel, which coincides with the Neural Tangent Kernel when $\ell=\text{depth}$. Given $\theta_\ell$, the feature speed formula provides us with rules to adjust HPs (scales and learning rates) so as to satisfy certain dynamical properties, such as feature learning and loss decay. We investigate the implications of our approach for ReLU MLPs and ResNets in the large width-then-depth limit. Relying on prior work, we show that in ReLU MLPs with iid initialization, the angle degenerates with depth as $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$. In contrast, ResNets with branch scale $O(1/\sqrt{\text{depth}})$ maintain a non-degenerate angle $\cos(\theta_\ell)=\Theta(1)$. We use these insights to recover key properties of known HP scalings and also to introduce a new HP scaling for large depth ReLU MLPs with favorable theoretical properties.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# 対話型インテクスト学習によるプロンプト最適化 Prompt Optimization via Adversarial In-Context Learning ( http://arxiv.org/abs/2312.02614v3 ) ライセンス: Link先を確認	Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He,	(参考訳) 本稿では,1つの LLM をジェネレータとして,もう1つは識別器として,もう1つはプロンプト修飾器として,さらに1つはプロンプト修飾器として用いることで,文脈内学習(ICL)のプロンプトを最適化する新しい手法であるadv-ICLを提案する。従来の逆数学習と同様に、adv-ICLはジェネレータと判別器の間で2人プレイヤゲームとして実装され、ジェネレータは判別器を騙すのに十分な出力を生成しようとする。各ラウンドにおいて、タスク命令といくつかの例によってプレフィックスされた入力が与えられたとき、ジェネレータは出力を生成する。次に、判別器は、ジェネレータの入出力ペアをモデル生成または実データとして分類する。判別器損失に基づいて、プロンプト修飾器は生成器への編集が可能であり、識別器のプロンプトが提案され、最も良くなる編集が選択される。本稿では,Adv-ICLにより,11 世代におけるオープンソースモデルとクローズドソースモデルの最適化手法と,要約,算術的推論,機械翻訳,データ-テキスト生成,MMLU およびBig-bench ハードベンチマークなどの分類タスクが大幅に改善されることを示す。さらに,本手法では事前学習モデルを用いて,モデルパラメータではなくプロンプトのみを更新するので,計算効率が良く,どのLLMやタスクにも容易に拡張でき,低リソース設定でも有効である。 We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# コラボレーション・コーポレート・キャプチャー : NLPの産業アーチファクト・コントリビューションへの信頼の定量化 Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions ( http://arxiv.org/abs/2312.03912v2 ) ライセンス: Link先を確認	Will Aitken, Mohamed Abdalla, Karen Rudie, Catherine Stinson,	(参考訳) 事前訓練されたモデルの印象的なパフォーマンスは大衆の注目を集め、近年ニュースの見出しを飾っている。ほぼ常に、これらのモデルは産業と共同で生産される。それらの使用は自然言語処理(NLP)ベンチマークと競合し、NLP研究に関連性を維持するために重要である。我々は,EMNLP 2022で公表された100の論文を調査し,研究者が産業モデルや他のアーティファクトにどの程度依存しているか,そしてNLPの権威ある会場で出版する貢献が期待するよりも少なくとも3倍大きいことを確かめた。私たちの研究は、将来の研究者がより正確に解決できる足場として役立ちます。 1 産業との連携は、選択肢がない状態での連携である。 2 民間企業のモチベーション及び研究の方向性により、NLP調査が達成された場合。 Impressive performance of pre-trained models has garnered public attention and made news headlines in recent years. Almost always, these models are produced by or in collaboration with industry. Using them is critical for competing on natural language processing (NLP) benchmarks and correspondingly to stay relevant in NLP research. We surveyed 100 papers published at EMNLP 2022 to determine the degree to which researchers rely on industry models, other artifacts, and contributions to publish in prestigious NLP venues and found that the ratio of their citation is at least three times greater than what would be expected. Our work serves as a scaffold to enable future researchers to more accurately address whether: 1) Collaboration with industry is still collaboration in the absence of an alternative or 2) if NLP inquiry has been captured by the motivations and research direction of private corporations.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# 地球観測のためのデータ中心機械学習 Better, Not Just More: Data-Centric Machine Learning for Earth Observation ( http://arxiv.org/abs/2312.05327v2 ) ライセンス: Link先を確認	Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl, Devis Tuia,	(参考訳) 現代の機械学習における最近の発展と研究は、地理空間分野の大幅な改善につながっている。多くのディープラーニングアーキテクチャとモデルが提案されているが、その大半は、強力な現実世界の関連性を欠いたベンチマークデータセット上でのみ開発されている。さらに、これらのデータセットには、すでに多くのメソッドのパフォーマンスが飽和している。モデル中心の視点から補完的なデータ中心の視点へのシフトは、より正確性、一般化能力、そしてエンドユーザーアプリケーションへの影響を高めるために必要である。さらに、問題定義からモデルデプロイメントまでのマシンラーニングサイクル全体を考慮すれば、予期せぬ状況で信頼性のあるマシンラーニングモデルを強化する上で、極めて重要です。本研究は、地理空間データに対する自動データ中心学習手法の正確な分類と概要と、その定義を提示する。これは、より大きな機械学習デプロイメントサイクルにおけるモデル中心の学習に対するデータ中心学習の補完的な役割を強調している。地理空間領域全体にわたる論文をレビューし、それらを異なるグループに分類する。代表的な実験のセットは具体的な実装例を示している。これらの例は、データ中心の機械学習アプローチで地理空間データに作用する具体的なステップを提供する。 Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle - from problem definition to model deployment with feedback - is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.	翻訳日:2024-06-26 02:42:42 公開日:2024-06-22
# マルチモーダル・インフォメーション・ボトルネック属性による画像テキスト表現の視覚的説明 Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution ( http://arxiv.org/abs/2312.17174v2 ) ライセンス: Link先を確認	Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson,	(参考訳) 視覚言語で事前訓練されたモデルは目覚ましい成功を収めてきたが、安全クリティカルな設定への適用は、解釈可能性の欠如によって制限されている。 CLIPのような視覚言語モデルの解釈性を改善するために,関連性のある視覚的特徴とテキスト的特徴を保存しつつ,無関係な情報を圧縮する潜時表現を学習するマルチモーダル情報ボトルネック(M2IB)アプローチを提案する。本稿では,M2IBを視覚言語事前学習モデルの帰属分析に適用し,帰属精度を高め,医療などの安全クリティカル領域に適用した場合の解釈可能性を向上させる方法を示す。重要な点として、一般的に使われるユニモーダル属性法とは異なり、M2IBは基礎的な真理ラベルを必要としないため、複数のモダリティがあるが、基礎的真実データがない場合に、視覚言語事前訓練されたモデルの表現を監査することができる。 CLIPを例として、M2IB属性の有効性を示し、勾配に基づく、摂動に基づく、注意に基づく属性法を質的かつ定量的に上回ることを示す。 Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively.	翻訳日:2024-06-26 02:32:50 公開日:2024-06-22
# 異常検出のための拡散モデルにおける雑音の動的付加 Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection ( http://arxiv.org/abs/2401.04463v2 ) ライセンス: Link先を確認	Justin Tebbe, Jawad Tayyub,	(参考訳) 拡散モデルは、名目データ分布を捕捉し、再構成を通して異常を識別することで、異常検出に有用な応用を見出した。それらの利点にもかかわらず、彼らは様々なスケールの異常、特に欠落した成分全体のような大きな異常をローカライズするのに苦労している。そこで我々は,従来の暗黙的条件付け手法であるMeng et al(2022)を3つの重要な方法で拡張することにより,拡散モデルの能力を高める新しい枠組みを提案する。まず,初期異常予測によって導かれるフォワードプロセスにおける可変ノイズ発生ステップを動的ステップサイズ計算に組み込む。第二に、ノイズが加わらずにのみスケールした入力をデノナイズすることが従来のデノナイズ処理より優れていることを示す。第三に、我々は、大きな欠落したコンポーネントの再構築を妨害する細部を抽象化するために、潜伏した空間に画像を投影する。さらに,対象領域のニュアンスを効果的に把握するための微調整機構を提案する。本手法は,VisA,BTAD,MVTecなどの異常検出データセットの厳密な評価を行い,高い性能を示した。重要な点として,本フレームワークは,拡散に基づく異常検出における重要な進歩を示すため,スケールに関わらず,効果的に異常の局所化を行う。 Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components. Addressing this, we present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. (2022) in three significant ways. First, we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second, we demonstrate that denoising an only scaled input, without any added noise, outperforms conventional denoising process. Third, we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally, we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on prominent anomaly detection datasets VisA, BTAD and MVTec yielding strong performance. Importantly, our framework effectively localizes anomalies regardless of their scale, marking a pivotal advancement in diffusion-based anomaly detection.	翻訳日:2024-06-26 02:32:50 公開日:2024-06-22
# 推論ステップ長が大規模言語モデルに及ぼす影響 The Impact of Reasoning Step Length on Large Language Models ( http://arxiv.org/abs/2401.04925v4 ) ライセンス: Link先を確認	Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du,	(参考訳) 思考の連鎖(CoT)は、大きな言語モデル(LLM)の推論能力を改善する上で重要である。しかし, プロンプトにおけるCoTの有効性と推論ステップの長さの相関はよく分かっていない。これを明らかにするために,我々はいくつかの実験を行い,その関係について検討した。具体的には、CoTの実証実験において、他のすべての要因を一定に保ちながら、合理的推論ステップを拡張して圧縮する実験を設計する。主な発見は以下の通りである。まず、プロンプトに新たな情報を加えることなく、プロンプトにおける推論ステップを延長することで、複数のデータセットにまたがるLLMの推論能力が大幅に向上することを示す。あるいは、キー情報を保存しながらも推論ステップを短縮することは、モデルの推論能力を著しく低下させる。この発見は、CoTプロンプトにおけるステップ数の重要性を強調し、複雑な問題解決シナリオにおけるLLMのポテンシャルをよりよく活用するための実践的なガイダンスを提供する。次に,CoTの性能と実演における有理性との関係について検討した。驚くべきことに、たとえ誤った有理数であっても、推論の必要な長さを維持すれば、有利な結果が得られることが示される。第三に、より単純なタスクはより少ないステップを必要とするのに対して、複雑なタスクはより長い推論シーケンスから著しく向上する。コードはhttps://github.com/MingyuJ666/The-Impact-of-Reasoning-Step-Length-on-Language-Modelsで公開されている。 Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences. The code is available at https://github.com/MingyuJ666/The-Impact-of-Reasoning-Step-Length-on-Large-Language-Models	翻訳日:2024-06-26 02:32:50 公開日:2024-06-22
# シルエット・アグリゲーションの再検討 Revisiting Silhouette Aggregation ( http://arxiv.org/abs/2401.05831v3 ) ライセンス: Link先を確認	John Pavlopoulos, Georgios Vardakas, Aristidis Likas,	(参考訳) シルエット係数(Silhouette coefficient)は、クラスタリングの割り当ての品質を評価し、データポイント当たりのスコアを生成する確立された内部クラスタリング評価尺度である。データセット全体のクラスタリングの品質を評価するために、データセットのすべてのポイントのスコアは通常、(マイクロ)1つの値に平均化されます。しかし、滅多に採用されない代替のパスは、まずクラスタレベルで平均化し、次に(マクロ)クラスタ全体で平均となることである。この研究を合成例で示すように、典型的なマイクロデバッグ戦略はクラスタ不均衡に敏感であり、見過ごされたマクロデバッグ戦略ははるかに堅牢である。マクロシルエットをさらに調査することで、既存の図書館で唯一利用可能な戦略である統一サブサンプリングが、不均衡に対する尺度の頑健さを損なうことが判明した。クラスタごとのサンプリング手法を提案することでこの問題に対処する。 8つの実世界のデータセットに関する実験的研究は、2つのクラスタリングタスクにおいて両方の係数を分析するために使用される。 Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are typically (micro) averaged into a single value. An alternative path, however, that is rarely employed, is to average first at the cluster level and then (macro) average across clusters. As we illustrate in this work with a synthetic example, the typical micro-averaging strategy is sensitive to cluster imbalance while the overlooked macro-averaging strategy is far more robust. By investigating macro-Silhouette further, we find that uniform sub-sampling, the only available strategy in existing libraries, harms the measure's robustness against imbalance. We address this issue by proposing a per-cluster sampling method. An experimental study on eight real-world datasets is then used to analyse both coefficients in two clustering tasks.	翻訳日:2024-06-26 02:22:43 公開日:2024-06-22
# 適応ファイナフィルタによる基底状態生成のための一元行列の量子固有値変換のスケーラビリティ向上 Enhancing Scalability of Quantum Eigenvalue Transformation of Unitary Matrices for Ground State Preparation through Adaptive Finer Filtering ( http://arxiv.org/abs/2401.09091v3 ) ライセンス: Link先を確認	Erenay Karacan, Yanbin Chen, Christian B. Mendl,	(参考訳) ハミルトニアンシミュレーション(英: Hamiltonian Simulation)は、量子コンピュータが、その固有の量子的振る舞いのために古典的な計算能力を上回る能力を持つ領域である。このような量子アルゴリズムの主な課題の1つは、意味のある量子優位性を達成するために必要とされるシステムサイズをアップスケーリングすることである。本研究では,与えられたハミルトニアンの基底状態作成のための固有空間フィルタリングのスケーラビリティ向上のためのアプローチを提案する。本手法は,低エネルギー状態の小さなスペクトルギャップと高縮退によって生じる制約に対処することを目的としている。単位行列の量子固有値変換(QETU)とスペクトルプロファイリングによる固有空間フィルタリングの適応配列に基づく。提案アルゴリズムと最先端位相推定法を組み合わせることで,地上状態エネルギーと局所2量子ゲート脱分極確率を最大10^{-4}$で近似した。本研究の重要な成果を示すために,Qiskit を用いた古典計算機上での逆場イジングモデルを用いてシミュレーションを行った。提案手法とQETUの静的実装を比較し,絶対誤差率の3～4桁の改善を連続的に達成可能であることを示す。 Hamiltonian simulation is a domain where quantum computers have the potential to outperform their classical counterparts due to their inherent quantum behavior. One of the main challenges of such quantum algorithms is up-scaling the system size, which is necessary to achieve meaningful quantum advantage. In this work, we present an approach to improve the scalability of eigenspace filtering for the ground state preparation of a given Hamiltonian. Our method aims to tackle limitations introduced by a small spectral gap and high degeneracy of low energy states. It is based on an adaptive sequence of eigenspace filtering through Quantum Eigenvalue Transformation of Unitary Matrices (QETU) followed by spectrum profiling. By combining our proposed algorithm with state-of-the-art phase estimation methods, we achieved good approximations for the ground state energy with local, two-qubit gate depolarizing probability up to $10^{-4}$. To demonstrate the key results in this work, we ran simulations with the transverse-field Ising Model on classical computers using Qiskit. We compare the performance of our approach with the static implementation of QETU and show that we can consistently achieve three to four orders of magnitude improvement in the absolute error rate.	翻訳日:2024-06-26 02:22:43 公開日:2024-06-22
# ブリッジング進化アルゴリズムと強化学習:ハイブリッドアルゴリズムに関する総合的な調査 Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms ( http://arxiv.org/abs/2401.11963v4 ) ライセンス: Link先を確認	Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang,	(参考訳) 進化的アルゴリズム(EA)と強化学習(RL)を統合した進化的強化学習(ERL)は、目覚ましい性能向上を示した。両方のアプローチを融合させることで、ERLは有望な研究方向として浮上した。本調査では,ERLの多様な研究分野について概観する。具体的には, 関連アルゴリズムの最近の進歩を体系的に要約し, EA支援によるRL最適化, RL支援によるEA最適化, EAとRLの相乗的最適化の3つの研究方向を特定する。その後、各研究の方向性を詳細に分析し、複数の研究部門を編成する。それぞれのブランチが取り組もうとしている問題と、EAとRLの統合がこれらの課題にどのように対処するかを明らかにする。結論として,様々な研究方向性にまたがる潜在的な課題と今後の研究方向性について議論する。研究者によるERLの探究を容易にするため, https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learningに関するアルゴリズムとコードを整理した。 Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in related algorithms and identify three primary research directions: EA-assisted Optimization of RL, RL-assisted Optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EAs and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.	翻訳日:2024-06-26 02:22:43 公開日:2024-06-22
# 強化学習エージェントにおける創発的支配階層 Emergent Dominance Hierarchies in Reinforcement Learning Agents ( http://arxiv.org/abs/2401.12258v7 ) ライセンス: Link先を確認	Ram Rachum, Yonatan Nakar, Bill Tomlinson, Nitay Alon, Reuth Mirsky,	(参考訳) 現代の強化学習(RL)アルゴリズムは、様々なタスクにおいて人間より優れている。マルチエージェント強化学習(MARL)の設定には新たな課題があり、エージェントの混合モチベーションにおける協調の成功は、個人とグループ間の微妙なバランスをとる行為に依存する。社会慣習や規範は、しばしば人間の制度にインスパイアされ、このバランスを打つための道具として使用される。本稿では,動物社会と人間社会の連携の基盤となる,基礎的でよく研究された社会慣行,支配階層について考察する。我々は、支配階層の倫理理論を人工エージェントに適用し、確立された用語と定義を可能な限り少ない修正で借用する。明示的なプログラミングや本質的な報酬なしに活動するRLエージェントの集団は、新しい集団に支配階層を発明し、学習し、強制し、伝達することができることを実証する。支配的な階層構造は、鶏、マウス、魚、その他の種で研究されるものと類似した構造を持つ。 Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.	翻訳日:2024-06-26 02:22:43 公開日:2024-06-22
# OMPGPT: OpenMPのための生成事前学習型トランスモデル OMPGPT: A Generative Pre-trained Transformer Model for OpenMP ( http://arxiv.org/abs/2401.16445v3 ) ライセンス: Link先を確認	Le Chen, Arijit Bhattacharjee, Nesreen Ahmed, Niranjan Hasabnis, Gal Oren, Vy Vo, Ali Jannesari,	(参考訳) ChatGPTのような大規模言語モデル(LLM)は自然言語処理(NLP)の分野を大きく進歩させた。この傾向は、StarCoder、WizardCoder、CodeLlamaといったコードベースの大規模言語モデルの開発につながった。これらのコードの汎用的な能力は、コード生成のようなタスクにおいて多くのプログラマにとって有用であるが、ハイパフォーマンスコンピューティング(HPC)の領域は、より小さく、よりドメイン固有のモデルをよりスマートな選択にするための、より狭い要求セットを持っている。本稿では,OpenMPプラグマ生成のための言語モデル固有の強みを巧みに活用したドメイン固有モデルであるOMPGPTを提案する。さらに、我々は、NLPドメインからの迅速なエンジニアリング技術を活用して、OMPGPTの有効性を高めるために設計された革新的な戦略であるChain-of-OMPを作成する。 OMPGPTはOpenMPタスクに特化している既存の大規模言語モデルよりも優れており、HPC環境の典型的なハードウェア制約とより密に一致している。我々は、言語モデルの利点とHPCタスクの特定の要求を結びつけるために、我々の貢献を重要な橋と考えます。 Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are trained extensively on vast repositories of code and programming languages. While the generic abilities of these code LLMs are useful for many programmers in tasks like code generation, the area of high-performance computing (HPC) has a narrower set of requirements that make a smaller and more domain-specific model a smarter choice. This paper presents OMPGPT, a novel domain-specific model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation. Furthermore, we leverage prompt engineering techniques from the NLP domain to create Chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive evaluations demonstrate that OMPGPT outperforms existing large language models specialized in OpenMP tasks and maintains a notably smaller size, aligning it more closely with the typical hardware constraints of HPC environments. We consider our contribution as a pivotal bridge, connecting the advantage of language models with the specific demands of HPC tasks.	翻訳日:2024-06-26 02:22:43 公開日:2024-06-22
# スキルセット最適化:トランスファー可能なスキルによる言語モデル行動の強化 Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills ( http://arxiv.org/abs/2402.03244v2 ) ライセンス: Link先を確認	Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox,	(参考訳) 大規模言語モデル(LLM)は、インタラクティブ環境でのシーケンシャルな意思決定に最近使用されている。しかし,環境報酬信号の連続的LLMアクター改善への活用は容易ではない。トランスファー可能なスキルセットの構築と精細化を通じて,LLMアクターのパフォーマンスを向上させるためのスキルセット最適化(SSO)を提案する。 SSOは、報酬の高い共通のサブトラジェクトリを抽出し、各スキルを表すサブゴールと命令を生成することで、スキルを構築する。これらのスキルは、高い報酬で行動を強化するために、LLMアクターにコンテキストで提供される。そして、SSOは、高い報酬を得られない技術を切り刻むことによって設定されたスキルをさらに洗練する。我々は,従来のビデオゲームNetHackとテキスト環境ScienceWorldで,SSOのスキルセットを最適化し,コンテキスト内ポリシーの改善を行う能力を実証するために,本手法を評価した。 SSOは当社のカスタムNetHackタスクのベースラインを40%上回り、ScienceWorldの最先端を35%上回ります。 Large language models (LLMs) have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting common subtrajectories with high rewards and generating subgoals and instructions to represent each skill. These skills are provided to the LLM actor in-context to reinforce behaviors with high rewards. Then, SSO further refines the skill set by pruning skills that do not continue to result in high rewards. We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom NetHack task and outperforms the previous state-of-the-art in ScienceWorld by 35%.	翻訳日:2024-06-26 02:11:02 公開日:2024-06-22
# ダンス生成のための双方向自己回帰拡散モデル Bidirectional Autoregressive Diffusion Model for Dance Generation ( http://arxiv.org/abs/2402.04356v4 ) ライセンス: Link先を確認	Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang,	(参考訳) ダンスは人間の感情を表現するための強力な媒体として機能するが、人生のようなダンスの生成は依然としてかなりの課題である。近年、拡散モデルは様々な領域で顕著な生成能力を示した。彼らは、適応可能な多対多の性質のために、人間のモーションジェネレーションを約束します。それにもかかわらず、現在の拡散に基づく運動生成モデルは、局所的および双方向的な拡張による動きに焦点を絞らず、直接かつ一方向の運動列を直接生成することが多い。高品質な舞踊の動きを振る舞う際には、音楽的文脈だけでなく、近隣の音楽的な舞踊の動きも考慮する必要がある。本研究では,音楽間距離生成のための双方向自己回帰拡散モデル (BADM) を提案する。生成したダンス動作をよりスムーズにするため、局所運動強調のための局所情報デコーダを構築する。提案フレームワークは入力条件と近傍の動作に基づいて新しい動きを生成することができ、個々の動きスライスを反復的に予測し、全ての予測を統合する。生成されたダンスとビートとの同期性を更に向上させるため、ビート情報を入力として組み込んで、より優れた音楽整列ダンス動作を生成する。実験結果から,提案モデルが既存の一方向アプローチと比較して最先端性能を達成できることが示唆された。 Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.	翻訳日:2024-06-26 02:11:02 公開日:2024-06-22
# 確率微分方程式によるスコアベース拡散モデル-技術チュートリアル Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial ( http://arxiv.org/abs/2402.07487v2 ) ライセンス: Link先を確認	Wenpin Tang, Hanyang Zhao,	(参考訳) 以下は、スコアベース拡散モデルに関する解説記事であり、特に確率微分方程式(SDE)による定式化に焦点を当てている。本稿では,SDE/ODEサンプリング,スコアマッチング効率,一貫性モデル,強化学習を含む,拡散モデルにおける2つの柱について考察する。提案された結果の主案を説明するための短い証明が与えられる。この記事は、主にこの分野の技術的な紹介であり、実践者は、新しいモデルやアルゴリズムを設計するのに有用な分析を見出すかもしれない。 This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency models, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily a technical introduction to the field, and practitioners may also find some analysis useful in designing new models or algorithms.	翻訳日:2024-06-26 02:01:18 公開日:2024-06-22
# All in One and One for All: クロスドメイングラフ事前トレーニングのためのシンプルで効果的な方法 All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining ( http://arxiv.org/abs/2402.09834v2 ) ライセンス: Link先を確認	Haihong Zhao, Aochuan Chen, Xiangguo Sun, Hong Cheng, Jia Li,	(参考訳) 大規模言語モデル (LLM) はコンピュータビジョン (CV) と自然言語処理 (NLP) の分野に革命をもたらした。 LLMの最も注目すべき進歩の1つは、単一のモデルが、複数のドメインにまたがる広範囲で多様なデータセット("All in One"と呼ばれるパラダイム)でトレーニングされていることだ。この方法論は、非常に一般化された能力を持つLLMに権限を与え、さまざまなデータ分散の包括的理解を促進する。これらの機能を活用することで、単一のLLMは、さまざまなドメインにまたがる顕著な汎用性を実証する。"One for All"というパラダイムは、我々が"One for All"と呼ぶパラダイムだ。しかし、このアイデアをグラフ場に適用することは、ドメイン間の事前学習がしばしば負の移動をもたらすため、依然として非常に難しい課題である。この問題は、トレーニングデータの質が外部知識源の組み入れを必要とする、数ショットの学習シナリオにおいて特に重要である。この課題に対応するために,多種多様なグラフデータセット間の共通性を生かしたグラフコーディネータ(GCOPE)を提案する。我々の新しい手法は、事前学習期間中に異なるグラフデータセットをアマルガメートして、目的のタスクに有意義な知識を蒸留し、伝達する統合フレームワークを包含する。複数のグラフデータセットにまたがる大規模な実験は、我々のアプローチの優れた効果を示す。複数のグラフデータセットの相乗的ポテンシャルを事前学習に活用することにより、我々の研究はグラフ基礎モデルの領域への先駆的な貢献として立証される。 Large Language Models (LLMs) have revolutionized the fields of computer vision (CV) and natural language processing (NLP). One of the most notable advancements of LLMs is that a single model is trained on vast and diverse datasets spanning multiple domains -- a paradigm we term `All in One'. This methodology empowers LLMs with super generalization capabilities, facilitating an encompassing comprehension of varied data distributions. Leveraging these capabilities, a single LLM demonstrates remarkable versatility across a variety of domains -- a paradigm we term `One for All'. However, applying this idea to the graph field remains a formidable challenge, with cross-domain pretraining often resulting in negative transfer. This issue is particularly important in few-shot learning scenarios, where the paucity of training data necessitates the incorporation of external knowledge sources. In response to this challenge, we propose a novel approach called Graph COordinators for PrEtraining (GCOPE), that harnesses the underlying commonalities across diverse graph datasets to enhance few-shot learning. Our novel methodology involves a unification framework that amalgamates disparate graph datasets during the pretraining phase to distill and transfer meaningful knowledge to target tasks. Extensive experiments across multiple graph datasets demonstrate the superior efficacy of our approach. By successfully leveraging the synergistic potential of multiple graph datasets for pretraining, our work stands as a pioneering contribution to the realm of graph foundational model.	翻訳日:2024-06-26 02:01:18 公開日:2024-06-22
# 異種情報ネットワークにおける大規模言語モデル駆動型メタ構造発見 Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network ( http://arxiv.org/abs/2402.11518v2 ) ライセンス: Link先を確認	Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui,	(参考訳) 異種情報ネットワーク(HIN)は近年,多様なノード間の複雑な関係を捉えることで人気が高まっている。メタ構造は、HINの重要なパターンを特定するのに有用なツールとして提案されているが、手作りのメタ構造はスケールアップに重大な課題をもたらし、自動検索アルゴリズムの開発に広範囲の研究が注がれている。それまでの取り組みは主に、人間の理解性と一般化性の重要性を見越して、経験的性能のよいメタ構造を探すことに焦点を当てていた。この課題に対処するため,大規模言語モデル(LLM)の創発的推論能力から着想を得た。本稿では,LLM推論を進化過程に統合するメタ構造探索フレームワークReStructを提案する。 ReStructは文法トランスレータを使用して、メタ構造を自然言語文にエンコードし、LLMの推論能力を活用して、それらの意味的な実現可能性を評価する。さらに、ReStructはパフォーマンス指向の進化操作も採用している。これら2つの競合する力により、ReStructはメタ構造のセマンティックな説明可能性と経験的なパフォーマンスを共同で最適化することができる。さらに、ReStructは、検索履歴を解析することで、発見されたメタ構造の自然言語説明を生成し、洗練するための微分LDM説明器を含んでいる。 8つの代表的HINデータセットの実験は、ReStructが推奨タスクとノード分類タスクの両方で最先端のパフォーマンスを達成することを示した。さらに、73人の大学院生を対象にした調査の結果、ReStructによるメタ構造と生成した説明は、かなり理解しやすいことがわかった。コードとアンケートはhttps://github.com/LinChen-65/ReStruct.comで公開されている。 Heterogeneous information networks (HIN) have gained increasing popularity in recent years for capturing complex relations between diverse types of nodes. Meta-structures are proposed as a useful tool to identify the important patterns in HINs, but hand-crafted meta-structures pose significant challenges for scaling up, drawing wide research attention towards developing automatic search algorithms. Previous efforts primarily focused on searching for meta-structures with good empirical performance, overlooking the importance of human comprehensibility and generalizability. To address this challenge, we draw inspiration from the emergent reasoning abilities of large language models (LLMs). We propose ReStruct, a meta-structure search framework that integrates LLM reasoning into the evolutionary procedure. ReStruct uses a grammar translator to encode the meta-structures into natural language sentences, and leverages the reasoning power of LLMs to evaluate their semantic feasibility. Besides, ReStruct also employs performance-oriented evolutionary operations. These two competing forces allow ReStruct to jointly optimize the semantic explainability and empirical performance of meta-structures. Furthermore, ReStruct contains a differential LLM explainer to generate and refine natural language explanations for the discovered meta-structures by reasoning through the search history. Experiments on eight representative HIN datasets demonstrate that ReStruct achieves state-of-the-art performance in both recommendation and node classification tasks. Moreover, a survey study involving 73 graduate students shows that the discovered meta-structures and generated explanations by ReStruct are substantially more comprehensible. Our code and questionnaire are available at https://github.com/LinChen-65/ReStruct.	翻訳日:2024-06-26 01:51:30 公開日:2024-06-22
# 両世界の多くの人々のベスト:未知の領域モデルに基づく予測付きオンラインリソース割り当て Best of Many in Both Worlds: Online Resource Allocation with Predictions under Unknown Arrival Model ( http://arxiv.org/abs/2402.13530v2 ) ライセンス: Link先を確認	Lin An, Andrew A. Li, Benjamin Moseley, Gabriel Visotsky,	(参考訳) オンライン意思決定者は、到着、要求、在庫など、将来の変数に関する予測を得ることが多い。これらの予測は、単変量時系列の単純な予測アルゴリズムから、複数の時系列と追加の機能情報を活用する最先端の機械学習モデルまで、すべて生成することができる。しかし、事前判断者にとって予測精度は未知であるため、予測に盲目的に従うことは有害である可能性がある。本稿では,未知の予測精度に頑健な予測アルゴリズムを開発することにより,この問題に対処する。本稿では,オンライン意思決定の汎用モデルであるオンラインリソース割当問題について考察する。先行研究は、到着が確率的に(つまり)あるいは完全に逆向きに生成されるとき、最も達成可能なパフォーマンスを特徴付けており、基礎となるモデル「知識」を使わずに、両方の到着モデルの下でこれらの境界に一致するアルゴリズムが存在することを示した。この背景として,資源の種類ごとに影価格の形で予測を導入する。予測精度は、予測と実際の影価格の間の距離として自然に定義される。我々は、任意のアルゴリズムが予測を最適に活用できる範囲(正確には「follow」、不正確な場合は「ignore」、不正確な場合は「ignore」)を、予測精度や下層の到着モデルを知ることなく、形式的な下限によって強く特徴づける。我々の主な貢献は、この下限を達成するアルゴリズムである。最後に,小売業者のH&Mによる実データに対する大規模な実験により,我々のアルゴリズムを実証的に検証した。 Online decision-makers often obtain predictions on future variables, such as arrivals, demands, inventories, and so on. These predictions can be generated from simple forecasting algorithms for univariate time-series, all the way to state-of-the-art machine learning models that leverage multiple time-series and additional feature information. However, the prediction accuracy is unknown to decision-makers a priori, hence blindly following the predictions can be harmful. In this paper, we address this problem by developing algorithms that utilize predictions in a manner that is robust to the unknown prediction accuracy. We consider the Online Resource Allocation Problem, a generic model for online decision-making, in which a limited amount of resources may be used to satisfy a sequence of arriving requests. Prior work has characterized the best achievable performances when the arrivals are either generated stochastically (i.i.d.) or completely adversarially, and shown that algorithms exist which match these bounds under both arrival models, without ``knowing'' the underlying model. To this backdrop, we introduce predictions in the form of shadow prices on each type of resource. Prediction accuracy is naturally defined to be the distance between the predictions and the actual shadow prices. We tightly characterize, via a formal lower bound, the extent to which any algorithm can optimally leverage predictions (that is, to ``follow'' the predictions when accurate, and ``ignore'' them when inaccurate) without knowing the prediction accuracy or the underlying arrival model. Our main contribution is then an algorithm which achieves this lower bound. Finally, we empirically validate our algorithm with a large-scale experiment on real data from the retailer H&M.	翻訳日:2024-06-26 01:51:30 公開日:2024-06-22
# 言語モデルを用いた統計モデルの自動発見 Automated Statistical Model Discovery with Language Models ( http://arxiv.org/abs/2402.17879v2 ) ライセンス: Link先を確認	Michael Y. Li, Emily B. Fox, Noah D. Goodman,	(参考訳) 統計的モデル発見は、ドメイン固有の制約を受ける広大なモデルの空間を探索する難題である。この領域を効果的に探索するには、モデリングと問題領域の専門知識が必要である。大規模言語モデル(LM)のドメイン知識とプログラミング能力に動機付けられ,言語モデルによる自動統計モデル発見のための手法を提案する。 LMは確率的プログラムとして表される統計モデルを提案し、モデラーとして機能し、ドメインエキスパートとして機能し、それらのモデルを批判する。 LMを利用することで、モデルのドメイン固有言語を定義したり、手作りの検索手順を設計したりする必要がなくなる。確率的モデリングでは,制約されたモデルの空間内を探索し,オープンな空間を探索し,自然言語制約下での専門家モデルを改善する(例えば,このモデルは生態学者に解釈できる)。提案手法は,人間の専門家が設計したモデルと同等のモデルを特定し,解釈可能な方法で古典モデルを拡張する。その結果,LM駆動型モデル発見の可能性を浮き彫りにした。 Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, which are key restrictions of previous systems. We evaluate our method in three settings in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints (e.g., this model should be interpretable to an ecologist). Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.	翻訳日:2024-06-26 01:41:44 公開日:2024-06-22
# 高分子太陽電池の材料発見の加速:自然言語処理によるデータ駆動的洞察 Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing ( http://arxiv.org/abs/2402.19462v2 ) ライセンス: Link先を確認	Pranav Shetty, Aishat Adeboye, Sonakshi Gupta, Chao Zhang, Rampi Ramprasad,	(参考訳) 本稿では, 自然言語処理パイプラインを用いて20年間にわたる文献から抽出したデータを用いて, 高分子太陽電池ドナー/アクセプターペアの発見のための各種能動的学習手法のシミュレーションを行う。データ駆動法はエジソンの試行錯誤法よりも早く新しい物質を発見するために確立されているが、その利点は何十年もかかる物質発見問題に対して定量化されていない。提案手法は, 材料革新の15年間の加速に相当し, 発見時間を約75%短縮する可能性を示した。私たちのパイプラインでは、3300以上の論文からデータを抽出することができます。また、電力変換効率を予測するために機械学習モデルをトレーニングし、我々のモデルを使用して、まだ報告されていない有望なドナー/アクセプタの組み合わせを特定しました。そこで我々は,論文から抽出した資料データへのパイプラインを実証し,そのパイプラインがデータ駆動の洞察を得るために使用されることを示した。私たちの洞察には、物質特性の強い予測モデルをトレーニングしたり、使用した初期材料システムに対して堅牢であるような、アクティブな学習戦略が含まれています。この研究は、材料科学におけるデータ駆動研究のための貴重なフレームワークを提供する。 We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantified for material discovery problems that can take decades. Our approach demonstrates a potential reduction in discovery time by approximately 75 %, equivalent to a 15 year acceleration in material innovation. Our pipeline enables us to extract data from greater than 3300 papers which is $\sim$5 times larger and therefore more diverse than similar data sets reported by others. We also trained machine learning models to predict the power conversion efficiency and used our model to identify promising donor-acceptor combinations that are as yet unreported. We thus demonstrate a pipeline that goes from published literature to extracted material property data which in turn is used to obtain data-driven insights. Our insights include active learning strategies that can be used to train strong predictive models of material properties or be robust to the initial material system used. This work provides a valuable framework for data-driven research in materials science.	翻訳日:2024-06-26 01:41:44 公開日:2024-06-22
# 協調型対話型エージェントによるツールの活用 Learning to Use Tools via Cooperative and Interactive Agents ( http://arxiv.org/abs/2403.03031v4 ) ライセンス: Link先を確認	Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren,	(参考訳) ツール学習は、外部ツールを使用してユーティリティを拡張するエージェントとして、大きな言語モデル(LLM)を促進する。既存のメソッドでは、1つのLCMベースのエージェントを使用してツールを反復的に選択し、実行結果を次のアクション予測に組み込む。これらの手法は, それらの進歩にもかかわらず, 1) 誤った動作を校正する柔軟性に制限された事前定義されたパイプライン, (2) 汎用LLMエージェントを適応して, 様々な特殊動作を実行することによる, 実用上の課題に対処する際の性能劣化に悩まされる。ツール選択,ツール実行,アクションキャリブレーションの3つの特殊エージェントを個別にコーディネートする,協調型対話型エージェントフレームワークであるConAgentsを提案する。 ConAgentsはエージェントの柔軟な協調を可能にする2つの通信プロトコルを導入した。また,ConAgentsをオープンソースモデルに効果的に一般化するために,特別なアクション蒸留を提案し,フレームワーク内での特別なアクションの実行能力を向上する。 3つのデータセットに関する広範な実験により、LLMは、ConAgentsを装備した場合、かなりの改善(最大14%の成功率)でベースラインを上回ります。 Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).	翻訳日:2024-06-26 01:41:44 公開日:2024-06-22
# GPTopic:動的かつインタラクティブなトピック表現 GPTopic: Dynamic and Interactive Topic Representations ( http://arxiv.org/abs/2403.03628v2 ) ライセンス: Link先を確認	Arik Reuter, Anton Thielmann, Christoph Weisser, Sebastian Fischer, Benjamin Säfken,	(参考訳) トピックモデリングは、大きなテキストコーパス内のトピックを表すトップワードのリストを生成するのとほぼ同義であるようだ。しかし、そのような個々の用語のリストからトピックを導出するには、相当な専門知識と経験が必要であるため、トピックモデリングは、トップワード解釈の特殊性や落とし穴に慣れていない人々にとってアクセスしにくくなる。トップワードに限定されたトピック表現は、トピックが持つであろう様々な側面、ファセット、ニュアンスを包括的かつ容易にアクセス可能な特徴づけを提供するのに、さらに不足する可能性がある。これらの課題に対処するため,GPTopicは大規模言語モデル(LLM)を利用して動的に対話的なトピック表現を生成するソフトウェアパッケージである。 GPTopicは、対話的にトピックを探索、分析、洗練するための直感的なチャットインターフェースを提供する。対応するコードは、https://github.com/ArikReuter/TopicGPT.comで入手できる。 Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience, making topic modelling less accessible to people unfamiliar with the particularities and pitfalls of top-word interpretation. A topic representation limited to top-words might further fall short of offering a comprehensive and easily accessible characterization of the various aspects, facets and nuances a topic might have. To address these challenges, we introduce GPTopic, a software package that leverages Large Language Models (LLMs) to create dynamic, interactive topic representations. GPTopic provides an intuitive chat interface for users to explore, analyze, and refine topics interactively, making topic modeling more accessible and comprehensive. The corresponding code is available here: https://github.com/ArikReuter/TopicGPT.	翻訳日:2024-06-26 01:41:44 公開日:2024-06-22
# 解凍トークン化:テキスト圧縮の評価とモデル性能との関係 Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance ( http://arxiv.org/abs/2403.06265v2 ) ライセンス: Link先を確認	Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty,	(参考訳) 最も一般的なトークン化アルゴリズムであるBPEの基盤であるにもかかわらず、トークン化プロセスにおける圧縮の重要性はいまだ不明である。本稿では,全てのトークンに等しい確率が割り当てられる0-gram言語モデリングとして,圧縮の理論的重要性を論じる。また,事前学習した言語モデルの下流における圧縮の重要性を実証的に示す。トレーニング中に利用可能な文書の量を100万文書から、トレーニングデータに匹敵する文字ベースのトークン化器まで変更することにより、複数のBPEトークン化器の圧縮能力を制御する。次に、それらのトークン化子に基づいて英語モデルを事前訓練し、いくつかのタスクでそれらを微調整します。本稿では, トークン化器の圧縮性能とモデル下流性能との間に相関関係があることを示し, 圧縮がトークン化品質の信頼性の高い本質的な指標であることを示唆する。これらの相関関係は、生成タスク(分類よりも)やより小さなモデル(大きなものよりも)に対してより顕著である。我々はトルコ語に関する実験の代表的な部分を再現し、同様の結果を得た。より優れた圧縮トークン化器の構築は、さらなる研究と全体的なモデル性能向上のための実りある道であると結論付けている。 Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream success of pre-trained language models. We control the compression ability of several BPE tokenizers by varying the amount of documents available during their training: from 1 million documents to a character-based tokenizer equivalent to no training data at all. We then pre-train English language models based on those tokenizers and fine-tune them over several tasks. We show that there is a correlation between tokenizers' compression and models' downstream performance, suggesting that compression is a reliable intrinsic indicator of tokenization quality. These correlations are more pronounced for generation tasks (over classification) or for smaller models (over large ones). We replicated a representative part of our experiments on Turkish and found similar results, confirming that our results hold for languages with typological characteristics dissimilar to English. We conclude that building better compressing tokenizers is a fruitful avenue for further research and for improving overall model performance.	翻訳日:2024-06-26 01:31:59 公開日:2024-06-22
# Commonsenseナレッジグラフによる論理的クエリの複雑な推論 Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs ( http://arxiv.org/abs/2403.07398v2 ) ライセンス: Link先を確認	Tianqing Fang, Zeming Chen, Yangqiu Song, Antoine Bosselut,	(参考訳) イベントコモンセンス推論には、イベント間の関係を推論する機能と、その関係の根底にある暗黙的なコンテキストを推論する必要がある。しかし、データ不足により、複雑なイベント間の相互作用に関わるコンテキストや質問に対して、言語モデルが常識推論を生成することを学ぶことは困難になる。この要求に対処するために、COM2(Complex COMmonsense)という、既存のコモンセンス知識グラフ(CSKG)からマルチホップ論理的クエリ(例えば、イベントAとBの結合効果や原因、あるいはイベントCの効果)をサンプリングし、手書きのルールと大きな言語モデルを用いて言語化して、複数選択とテキスト生成の質問を合成する新しいデータセットを提示する。実験の結果,COM2で訓練した言語モデルでは複雑な推論能力が向上し,ドメイン内タスクとドメイン外タスクのゼロショット性能が向上することがわかった。コードとデータはhttps://github.com/tqfang/complex-commonsense-reasoning.comで公開されている。 Event commonsense reasoning requires the ability to reason about the relationship between events, as well as infer implicit context underlying that relationship. However, data scarcity makes it challenging for language models to learn to generate commonsense inferences for contexts and questions involving interactions between complex events. To address this demand, we present COM2 (COMplex COMmonsense), a new dataset created by sampling multi-hop logical queries (e.g., the joint effect or cause of both event A and B, or the effect of the effect of event C) from an existing commonsense knowledge graph (CSKG), and verbalizing them using handcrafted rules and large language models into multiple-choice and text generation questions. Our experiments show that language models trained on COM2 exhibit significant improvements in complex reasoning ability, resulting in enhanced zero-shot performance in both in-domain and out-of-domain tasks for question answering and generative commonsense reasoning, without expensive human annotations. Code and data are available at https://github.com/tqfang/complex-commonsense-reasoning.	翻訳日:2024-06-26 01:31:59 公開日:2024-06-22
# ガウス局所線型写像を用いた高速で高精度で軽量な逐次シミュレーションに基づく推論 Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear mappings ( http://arxiv.org/abs/2403.07454v3 ) ライセンス: Link先を確認	Henrik Häggström, Pedro L. C. Rodrigues, Geoffroy Oudoumanessah, Florence Forbes, Umberto Picchini,	(参考訳) 難易度の高い複素モデルに対するベイズ推論は、計算機シミュレータへの多くの呼び出しを実行するアルゴリズムを用いて取り組むことができる。これらの手法を総合的に「シミュレーションベース推論(SBI)」と呼ぶ。近年のSBI法では、ニューラルネットワーク(NN)を用いて、不可能な可能性関数と後部分布の近似的かつ表現的な構造を提供している。しかし、精度と計算需要のトレードオフは、改善の余地を多く残している。本研究では,確率分布の構造的混合を用いて,確率分布と後部分布の両方を近似する手法を提案する。提案手法は,マルチモーダル後部であっても,最先端のNNベースのSBI法と比較して,計算フットプリントがはるかに小さく,正確な後部推測を導出する。本研究は,SBI文献から得られたいくつかのベンチマークモデルと,mRNAトランスフェクション後の翻訳動態の生物学的モデルについて述べる。 Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection.	翻訳日:2024-06-26 01:31:59 公開日:2024-06-22
# LLMの知識紛争:調査 Knowledge Conflicts for LLMs: A Survey ( http://arxiv.org/abs/2403.08319v2 ) ライセンス: Link先を確認	Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu,	(参考訳) この調査は、大規模言語モデル(LLM)における知識の衝突を詳細に分析し、文脈的知識とパラメトリック知識を混ぜ合わせる際に直面する複雑な課題を明らかにする。私たちの焦点は、コンテキストメモリ、コンテキスト間、メモリ内コンフリクトの3つのカテゴリの知識コンフリクトに焦点を当てています。これらの対立は、特にノイズや誤報が一般的である現実世界のアプリケーションにおいて、LLMの信頼性と性能に大きな影響を及ぼす可能性がある。これらの紛争を分類し、原因を探究し、これらの紛争下でのLSMの行動を調べ、利用可能な解決策を見直し、この調査は、LSMの堅牢性を改善するための戦略に光を当てることを目的としており、この発展途上国の研究を進めるための貴重な資源となっている。 This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area.	翻訳日:2024-06-26 01:31:59 公開日:2024-06-22
# Ctrl123: クローズドループ転写による新規なビュー合成 Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription ( http://arxiv.org/abs/2403.10953v2 ) ライセンス: Link先を確認	Hongxiang Zhao, Xili Dai, Jianan Wang, Shengbang Tong, Jingyuan Zhang, Weida Wang, Lei Zhang, Yi Ma,	(参考訳) 大規模な画像拡散モデルは、新規ビュー合成(NVS)においてゼロショット機能を示した。しかし、既存の拡散に基づくNVS法は、トレーニングセット上でも対応する真実のポーズや外観と正確に一致した新しいビューを生成するのに苦労している。これにより、イメージ・ツー・マルチビュー生成や3D再構成といった下流タスクのパフォーマンスが制限される。このような矛盾は主に、Zero123のような既存の手法で行われているように、拡散訓練において、正確なポーズと外観アライメントを直接強制することが困難であるという事実から生じている。この問題を解決するために、我々はCtrl123を提案する。Ctrl123は、ポーズに敏感な特徴空間において、生成されたビューと地上の真実との間のアライメントを強制する、クローズドループ転写に基づくNVS拡散法である。我々は,Ctrl123がNVSおよび3次元再構成のタスクに与える影響を実証し,既存の手法よりも多視点整合性とポーズ整合性の両方において顕著な改善を実現した。 Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D reconstruction. We realize that such inconsistency is largely due to the fact that it is difficult to enforce accurate pose and appearance alignment directly in the diffusion training, as mostly done by existing methods such as Zero123. To remedy this problem, we propose Ctrl123, a closed-loop transcription-based NVS diffusion method that enforces alignment between the generated view and ground truth in a pose-sensitive feature space. Our extensive experiments demonstrate the effectiveness of Ctrl123 on the tasks of NVS and 3D reconstruction, achieving significant improvements in both multiview-consistency and pose-consistency over existing methods.	翻訳日:2024-06-26 01:22:15 公開日:2024-06-22
# 対話における言語モデル:人間とAIの対話における会話の最大化 Language Models in Dialogue: Conversational Maxims for Human-AI Interactions ( http://arxiv.org/abs/2403.15115v2 ) ライセンス: Link先を確認	Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards,	(参考訳) 現代言語モデルは洗練されているが、固有の欠点、特に会話の場面で現れている。観察された欠点の多くは、1つ以上の会話の原則に違反しているためである、と我々は主張する。社会科学とAIコミュニティの両方からの広範な研究に基づいて、有効な人間とAIの会話を記述するために、量、品質、関連性、方法、慈悲、透明性のセットを提案する。まず、人間とAIの相互作用の文脈において、最初の4つの最大値(Griceから)の適用性を正当化する。次に、現代の人間とAIの相互作用に特有の行動に対処するためには、2つの新たな最大性、善意(有害なコンテンツの生成と関与)と透明性(知識境界、運用上の制約、意図の認識)が必要であると論じる。様々な言語モデルがこれらの最大値を理解することができる程度を評価し、モデルがそれらの最大値を正確に解釈する能力に大きな影響を与える原理の内的優先順位付けを持っていることを発見した。 Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benevolence, and transparency -- for describing effective human-AI conversation. We first justify the applicability of the first four maxims (from Grice) in the context of human-AI interactions. We then argue that two new maxims, benevolence (concerning the generation of, and engagement with, harmful content) and transparency (concerning recognition of one's knowledge boundaries, operational constraints, and intents), are necessary for addressing behavior unique to modern human-AI interactions. We evaluate the degree to which various language models are able to understand these maxims and find that models possess an internal prioritization of principles that can significantly impact their ability to interpret the maxims accurately.	翻訳日:2024-06-26 01:22:15 公開日:2024-06-22
# Qibo: 漢方医学における大規模言語モデル Qibo: A Large Language Model for Traditional Chinese Medicine ( http://arxiv.org/abs/2403.16056v3 ) ライセンス: Link先を確認	Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo,	(参考訳) LLM(Large Language Models)は、医学、法律、金融など多くの専門分野において大きな進歩を遂げている。しかし、伝統的な中国医学(TCM)においては、理論と近代医学の本質的な違い、専門的なコーパス資源の欠如、監督された微調整にのみ依存しているという事実は、過度な予測につながる可能性がある。これらの課題に対処するため,継続的事前学習と教師付き微調整を組み合わせた2段階の訓練手法を提案する。本研究の特筆すべき貢献は,TCM専用の2GBコーパスの処理であり,TCMのための事前学習データセットと命令微調整データセットの構築である。さらに,主観的,客観的,および3つのTCMNLPタスクを含む,TCMにおけるLLMの性能を評価するツールであるQibo-Benchmarkを開発した。 $\textbf{Qibo}$という名前の、私たちのパイプラインでトレーニングされた医療用LLMは、大幅なパフォーマンス向上を示します。ベースラインと比較すると、平均主観的勝利率は63%、平均目標精度は23%から58%向上し、3つのTCM NLPタスクのルージュ-Lスコアは0.72、0.61、0.55である。最後に,QiboをTCMコンサルテーションに適用するためのピップラインを提案し,ケーススタディを通じてモデル性能を実証する。 Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2GB corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named $\textbf{Qibo}$, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63%, the average objective accuracy improved by 23% to 58%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.	翻訳日:2024-06-26 01:22:15 公開日:2024-06-22
# データ正規化自己再生強化学習による人間互換運転パートナー Human-compatible driving partners through data-regularized self-play reinforcement learning ( http://arxiv.org/abs/2403.19648v2 ) ライセンス: Link先を確認	Daphne Cornelisse, Eugene Vinitsky,	(参考訳) 自動運転車における中心的な課題は、人間と協調することだ。したがって、シミュレーションにおける自律運転システムのスケーラブルなトレーニングと評価には、現実的なヒューマンエージェントの導入が不可欠である。シミュレーションエージェントは通常、人間の運転の大規模で高品質なデータセットを模倣することによって開発される。しかし、純粋な模倣学習エージェントは、マルチエージェント閉ループ設定で実行される場合、経験的に高い衝突率を有する。クローズドループ設定において現実的で効果的なエージェントを構築するために,エージェントが人間の参照ポリシーから逸脱する小さなペナルティで自己プレイによって訓練されるマルチエージェントアルゴリズムであるHuman-Regularized PPO(HR-PPO)を提案する。従来の研究とは対照的に、我々のアプローチはRLファーストであり、不完全な人間のデモを30分しか使っていません。エージェントを多エージェントの交通シーンで評価する。その結果,HR-PPOは93%,オフロード率3.5%,衝突率3%の目標達成に極めて有効であることがわかった。同時に、エージェントは既存の人間の運転ログと類似性によって測定されるように、人間のように運転する。また、HR-PPOエージェントは、特に高度に対話的なシナリオにおいて、人間の運転と協調するためのプロキシ対策をかなり改善していることが判明した。私たちはコードと訓練されたエージェントをhttps://github.com/Emerge-Lab/nocturne_labでオープンソース化し、https://sites.google.com/view/driving-partnersでエージェントの動作のデモを提供します。 A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.	翻訳日:2024-06-26 01:12:30 公開日:2024-06-22
# 一貫性モデルのためのRL:より高速なリワードガイドテキスト-画像生成 RL for Consistency Models: Faster Reward Guided Text-to-Image Generation ( http://arxiv.org/abs/2404.03673v2 ) ライセンス: Link先を確認	Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun,	(参考訳) Reinforcement Learning (RL)は、画像品質、美学、指示追従能力をキャプチャする報酬を直接最適化することにより、拡散モデルによるガイド付き画像生成を改善した。しかし、結果として生じる生成ポリシーは、遅い生成を引き起こす拡散モデルの反復サンプリングプロセスを継承する。この制限を克服するために、一貫性モデルは、ノイズを直接データにマッピングする新しい世代の生成モデルを学ぶことを提案した。本研究では,タスク固有報酬に対するテキスト・ツー・イメージ生成モデルを最適化し,高速なトレーニングと推論を実現するために,RLを用いた微調整一貫性モデルのためのフレームワークを提案する。 RLCM(Reinforcement Learning for Consistency Model)と呼ばれる我々のフレームワークは、一貫性モデルの反復推論プロセスをRLプロシージャとしてフレーム化します。 RL微調整拡散モデルと比較して、RCCMの列車は大幅に高速で、報奨目標に基づいて測定された生成の質を向上し、2段階の推論ステップで高品質な画像を生成することにより推論手順を高速化する。実験により,RLCMは画像の圧縮性や美的品質などの人間のフィードバックから導出されるようなプロンプトで表現しにくい目標に対して,テキスト・画像の整合性モデルを適用することができることを示す。私たちのコードはhttps://rlcm.owenoertell.comで公開されています。 Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Our code is available at https://rlcm.owenoertell.com.	翻訳日:2024-06-26 01:12:30 公開日:2024-06-22
# 超作用素マスター方程式による非分極力学の解法 Superoperator master equations for depolarizing dynamics ( http://arxiv.org/abs/2404.06595v2 ) ライセンス: Link先を確認	A. E. Teretenkov,	(参考訳) この作品はスーパーオペレーターマスター方程式に特化している。すなわち、ツイリング超プロジェクターの場合の超作用素マスター方程式はユニタリ群全体に対して導出される。そのような超射影と整合性を持つためには、自由力学は非分極的であると仮定される。そして、任意のゴリーニ-コサコフスキー-スダルシャン-リンドブラッド発生器によって摂動される。この場合、2階マスター方程式の明示的な形式が示される。 The work is devoted to superoperator master equations. Namely, the superoperator master equations in the case of the twirling hyperprojector with respect to the whole unitary group are derived. To be consistent with such a hyperprojector the free dynamics is assumed to be depolarizing. And it is perturbed by the arbitrary Gorini--Kossakowski--Sudarshan--Lindblad generator. The explicit form of the second order master equations are presented in this case.	翻訳日:2024-06-26 01:12:30 公開日:2024-06-22
# データ不足地域におけるPM2.5推定のための空間伝達学習 Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions ( http://arxiv.org/abs/2404.07308v2 ) ライセンス: Link先を確認	Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu,	(参考訳) 大気汚染、特に粒子状物質2.5(PM2.5)は公衆衛生への関心が高まり、地上センサーの欠如により発展途上国(データ貧しい地域)では推定が難しい。移行学習モデルは、知識を得るために代替データソース(すなわち、データ豊富な領域のデータ)を使用するため、この問題を解決するために利用することができる。しかし、現在の転送学習手法は、ソースとターゲットドメイン間の依存関係を考慮しない。我々はこの伝達問題を空間伝達学習として認識し、両方の領域の空間的および意味的依存関係をキャプチャし、その後、各領域の特徴空間に追加するLatent Dependency Factor (LDF) という新機能を提案する。我々は、類似したソースとターゲットドメインデータのクラスタから学習する新しい2段階オートエンコーダモデルを用いてLPFを生成する。実験の結果, LDFを用いた移動学習モデルでは, ベースラインよりも19.34%向上していることがわかった。また,定性的な実験も支援している。 Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a 19.34% improvement over the baselines. We additionally support our experiments with qualitative findings.	翻訳日:2024-06-26 01:02:45 公開日:2024-06-22
# 音声匿名化が病因とその限界に及ぼす影響 The Impact of Speech Anonymization on Pathology and Its Limits ( http://arxiv.org/abs/2404.08064v2 ) ライセンス: Link先を確認	Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang,	(参考訳) 医療へのスピーチの統合は、個々の生体情報を含む非侵襲的なバイオマーカーとしての可能性から、プライバシー上の懸念を強めている。これに対し、話者匿名化は、重要な言語内容を保持しながら個人識別可能な情報を隠蔽することを目的としている。しかし,プライバシが特に重要である重要な領域である病的音声への匿名化手法の適用については,広く検討されていない。本研究では,ドイツの複数の機関の2,700人以上の話者を対象に,匿名化が病的スピーチに与える影響について検討した。深層学習と信号処理を併用した匿名化手法について検討し,同程度のエラー率で推定される障害間のプライバシー改善を,実用性に最小限の影響を伴って,1933%まで向上することを示す。 Dysarthria, Dysphonia, Cleft Lip and Palateなどの特定の疾患は最小限の効用変化を経験し, Dysglossiaはわずかに改善した。以上より, 匿名化の影響は疾患によって大きく異なることが示唆された。これは、プライバシーと診断ユーティリティの最適なバランスをとるために、障害特異的匿名化戦略を必要とする。さらに, フェアネス分析の結果, 多くの人口層で一貫した匿名化効果が認められた。本研究は,病的音声の匿名化によるプライバシー向上効果を実証するとともに,逆攻撃を考慮に入れたカスタマイズおよび障害特異的アプローチの重要性を強調した。 Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.	翻訳日:2024-06-26 01:02:45 公開日:2024-06-22
# AIライフサイクルに沿ったフェアネスのための説明可能な人工知能(XAI)の可能性のマッピング Mapping the Potential of Explainable Artificial Intelligence (XAI) for Fairness Along the AI Lifecycle ( http://arxiv.org/abs/2404.18736v3 ) ライセンス: Link先を確認	Luca Deck, Astrid Schomäcker, Timo Speith, Jakob Schöffer, Lena Kästner, Niklas Kühl,	(参考訳) さまざまな領域で人工知能(AI)システムが広く使われるようになると、アルゴリズムの公正性、特に高い評価のシナリオに関する問題がますます強調されている。したがって、AIシステムの公正性がどのように改善されるのか、このプロセスを支援するためにどのような手段が利用できるのか、という批判的な考察が過度に進んでいる。多くの研究者や政策立案者は、AIシステムの公正性を高めるための有望な方法として説明可能なAI(XAI)を考えている。しかし、異なるデシダラタを表すXAIの方法やフェアネスの概念は様々であり、XAIとフェアネスの正確な関係はいまだに不明瞭である。さらに、アルゴリズムの公正性を高めるためのさまざまな手段が、AIシステムのライフサイクルを通して異なるポイントに適用できる可能性がある。しかし、AIライフサイクルに沿って、現在フェアネスデシダータのコヒーレントなマッピングはありません。我々は8つの公正なデシダータを蒸留し、AIライフサイクルに沿ってそれらをマップし、XAIがそれぞれにどのように対処できるかを議論する。我々は,これらのフェアネス・デシダータに特化して,実践的応用のためのオリエンテーションを提供し,XAI研究のインスピレーションを期待する。 The widespread use of artificial intelligence (AI) systems across various domains is increasingly highlighting issues related to algorithmic fairness, especially in high-stakes scenarios. Thus, critical considerations of how fairness in AI systems might be improved, and what measures are available to aid this process, are overdue. Many researchers and policymakers see explainable AI (XAI) as a promising way to increase fairness in AI systems. However, there is a wide variety of XAI methods and fairness conceptions expressing different desiderata, and the precise connections between XAI and fairness remain largely nebulous. Besides, different measures to increase algorithmic fairness might be applicable at different points throughout an AI system's lifecycle. Yet, there currently is no coherent mapping of fairness desiderata along the AI lifecycle. In this paper, we set out to bridge both these gaps: We distill eight fairness desiderata, map them along the AI lifecycle, and discuss how XAI could help address each of them. We hope to provide orientation for practical applications and to inspire XAI research specifically focused on these fairness desiderata.	翻訳日:2024-06-26 01:02:45 公開日:2024-06-22
# DiffMatch:ビジュアルランゲージガイダンスは、半教師付き変更検出器を改善 DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector ( http://arxiv.org/abs/2405.04788v2 ) ライセンス: Link先を確認	Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang,	(参考訳) Change Detection (CD) は、画像間のセマンティックな変化でピクセルを識別することを目的としている。しかし、大量のピクセルレベルの画像に注釈を付けることは、特に人間の専門家によるピクセルレベルの比較を必要とするマルチテンポラリ画像に対して、労働集約的でコストがかかる。ゼロショットやオープンボキャブラリなどにおける視覚言語モデル(VLM)の性能を即時推論で向上させることを考えると,VLMを利用してラベル付きデータでより良いCDを作成することが期待できる。本稿では,VLM誘導に基づく半教師付きCD手法,すなわちDiffMatchを提案する。 DiffMatchの洞察は、VLMを使用して自由な変更ラベルを合成し、ラベルなしデータに対するさらなる監視信号を提供することである。しかしながら、現在のほとんどのVLMは単一時間画像用に設計されており、バイ時間画像や複数時間画像に直接適用することはできない。そこで我々はまず,VLMに基づく混合変化イベント生成(CEG)戦略を提案し,ラベルなしCDデータに擬似ラベルを付与する。これらのVLM駆動型擬似ラベルによって提供される追加の教師付き信号は、整合正則化パラダイム(例えば FixMatch)の擬似ラベルと矛盾する可能性があるため、異なる信号源を分離するための二重投影ヘッドを提案する。さらに、VLMによってガイドされる2つの補助セグメント化デコーダを通して、両時間画像の意味表現を明示的に分離する。最後に、モデルが変化表現をより適切にキャプチャするために、補助枝における特徴レベルのコントラスト損失によるメトリクス認識の監視を導入する。大規模な実験はDiffMatchの利点を示している。例えば、DiffMatchはFixMatchベースラインをWHU-CDで+5.3 IoU、LEVIR-CDで+2.4 IoUで5%改善している。さらに、当社のCEG戦略は、教師なしの方法で、最先端の教師なしCD手法よりもはるかに優れた性能を達成することができる。 Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely DiffMatch. The insight of DiffMatch is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of DiffMatch. For instance, DiffMatch improves the FixMatch baseline by +5.3 IoU on WHU-CD and by +2.4 IoU on LEVIR-CD with 5% labels. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art un-supervised CD methods.	翻訳日:2024-06-26 00:53:00 公開日:2024-06-22
# ミスセグメンテーション設定下でのユニバーサルバッチ学習 Universal Batch Learning Under The Misspecification Setting ( http://arxiv.org/abs/2405.07252v2 ) ライセンス: Link先を確認	Shlomi Vituri, Meir Feder,	(参考訳) 本稿では,ログロスを伴う不特定設定における普遍的バッチ学習の問題点について考察する。この設定では、仮説クラスはモデルの集合 $\Theta$ である。しかし、データは、この集合に属さないが、より大きなモデルの集合である$\Phi \supset \Theta$から生成される未知の分布によって生成される。トレーニングサンプルが与えられた場合、ユニバーサル学習者が次の結果の確率分布を予測するように要求され、ログロスが発生する。ユニバーサルラーナーのパフォーマンスは、$\Theta$から選択されたデータにマッチする最良の仮説に対する後悔によって測定される。ミニマックス定理と情報理論ツールを用いて、データ生成分布の集合上の混合である最適普遍学習者を導出し、min-max後悔の閉形式式を得る。我々は,この後悔を,データとその生成分布の条件付き容量の制約版と考えることができることを示す。この問題の複雑さは仮説モデルの豊かさによって支配され、データ生成分布セットの$\Phi$には支配されないことを暗示する。本研究では,有本・ブラフトアルゴリズムを拡張して,先行分布における後悔と能力の数値評価を行う。仮定クラス $\Theta$ はこの分布の族の部分集合に過ぎず、観測が $K$-parameters の多重項分布から来る場合の結果を実証する。 In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.	翻訳日:2024-06-26 00:53:00 公開日:2024-06-22
# Human-AIの安全性: 生成AIと制御システムの安全性の子孫 Human-AI Safety: A Descendant of Generative AI and Control Systems Safety ( http://arxiv.org/abs/2405.09794v2 ) ライセンス: Link先を確認	Andrea Bajcsy, Jaime F. Fisac,	(参考訳) 人工知能(AI)は前例のない規模で人々と対話し、大きなポジティブな影響をもたらす新たな道を提供する一方で、個人や社会的な害の可能性を広く懸念している。今日、人間-AI安全のための主要なパラダイムは、人が提供する例やフィードバックによりよく一致するように生成モデルの出力を微調整することに焦点を当てている。しかし、実際には、AIモデルのアウトプットの結果は独立して決定することはできない。本稿では,AIの安全性と制御システムの安全性から重要な補完的教訓を抽出し,オープンな課題と両分野間の重要なシナジーを強調した。そして、高度なAI技術に対する有意義な安全保証には、AI出力と人間の振る舞いによって形成されるフィードバックループが、どのようにして異なる結果に向かって相互作用を駆動するかについての推論が必要である、と論じる。この目的のために、動的で安全クリティカルな人間-AIインタラクションをキャプチャするための統一的なフォーマリズムを導入し、次世代の人間中心AI安全性に向けた具体的な技術的なロードマップを提案する。 Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.	翻訳日:2024-06-26 00:53:00 公開日:2024-06-22
# 因果発見のための適応型オンライン実験設計 Adaptive Online Experimental Design for Causal Discovery ( http://arxiv.org/abs/2405.11548v3 ) ライセンス: Link先を確認	Muhammad Qasim Elahi, Lai Wei, Murat Kocaoglu, Mahsa Ghasemi,	(参考訳) 因果発見は、観察データ、介入データ、またはそれらの組み合わせを利用して因果グラフに符号化された因果関係を明らかにすることを目的としている。既存の因果発見法の大部分は、無限の介入データを想定して開発されている。我々は、データ介入効率に重点を置き、オンライン学習の観点から因果発見を形式化し、バンドイット問題における純粋な探索から着想を得た。グラフのすべてのエッジを少なくとも一度は切断する介入からなるグラフ分離システムは、最悪の場合であっても無限の介入データが利用できる場合に因果グラフを学習するのに十分である。本稿では,グラフ分離システムからの介入をアロケーションマッチングにより適応的に選択し,サンプリング履歴に基づいて因果グラフを学習するトラック・アンド・ストップ因果探索アルゴリズムを提案する。任意の信頼度が与えられた場合、アルゴリズムは終了条件を決定し、それを満たすまで実行させる。本稿では,提案アルゴリズムを解析し,必要な介入サンプルの期待数に基づいて問題依存上界を確立する。提案アルゴリズムは,様々なランダムに生成した因果グラフのシミュレーションにおいて,既存の手法よりも優れている。学習した因果グラフと地上の真理の間の構造的ハミング距離(SHD)によって測定され、試料は著しく少ない。 Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs by leveraging observational, interventional data, or their combination. The majority of existing causal discovery methods are developed assuming infinite interventional data. We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning, inspired by pure exploration in bandit problems. A graph separating system, consisting of interventions that cut every edge of the graph at least once, is sufficient for learning causal graphs when infinite interventional data is available, even in the worst case. We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system via allocation matching and learns the causal graph based on sampling history. Given any desired confidence value, the algorithm determines a termination condition and runs until it is met. We analyze the algorithm to establish a problem-dependent upper bound on the expected number of required interventional samples. Our proposed algorithm outperforms existing methods in simulations across various randomly generated causal graphs. It achieves higher accuracy, measured by the structural hamming distance (SHD) between the learned causal graph and the ground truth, with significantly fewer samples.	翻訳日:2024-06-26 00:43:06 公開日:2024-06-22
# 階層的セマンティックグラフを用いた3次元復元におけるガウス制御 Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery ( http://arxiv.org/abs/2405.12477v3 ) ライセンス: Link先を確認	Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Jing Li, Zhanyun Tang, Shengyu Zhang, Fei Wu, Feng Lin,	(参考訳) 3D Gaussian Splatting (3DGS)は、最近3Dの人間の再構築に進歩を遂げているが、主に2Dピクセルレベルの監視に依存しており、異なる部位の幾何学的複雑さとトポロジ的関係を見越している。このギャップに対処するために,高忠実度3次元再構成を実現するための階層型人ガウス制御(HUGS)フレームワークを導入する。我々のアプローチは、幾何学的トポロジーの整合性を確保するために、身体部分の明確な意味的先行を活用することにより、身体部分間の複雑な幾何学的およびトポロジ的関連の捕捉を可能にする。さらに,大域的な人体の特徴から高周波の特徴を引き離し,表面の細部を洗練させる。広範囲な実験により,本手法は人体再建において優れた性能を示し,特に表面の細部の改善と体部接合部の精密再構築に有効であることが示された。コードはhttps://wanghongsheng01.github.io/HUGS/で公開されている。 Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/.	翻訳日:2024-06-26 00:43:06 公開日:2024-06-22
# MOSS:モノクルビデオからのモーションベース3D合成 MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video ( http://arxiv.org/abs/2405.12806v3 ) ライセンス: Link先を確認	Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu,	(参考訳) 単一視点の人間の再構築は、仮想現実の応用、特に複雑な人間の動きを含む文脈において中心的な位置を占める。これは、現実的な衣服の変形を達成する上での顕著な課題である。現在の手法は、運動が表面の変形に与える影響をしばしば見落とし、その結果、表面は大域的な動きによって課される制約を欠いている。これらの制約を克服するために,動作対応のガウス分割を実現するために,運動情報を利用した3次元クローン合成(MOSS)という革新的なフレームワークを導入する。本フレームワークは,KGAS (Kinematic Gaussian Locating Splatting) とUID (Surface deformation Detector) の2つのモジュールから構成される。 KGASは、体表面を横切る大域的な運動を伝播するためにマトリックス・フィッシャー分布を包含する。この分布の密度と回転係数はガウスを明示的に制御し、再構成された表面の現実性を高める。さらに,KGASに基づく単一視点での局所閉塞に対処するため,UIDは重要な表面を同定し,これらの変形を補うために幾何的再構成を行う。実験により,MOSSはモノクロビデオからの3次元衣料合成において,最先端の視覚的品質を実現することが示された。特に,ヒトNeRFとガウススプラッティングをそれぞれ33.94%,LPIPSで16.75%改善した。コードはhttps://wanghongsheng01.github.io/MOSS/で公開されている。 Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.	翻訳日:2024-06-26 00:43:06 公開日:2024-06-22
# ロボット言語接地に関する調査:シンボルと埋め込みのトレードオフ A Survey of Robotic Language Grounding: Tradeoffs between Symbols and Embeddings ( http://arxiv.org/abs/2405.13245v2 ) ライセンス: Link先を確認	Vanya Cohen, Jason Xinyu Liu, Raymond Mooney, Stefanie Tellex, David Watkins,	(参考訳) 大きな言語モデルでは、ロボットは言語をより柔軟に理解し、これまで以上に能力を高めることができる。この調査は、最近の文献を2つの極を持つスペクトルにレビューし、整理する。 1)言語といくつかの手作業で定義された意味の形式表現のマッピング 2)低レベルロボットポリシーに直接変換する言語と高次元ベクトル空間のマッピング。形式表現を使用することで、言語の意味を正確に表現することができ、学習の問題のサイズを制限し、解釈可能性と形式的安全性を保証するためのフレームワークにつながる。言語や知覚データを高次元空間に埋め込む手法は、手動で指定した記号構造を回避し、十分なデータを供給するとより一般的な可能性を持つが、訓練により多くのデータや計算を必要とする。我々は、それぞれのアプローチの利点とトレードオフについて議論し、両方の世界のベストを達成するための今後の仕事の方向性を提供することで、仕上げる。 With large language models, robots can understand language more flexibly and more capable than ever before. This survey reviews and situates recent literature into a spectrum with two poles: 1) mapping between language and some manually defined formal representation of meaning, and 2) mapping between language and high-dimensional vector spaces that translate directly to low-level robot policy. Using a formal representation allows the meaning of the language to be precisely represented, limits the size of the learning problem, and leads to a framework for interpretability and formal safety guarantees. Methods that embed language and perceptual data into high-dimensional spaces avoid this manually specified symbolic structure and thus have the potential to be more general when fed enough data but require more data and computing to train. We discuss the benefits and tradeoffs of each approach and finish by providing directions for future work that achieves the best of both worlds.	翻訳日:2024-06-26 00:43:06 公開日:2024-06-22
# SketchQLデモ - Sketchesによるゼロショットビデオモーメントクエリ SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches ( http://arxiv.org/abs/2405.18334v2 ) ライセンス: Link先を確認	Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong,	(参考訳) 本稿では、スケッチベースのクエリインタフェースでビデオモーメントを検索するビデオデータベース管理システム(VDBMS)であるSketchQLについて述べる。このインターフェースでは、単純なマウスドラッグアンドドロップ操作でオブジェクトのトラジェクトリイベントを指定できる。複雑なイベントを構成するために、単一のオブジェクトのトラジェクトリをビルディングブロックとして使用することができる。トラジェクトリ類似性を符号化した事前トレーニングモデルを使用して、SketchQLは、ビデオ上で類似性検索を実行してゼロショットビデオモーメント検索を実現し、ビジュアルクエリに最も近いクリップを識別する。このデモでは、SketchQLのグラフィックユーザインタフェースを導入し、その機能とインタラクションメカニズムを詳述する。また,クエリ合成からリアルタイムシナリオを用いたビデオモーメント検索まで,SketchQLのエンドツーエンド使用例を示す。 In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.	翻訳日:2024-06-26 00:33:22 公開日:2024-06-22
# マンハッタン世界仮説を用いた構造ガウスSLAM Structure Gaussian SLAM with Manhattan World Hypothesis ( http://arxiv.org/abs/2405.20031v2 ) ライセンス: Link先を確認	Shuhong Liu, Heng Zhou, Liuzhuozheng Li, Yun Liu, Tianchen Deng, Yiming Zhou, Mingrui Li,	(参考訳) ガウスのSLAMシステムは、リアルタイム再構築の効率性と忠実性を向上させるために大きな進歩を遂げた。しかし、これらのシステムは複雑な屋内環境において、障害物や限られた視角によって引き起こされる未観測の幾何学により、実質的な穴を特徴とする不完全な再構成に遭遇することが多い。この課題に対処するために,マンハッタンワールド仮説を利用したRGB-DシステムであるManhattan Gaussian SLAM(MG-SLAM)を提案する。 MG-SLAMは、構造されたシーンから導かれた融合した線分をシームレスに統合することにより、テクスチャレス屋内領域におけるロバストな追跡を確実にする。さらに、抽出された線と平面面仮定により、欠測した幾何学領域における新しいガウスの戦略的補間が可能となり、効率的なシーン補完が可能となった。合成シーンと実世界のシーンの両方で行われた大規模な実験により、これらの手法が最先端の性能を実現し、ガウスSLAMシステムの能力を大幅に向上することを示す。 Gaussian SLAM systems have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM (MG-SLAM), an RGB-D system that leverages the Manhattan World hypothesis to enhance geometric accuracy and completeness. By seamlessly integrating fused line segments derived from structured scenes, MG-SLAM ensures robust tracking in textureless indoor areas. Moreover, The extracted lines and planar surface assumption allow strategic interpolation of new Gaussians in regions of missing geometry, enabling efficient scene completion. Extensive experiments conducted on both synthetic and real-world scenes demonstrate that these advancements enable our method to achieve state-of-the-art performance, marking a substantial improvement in the capabilities of Gaussian SLAM systems.	翻訳日:2024-06-26 00:33:22 公開日:2024-06-22
# ワンステップテキスト・ツー・イメージ生成のためのスコアアイデンティティ蒸留における長短誘導 Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation ( http://arxiv.org/abs/2406.01561v2 ) ライセンス: Link先を確認	Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang,	(参考訳) 広範テキストイメージペアで訓練された拡散ベースのテキスト画像生成モデルは、テキスト記述と整合したフォトリアリスティック画像を生成する能力を示している。しかし、これらのモデルの顕著な制限は、その遅いサンプル生成であり、同じネットワークを通して反復的な改善を必要とする。本稿では,Score ID Distillation (SiD) を強化し,Long and Short Classifier-free Guide (LSG) を開発した。 SiD はモデルに基づく明示的なスコアマッチング損失を最適化することを目的としており、実際の計算のために提案したLSG と並行してスコア同一性に基づく近似を用いている。一段生成器で合成された偽画像のみをトレーニングすることにより、LSGを備えたSiDは、FIDとCLIPのスコアを急速に改善し、競争力のあるCLIPスコアを維持しながら最先端のFIDのパフォーマンスを達成する。具体的には、そのデータフリー蒸留である安定拡散1.5は、COCO-2014検証セットで8.15の低いFID、LSGスケールで0.304のCLIPスコア、LSGスケールで0.313のCLIPスコアで9.56のFIDを達成している。我々のSiD-LSGコードと蒸留したワンステップのテキスト・ツー・イメージ・ジェネレータはhttps://github.com/mingyuanzhou/SiD-LSGで入手できる。 Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and a FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. Our SiD-LSG code and distilled one-step text-to-image generators are available at https://github.com/mingyuanzhou/SiD-LSG	翻訳日:2024-06-26 00:33:22 公開日:2024-06-22
# 生成モデルにおける世界モデル含意の評価 Evaluating the World Model Implicit in a Generative Model ( http://arxiv.org/abs/2406.03689v2 ) ライセンス: Link先を確認	Keyon Vafa, Justin Y. Chen, Jon Kleinberg, Sendhil Mullainathan, Ashesh Rambachan,	(参考訳) 最近の研究は、大きな言語モデルが暗黙的に世界モデルを学ぶことを示唆している。この可能性をどのように評価するか。この問題は、基礎となる現実が決定論的有限オートマトンによって支配されている場合に公式化する。これには、単純な論理的推論、地理的ナビゲーション、ゲームプレイング、化学といった問題が含まれる。我々は,古典的なマイヒル・ネローデ定理に触発された世界モデル回復のための新しい評価指標を提案する。ゲームプレイ,ロジックパズル,ナビゲーションの3つの領域でそれらの実用性を解説する。すべての領域において、我々が検討する生成モデルは、世界モデルを評価するための既存の診断に優れているが、我々の評価指標は、世界モデルが現れるよりもはるかに一貫性が低いことを示している。生成モデルを使って、関連するが微妙に異なるタスクを解くと、それがひどく失敗する。モデルの基礎となるロジックを有意義に捉えた生成モデルを構築することは、非常に価値があるでしょう。 Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead it to fail badly. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal.	翻訳日:2024-06-26 00:23:38 公開日:2024-06-22
# 効率性を超えて: 持続可能なAIのスケーリング Beyond Efficiency: Scaling AI Sustainably ( http://arxiv.org/abs/2406.05303v2 ) ライセンス: Link先を確認	Carole-Jean Wu, Bilge Acun, Ramya Raghavendra, Kim Hazelwood,	(参考訳) バローゾのエネルギーに比例した倉庫規模のコンピューティングへの献身的な貢献は、現代のデータセンターがこれまで以上にエネルギー効率とコスト効率を高めてきた時代を幕開けた。同時に、現代のAIアプリケーションは、ディープラーニングモデル開発サイクル全体にわたって効率を最適化することの重要性を強調しながら、コンピューティングにおける需要を継続的に増加させてきた。本稿では、トレーニングと推論からの運転中の二酸化炭素排出量と、データセンターの構築とハードウェア製造から排出した炭素排出量の両方を含む、AIのカーボンインパクトを特徴付ける。我々は、ディープラーニングレコメンデーションモデルからマルチモーダル生成AIタスクまで、最先端AI技術における主要な効率最適化機会を強調します。 AIを継続的にスケールアップするには、ハードウェア製造からデータセンタ運用、ハードウェアの終末処理に至るまで、コンピュータインフラストラクチャのライフサイクル全体にわたって、効率性を超えて最適化しなければなりません。 Barroso's seminal contributions in energy-proportional warehouse-scale computing launched an era where modern datacenters have become more energy efficient and cost effective than ever before. At the same time, modern AI applications have driven ever-increasing demands in computing, highlighting the importance of optimizing efficiency across the entire deep learning model development cycle. This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from datacenter construction and hardware manufacturing. We highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operations and end-of-life processing for the hardware.	翻訳日:2024-06-26 00:23:38 公開日:2024-06-22
# Ctrl-V:バウンディングボックス制御オブジェクトモーションによる高忠実度映像生成 Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion ( http://arxiv.org/abs/2406.05630v2 ) ライセンス: Link先を確認	Ge Ya Luo, Zhi Hao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal,	(参考訳) 近年の映像予測の進歩により、制御可能な映像生成が注目されている。単純でフレキシブルな条件付けによる高忠実度ビデオの生成は特に興味深い。そこで本研究では,2次元または3次元境界ボックスの画素レベルのレンダリングを条件付けとして,制御可能な映像生成モデルを提案する。さらに,初期フレームと終端フレームのバウンディングボックスを考慮すれば,フレーム毎に最大15個のバウンディングボックスを25フレームクリップで予測できるバウンディングボックス予測器も作成した。私たちは、KITTI、Virtual-KITTI 2、BDD100kという3つの有名なAVビデオデータセットで実験を行います。 With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor that, given the initial and ending frames' bounding boxes, can predict up to 15 bounding boxes per frame for all the frames in a 25-frame clip. We perform experiments across 3 well-known AV video datasets: KITTI, Virtual-KITTI 2 and BDD100k.	翻訳日:2024-06-26 00:23:38 公開日:2024-06-22
# fNIRSによる画像の復号化に向けて Progress Towards Decoding Visual Imagery via fNIRS ( http://arxiv.org/abs/2406.07662v3 ) ライセンス: Link先を確認	Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu,	(参考訳) 我々は,fNIRS脳活動からのイメージ再構成の可能性を示し,必要な仕様に適合するプロトタイプの構築に着手する。縮小されたfMRIデータを用いて画像再構成モデルを訓練することにより,cmスケールの空間分解能は画像生成に十分であることがわかった。その結果, フル解像度fMRIでは93%, 2cmでは20%の精度で検索精度は71%であった。シミュレーションと高密度トモグラフィにより,時間領域fNIRSは連続波fNIRSの2cm分解能と比較して1cm分解能が得られることがわかった。最後に,レーザードライバ,光子検出器,デジタルコンバータシステムからなるプロトタイプの時間領域fNIRSデバイスの設計を共有する。 We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.	翻訳日:2024-06-26 00:13:51 公開日:2024-06-22
# 高等教育におけるGenAIを用いたボットプープの活用--学習に影響を及ぼす検索型世代チャットボットの検討 Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning ( http://arxiv.org/abs/2406.07796v2 ) ライセンス: Link先を確認	Maung Thway, Jose Recatala-Gomez, Fun Siong Lim, Kedar Hippalgaonkar, Leonard W. T. Ng,	(参考訳) ジェネレーティブ・人工知能(GenAI)と大規模言語モデル(LLM)は同時に、人間の学習を強化する新たな道を開いた。この研究は、ボットプープを減らしながら教育を強化するために設計された、カスタムビルドされたSinglish- speak Retrieval Augmented Generation (RAG)チャットボットであるLeodar教授を紹介した。シンガポールの南陽工科大学で展開されたLeodar教授は、AI支援学習の未来を垣間見るとともに、パーソナライズされたガイダンス、24/7の可用性、コンテキストに関連する情報を提供している。混合手法を用いて,レオダール教授が学習,エンゲージメント,試験準備に及ぼす影響について検討し,97.1%の参加者が肯定的な経験を報告した。これらの発見は、教育におけるAIの役割を定義し、カスタムなGenAIチャットボットの可能性を強調するのに役立つ。チャットボットの開発、クラス内展開、成果調査の組み合わせは、GenAI教育ツールのベンチマークを提供し、AIと人間の学習の相互作用を再定義するための一歩です。 Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a stepping stone for redefining the interplay between AI and human learning.	翻訳日:2024-06-26 00:13:51 公開日:2024-06-22
# 蛍光分光法による物理化学プロセスの深層学習領域適応:オリーブオイルの老化への応用 Deep Learning Domain Adaptation to Understand Physico-Chemical Processes from Fluorescence Spectroscopy Small Datasets: Application to Ageing of Olive Oil ( http://arxiv.org/abs/2406.10031v2 ) ライセンス: Link先を確認	Umberto Michelucci, Francesca Venturini,	(参考訳) 蛍光分光法は生命科学や化学の基本的な道具であり、環境モニタリング、食品品質管理、生物医学診断などの応用に広く用いられている。しかし、深層学習による分光データの解析、特に蛍光励起放出行列 (EEMs) は、典型的には小さくスパースなデータセットが利用可能であるため、大きな課題を呈している。さらに, スペクトル特性の重なりが強いため, 脳波の解析は困難である。本研究では、これらの課題に対処する新しい解釈可能性アルゴリズムとともに、事前学習された視覚モデルによるドメイン適応を利用する新しいアプローチを提案する。この研究で説明されているニューラルネットワークの機能エンジニアリングのおかげで、データの基礎となる物理化学的プロセスについて、より深い洞察を得られるようになりました。提案手法は, 熟成中のヴァージンオリーブ油 (EVOO) の酸化過程を解析し, 品質指標の予測とスペクトル帯の同定に有効であることを示す。この研究は、深層学習を用いた分光学の極めて革新的なアプローチを記述し、それをブラックボックスから複雑な生物学的および化学的プロセスを理解するためのツールに変換する。 Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets available. Furthermore, the analysis of EEMs is difficult due to their high dimensionality and overlapping spectral features. This study proposes a new approach that exploits domain adaptation with pretrained vision models, alongside a novel interpretability algorithm to address these challenges. Thanks to specialised feature engineering of the neural networks described in this work, we are now able to provide deeper insights into the physico-chemical processes underlying the data. The proposed approach is demonstrated through the analysis of the oxidation process in extra virgin olive oil (EVOO) during ageing, showing its effectiveness in predicting quality indicators and identifying the spectral bands, and thus the molecules involved in the process. This work describes a significantly innovative approach in the use of deep learning for spectroscopy, transforming it from a black box into a tool for understanding complex biological and chemical processes.	翻訳日:2024-06-26 00:13:51 公開日:2024-06-22
# マルチモーダルクエリによるビデオ内のイベントのローカライズ Localizing Events in Videos with Multimodal Queries ( http://arxiv.org/abs/2406.10079v2 ) ライセンス: Link先を確認	Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, Yansong Tang, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu,	(参考訳) ビデオ理解はデジタル時代において重要な課題であるが、ビデオのダイナミックで多面的な性質は、労働集約的で、処理を計算的に要求する。このように、セマンティッククエリが与えられた特定のイベントのローカライズは、ビデオ検索のようなユーザ指向アプリケーションと、ビデオ基盤モデルに関する学術研究の両方において重要である。現在の研究における重要な制限は、セマンティッククエリが典型的には、対象イベントのセマンティックスを記述する自然言語にあることである。この設定は、画像とテキストからなるマルチモーダルなセマンティッククエリの可能性を見落としている。このギャップに対処するため、マルチモーダルクエリによるビデオ内のイベントのローカライズのための新しいベンチマークICQと、新しい評価データセットICQ-Highlightを導入する。我々の新しいベンチマークは、参照画像からなるマルチモーダルなセマンティッククエリと、画像のセマンティクスを調整するための洗練されたテキストを与えられたイベントを、モデルがいかにうまくローカライズできるかを評価することを目的としている。モデル性能を体系的にベンチマークするために、参照画像の4つのスタイルと5つのタイプの改善テキストを含む。我々は,既存のモデルを新しい設定に適合させる3つの適応法を提案し,特殊モデルから大規模基礎モデルまで10のSOTAモデルを評価した。このベンチマークは、ビデオイベントのローカライゼーションにおいて、マルチモーダルクエリを調査するための最初のステップであると考えています。 Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images' semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.	翻訳日:2024-06-26 00:04:06 公開日:2024-06-22
# 転送可能性モデリングによるグラフ上のマルチソース非教師付きドメイン適応 Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling ( http://arxiv.org/abs/2406.10425v2 ) ライセンス: Link先を確認	Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, Suhang Wang,	(参考訳) 本稿では、ノード分類のために、アノテーション付きソースドメインで訓練されたモデルを教師なしターゲットグラフに転送する必要があるグラフに対するtextit{multi-source unsupervised domain adaptation (MSUDA) という新しい問題に取り組む。ドメイン間の分散の相違により、重要な課題は、どのように優れたソースインスタンスを選択し、モデルを適応させるかである。様々なグラフ構造がこの問題をさらに複雑にし、以前の MSUDA のアプローチはより効果的でない。本稿では、グラフモデリングに基づくドメインセレクタ、サブグラフノードセレクタ、および適応のための双方向アライメント目的を備えたSelective Multi-source Adaptation for Graph ({\method})を提案する。具体的には、情報ソースデータの識別を容易にするため、グラフ間の類似性は、グラフモデリングタスクセットの転送可能性によって切り離され、測定され、ソースドメイン選択の証拠として使用される。ノードセレクタは、同じソースドメイン内のノードの転送可能性の変化をキャプチャするために、さらに組み込まれている。適応のための不変な特徴を学習するために、最適な輸送距離を最小化し、ラベル関数を蒸留することで分類レベルを最小化し、選択したソースデータにターゲット領域を合わせる。モジュールは、情報ソースデータを選択し、メタ学習戦略で仮想トレーニングスプリットのアライメントを実行するように明示的に学習される。 5つのグラフデータセットに対する実験結果から,提案手法の有効性が示された。 In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method.	翻訳日:2024-06-26 00:04:06 公開日:2024-06-22
# ドットの接続:New York Times Connections Word Gameを用いたLLMの抽象推論能力の評価 Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game ( http://arxiv.org/abs/2406.11012v3 ) ライセンス: Link先を確認	Prisha Samadarshi, Mariam Mustafa, Anushka Kulkarni, Raven Rothkopf, Tuhin Chakrabarty, Smaranda Muresan,	(参考訳) New York Times Connectionsゲームは、ワードパズル愛好家のための人気で挑戦的な追跡ゲームとして登場した。我々は200のConnectionsゲームを収集し、最先端の大規模言語モデル(LLM)の性能を専門家や初心者の人間プレイヤーに対して評価する。以上の結果から,多種多様なベンチマークで顕著な推論能力を示した最高のLPMであるGPT-4oでも,ゲーム全体の8%しか解けないことがわかった。 GPT-4oと比較すると、初心者や専門家のプレイヤーはGPT-4oより優れており、専門家のプレイヤーはGPT-4oよりも優れていた。我々の理解を深めるために、私たちはコネクティクスゲームにおける単語の分類に成功するために必要な知識タイプの分類を作成し、LLMが連想的、百科事典的、言語的知識に苦しむことを明らかにした。我々の発見は、New York Times Connectionsゲームが、人間とAIシステムの抽象的推論能力を評価するための挑戦的なベンチマークとして確立されている。 The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games. Compared to GPT-4o, novice and expert players perform better, with expert human players significantly outperforming GPT-4o. To deepen our understanding we create a taxonomy of the knowledge types required to successfully categorize words in the Connections game, revealing that LLMs struggle with associative, encyclopedic, and linguistic knowledge. Our findings establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems.	翻訳日:2024-06-26 00:04:06 公開日:2024-06-22
# シンプルだが効率的なFG-SBIR : 統一されたサンプル特徴アライメントによる自己監督型FG-SBIRの実現 Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment ( http://arxiv.org/abs/2406.11551v2 ) ライセンス: Link先を確認	Jianan Jiang, Di Wu, Zhilin Jiang, Weiren Yu,	(参考訳) Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) は、スケッチと埋め込み空間における対応する画像の距離を最小化することを目的としている。しかし、スケーラビリティはソリューションの複雑さの増大によって妨げられ、主にきめ細かいスケッチの抽象的な性質が原因である。本稿では,2つのモード間のギャップを狭めるための,シンプルで効率的な手法を提案する。主に、モダリティ間の単一特徴アライメント問題として扱うのではなく、サンプル内の情報とサンプル間の情報を共有する統一的な情報共有を促進する。特に、我々のアプローチには以下のものがある。一二重重み共有ネットワークを用いてスケッチと画像領域内のアライメントを最適化し、モデル学習飽和問題を効果的に軽減する。 (2)コントラスト損失に基づく目的最適化関数の導入により,モデルがサンプル内およびサンプル間の特徴を整列する能力を高める。三トークン間の特徴表現を促進するために自己注意と相互注意を組み合わせた学習可能なTRSMを提示し、さらに埋め込み空間におけるサンプルアライメントを強化する。このフレームワークは,CNNおよびViTベースのバックボーンにおいて優れた結果が得られる。大規模な実験は、既存の方法よりも優れていることを示す。また、最初のプロのファッションスケッチとイメージデータセットであるCloss-V1を導入し、私たちのメソッドを検証するために利用し、他のアプリケーションに役立ちます。 Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitates unified mutual information sharing both intra- and inter-samples, rather than treating them as a single feature alignment problem between modalities. Specifically, our approach includes: (i) Employing dual weight-sharing networks to optimize alignment within sketch and image domain, which also effectively mitigates model learning saturation issues. (ii) Introducing an objective optimization function based on contrastive loss to enhance the model's ability to align features intra- and inter-samples. (iii) Presenting a learnable TRSM combined of self-attention and cross-attention to promote feature representations among tokens, further enhancing sample alignment in the embedding space. Our framework achieves excellent results on CNN- and ViT-based backbones. Extensive experiments demonstrate its superiority over existing methods. We also introduce Cloths-V1, the first professional fashion sketches and images dataset, utilized to validate our method and will be beneficial for other applications.	翻訳日:2024-06-26 00:04:06 公開日:2024-06-22
# RetinaGS: 数十億ドル規模の3Dガウシアンによる高精細なシーンレンダリングのためのスケーラブルなトレーニング RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians ( http://arxiv.org/abs/2406.11836v2 ) ライセンス: Link先を確認	Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong,	(参考訳) 本研究では,大規模・高解像度データセット上での高パラメータ3次元ガウススプラッティング(3DGS)モデルのトレーニングの可能性について検討する。我々は、適切なレンダリング方程式を用いてガウス原始体の任意のシーンや任意の分布に適用可能な3DGSの一般モデル並列トレーニング手法であるRetinaGSを設計する。これにより、3DGSのスケーリングの振る舞いをプリミティブな数値とトレーニングの解像度で調べることができる。我々は,原始的な数を増やす際に,視覚的品質を増大させる明確な正の傾向を観察する。また、完全なMatrixCityデータセット上に10億以上のプリミティブを持つ3DGSモデルをトレーニングし、有望な視覚的品質を達成するための最初の試みを実演する。 In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.	翻訳日:2024-06-25 23:54:21 公開日:2024-06-22
# 細胞内における分子表現の学習 Learning Molecular Representation in a Cell ( http://arxiv.org/abs/2406.12056v2 ) ライセンス: Link先を確認	Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh,	(参考訳) 薬物の有効性と安全性をin vivoで予測するには、小さな分子摂動に対する生物学的反応(細胞形態、遺伝子発現など)に関する情報が必要である。しかしながら、現在の分子表現学習法は、これらの摂動下での細胞状態の包括的なビューを提供しておらず、ノイズを取り除くのに苦労し、モデル一般化を妨げている。本稿では,細胞内情報ボトルネック法を用いて分子表現を学習するための情報アライメント(InfoAlign)手法を提案する。我々は、分子と細胞応答データをノードとしてコンテキストグラフに統合し、化学、生物学的、計算基準に基づいて重み付けされたエッジと接続する。トレーニングバッチの各分子に対して、InfoAlignはエンコーダの潜在表現を最小限の目的で最適化し、冗長な構造情報を破棄する。十分性目的(sufficiency objective)は、コンテキストグラフ内の分子の近傍から異なる特徴空間と整合するように表現をデコードする。提案手法は,既存のエンコーダをベースとしたコントラスト法よりも,アライメントの効率向上を目標としている。経験的に、我々はInfoAlignの表現を2つの下流タスクで検証した: 4つのデータセットにまたがる19のベースライン法に対する分子特性予測とゼロショット分子形態整合である。 Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.	翻訳日:2024-06-25 23:54:21 公開日:2024-06-22
# DASSF: 空中物体検出のためのダイナミックアテンションスケールシーケンスフュージョン DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection ( http://arxiv.org/abs/2406.12285v2 ) ライセンス: Link先を確認	Haodong Li, Haicheng Qu,	(参考訳) 空中画像における小さな物体の検出は、コンピュータビジョンの分野における基本的な課題である。空中撮影における移動物体には、形状や大きさ、重なり合い、背景による隠蔽、物体のぼやけなどの問題があるが、元のYOLOアルゴリズムは、異なるスケールの目標を知覚する能力の弱いため、全体的な検出精度が低い。そこで本研究では,小型目標とファジィ目標の重なり合う検出精度を向上させるために,空中画像における小型目標検出のためのダイナミックアテンションスケール系列融合アルゴリズム(DASSF)を提案する。まず、アップサンプリング機構を改善し、計算負荷を低減する動的スケールシーケンス機能融合(DSSFF)モジュールを提案する。第2に、小目標の検出能力を高めるために、特別にx小物体検出ヘッドを付加する。最後に、異なるタイプやサイズのターゲットの表現能力を改善するために、動的ヘッド(DyHead)を使用します。提案するモデルでは,航空画像における目標検出の小型化が問題視され,YOLOアルゴリズムの多種多様なバージョンに適用可能である。実験の結果, YOLOv8nと比較すると, 平均平均精度 (mAP) は9.2%, DIORは2.4%向上し, 現在の主流手法よりも優れていた。 The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to improve the detection accuracy of densely overlapping small targets and fuzzy targets, this paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images. First, we propose a dynamic scale sequence feature fusion (DSSFF) module that improves the up-sampling mechanism and reduces computational load. Secondly, a x-small object detection head is specially added to enhance the detection capability of small targets. Finally, in order to improve the expressive ability of targets of different types and sizes, we use the dynamic head (DyHead). The model we proposed solves the problem of small target detection in aerial images and can be applied to multiple different versions of the YOLO algorithm, which is universal. Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, on the VisDrone-2019 and DIOR datasets, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP), respectively, and outperforms the current mainstream methods.	翻訳日:2024-06-25 23:54:21 公開日:2024-06-22
# BIOSCAN-5M:昆虫の生物多様性のためのマルチモーダルデータセット BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity ( http://arxiv.org/abs/2406.12723v2 ) ライセンス: Link先を確認	Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo A. Millan, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang,	(参考訳) 本稿では,昆虫の生物多様性を理解・監視するための国際的な取り組みの一環として,BIOSCAN-5M Insectデータセットを機械学習コミュニティに提示し,いくつかのベンチマークタスクを確立する。 BIOSCAN-5Mは500万以上の昆虫標本のマルチモーダル情報を含む包括的データセットであり、分類学的ラベル、生ヌクレオチドバーコード配列、割り当てられたバーコードインデックス番号、地理的情報を含む既存の画像ベースの生物学的データセットを著しく拡張する。マルチモーダルデータ型が分類とクラスタリングの精度に与える影響を示すための3つのベンチマーク実験を提案する。まず,<mbox{BIOSCAN-5M} データセットのDNAバーコード配列にマスク付き言語モデルを事前学習し,この大規模な参照ライブラリが種と種レベルの分類性能に与える影響を実証する。次に、自己教師付き学習から得られたクラスタ特徴埋め込みに画像やDNAバーコードに適用したゼロショット転送学習タスクを提案し、これらの表現埋め込みから有意義なクラスタを抽出できるかどうかを検討する。第3に、DNAバーコード、画像データ、分類情報に対してコントラスト学習を行うことにより、マルチモダリティをベンチマークする。これにより、複数の種類の情報とモダリティを用いた分類学的分類を可能にする一般的な共有埋め込み空間が得られる。 BIOSCAN-5M Insectデータセットのコードリポジトリは {\url{https://github.com/zahrag/BIOSCAN-5M}}で公開されている。 As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, and geographical information. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy. First, we pretrain a masked language model on the DNA barcode sequences of the \mbox{BIOSCAN-5M} dataset, and demonstrate the impact of using this large reference library on species- and genus-level classification performance. Second, we propose a zero-shot transfer learning task applied to images and DNA barcodes to cluster feature embeddings obtained from self-supervised learning, to investigate whether meaningful clusters can be derived from these representation embeddings. Third, we benchmark multi-modality by performing contrastive learning on DNA barcodes, image data, and taxonomic information. This yields a general shared embedding space enabling taxonomic classification using multiple types of information and modalities. The code repository of the BIOSCAN-5M Insect dataset is available at {\url{https://github.com/zahrag/BIOSCAN-5M}}	翻訳日:2024-06-25 23:54:21 公開日:2024-06-22
# ユニバーサルリモートセンシング変更検出のための単一時間教師付き学習 Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection ( http://arxiv.org/abs/2406.15694v1 ) ライセンス: Link先を確認	Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang,	(参考訳) 特に高空間分解能(HSR)リモートセンシング画像において、バイテンポラル教師付き学習パラダイムは、多くのラベル付きバイテンポラルイメージペアを用いたリモートセンシング変化検出を常に支配している。しかし、大規模なバイテンポラルHSRリモートセンシング画像対における変化領域のラベル付けは非常に高価で労力がかかる。本稿では,非ペア画像間の変化を監視信号として活用する新しい視点から,リモートセンシングの普遍的変化検出のための単一時間教師付き学習(STAR)を提案する。 STARにより、未ペアラベル付き画像のみを用いて高精度な変化検出装置を訓練し、実世界のバイテンポラル画像ペアに一般化することができる。そこで本研究では,STARの柔軟性とスケーラビリティを実証するため,バイナリ変更検出,オブジェクト変更検出,セマンティック変更検出をひとつのアーキテクチャで処理可能な,シンプルで統一的な変更検出器であるChangeStar2を設計した。 ChangeStar2は、8つのパブリックリモートセンシング変更検出データセットの最先端のパフォーマンスを実現し、2つの教師付き設定、複数の変更タイプ、複数のシナリオをカバーする。コードはhttps://github.com/Z-Zheng/pytorch-change-modelsで入手できる。 Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (STAR) for universal remote sensing change detection from a new perspective of exploiting changes between unpaired images as supervisory signals. STAR enables us to train a high-accuracy change detector only using unpaired labeled images and can generalize to real-world bitemporal image pairs. To demonstrate the flexibility and scalability of STAR, we design a simple yet unified change detector, termed ChangeStar2, capable of addressing binary change detection, object change detection, and semantic change detection in one architecture. ChangeStar2 achieves state-of-the-art performances on eight public remote sensing change detection datasets, covering above two supervised settings, multiple change types, multiple scenarios. The code is available at https://github.com/Z-Zheng/pytorch-change-models.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# SS-Bench: ソーシャルストーリーの生成と評価のためのベンチマーク SS-Bench: A Benchmark for Social Story Generation and Evaluation ( http://arxiv.org/abs/2406.15695v1 ) ライセンス: Link先を確認	Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu,	(参考訳) 自閉症スペクトラム障害(ASD)を持つ子供たちは、しばしば社会的状況を誤解し、日々のルーチンに参加するのに苦労する。心理学の専門家は、これらの体制における能力を高めるために、構造的明瞭さ、記述的指向、状況的安全性の厳格な制約の下で社会ストーリーを書く。しかし、ソーシャルストーリーは作成に費用がかかり、しばしば多様性やタイムラインに制限される。大規模言語モデル(LLMs)がますます強力になるにつれて、より自動化され、手頃な価格で、アクセスしやすい方法で、幅広い範囲でリアルタイムでソーシャルストーリーを生成する必要性が高まっています。ソーシャルストーリーのユニークで厳格な制約を満たすためにLLMを適用することは、難しい問題です。この目的のために,ソーシャルストーリーの生成と評価を行うために,textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}markを提案する。具体的には,社会的ストーリの生成とベンチマーク作成をLLMに階層的に促すための制約駆動型戦略である‘textbf{\textsc{StarSow}} を開発した。また、人間とGPTの評価に使用される「textbf{Quality Assessment Criteria」を導入し、生成したストーリーの有効性を検証する。我々は、この研究が自閉症コミュニティに恩恵を与え、特定のグループに焦点を当てた将来の研究を促進することを願っている。 Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timeliness. As Large Language Models (LLMs) become increasingly powerful, there is a growing need for more automated, affordable, and accessible methods to generate Social Stories in real-time with broad coverage. Adapting LLMs to meet the unique and strict constraints of Social Stories is a challenging issue. To this end, we propose \textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}mark for generating and evaluating Social Stories. Specifically, we develop a constraint-driven strategy named \textbf{\textsc{StarSow}} to hierarchically prompt LLMs to generate Social Stories and build a benchmark, which has been validated through experiments to fine-tune smaller models for generating qualified Social Stories. Additionally, we introduce \textbf{Quality Assessment Criteria}, employed in human and GPT evaluations, to verify the effectiveness of the generated stories. We hope this work benefits the autism community and catalyzes future research focusing on particular groups.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# 医用画像セグメンテーションのための自己教師付きアライメント学習 Self-Supervised Alignment Learning for Medical Image Segmentation ( http://arxiv.org/abs/2406.15699v1 ) ライセンス: Link先を確認	Haofeng Li, Yiming Ouyang, Xiang Wan,	(参考訳) 近年,2次元画像と3次元画像のセグメンテーションモデルの事前学習に,自己教師付き学習(SSL)法が用いられている。これらの手法のほとんどは、再構成、コントラスト学習、一貫性正規化に基づいている。しかし,3次元医用画像からの2次元スライスの空間的対応は十分に活用されていない。本稿では,医療画像セグメンテーションのためのニューラルネットワークを事前学習するための,自己教師付きアライメント学習フレームワークを提案する。提案するフレームワークは,新たな局所的なアライメント損失とグローバルな位置損失から構成される。同じ3Dスキャンでは、2つの近接2Dスライスは通常、同様の解剖学的構造を含む。そこで, 一致した構造の画素レベルの特徴を互いに近接させるために, 局所アライメント損失を提案する。実験結果から,提案したアライメント学習は,限定アノテーションの設定の下で,既存のCTおよびMRIデータセットに対する自己教師付き事前学習手法と競合することが示された。 Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment learning framework to pre-train the neural network for medical image segmentation. The proposed framework consists of a new local alignment loss and a global positional loss. We observe that in the same 3D scan, two close 2D slices usually contain similar anatomic structures. Thus, the local alignment loss is proposed to make the pixel-level features of matched structures close to each other. Experimental results show that the proposed alignment learning is competitive with existing self-supervised pre-training approaches on CT and MRI datasets, under the setting of limited annotations.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# 高効率低雑音光子結晶光子-フォノン変換器 High-Efficiency Low-Noise Optomechanical Crystal Photon-Phonon Transducers ( http://arxiv.org/abs/2406.15701v1 ) ライセンス: Link先を確認	Sameer Sonar, Utku Hatipoglu, Srujan Meesala, David Lake, Hengjiang Ren, Oskar Painter,	(参考訳) オプトメカニカルクリスタル(OMC)は光子とマイクロ波音響フォノンのコヒーレントな相互作用を可能にし、マイクロ波と光信号間の量子トランスダクションを実装するためのプラットフォームである。低温における光吸収誘起熱雑音は、OMCベースの量子トランスデューサの性能の第一の限界の一つである。ここでは,2次元シリコンOCC共振器を機械的に分離した光導波路に横結合し,従来の状態と比較して音響共振器の加熱速度を6倍に低減し,高光学バックアクションとミリケルビンベース温度の条件下で動作させることで,この問題に対処する。この還元加熱により、フォノン-光子の変換効率が93.1ドル\pm$ 0.8%のノイズが0.25ドル\pm$ 0.01量子化され、量子制限されたマイクロ波-光周波数変換と光制御された量子音響メモリへの大きな進歩を示す。 Optomechanical crystals (OMCs) enable coherent interactions between optical photons and microwave acoustic phonons, and represent a platform for implementing quantum transduction between microwave and optical signals. Optical absorption-induced thermal noise at cryogenic (millikelvin) temperatures is one of the primary limitations of performance for OMC-based quantum transducers. Here, we address this challenge with a two-dimensional silicon OMC resonator that is side-coupled to a mechanically detached optical waveguide, realizing a six-fold reduction in the heating rate of the acoustic resonator compared to prior state-of-the-art, while operating in a regime of high optomechanical-backaction and millikelvin base temperature. This reduced heating translates into a demonstrated phonon-to-photon conversion efficiency of 93.1 $\pm$ 0.8% at an added noise of 0.25 $\pm$ 0.01 quanta, representing a significant advance toward quantum-limited microwave-optical frequency conversion and optically-controlled quantum acoustic memories.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# video-SALMONN:音声強調型音声視覚大言語モデル video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models ( http://arxiv.org/abs/2406.15704v1 ) ライセンス: Link先を確認	Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang,	(参考訳) 音声-視覚的大言語モデル(av-LLM)を用いたより汎用的なビデオ理解の要素としての音声理解は、重要で未検討の側面である。本稿では,ビデオ処理のための一対一のav-LLMであるVideo-SALMONNを提案する。本稿では,他のビデオ要素に対して効率を保ちながら,音声理解に必要な微細な時間情報を得るために,事前学習した音声視覚エンコーダとバックボーン大言語モデルとを接続する,新しいマルチレゾリューション因果Q-Former(MRC Q-Former)構造を提案する。さらに、フレームやモダリティの優位性を回避するため、多様性の喪失や、聴覚と視覚の混在を考慮しない訓練手法を含む専用トレーニング手法を提案する。 The introduced speech-audio-visual evaluation benchmark, video-SALMONN has achieved more more more more exact accuracy on the video-QA task and more 30\% out above audio-visual QA task with human speech。さらに、ビデオSALMONNは、他のav-LLMでは前例のないタスクに対して、驚くべきビデオ理解と推論能力を示す。トレーニングコードとモデルチェックポイントは、 \texttt{\url{https://github.com/bytedance/SALMONN/}}で利用可能です。 Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model. Moreover, dedicated training approaches including the diversity loss and the unpaired audio-visual mixed training scheme are proposed to avoid frames or modality dominance. On the introduced speech-audio-visual evaluation benchmark, video-SALMONN achieves more than 25\% absolute accuracy improvements on the video-QA task and over 30\% absolute accuracy improvements on audio-visual QA tasks with human speech. In addition, video-SALMONN demonstrates remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other av-LLMs. Our training code and model checkpoints are available at \texttt{\url{https://github.com/bytedance/SALMONN/}}.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# psPRF:汎用3次元再構成衛星画像のための平面ニューラル放射場 psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery ( http://arxiv.org/abs/2406.15707v1 ) ライセンス: Link先を確認	Tongtong Zhang, Yuanxiang Li,	(参考訳) 現在の衛星用NeRF変種のほとんどは特定のシーンのために設計されており、新しい幾何学への一般化には至っていない。さらに、RGBイメージは独立した前処理ステップとしてパンシャーピングを必要とする。本稿では,低分解能RGB(LR-RGB)と高分解能パノクロマティック(HR-PAN)の画像にRational Polynomial Cameras(RPC)を併用した平面ニューラルラジアンスフィールドであるpsPRFを紹介する。 Unet型アーキテクチャでは, LR-RGBとHR-PANの両画像から先行するクロスモーダルをキャプチャするために, 露骨なスペクトル対空間畳み込み(SSConv)でエンコーダを適応させ, マルチモーダル表現能力を向上する。シーン間におけるpsRPFの一般化能力を支援するため、プロジェクションロスを採用し、強力な幾何学的自己監督を実現する。提案手法は,マルチシーンのWorldView-3 LR-RGBとHR-PANのペアを用いて評価し,最先端性能を実現する。 Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors with Rational Polynomial Cameras (RPC). To capture the cross-modal prior from both of the LR-RGB and HR-PAN images, for the Unet-shaped architecture, we adapt the encoder with explicit spectral-to-spatial convolution (SSConv) to enhance the multimodal representation ability. To support the generalization ability of psRPF across scenes, we adopt projection loss to ensure strong geometry self-supervision. The proposed method is evaluated with the multi-scene WorldView-3 LR-RGB and HR-PAN pairs, and achieves state-of-the-art performance.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# 授業改善か、よりスマートか? : 自動プロンプト最適化の指導と実践について Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization ( http://arxiv.org/abs/2406.15708v1 ) ライセンス: Link先を確認	Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik,	(参考訳) 大規模言語モデルは目覚ましい能力を示してきたが、その性能は効果的なプロンプトエンジニアリングに大きく依存している。自動プロンプト最適化(APO)手法は、これを自動化するために設計されており、命令(命令最適化、IO)を対象とする命令(例選択、ES)に対して広範囲に分類することができる。彼らの共通の目的にもかかわらず、これらは比較的独立して進化しており、IOは最近より研究の注目を集めている。本稿では,このギャップを解消するために,多様な課題に対して,代表的IO技術とES技術(分離と組み合わせの両方)のパフォーマンスを総合的に比較する。実験結果によると, モデル生成した入力出力ペアを, 検証セット上でのプロンプトの評価からインテリジェントに再利用することで, IO法よりも連続的に性能が向上するが, 未検討であることがわかった。また,最近の IO に焦点が当てられているにも拘わらず,ES ストラテジーは,最適化を伴わないシード命令で最先端の IO メソッドをランダムに検索するのと同じように,命令の最適化方法を上回ることができることがわかった。さらに,ESとIOの相乗効果を観察し,各コントリビューションを超越した最適な組み合わせを示す。予備的な手法としての模範選択の学習と命令最適化との最適な組み合わせは、APOの重要な側面であり、高度に有能な命令追従モデルの時代においても、将来の研究においてより考慮すべきである、と結論付けている。 Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models.	翻訳日:2024-06-25 21:04:37 公開日:2024-06-22
# 10以上のDeFi詐欺を体験した:DeFiユーザーによるセキュリティの認識と対策 I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures ( http://arxiv.org/abs/2406.15709v1 ) ライセンス: Link先を確認	Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim,	(参考訳) Decentralized Finance(DeFi)は、まったく新しい投資経験を提供し、CeFi(Centralized Finance)の代わりとしてすぐに登場した。しかし、市場規模とアクティブユーザー数の急激な増加により、DeFiは詐欺やハッキングの有利なターゲットとなり、2023年には195億USドルを失った。残念ながら、DeFiユーザーのセキュリティリスク認識レベルとリスク軽減戦略の妥当性を徹底的に調査する以前の研究はない。半構造化インタビュー研究 (N = 14) とフォローアップ調査 (N = 493) に基づいて,DeFi利用者のセキュリティ意識と一般的に採用されているプラクティス,および過去の詐欺やハッキング(DeFi被害者)による被害に対してどのように対応し,損失を回復しようとするかを検討する。分析の結果,CeFiよりもDeFiが好まれる傾向がみられた。 DeFiはCeFiに比べて深刻な攻撃を受ける傾向にあるが、ユーザーは新たな投資機会を探るためにこうしたリスクを冒すことを喜んでいる。従来のシステムを通じて調査された被害者とは異なり、DeFiの被害者はセキュリティの慣行を見直しずに新しいサービスを見つけ、損失を素早く回復する傾向にある。さまざまなDeFiサービスや機会が豊富にあることで、被害者は新しい金融機会を継続的に探究することができる。実際、私たちの結果は、DeFiユーザーの強い経済的モチベーションがセキュリティ上の懸念よりも優れていることを示唆しています。事故後の行動に関する我々の観察は、DeFiユーザーを将来の侵害から守るためには、業界規制の形でより強力なコントロールが必要であることを示唆している。 Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness levels and the adequacy of their risk mitigation strategies. Based on a semi-structured interview study (N = 14) and a follow-up survey (N = 493), this paper investigates DeFi users' security perceptions and commonly adopted practices, and how those affected by previous scams or hacks (DeFi victims) respond and try to recover their losses. Our analysis shows that users often prefer DeFi over CeFi due to their decentralized nature and strong profitability. Despite being aware that DeFi, compared to CeFi, is prone to more severe attacks, users are willing to take those risks to explore new investment opportunities. Worryingly, most victims do not learn from previous experiences; unlike victims studied through traditional systems, DeFi victims tend to find new services, without revising their security practices, to recover their losses quickly. The abundance of various DeFi services and opportunities allows victims to continuously explore new financial opportunities, and this reality seems to cloud their security priorities. Indeed, our results indicate that DeFi users' strong financial motivations outweigh their security concerns - much like those who are addicted to gambling. Our observations about victims' post-incident behaviors suggest that stronger control in the form of industry regulations would be necessary to protect DeFi users from future breaches.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 超放射光駆動型フォトニック量子エンジン A photonic quantum engine driven by superradiance ( http://arxiv.org/abs/2406.15710v1 ) ライセンス: Link先を確認	Jinuk Kim, Seung-hoon Oh, Daeho Yang, Junki Kim, Moonjoo Lee, Kyungwon An,	(参考訳) ナノおよびマイクロスケールのヒートエンジンの性能は、量子力学現象の助けを借りて改善することができる。近年, 単一貯水池においても, キャノット限界を超えてエンジン性能を向上させるために, 量子コヒーレンスを有する貯水池が提案されている。しかし、これまでのところ、物理的な実現は行われていない。本稿では、原子とフォトニック真空からなる単一熱貯留層を用いた超放射光駆動型フォトニック量子エンジンの実証実験を初めて報告する。量子コヒーレント重ね合わせ状態で調製された貯留層原子は、空洞を通過しながら超放射光を受けた。この結果、エンジンの効率は40倍に向上し、エンジンの効率もほぼ均一に向上した。さらに, 観測されたエンジン出力は, 原子注入速度に対して2次的に増大した。我々の研究は、量子力学的熱伝達やエンジンのパワー向上に利用でき、熱浴に埋め込まれた量子コヒーレンス上で動く光メカニカルデバイスの開発への道を開くことができる。 Performance of nano- and micro-scale heat engines can be improved with a help from quantum mechanical phenomena. Recently, heat reservoirs with quantum coherence have been proposed to enhance engine performance beyond the Carnot limit even with a single reservoir. However, no physical realizations have been achieved so far. Here, we report the first proof-of-principle experimental demonstration of a photonic quantum engine driven by superradiance employing a single heat reservoir composed of atoms and photonic vacuum. Reservoir atoms prepared in a quantum coherent superposition state underwent superradiance while traversing the cavity. This led to about 40-fold increase of the effective engine temperature, resulting in a near-unity engine efficiency. Moreover, the observed engine output power grew quadratically with respect to the atomic injection rate. Our work can be utilized in quantum mechanical heat transfer as well as in boosting engine powers, opening a pathway to development of photomechanical devices that run on quantum coherence embedded in heat baths.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 繰り返し繰り返し核ノルム最小化による効率的な低ランク同定 Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization ( http://arxiv.org/abs/2406.15713v1 ) ライセンス: Link先を確認	Hao Wang, Ye Wang, Xiangyu Yang,	(参考訳) 本稿では、滑らかな函数の和と行列のSchatten-$p$ノルムを最小化する問題を考察する。我々の貢献は、非凸な低ランク化問題を解くために設計された、反復的に再重み付けされた核ノルム法を提案することである。 2つの主要な小説が我々のアプローチを特徴づけている。まず、提案手法はランク識別特性を持ち、有限個の反復で定常点の「正しい」ランクを証明できる。次に,パラメータの平滑化のための適応的更新手法を提案する。この戦略は、「正しい」ランクを検出すると、ゼロ特異値に関連するパラメータを定数として自動的に修正し、残りのパラメータを0に素早く駆動する。この適応的な振る舞いは、アルゴリズムを数回繰り返した後にスムーズな問題を効果的に解決するアルゴリズムに変換し、我々の作業を、低ランク最適化のための既存の反復的に重み付けされた方法とは切り離す。提案アルゴリズムのグローバル収束を証明し、反復のすべての極限点が臨界点であることを保証する。さらに、Kurdyka-{\L}ojasiewicz性質の下で局所収束速度解析を行う。合成データと実データの両方を用いて数値実験を行い、既存の手法よりもアルゴリズムの効率と優越性を実証する。 This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enabling the provable identification of the "correct" rank of the stationary point within a finite number of iterations. Secondly, we introduce an adaptive updating strategy for smoothing parameters. This strategy automatically fixes parameters associated with zero singular values as constants upon detecting the "correct" rank while quickly driving the rest parameters to zero. This adaptive behavior transforms the algorithm into one that effectively solves smooth problems after a few iterations, setting our work apart from existing iteratively reweighted methods for low-rank optimization. We prove the global convergence of the proposed algorithm, guaranteeing that every limit point of the iterates is a critical point. Furthermore, a local convergence rate analysis is provided under the Kurdyka-{\L}ojasiewicz property. We conduct numerical experiments using both synthetic and real data to showcase our algorithm's efficiency and superiority over existing methods.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# Light My Cells におけるPix2Pixフリー顕微鏡画像における蛍光ラベルの予測と適応損失 Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge ( http://arxiv.org/abs/2406.15716v1 ) ライセンス: Link先を確認	Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz,	(参考訳) 蛍光ラベリングは、顕微鏡画像のための細胞構造やその他の細胞内成分を明らかにするための標準的なアプローチである。しかし、この侵襲的な処置は細胞を摂動させるか、あるいは死滅させる可能性があり、その手順自体は非常に時間がかかり複雑である。近年,サイリコラベリングが有望な代替手段として登場し,ラベルなし顕微鏡から蛍光標識画像を直接予測する機械学習モデルが注目されている。本稿では,Light My Cells チャレンジのための深層学習に基づくサイリコラベリング手法を提案する。提案手法は,pix2pix上に構築され,適応損失のある部分ラベル付きデータセットを用いて学習することができる。さらに、異なる入力モダリティを扱うための複数のトレーニング戦略の有効性について検討する。その結果,本手法はシリカラベリングにおいて有望な性能を達成できることが示唆された。私たちのコードはhttps://github.com/MedICL-VU/LightMyCells.comで利用可能です。 Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the fluorescently labeled images from label-free microscopy. In this paper, we propose a deep learning-based in silico labeling method for the Light My Cells challenge. Built upon pix2pix, our proposed method can be trained using the partially labeled datasets with an adaptive loss. Moreover, we explore the effectiveness of several training strategies to handle different input modalities, such as training them together or separately. The results show that our method achieves promising performance for in silico labeling. Our code is available at https://github.com/MedICL-VU/LightMyCells.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# ターンベースゲームを超えて - ダブルプレックスモデルによるリアルタイム会話の実現 Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models ( http://arxiv.org/abs/2406.15718v1 ) ライセンス: Link先を確認	Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu,	(参考訳) 大規模言語モデル(LLM)が日々の生活に浸透するにつれて、人間の会話を反映するリアルタイムインタラクションへの需要が高まっている。 LLMによって駆動される従来のターンベースのチャットシステムは、ユーザが応答を生成している間に、システムが言葉で対話することを防ぐ。これらの制限を克服するため,既存のLCMをtextit{duplex model} に適応させ,出力を生成しながらユーザをリスニングし,ユーザに対して迅速なフィードバックを提供する。 %であった。具体的には、会話のクエリとレスポンスを複数のタイムスライスに分割し、それらを擬似的に処理するために時間分割多重化(TDM)符号化戦略を採用する。さらに,LLMをリアルタイムな会話を処理できるほど高度にするために,クエリやレスポンスの時間スライスを交互に行うような微調整データセットを構築した。実験の結果,会話のクエリと応答は処理のための不完全なスライスに分割されているものの,LLMは標準ベンチマークで元の性能を保ちながら,データセットに微調整を施すことができることがわかった。自動的および人的評価は、ユーザとAIのインタラクションをより自然で人間的なものにし、バニラLLMに比べてユーザ満足度を大幅に向上させることを示している。我々のデュプレックスモデルとデータセットはリリースされます。 As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# より深く学ぶには?ハイパースペクトル画像分類のためのコルモゴロフ・アルノルドネットワークの探索 How to Learn More? Exploring Kolmogorov-Arnold Networks for Hyperspectral Image Classification ( http://arxiv.org/abs/2406.15719v1 ) ライセンス: Link先を確認	Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Bing Lu, Pedram Ghamisi,	(参考訳) 畳み込みニューラルネットワーク(CNN)と視覚変換器(ViT)は、複雑なハイパースペクトル画像(HSI)分類において優れた能力を示している。しかし、これらのモデルは大量のトレーニングデータを必要とし、計算資源である。一方、現代のマルチ層パーセプトロン(MLP)は、優れた分類能力を示している。現代のMLPベースのモデルでは、CNNやViTに比べてトレーニングデータが少ないため、最先端の分類精度が向上する。近年,MLPの代替としてKAN(Kolmogorov-Arnold Networks)が提案されている。スプラインと内部的類似性やMDPと外部的類似性により、KANSAは学習した特徴を顕著な精度で最適化し、新しい特徴を学習することができる。そこで本研究では,複雑なHSIデータ分類におけるkanの有効性を評価する。さらに,KANSAが取得したHSI分類精度を向上させるために,1D,2D,3Dkanを用いたハイブリッドアーキテクチャを開発し,提案する。提案アーキテクチャの有効性を実証するため,新たに作成された3つのHSIベンチマークデータセット,QUH-Pingan,QUH-Tangdaowan,QUH-Qingyunについて広範な実験を行った。結果は、これらのベンチマークデータセットを1D-CNN、2DCNN、3D CNN、VGG-16、ResNet-50、EfficientNet、RNN、ViTなど、他のCNNおよびViTベースのアルゴリズムに比較して、開発されたハイブリッドkanベースのモデルの競争力またはより良い能力を強調した。コードはhttps://github.com/aj1365/HSIConvKAN)で公開されている。 Convolutional Neural Networks (CNNs) and vision transformers (ViTs) have shown excellent capability in complex hyperspectral image (HSI) classification. However, these models require a significant number of training data and are computational resources. On the other hand, modern Multi-Layer Perceptrons (MLPs) have demonstrated great classification capability. These modern MLP-based models require significantly less training data compared to CNNs and ViTs, achieving the state-of-the-art classification accuracy. Recently, Kolmogorov-Arnold Networks (KANs) were proposed as viable alternatives for MLPs. Because of their internal similarity to splines and their external similarity to MLPs, KANs are able to optimize learned features with remarkable accuracy in addition to being able to learn new features. Thus, in this study, we assess the effectiveness of KANs for complex HSI data classification. Moreover, to enhance the HSI classification accuracy obtained by the KANs, we develop and propose a Hybrid architecture utilizing 1D, 2D, and 3D KANs. To demonstrate the effectiveness of the proposed KAN architecture, we conducted extensive experiments on three newly created HSI benchmark datasets: QUH-Pingan, QUH-Tangdaowan, and QUH-Qingyun. The results underscored the competitive or better capability of the developed hybrid KAN-based model across these benchmark datasets over several other CNN- and ViT-based algorithms, including 1D-CNN, 2DCNN, 3D CNN, VGG-16, ResNet-50, EfficientNet, RNN, and ViT. The code are publicly available at (https://github.com/aj1365/HSIConvKAN)	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 大規模言語モデルのファクト記憶のためのスケーリング法則 Scaling Laws for Fact Memorization of Large Language Models ( http://arxiv.org/abs/2406.15720v1 ) ライセンス: Link先を確認	Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu,	(参考訳) 事実的かつ信頼性の高い応答を生成するために,Large Language Models (LLM) には,ファクト知識の記憶が不可欠である。しかし, LLM事実記憶の挙動は未解明のままである。本稿では,LLMの事実知識のスケーリング法則と,異なる種類の事実を記憶するLLMの挙動を解析する。 LLMの事実知識能力は,それぞれモデルサイズとトレーニングエポックスとの線形および負の指数法則関係を持つことがわかった。 Wikidataの事実全体を記憶するためには、100のエポックで1000Bの非埋め込みパラメータを持つLSMをトレーニングする必要がある。一方,LLMは未知の事実知識に基づいて一般化することができ,そのスケーリング法則は一般事前学習と類似している。さらに,LLMの事実記憶の互換性と嗜好について分析する。互換性のために、LLMは冗長な事実を統一的に記憶するのに苦労している。相関事実が同じ方向と構造を持つ場合のみ、LLMはそれらを相互に記憶することができる。このことは、冗長な事実に対するLLM記憶の非効率性を示している。優先的に、LLMはより頻繁で困難な事実を記憶することにより多くの注意を払っており、その後の事実は過去の事実の記憶を上書きし、低頻度の事実の記憶を著しく妨げている。本研究は,LLMのファクト・ナレッジ・ナレッジ・ラーニングの能力と特徴を明らかにし,LLMのファクト・ナレッジ・アジュメンテーションの方向性を示した。 Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# Clapton: 変分量子アルゴリズムにおける誤り除去のためのクリフォード支援問題変換 Clapton: Clifford-Assisted Problem Transformation for Error Mitigation in Variational Quantum Algorithms ( http://arxiv.org/abs/2406.15721v1 ) ライセンス: Link先を確認	Lennart Maximilian Seifert, Siddharth Dangwal, Frederic T. Chong, Gokul Subramanian Ravi,	(参考訳) 変分量子アルゴリズム(VQA)は、量子コンピューティングの短期において量子優位性を示すが、NISQデバイスの現在の能力を超える精度のレベルを要求する。量子デバイスエラーがVQAに与える影響を系統的に緩和するために,変分量子アルゴリズムにおける誤り除去のためのクラプトン:クリフォード支援問題変換を提案する。クラプトンは、与えられたVQA問題、古典的なデバイスノイズのシミュレーション可能なモデル、およびVQAの変動原理に対して古典的に推定された良い量子状態を利用する。これは、モデル化されたデバイスノイズの存在下で、既知の良いVQA状態のエネルギー推定を低くするために、VQA問題のハミルトニアンに変換を適用する。クラプトン仮説は、VQA問題の既知良状態が問題の理想的な基底状態に近く、デバイスノイズモデリングが合理的に正確である限り(どちらも概ね正しい)、クラプトン変換はVQA問題の基底状態に対するデバイスノイズの影響を著しく減少させ、VQA解の精度を増大させる。 Claptonはエンド・ツー・エンドのアプリケーション・ツー・デバイス・フレームワークとして構築され、1.7xから3.7xまでの平均VQA初期化改善を最大13.3xまで達成している。 Variational quantum algorithms (VQAs) show potential for quantum advantage in the near term of quantum computing, but demand a level of accuracy that surpasses the current capabilities of NISQ devices. To systematically mitigate the impact of quantum device error on VQAs, we propose Clapton: Clifford-Assisted Problem Transformation for Error Mitigation in Variational Quantum Algorithms. Clapton leverages classically estimated good quantum states for a given VQA problem, classical simulable models of device noise, and the variational principle for VQAs. It applies transformations on the VQA problem's Hamiltonian to lower the energy estimates of known good VQA states in the presence of the modeled device noise. The Clapton hypothesis is that as long as the known good states of the VQA problem are close to the problem's ideal ground state and the device noise modeling is reasonably accurate (both of which are generally true), then the Clapton transformation substantially decreases the impact of device noise on the ground state of the VQA problem, thereby increasing the accuracy of the VQA solution. Clapton is built as an end-to-end application-to-device framework and achieves mean VQA initialization improvements of 1.7x to 3.7x, and up to a maximum of 13.3x, over the state-of-the-art baseline when evaluated for a variety of scientific applications from physics and chemistry on noise models and real quantum devices.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# バランスの取れた複数アスペクトの発音評価のための音響的特徴の混合 Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment ( http://arxiv.org/abs/2406.15723v1 ) ライセンス: Link先を確認	Heejin Do, Wonjun Lee, Gary Geunbae Lee,	(参考訳) 自動発音評価において、近年の重点は、豊富なフィードバックを提供するために複数の側面を評価することにある。しかし、非母国語学習者の発話に対するマルチアスペクトスコアラベル付きデータを取得することは、しばしばスコアバランスの取れない分布につながる。本稿では,データ不足とスコア・ラベルの不均衡に対処するため,2つの音響的特徴混合手法を提案する。主に発音の良さを音響的特徴として用いて,発音評価に適した混合設計を調整した。さらに,音声認識結果と元の応答音素を比較し,誤発音のヒントを与えることによって,高精度な誤り率特徴を統合する。音響特性を効果的に混合することにより,音声オクタン762データセットの総合的なスコアリング性能が向上し,詳細な解析により未知の歪みを予測する可能性が示された。 In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly interpolating with the in-batch averaged feature, to address data scarcity and score-label imbalances. Primarily using goodness-of-pronunciation as an acoustic feature, we tailor mixup designs to suit pronunciation assessment. Further, we integrate fine-grained error-rate features by comparing speech recognition results with the original answer phonemes, giving direct hints for mispronunciation. Effective mixing of the acoustic features notably enhances overall scoring performances on the speechocean762 dataset, and detailed analysis highlights our potential to predict unseen distortions.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 多重蛍光画像における細胞の特徴抽出のための半教師付き変異オートエンコーダ Semi-supervised variational autoencoder for cell feature extraction in multiplexed immunofluorescence images ( http://arxiv.org/abs/2406.15727v1 ) ライセンス: Link先を確認	Piumi Sandarenu, Julia Chen, Iveta Slapetova, Lois Browne, Peter H. Graham, Alexander Swarbrick, Ewan K. A. Millar, Yang Song, Erik Meijering,	(参考訳) デジタルイメージング技術の進歩は、細胞レベルでの腫瘍ミクロ環境と特定の免疫フェノタイプ間の相互作用を可視化し識別するために、多重免疫蛍光(mIF)画像を使うことへの関心を高めている。現在最先端の多重蛍光画像解析パイプラインは、単純な統計的および機械学習ベースのツールを用いて生成された形態的および染色強度に基づくメトリクスによって特徴づけられる細胞の特徴表現に依存している。しかし、これらの方法は細胞の複雑な表現を生成できない。我々は,mIF画像中のセルの特徴を抽出するために,潜伏部分空間を用いた教師付き変分オートエンコーダを用いた深層学習に基づくセル特徴抽出モデルを提案する。乳がん患者の1,093個の組織マイクロアレイコアから抽出した44,000個以上の多重多重蛍光細胞像のコホートを用いて細胞表現型分類を行い,本モデルの有効性を実証した。 Advancements in digital imaging technologies have sparked increased interest in using multiplexed immunofluorescence (mIF) images to visualise and identify the interactions between specific immunophenotypes with the tumour microenvironment at the cellular level. Current state-of-the-art multiplexed immunofluorescence image analysis pipelines depend on cell feature representations characterised by morphological and stain intensity-based metrics generated using simple statistical and machine learning-based tools. However, these methods are not capable of generating complex representations of cells. We propose a deep learning-based cell feature extraction model using a variational autoencoder with supervision using a latent subspace to extract cell features in mIF images. We perform cell phenotype classification using a cohort of more than 44,000 multiplexed immunofluorescence cell image patches extracted across 1,093 tissue microarray cores of breast cancer patients, to demonstrate the success of our model against current and alternative methods.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 安全な集合を破る:フェデレーションラーニングにおける集合的グラディエントからのラベル漏洩 Breaking Secure Aggregation: Label Leakage from Aggregated Gradients in Federated Learning ( http://arxiv.org/abs/2406.15731v1 ) ライセンス: Link先を確認	Zhibo Wang, Zhiwei Chang, Jiahui Hu, Xiaoyi Pang, Jiacheng Du, Yongle Chen, Kui Ren,	(参考訳) Federated Learning (FL)は、勾配反転攻撃(GIA)の下でプライバシー上の脆弱性を示し、個々の勾配から個人情報を抽出する。プライバシーを強化するため、FLはセキュアアグリゲーション(SA)を導入し、サーバが個別の勾配を得るのを防ぐ。本稿では,SAをバイパスし,個々のクライアントのプライベートラベルを復元するために,ステルスなラベル推論攻撃を提案する。具体的には、SAの実装後にのみ得られる集約勾配からラベル推論を理論的に解析する。その結果、最終完全連結層(FCL)の入力(埋め込み)と出力(論理)が勾配分解とラベル復元に寄与していることが判明した。 FCLの埋め込みとロジットをプリセットするために、元のモデルで単一のバッチ正規化(BN)層のパラメータを単に修正して漁モデルを構築する。クライアント固有の漁獲モデルを提供することで、サーバは、期待される埋め込みを伴う線形システムと集約された勾配を係数として解決することにより、FCLのバイアスに関する個々の勾配を導出することができる。すると、各クライアントのラベルは、予め設定されたロジットとFCLのバイアスの勾配に基づいて正確に計算できる。大規模な実験により,様々なデータセットやモデルアーキテクチャ上で,100倍の精度で大規模ラベル回復を実現することができた。 Federated Learning (FL) exhibits privacy vulnerabilities under gradient inversion attacks (GIAs), which can extract private information from individual gradients. To enhance privacy, FL incorporates Secure Aggregation (SA) to prevent the server from obtaining individual gradients, thus effectively resisting GIAs. In this paper, we propose a stealthy label inference attack to bypass SA and recover individual clients' private labels. Specifically, we conduct a theoretical analysis of label inference from the aggregated gradients that are exclusively obtained after implementing SA. The analysis results reveal that the inputs (embeddings) and outputs (logits) of the final fully connected layer (FCL) contribute to gradient disaggregation and label restoration. To preset the embeddings and logits of FCL, we craft a fishing model by solely modifying the parameters of a single batch normalization (BN) layer in the original model. Distributing client-specific fishing models, the server can derive the individual gradients regarding the bias of FCL by resolving a linear system with expected embeddings and the aggregated gradients as coefficients. Then the labels of each client can be precisely computed based on preset logits and gradients of FCL's bias. Extensive experiments show that our attack achieves large-scale label recovery with 100\% accuracy on various datasets and model architectures.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 消費電力を最適化するためのAI駆動アプローチ:総合的な調査 AI-Driven Approaches for Optimizing Power Consumption: A Comprehensive Survey ( http://arxiv.org/abs/2406.15732v1 ) ライセンス: Link先を確認	Parag Biswas, Abdur Rashid, Angona Biswas, Md Abdullah Al Nasim, Kishor Datta Gupta, Roy George,	(参考訳) 電力最適化が重要である主な理由は、環境効果の低減、運転コストの低減、そして、現在の世代と将来の世代に対する安定的で持続可能なエネルギー供給である。電力最適化は、エネルギーをより効果的に利用し、廃棄物を削減し、資源の利用を最適化する。今日の世界では、電力最適化と人工知能(AI)の統合は、エネルギーの生成、使用、分散の方法を変えるために不可欠である。 AI駆動のアルゴリズムと予測分析により、電力使用傾向のリアルタイム監視と分析が可能となり、動的修正によって需要を効果的に満たすことができる。インテリジェントシステムの使用により、電力消費が異なるセクターで最適化されると、効率性と持続可能性が向上する。本研究は、電力最適化に使用されるいくつかのAI技術と、電力消費の異なる分野にわたる様々なインテリジェントシステム応用ドメインの研究のための文献の方法論的分析を網羅し、それらを評価することにより17種類の研究手法の性能と成果を特定し、その強度と限界に関する貴重な知見を抽出することを目的とする。さらに、電力消費最適化のためのAIの統合における今後の方向性について概説する。 Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) integration are essential to changing the way energy is produced, used, and distributed. Real-time monitoring and analysis of power usage trends is made possible by AI-driven algorithms and predictive analytics, which enable dynamic modifications to effectively satisfy demand. Efficiency and sustainability are increased when power consumption is optimized in different sectors thanks to the use of intelligent systems. This survey paper comprises an extensive review of the several AI techniques used for power optimization as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of power consumption.This literature review identifies the performance and outcomes of 17 different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. Furthermore, this article outlines future directions in the integration of AI for power consumption optimization.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# RankAdaptor: 階層型動的低ランク適応による構造解析 RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs ( http://arxiv.org/abs/2406.15734v1 ) ライセンス: Link先を確認	Changhai Zhou, Shijie Han, Shiyang Zhang, Shichao Weng, Zekai Liu, Cheng Jin,	(参考訳) 大規模言語モデル(LLM)の効率的な圧縮は、ますます人気が高まっている。しかし, 圧縮LDMの精度の回復は依然として大きな課題である。標準低ランク適応 (LoRA) を用いた構造解析は、現在のLLM圧縮において一般的な手法である。構造的なプルーニングでは、モデルアーキテクチャは不均一に修正され、固定ランクの標準のLoRAを介して、様々な下流タスクにおいて最適なパフォーマンスをもたらす。この問題に対処するために, 階層的動的階数スケジューリングを用いた効率的な微調整手法である RankAdaptor を導入する。軽量な性能モデルを用いて、微調整時に異なるランクを決定するエンド・ツー・エンドの自動最適化フローを開発した。一般的なベンチマークに関する総合的な実験によると、RancAdaptorは標準のLoRAより一貫して優れており、異なるプルーニング設定に対して構造的なプルーニングを行っている。トレーニング可能なパラメータを増やさなくても、RandAdaptorは、標準的なLoRAと比較して、プルーンドモデルとオリジナルのモデルのリカバリの間の精度パフォーマンスギャップをさらに減らすことができる。 The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various downstream tasks via standard LoRA with fixed rank. To address this problem, we introduce RankAdaptor, an efficient fine-tuning method with hierarchical dynamic rank scheduling for pruned LLMs. An end-to-end automatic optimization flow is developed that utilizes a lightweight performance model to determine the different ranks during fine-tuning. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms standard LoRA with structural pruning over different pruning settings. Without increasing the trainable parameters, RankAdaptor further reduces the accuracy performance gap between the recovery of the pruned model and the original model compared to standard LoRA.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 画像-映像拡散モデルにおける条件付き画像漏洩の同定と解法 Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model ( http://arxiv.org/abs/2406.15735v1 ) ライセンス: Link先を確認	Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu,	(参考訳) 拡散モデルは、画像間(I2V)生成においてかなり進歩している。しかし、そのようなモデルは完全には理解されていない。本稿では,I2V拡散モデル(I2V-DMs)における重要な問題,すなわち条件付き画像リークについて報告する。 I2V-DMは、ノイズの多い入力からクリーンなビデオを予測する重要なタスクを無視し、大きなステップで条件付き画像を過度に頼りにしがちである。さらに,プラグイン・アンド・プレイ戦略を提示することで,推論とトレーニングの両面からこの課題に対処する。まず、I2V-DMの信頼性の低い遅延時間ステップを回避するために、早い段階から生成プロセスを開始するトレーニングフリー推論戦略を導入し、トレーニング-推論ギャップを効果的に橋渡しするために、KLの分散を最小化することにより、最適な解析式(Analytic-Init)による初期ノイズ分布を導出する。第2に,条件画像リークを軽減するため,条件画像の時間依存性雑音分布を設計し,条件画像に十分干渉するために,大規模ステップでの高雑音レベルを優先する。収集したオープンドメイン画像ベンチマークとUCF101データセットを用いて,様々なI2V-DM上でこれらの戦略を検証する。画像のアライメントや時間的一貫性を損なうことなく、よりダイナミックで自然な動画を制作することで、本手法がベースラインより優れていることを示す。プロジェクトページ: \url{https://cond-image-leak.github.io/}。 Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean video from noisy inputs, which results in videos lacking dynamic and vivid motion. We further address this challenge from both inference and training aspects by presenting plug-and-play strategies accordingly. First, we introduce a training-free inference strategy that starts the generation process from an earlier time step to avoid the unreliable late-time steps of I2V-DMs, as well as an initial noise distribution with optimal analytic expressions (Analytic-Init) by minimizing the KL divergence between it and the actual marginal distribution to effectively bridge the training-inference gap. Second, to mitigate conditional image leakage during training, we design a time-dependent noise distribution for the conditional image, which favors high noise levels at large time steps to sufficiently interfere with the conditional image. We validate these strategies on various I2V-DMs using our collected open-domain image benchmark and the UCF101 dataset. Extensive results demonstrate that our methods outperform baselines by producing videos with more dynamic and natural motion without compromising image alignment and temporal consistency. The project page: \url{https://cond-image-leak.github.io/}.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# 子どもの数学オリンピックにおける視覚・言語モデルの評価 Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads ( http://arxiv.org/abs/2406.15736v1 ) ライセンス: Link先を確認	Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum,	(参考訳) 近年,ChatGPTやGeminiなど,大規模視覚・言語モデル(LVLM)の汎用的問題解決能力が著しく進歩している。これらのブレークスルーのいくつかは、高次の認知スキルを必要とするさまざまなタスクにおいて、AIモデルが人間の能力を上回っているようにさえ見えます。現在の大きなAIモデルは、人間のように一般化された問題解決が可能か? しかし、ジョイントビジョンとテキスト推論のためのAI能力の体系的な分析は、現在の科学文献に欠けている。本稿では, このギャップを埋めるために, 子どものオリンピアードのビジュオ言語問題を用いて, 数学的, アルゴリズム的推論能力について, 最先端のLVLMを評価した。具体的には,1～12年生の子どもを対象とする国際コンペである数学カンガルー(MK)オリンピアード(Olympiad)の問題について考察する。 MKのパズルを用いて、2020-2024年の840個の問題からなるSMART-840というデータセットを作成しました。我々のデータセットを用いて,LVLMのパワーを数学的推論に基づいて分析し,パズルに対する反応は,子供のそれと直接比較する方法を提供する。以上の結果から,近代のLVLMは,高学年の問題解決において,より強力な推論能力を示す一方で,幼児向けの問題に正しく答える基盤が欠如していることが示唆された。さらに分析したところ、AIモデルの推論能力と幼児の推論能力の間に有意な相関は見られず、それらの能力は、子どもの数学や論理のスキルの根底にある累積的知識とは異なるタイプの推論に基づいているようである。 Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at children from grades 1-12, that tests children's deeper mathematical abilities using puzzles that are appropriately gauged to their age and skills. Using the puzzles from MK, we created a dataset, dubbed SMART-840, consisting of 840 problems from years 2020-2024. With our dataset, we analyze LVLMs power on mathematical reasoning; their responses on our puzzles offer a direct way to compare against that of children. Our results show that modern LVLMs do demonstrate increasingly powerful reasoning skills in solving problems for higher grades, but lack the foundations to correctly answer problems designed for younger children. Further analysis shows that there is no significant correlation between the reasoning capabilities of AI models and that of young children, and their capabilities appear to be based on a different type of reasoning than the cumulative knowledge that underlies children's mathematics and logic skills.	翻訳日:2024-06-25 20:54:52 公開日:2024-06-22
# Ladder: LLMベースの機械翻訳を次のレベルに上げるモデルに依存しないフレームワーク Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level ( http://arxiv.org/abs/2406.15741v1 ) ライセンス: Link先を確認	Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu,	(参考訳) GPT-4のような汎用大規模言語モデル(LLM)は、広範囲なウェブコンテンツを活用することで機械翻訳(MT)において顕著な進歩を遂げた。一方、翻訳特化LDMは、ドメイン固有の単言語コーパスを事前学習し、人手による翻訳データによる微調整によって構築される。優れた性能にもかかわらず、これらの手法は前例のない規模の計算とデータを必要とするか、実際の人間の編集とアノテーションの努力を必要とする。本稿では,MT 用汎用 LLM の性能を向上する新しいモデル非依存・コスト効率ツールである Ladder を開発した。トレーニング中、我々は容易にハードなスキーマで階層的な微調整戦略を提案し、ラダーの精錬性能を徐々に改善した。トレーニングされたLadderは、任意の汎用LLMとシームレスに統合され、翻訳性能が向上する。 Gemma-2B/7B をバックボーンとして使用することにより、Ladder-2B は最上位のオープンソースモデル(例えば、BigTranslate-13B を +6.91 BLEU と +3.52 COMET for XX-En で精製)に生の翻訳を高め、Ladder-7B は最先端の GPT-4 と同等のモデル性能をさらに向上させることができる。広範囲にわたるアブレーションと分析は、様々な環境でラダーの有効性を裏付ける。私たちのコードはhttps://github.com/fzp0424/Ladderで利用可能です。 General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedented scale of computing and data or substantial human editing and annotation efforts. In this paper, we develop Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT. Ladder is trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost. During training, we propose a hierarchical fine-tuning strategy with an easy-to-hard schema, improving Ladder's refining performance progressively. The trained Ladder can be seamlessly integrated with any general-purpose LLMs to boost their translation performance. By utilizing Gemma-2B/7B as the backbone, Ladder-2B can elevate raw translations to the level of top-tier open-source models (e.g., refining BigTranslate-13B with +6.91 BLEU and +3.52 COMET for XX-En), and Ladder-7B can further enhance model performance to be on par with the state-of-the-art GPT-4. Extensive ablation and analysis corroborate the effectiveness of Ladder in diverse settings. Our code is available at https://github.com/fzp0424/Ladder	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# プログラム可能な変分推論による確率計画法 Probabilistic Programming with Programmable Variational Inference ( http://arxiv.org/abs/2406.15742v1 ) ライセンス: Link先を確認	McCoy R. Becker, Alexander K. Lew, Xiaoyan Wang, Matin Ghavami, Mathieu Huot, Martin C. Rinard, Vikash K. Mansinghka,	(参考訳) 現代の確率的プログラミング言語 (PPL) でサポートされている高度なモンテカルロ法と比較すると、PPLは変分推論 (VI) をサポートしていない: ユーザーは通常、PPLバックエンドでモノリシックに実装される変分目的と勾配推定器の事前定義された選択に制限される。本稿では,構成プログラム変換に基づくPPLの変分推論を支援するための,よりモジュラーなアプローチを提案する。提案手法では,変動目的をプログラムとして表現し,ユーザが定義したモデルと変分族の下での期待値の密度の計算に一級構成を用いる。次に、これらのプログラムを体系的に非バイアス勾配推定器に変換し、それらが定義する目的を最適化する。我々の設計は、自動微分、密度蓄積、トレーシング、非バイアス勾配推定戦略の適用など、多くの相互作用する関心事に関するモジュラー推論を可能にする。さらに,PPL における VI の既存サポートと比較して,その設計は3つの軸に沿った表現性の向上を図っている。(1) オプションの固定メニューではなく,ユーザ定義の変動目標のオープンなセットのサポート,(2) 現在の PPL では自動化されていない勾配推定戦略の組合せ空間のサポート,(3) 近似境界化と正規化(モンテカルロ推論のみに導入)のための構成をサポートするため,より広範なモデルと変動家族のクラスをサポートする。我々は、Gen確率型プログラミングシステムの拡張(JAXで実装されたgenjax.vi)にアプローチを実装し、いくつかの深い生成モデリングタスクを評価し、手書き実装と比較してパフォーマンスのオーバーヘッドが最小限であり、オープンソースのPPLと競合する性能を示す。 Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design enables modular reasoning about many interacting concerns, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today's PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced only for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (genjax.vi, implemented in JAX), and evaluate on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# CasModaTest: 単体テスト生成のためのケースドとモデルに依存しないセルフダイレクトフレームワーク CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation ( http://arxiv.org/abs/2406.15743v1 ) ライセンス: Link先を確認	Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang,	(参考訳) 多くの機械学習(ML)ベースのユニットテスト生成アプローチが提案され、実際に顕著なパフォーマンスを達成したが、有効性や実用性にはいくつかの制限がある。より正確には、(1) 既存のMLベースのアプローチは、主にテストオラクル生成に焦点を当てた単体テストの部分的内容を生成し、(2) テストプレフィックスをテストオラクルと意味的にミスマッチさせ、(3) は、クローズドソースモデルに強く結びついており、最終的にはデータセキュリティを損なう。本稿では,CasModaTestを提案する。CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest。そして、手動で大規模なデモプールを構築し、CasModaTestに高品質なテストプレフィックスとテストオラクルの例を提供します。最後に、CasModaTestは生成されたテストプレフィックスとテストオラクルを自動的に組み立て、それらの有効性をチェックするためにコンパイルまたは実行します。 CasModaTestの有効性を評価するために、広く使われているデータセット(Defects4J)上で大規模な実験を行い、2つのパフォーマンス対策を考慮し、4つの最先端(SOTA)アプローチと比較する。実験の結果、CasModaTestは全てのSOTAをかなり改善した(精度は60.62%-352.55%、焦点法は2.83%-87.27%)。また、異なるオープンソース LLM 上で CasModaTest を実験した結果、CasModaTest は SOTA (39.82%-293.96% と 9.25%-98.95% ) に対して、エンドツーエンドの単体テスト生成において大幅な改善が達成できることがわかった。 Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle semantically; and (3) are highly bound with the close-sourced model, eventually damaging data security. We propose CasModaTest, a cascaded, model-agnostic, and end-to-end unit test generation framework, to alleviate the above limitations with two cascaded stages: test prefix generation and test oracle generation. Then, we manually build large-scale demo pools to provide CasModaTest with high-quality test prefixes and test oracles examples. Finally, CasModaTest automatically assembles the generated test prefixes and test oracles and compiles or executes them to check their effectiveness, optionally appending with several attempts to fix the errors occurring in compiling and executing phases. To evaluate the effectiveness of CasModaTest, we conduct large-scale experiments on a widely used dataset (Defects4J) and compare it with four state-of-the-art (SOTA) approaches by considering two performance measures. The experimental results indicate that CasModaTest outperforms all SOTAs with a substantial improvement (i.e., 60.62%-352.55% in terms of accuracy, 2.83%-87.27% in terms of focal method coverage). Besides, we also conduct experiments of CasModaTest on different open-source LLMs and find that CasModaTest can also achieve significant improvements over SOTAs (39.82%-293.96% and 9.25%-98.95% in terms of accuracy and focal method coverage, respectively) in end-to-end unit test generation	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 外部励起を考慮した未知確率力学系のモデル化 Modeling Unknown Stochastic Dynamical System Subject to External Excitation ( http://arxiv.org/abs/2406.15747v1 ) ライセンス: Link先を確認	Yuan Chen, Dongbin Xiu,	(参考訳) 本稿では,時間依存励起や制御信号を受ける確率系の未知の確率力学系を学習するための数値的手法を提案する。我々の基本的な前提は、確率系の支配方程式は利用できないということである。しかし、ある既知の励起信号と対応するシステム応答からなる入出力(I/O)データの短いバーストが利用可能である。十分な量のI/Oデータが得られると、未知のダイナミクスを学習し、トレーニングデータにない任意の励起信号を受けるシステムの確率応答の正確な予測モデルを生成することができる。本手法は,(1)学習をパラメータ化形式に変換するためのトレーニングI/Oデータの局所近似,(2)未知の確率フローマップを分布に近似する生成モデル,の2つの重要な要素を有する。提案手法を詳細に提示した後, 提案手法の性能, 特に長期システム予測について, 総合的な数値例を提示する。 We present a numerical method for learning unknown nonautonomous stochastic dynamical system, i.e., stochastic system subject to time dependent excitation or control signals. Our basic assumption is that the governing equations for the stochastic system are unavailable. However, short bursts of input/output (I/O) data consisting of certain known excitation signals and their corresponding system responses are available. When a sufficient amount of such I/O data are available, our method is capable of learning the unknown dynamics and producing an accurate predictive model for the stochastic responses of the system subject to arbitrary excitation signals not in the training data. Our method has two key components: (1) a local approximation of the training I/O data to transfer the learning into a parameterized form; and (2) a generative model to approximate the underlying unknown stochastic flow map in distribution. After presenting the method in detail, we present a comprehensive set of numerical examples to demonstrate the performance of the proposed method, especially for long-term system predictions.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# TacoLM: Gated Attention Equated Codec Language Modelは音声合成のための効率的なゼロショットテキストである TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers ( http://arxiv.org/abs/2406.15752v1 ) ライセンス: Link先を確認	Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen,	(参考訳) ニューラルコーデック言語モデル(LM)は、ゼロショットテキスト音声合成(TTS)において強力な機能を示した。しかし、コーデックLMは、自動回帰特性とテキストとオーディオ間の暗黙のアライメントのため、推論速度と安定性の制限に悩まされることが多い。本研究では,これらの課題に対処するために,新しいニューラルコーデックLM,すなわちTacoLMを導入する。特に、TacoLMは、トレーニングと推論効率を改善し、モデルサイズを小さくするゲートアテンション機構を導入している。一方、デコーダ層毎に追加のゲートクロスアテンション層が含まれており、合成音声の効率性と内容精度を向上させる。 The evaluation of the Librispeech corpus, proposed TacoLM achieve a better word error rate, speaker similarity and mean opinion score, with 90% less parameters and 5.2 times up than VALL-E。デモとコードはhttps://ereboas.github.io/TacoLM/.comで公開されている。 Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, TacoLM introduces a gated attention mechanism to improve the training and inference efficiency and reduce the model size. Meanwhile, an additional gated cross-attention layer is included for each decoder layer, which improves the efficiency and content accuracy of the synthesized speech. In the evaluation of the Librispeech corpus, the proposed TacoLM achieves a better word error rate, speaker similarity, and mean opinion score, with 90% fewer parameters and 5.2 times speed up, compared with VALL-E. Demo and code is available at https://ereboas.github.io/TacoLM/.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 学習したリワード関数を最適化する危険性:低トレーニングエラーは低レギュレーションを保証しない The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret ( http://arxiv.org/abs/2406.15753v1 ) ライセンス: Link先を確認	Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse,	(参考訳) 強化学習では、意図したタスクをキャプチャする報酬関数を指定することが非常に難しい。リワード学習は報酬関数を学習することでこの問題に対処することを目的としている。しかし、学習した報奨モデルはトレーニング分布に誤差が低く、その後、大きな後悔を伴うポリシーを生成する。このような報酬モデルにはミスマッチがあると言っています。エラー-回帰ミスマッチの主な原因は、ポリシー最適化中に一般的に発生する分布シフトである。本稿では,報奨モデルの十分低いテスト誤差が最悪のケースの後悔の少ないことを数学的に証明するが,任意の固定されたテスト誤差に対して,エラー-回帰ミスマッチを許容する現実的なデータ分布が存在することを示す。次に、RLHFのような手法でよく用いられるポリシー正則化手法を用いても、同様の問題が持続することを示す。我々の理論的結果は、学習報酬モデルの品質を測定する新しい方法を開発することの重要性を強調している。 In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of developing new ways to measure the quality of learned reward models.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 声道モデルのためのマルチモーダルセグメンテーション Multimodal Segmentation for Vocal Tract Modeling ( http://arxiv.org/abs/2406.15754v1 ) ライセンス: Link先を確認	Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli,	(参考訳) 解釈可能な音声処理と言語学のための調音表現を構築するためには,声道の正確なモデリングが必要である。しかし, 声道モデリングは, 内部調音器の多くが外的モーションキャプチャー技術から除外されているため, 困難である。リアルタイム磁気共鳴イメージング(RT-MRI)は、音声中の内音節の正確な動きを計測するが、MRIの注釈付きデータセットのサイズは、時間的・計算的に高価なラベル付け法によって制限される。まず、視覚のみのセグメンテーション手法を用いて、RT-MRIビデオにディープラベリング戦略を提案する。次に、音声を用いたマルチモーダルアルゴリズムを導入し、発声器のセグメンテーションを改善する。今回我々は,MRIビデオセグメンテーションにおける声道モデリングのための新しいベンチマークを作成し,75話者RT-MRIデータセットのラベルをリリースし,声道の公的なRT-MRIデータのラベルを9倍に増やした。コードとデータセットのラベルは \url{rishiraij.github.io/multimodal-mri-avatar/} にある。 Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 弱教師付きセマンティックセグメンテーションのためのきめ細かい背景表現 Fine-grained Background Representation for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2406.15755v1 ) ライセンス: Link先を確認	Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon,	(参考訳) 画像レベルのラベルから信頼できる疑似マスクを生成することは、空間情報の欠如により、弱教師付きセマンティックセグメンテーション(WSSS)タスクにおいて困難である。クラスアクティベーションマップ(CAM)ベースのソリューションは、不審な背景(BG)画素から前景(FG)オブジェクトを識別し、積分対象領域を学習する。本稿では,多様なBGセマンティクスを発見し,表現し,共起問題に対処するシンプルな背景表現(FBR)法を提案する。 BG表現のためのクラスプロトタイプやピクセルレベルの機能の使用を放棄します。代わりに、我々は、細粒度BGセマンティック情報を捕捉し、ピクセル対NROIのコントラストを実行し、紛らわしいBGピクセルを区別するために、新しいプリミティブ、負の関心領域(NROI)を開発する。また,FGの負をフライでマイニングするアクティブサンプリング戦略を提案し,地中コントラスト学習を効果的に行い,対象領域全体を活性化させる。設計の単純さと使い勝手の良さにより,提案手法は様々なモデルにシームレスに接続することができ,ベンチマーク間でWSSS設定の下で新たな最先端結果が得られる。画像レベルのラベルのみを監督として活用し,Pascal VocとMS COCOテストセットで73.2 mIoUと45.6 mIoUのセグメンテーション結果を得た。さらに,サリエンシマップを追加の監視信号(I+S)として組み込むことで,Pascal Vocテストセット上で74.9 mIoUを得ることができた。同時に、我々のFBRアプローチは、弱教師付きインスタンスセグメンテーション(WSIS)タスクにおいて有意義なパフォーマンス向上を示し、その堅牢性と多様なドメインにわたる強力な一般化能力を示している。 Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics and address the co-occurring problems. We abandon using the class prototype or pixel-level features for BG representation. Instead, we develop a novel primitive, negative region of interest (NROI), to capture the fine-grained BG semantic information and conduct the pixel-to-NROI contrast to distinguish the confusing BG pixels. We also present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning to activate the entire object region. Thanks to the simplicity of design and convenience in use, our proposed method can be seamlessly plugged into various models, yielding new state-of-the-art results under various WSSS settings across benchmarks. Leveraging solely image-level (I) labels as supervision, our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively. Furthermore, by incorporating saliency maps as an additional supervision signal (I+S), we attain 74.9 mIoU on Pascal Voc test set. Concurrently, our FBR approach demonstrates meaningful performance gains in weakly-supervised instance segmentation (WSIS) tasks, showcasing its robustness and strong generalization capabilities across diverse domains.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 量子符号の摂動安定性と誤差補正しきい値 Perturbative stability and error correction thresholds of quantum codes ( http://arxiv.org/abs/2406.15757v1 ) ライセンス: Link先を確認	Yaodong Li, Nicholas O'Dea, Vedika Khemani,	(参考訳) 位相的に順序付けられた位相は局所摂動に対して安定であり、位相的量子誤り訂正符号は局所誤差に対するしきい値を持つ。一般CSS符号と古典線形符号を復号化するための古典統計力学モデルを構築することにより、安定性の2つの概念を結合する。提案手法は,非相関ビットフリップおよび位相フリップ誤差下での補正成功確率をエンコードし,焼成障害を伴う一般化Z2格子ゲージ理論を同時に記述する。後者のクリーンな限界は、誤差が摂動XまたはZ磁場に変換されるとき、正確には対応する量子コードハミルトニアンの離散化された仮想時間パス積分である。誤差補正の考慮により、そのような一般化されたZ2格子ゲージ理論の一般次パラメータを定義し、誤差補正の成功確率によって一般に低い値となることを示す。 LDPC条件を満たすCSS符号に対して、対応する格子ゲージ理論の低温秩序相の存在を証明し、特にユークリッド空間的局所性に欠ける場合や、符号速度がゼロでない場合について述べる。さらに、これらの結果は、連続虚数時間の極限で得られた対応する摂動量子ハミルトニアンの安定相の証拠を与えると主張する。そのため、格子ゲージ理論における空間的および時間的欠陥を区別する。空間的欠陥の高エネルギーコストは「メモリ実験」の成功に対応し、基底状態間のエネルギー分割を抑制する一方、時間的欠陥の高エネルギーコストは「安定実験」の成功に対応し、局所的な励起に対するゼロではないギャップを指し示している。 Topologically-ordered phases are stable to local perturbations, and topological quantum error-correcting codes enjoy thresholds to local errors. We connect the two notions of stability by constructing classical statistical mechanics models for decoding general CSS codes and classical linear codes. Our construction encodes correction success probabilities under uncorrelated bit-flip and phase-flip errors, and simultaneously describes a generalized Z2 lattice gauge theory with quenched disorder. We observe that the clean limit of the latter is precisely the discretized imaginary time path integral of the corresponding quantum code Hamiltonian when the errors are turned into a perturbative X or Z magnetic field. Motivated by error correction considerations, we define general order parameters for all such generalized Z2 lattice gauge theories, and show that they are generally lower bounded by success probabilities of error correction. For CSS codes satisfying the LDPC condition and with a sufficiently large code distance, we prove the existence of a low temperature ordered phase of the corresponding lattice gauge theories, particularly for those lacking Euclidean spatial locality and/or when there is a nonzero code rate. We further argue that these results provide evidence to stable phases in the corresponding perturbed quantum Hamiltonians, obtained in the limit of continuous imaginary time. To do so, we distinguish space- and time-like defects in the lattice gauge theory. A high free-energy cost of space-like defects corresponds to a successful "memory experiment" and suppresses the energy splitting among the ground states, while a high free-energy cost of time-like defects corresponds to a successful "stability experiment" and points to a nonzero gap to local excitations.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# EDGE-LLM:Layerwise Unified CompressionとAdaptive Layer Tuning and Votingによるエッジデバイス上での効率的な大言語モデル適応の実現 EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting ( http://arxiv.org/abs/2406.15758v1 ) ライセンス: Link先を確認	Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin,	(参考訳) エッジデバイスへの大規模言語モデル(LLM)の効率的な適応は、継続的かつプライバシ保護の適応と推論を必要とするアプリケーションにとって不可欠である。しかし、既存のチューニング技術は高い計算とメモリオーバーヘッドのために不足している。そこで我々はEdge-LLMと呼ばれる計算・メモリ効率の高いLLMチューニングフレームワークを導入する。具体的には,レイヤワイド統一圧縮 (LUC) 技術を用いて,レイヤワイドプルーニング空間と量子化ビット幅ポリシーの生成による計算オーバーヘッドの低減,(2)バックプロパゲーション深さの低減によるメモリオーバーヘッドの低減のための適応層チューニングと投票方式,(3)LUCが導入した不規則な計算パターンと適応層チューニングを補完するハードウェアスケジューリング戦略により,効率的な計算とデータ移動を実現する。大規模な実験では、Edge-LLMは2.92倍のスピードアップと4倍のメモリオーバーヘッド削減を実現している。私たちのコードはhttps://github.com/GATECH-EIC/Edge-LLMで利用可能です。 Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning, thereby achieving efficient computation and data movements. Extensive experiments demonstrate that Edge-LLM achieves a 2.92x speed up and a 4x memory overhead reduction as compared to vanilla tuning methods with comparable task accuracy. Our code is available at https://github.com/GATECH-EIC/Edge-LLM	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# コンセプトドリフトのための新しいベッティング機能を備えたICMアンサンブル ICM Ensemble with Novel Betting Functions for Concept Drift ( http://arxiv.org/abs/2406.15760v1 ) ライセンス: Link先を確認	Charalambos Eliades, Harris Papadopoulos,	(参考訳) 本研究は,CD(Concept Drift)に対処する改良されたインダクティブ・コンフォーマル・マルティンゴール (ICM) アプローチを導入することで,これまでの成果を裏付けるものである。具体的には,これまでに提案したCAUTIOUSベッティング機能を強化し,複数の密度推定器を組み込んで検出能力を向上する。また、このベッティング関数と、これまでICMフレームワークで利用されていなかった2つのベース推定器(補間ヒストグラムと近接近傍密度推定器)を組み合わせる。 ICMとICMのアンサンブルの両方を用いて,これらの拡張を評価する。後者では,アンサンブルサイズが予測精度および利用可能な予測数に与える影響を総合的に検討する。評価実験の結果,提案手法は従来の手法を上回り,その多くが現代の3つの最先端技術に勝っていることがわかった。 This study builds upon our previous work by introducing a refined Inductive Conformal Martingale (ICM) approach for addressing Concept Drift (CD). Specifically, we enhance our previously proposed CAUTIOUS betting function to incorporate multiple density estimators for improving detection ability. We also combine this betting function with two base estimators that have not been previously utilized within the ICM framework: the Interpolated Histogram and Nearest Neighbor Density Estimators. We assess these extensions using both a single ICM and an ensemble of ICMs. For the latter, we conduct a comprehensive experimental investigation into the influence of the ensemble size on prediction accuracy and the number of available predictions. Our experimental results on four benchmark datasets demonstrate that the proposed approach surpasses our previous methodology in terms of performance while matching or in many cases exceeding that of three contemporary state-of-the-art techniques.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# ワッサーシュタイン勾配流の観点からの数値タブラルデータインプットの拡散モデルの再考 Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow ( http://arxiv.org/abs/2406.15762v1 ) ライセンス: Link先を確認	Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang,	(参考訳) 拡散モデル (DM) はMDI (Missing Data Imputation) において注目されている。不正確なImputationは、DMの自然に試料を多様化する生成過程から生じる。 (2)。難易度トレーニングは、モデルトレーニング段階におけるマスクマトリックスに必要な複雑な設計に由来する。数値表付きデータセットの領域内でこれらの懸念に対処するため、KnewImp(Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation)と呼ばれる新しい原理のアプローチを導入する。具体的には、ワッサースタイン勾配流(WGF)の枠組みに基づいて、第一号がDMベースのMDIで暗黙的に最大化されるコスト汎関数とMDIの目的と多角化を動機とする非負の項に等しいことを最初に証明する。そこで我々は,分散化分散負のエントロピーを持つ新しいコスト関数を設計し,WGFフレームワーク内でのKnewImpアプローチとカーネルヒルベルト空間の再生を導出する。その後、KnewImpの計算手順は、関節分布に関連する他のコスト関数から導出できることを証明し、マスクマトリックスの必要性を排除し、自然に対処する問題(2)を解決した。我々の提案したKnewImpアプローチは,既存の最先端手法を著しく上回っている。 Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within the realm of numerical tabular datasets, we introduce a novel principled approach termed Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation (KnewImp). Specifically, based on Wasserstein gradient flow (WGF) framework, we first prove that issue (1) stems from the cost functionals implicitly maximized in DM-based MDI are equivalent to the MDI's objective plus diversification-promoting non-negative terms. Based on this, we then design a novel cost functional with diversification-discouraging negative entropy and derive our KnewImp approach within WGF framework and reproducing kernel Hilbert space. After that, we prove that the imputation procedure of KnewImp can be derived from another cost functional related to the joint distribution, eliminating the need for the mask matrix and hence naturally addressing issue (2). Extensive experiments demonstrate that our proposed KnewImp approach significantly outperforms existing state-of-the-art methods.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# AllMatch: セミスーパービジョンの学習のために、ラベルのないすべてのデータを爆発させる AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning ( http://arxiv.org/abs/2406.15763v1 ) ライセンス: Link先を確認	Zhiyu Wu, Jinshi Cui,	(参考訳) 既存の半教師付き学習アルゴリズムでは、擬似ラベル付けおよび整合性制御技術を用いて、未ラベルサンプルの監視信号を導入する。しきい値に基づく擬似ラベルの本来の限界を克服するために、従来の研究では、信頼度閾値と、未ラベルデータに基づく予測によって推定されるモデルの進化的学習状態との整合を試みてきた。本稿では,分類器の重み付けにより,カテゴリ間での差分学習状態を反映し,クラス固有の適応しきい値機構を提案する。さらに、最適しきい値スキームでさえ、ラベル付けされていないサンプルを廃棄する問題を解決できないことを考えると、バイナリ分類整合性規制アプローチは、全てのラベル付けされていないサンプルに対して負のオプションから候補クラスを区別するように設計されている。以上の戦略を組み合わせることで、擬似ラベル精度の向上とラベルなしデータの100%利用率を実現する、AllMatchという新しいSSLアルゴリズムを提案する。我々は、バランスの取れた設定とバランスの取れていない設定の両方を含む、複数のベンチマークに対するアプローチを広範囲に評価した。その結果、AllMatchは既存の最先端メソッドよりも一貫して優れています。 Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100\% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# TP-DRSeg:Explicit Text-Prompts Assisted SAMによる糖尿病網膜症病変分画の改善 TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM ( http://arxiv.org/abs/2406.15764v1 ) ライセンス: Link先を確認	Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge,	(参考訳) SAM(Segment Anything Model)のような大規模基盤モデルの最近の進歩は、様々なタスクにおいて大きな可能性を証明している。それらの進歩にもかかわらず、これらのモデルは、特に糖尿病網膜症(DR)病変の微妙な相違を認識する際に、専門的な医用画像解析における課題に直面している。本稿では,テキストプロンプされたDR病変のセグメンテーションのためにSAMをカスタマイズする新しいフレームワーク,TP-DRSegを提案する。我々の中核となる考え方は、医学的な事前知識を視覚のみのセグメンテーションネットワークに注入するために言語手がかりを活用することであり、それによって異なる基礎モデルの利点を組み合わせ、セグメンテーションの信頼性を高めることである。具体的には、医用概念認識における視覚言語モデルの可能性を解き明かすために、暗黙の医学的概念を明示的な事前知識に伝達する明示的な事前エンコーダを提案し、病変に関連する低レベル特徴を発掘するための説明可能な手がかりを提供する。さらに,マルチモーダルな特徴間の知識共有を容易にし,パラメータ効率のよい手法でフレームワークを訓練できるように,セグメンテーションプロセスに明示的な事前を注入するための事前整合型インジェクタを設計する。実験により、従来のモデルや基礎モデルよりもフレームワークの方が優れていることが示された。 Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis, especially in recognizing subtle inter-class differences in Diabetic Retinopathy (DR) lesion segmentation. In this paper, we propose a novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion. Experimental results demonstrate the superiority of our framework over other traditional models and foundation model variants.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 隠れた注意シンクの解き放つ-注意校正によるトレーニング無しの大規模言語モデルの強化 Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration ( http://arxiv.org/abs/2406.15765v1 ) ライセンス: Link先を確認	Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin,	(参考訳) 注意は、大きな言語モデル(LLM)の顕著な成果の背後にある基本的な要素である。しかし、注意機構の現在の理解、特に注意分布の確立に関する理解は限られている。意味的重要性の欠如にもかかわらず、非常に大きな注意点を受け取る最初のトークンに注意シンクの存在を探求する最近の研究から着想を得たこの研究は、この現象を深く掘り下げている。本研究の目的は,LLM内の注目シンクの存在をより深く理解し,重量微調整を必要とせず,注意分布を直接最適化することにより,LLMの達成可能な精度を高める方法を明らかにすることである。具体的には、様々な入力やタスクの推論中にLLMの注意分布を包括的に可視化することから始める。これらの視覚化に基づいて,(1)注意シンクはシーケンスの開始時だけでなく,後続の入力トークン内でも発生し,(2)すべての注意シンクがLLMの達成可能な精度に肯定的な影響を及ぼすわけではないことを初めて知る。そこで本研究では,入力適応方式で,ハエの注意分布を自動的に最適化する,トレーニング不要な注意校正手法(ACT)を提案する。広範囲にわたる実験により、ACTは異なる用途にわたる様々なLSMの精度を一貫して向上することが示された。具体的には、ACTはLlama-30Bに適用した場合、異なるデータセット間で平均7.30%の精度向上を達成する。私たちのコードはhttps://github.com/GATECH-EIC/ACTで公開されています。 Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores despite their lack of semantic importance, this work delves deeper into this phenomenon. We aim to provide a more profound understanding of the existence of attention sinks within LLMs and to uncover ways to enhance the achievable accuracy of LLMs by directly optimizing the attention distributions, without the need for weight finetuning. Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks. Based on these visualizations, to the best of our knowledge, we are the first to discover that (1) attention sinks occur not only at the start of sequences but also within later tokens of the input, and (2) not all attention sinks have a positive impact on the achievable accuracy of LLMs. Building upon our findings, we propose a training-free Attention Calibration Technique (ACT) that automatically optimizes the attention distributions on the fly during inference in an input-adaptive manner. Extensive experiments validate that ACT consistently enhances the accuracy of various LLMs across different applications. Specifically, ACT achieves an average improvement of up to 7.30% in accuracy across different datasets when applied to Llama-30B. Our code is available at https://github.com/GATECH-EIC/ACT.	翻訳日:2024-06-25 20:45:08 公開日:2024-06-22
# 産業ストリーミングデータに対する拡散型生成再生による連続学習 Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data ( http://arxiv.org/abs/2406.15766v1 ) ライセンス: Link先を確認	Jiayi He, Jiao Chen, Qianmiao Liu, Suyan Dai, Jianhua Tang, Dongpo Liu,	(参考訳) 産業用モノのインターネット(Industrial Internet of Things, IIoT)は、相互接続されたセンサーとデバイスを統合して産業アプリケーションをサポートするが、その動的環境はデータドリフトに関連する課題を引き起こす。本稿では,資源が限られており,新たなデータ配信にモデルを効果的に適用する必要があることを考慮し,新たな生成再生機構を通じて産業ストリーミングデータによってもたらされる課題に対処する継続学習(CL)アプローチ(Distillation-based Self-Guidance:DSG)を提案する。 DSGは、知識蒸留を利用して、前回の拡散ベースジェネレータから更新したジェネレータへの知識伝達を行い、ジェネレータの安定性と再生データの品質の両方を改善し、破滅的な忘れの軽減を図る。 CWRU, DSA, WISDMデータセットの実験結果からDSGの有効性が示された。 DSGは最先端のベースラインを精度で上回り、主要なデータセットの2.9%から5.0%の改善を示す。 The Industrial Internet of Things (IIoT) integrates interconnected sensors and devices to support industrial applications, but its dynamic environments pose challenges related to data drift. Considering the limited resources and the need to effectively adapt models to new data distributions, this paper introduces a Continual Learning (CL) approach, i.e., Distillation-based Self-Guidance (DSG), to address challenges presented by industrial streaming data via a novel generative replay mechanism. DSG utilizes knowledge distillation to transfer knowledge from the previous diffusion-based generator to the updated one, improving both the stability of the generator and the quality of reproduced data, thereby enhancing the mitigation of catastrophic forgetting. Experimental results on CWRU, DSA, and WISDM datasets demonstrate the effectiveness of DSG. DSG outperforms the state-of-the-art baseline in accuracy, demonstrating improvements ranging from 2.9% to 5.0% on key datasets, showcasing its potential for practical industrial applications.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# MR-MLLM:マルチモーダル理解と視覚知覚の相互強化 MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception ( http://arxiv.org/abs/2406.15768v1 ) ライセンス: Link先を確認	Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang,	(参考訳) 近年,マルチモーダル大規模言語モデル (MLLM) は視覚的質問応答や常識推論といったタスクにおいて顕著な能力を示し,視覚的知覚モデルは検出やセグメンテーションといった認知タスクにおいて大きな進歩を遂げている。しかし、MLLMは主に高レベルの画像文の解釈に重点を置いており、細かな視覚的理解に苦慮している。これらの課題を克服するために,視覚知覚とマルチモーダル理解を相乗的に強化する新しいフレームワークであるMutually Reinforced Multimodal Large Language Model (MR-MLLM)を提案する。まず、視覚モデルからの詳細な視覚入力と言語モデルの言語深度を調和させ、マルチモーダル理解と視覚知覚を相乗的に強化する共有クエリ融合機構を提案する。第2に,物体検出境界ボックスなどの視覚知覚出力から新たなモダリティを取り入れ,微妙な視覚的要素を捕捉し,視覚的およびテキスト的データの理解を深める,知覚強化型クロスモーダル統合手法を提案する。さらに, 言語モデルのプロンプトに知覚情報を組み込んで, より正確なマルチモーダル解釈のために, 応答を文脈的に, 知覚的に整列させる, 革新的な知覚埋め込み型プロンプト生成機構を提案する。 MR-MLLMの様々なマルチモーダル理解および視覚知覚タスクにおいて、特にコーナーケースの視覚知覚ときめ細かな言語理解を必要とするタスクにおいて、より優れた性能を示す実験である。 In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation. However, MLLMs mainly focus on high-level image-text interpretations and struggle with fine-grained visual understanding, and vision perception models usually suffer from open-world distribution shifts due to their limited model capacity. To overcome these challenges, we propose the Mutually Reinforced Multimodal Large Language Model (MR-MLLM), a novel framework that synergistically enhances visual perception and multimodal comprehension. First, a shared query fusion mechanism is proposed to harmonize detailed visual inputs from vision models with the linguistic depth of language models, enhancing multimodal comprehension and vision perception synergistically. Second, we propose the perception-enhanced cross-modal integration method, incorporating novel modalities from vision perception outputs, like object detection bounding boxes, to capture subtle visual elements, thus enriching the understanding of both visual and textual data. In addition, an innovative perception-embedded prompt generation mechanism is proposed to embed perceptual information into the language model's prompts, aligning the responses contextually and perceptually for a more accurate multimodal interpretation. Extensive experiments demonstrate MR-MLLM's superior performance in various multimodal comprehension and vision perception tasks, particularly those requiring corner case vision perception and fine-grained language comprehension.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# HCQA @ Ego4D EgoSchema Challenge 2024 HCQA @ Ego4D EgoSchema Challenge 2024 ( http://arxiv.org/abs/2406.15771v1 ) ライセンス: Link先を確認	Haoyu Zhang, Yuquan Xie, Yisen Feng, Zaijing Li, Meng Liu, Liqiang Nie,	(参考訳) 本稿では,CVPR 2024におけるEgo4D EgoSchema Challengeのチャンピオンソリューションについて述べる。強力な自我中心のキャプションモデルと質問推理モデルを深く統合するために,HCQA という,自我中心のビデオ質問回答のための階層的理解スキームを提案する。細粒度キャプション生成、コンテキスト駆動の要約、推論誘導解答の3段階で構成されている。 HCQAは、長めのビデオが与えられたとき、局所的な詳細な視覚情報と、細粒度キャプション生成とコンテキスト駆動の要約によって、大域的に要約された視覚情報をキャプチャする。次に、推論誘導解答法において、HCQAは、この階層的な情報を用いて、与えられた質問を推論し、答える。 EgoSchemaのブラインドテストセットでは、HCQAは5000人以上のキュレートされた複数の質問に対して75%の精度で回答する。私たちのコードはhttps://github.com/Hyu-Zhang/HCQA.comでリリースされます。 In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guided Answering. Given a long-form video, HCQA captures local detailed visual information and global summarised visual information via Fine-grained Caption Generation and Context-driven Summarization, respectively. Then in Inference-guided Answering, HCQA utilizes this hierarchical information to reason and answer given question. On the EgoSchema blind test set, HCQA achieves 75% accuracy in answering over 5,000 human curated multiple-choice questions. Our code will be released at https://github.com/Hyu-Zhang/HCQA.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# ISS-Scenario: CARLAにおけるシナリオベースのテスト ISS-Scenario: Scenario-based Testing in CARLA ( http://arxiv.org/abs/2406.15777v1 ) ライセンス: Link先を確認	Renjue Li, Tianhang Qin, Cas Widdershoven,	(参考訳) 自律運転システム(ADS)の急速な発展は、将来性に満ちている。しかし、これらの約束を満たすためには、ADSはあらゆる状況において安全である必要がある。本稿では,シナリオベーステストのパラダイムにおける自律走行テストフレームワークであるISS-Scenarioを紹介する。 ISS-Scenarioは、バッチテスト、テストケース(潜在的に危険なシナリオ)の探索、自動運転車(AV)の性能評価のために設計されている。 ISS-Scenarioには、パラメタライズドデザインを備えた多様なシミュレーションシナリオライブラリが含まれている。さらにISS-Scenarioは、ランダムサンプリングと遺伝的アルゴリズムによる最適化検索という、2つのテスト手法をこのフレームワークに統合している。最後に、ISS-Scenarioは、アクシデントリプレイ機能を提供し、各テストケースのログファイルを保存することで、ADSが問題のある振る舞いを示したシナリオを再生および無効化することができる。 The rapidly evolving field of autonomous driving systems (ADSs) is full of promise. However, in order to fulfil these promises, ADSs need to be safe in all circumstances. This paper introduces ISS-Scenario, an autonomous driving testing framework in the paradigm of scenario-based testing. ISS-Scenario is designed for batch testing, exploration of test cases (e.g., potentially dangerous scenarios), and performance evaluation of autonomous vehicles (AVs). ISS-Scenario includes a diverse simulation scenario library with parametrized design. Furthermore, ISS-Scenario integrates two testing methods within the framework: random sampling and optimized search by means of a genetic algorithm. Finally, ISS-Scenario provides an accident replay feature, saving a log file for each test case which allows developers to replay and dissect scenarios where the ADS showed problematic behavior.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 ( http://arxiv.org/abs/2406.15778v1 ) ライセンス: Link先を確認	Yisen Feng, Haoyu Zhang, Yuquan Xie, Zaijing Li, Meng Liu, Liqiang Nie,	(参考訳) 本稿では,CVPR 2024におけるEgo4D Episodic Memory Benchmarkの自然言語クエリトラックとゴールステップトラックについて述べる。どちらの課題も、テキストクエリを使って長いビデオシーケンス内のアクションをローカライズする必要がある。ローカライゼーションの精度を高めるため,ビデオの時間的情報を処理するだけでなく,フレーム内の微細な物体を空間的に識別する。この目的のために,オブジェクトブランチを組み込んだ新しいアプローチであるObjectNLQを導入し,映像表現を詳細なオブジェクト情報で拡張し,グラウンド化効率を向上する。 ObjectNLQは23.15の平均R@1を達成し、自然言語クエリチャレンジでは2位、R@1, IoU=0.3で33.00を獲得し、ゴールステップチャレンジでは3位となった。私たちのコードはhttps://github.com/Yisen-Feng/ObjectNLQ.comでリリースされます。 In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatially within the frames. To this end, we introduce a novel approach, termed ObjectNLQ, which incorporates an object branch to augment the video representation with detailed object information, thereby improving grounding efficiency. ObjectNLQ achieves a mean R@1 of 23.15, ranking 2nd in the Natural Language Queries Challenge, and gains 33.00 in terms of the metric R@1, IoU=0.3, ranking 3rd in the Goal Step Challenge. Our code will be released at https://github.com/Yisen-Feng/ObjectNLQ.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# DABL:大規模言語モデルを用いたビジネスプロセスにおける意味異常の検出 DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models ( http://arxiv.org/abs/2406.15781v1 ) ライセンス: Link先を確認	Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian,	(参考訳) ビジネスプロセスの異常を検出することは、運用の成功を保証するために重要です。多くの既存手法は異常を検出するために統計周波数に依存していますが、頻繁な振る舞いが必ずしも望ましくないというわけではないことに注意する必要があります。この課題に対処するために、意味論的視点から異常を検出することは、より効果的なアプローチであることが証明されている。しかし、現在のセマンティックな異常検出方法は、トレース(すなわちプロセスインスタンス)を複数のイベントペアとして扱い、長距離依存関係を乱す。本稿では,大規模言語モデル(LLM)を用いたビジネスプロセスにおける意味異常の検出手法であるDABLを紹介する。さまざまなドメインから143,137の現実世界のプロセスモデルを収集します。これらのプロセスモデルのプレイアウトから通常のトレースを生成し、順序付けと排他的異常の両方をシミュレートすることで、Llama 2を微調整する。実験により,DABLは与えられたプロセスの一般化能力と学習能力の両方の観点から,既存の最先端のセマンティックな異常検出手法を超越していることが実証された。ユーザーはDABLを直接適用して、追加のトレーニングを必要とせずに、自身のデータセットのセマンティックな異常を検出することができる。さらに、DABLは自然言語の異常の原因を解釈する能力を提供し、検出された異常について貴重な洞察を提供する。 Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the capability to interpret the causes of anomalies in natural language, providing valuable insights into the detected anomalies.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 時系列解析のためのフィードバック駆動型量子貯水池計算 Feedback-driven quantum reservoir computing for time-series analysis ( http://arxiv.org/abs/2406.15783v1 ) ライセンス: Link先を確認	Kaito Kobayashi, Keisuke Fujii, Naoki Yamamoto,	(参考訳) 量子貯水池コンピューティング(QRC)は、非線形情報処理のための計算資源として量子システムを利用する、非常に有望な計算パラダイムである。時系列解析へのその応用は期待されているが、一般的なアプローチは測定時の量子状態の崩壊に悩まされ、時間的入力メモリが消去される。前者は時間複雑性をエスカレートし、後者はヒルベルト空間からの情報抽出を制限する。この問題に対処するため,フィードバック駆動型QRCフレームワークを提案する。この手法では、量子状態への無制限アクセスのために全ての量子ビットの射影測定を用い、測定結果はその後貯水池に送り返され、以前の入力の記憶を復元する。我々は,QRCが時系列処理において重要な要素であるフィードバックを通じて,フェードメモリ特性の取得に成功したことを実証した。特に、測定軌跡の分析では、フィードバック強度に応じて3つの異なる位相が示され、メモリ性能はカオスの端で最大化される。また、QRCの予測能力を評価し、量子スピン系から発する信号の予測性を示す。 Quantum reservoir computing (QRC) is a highly promising computational paradigm that leverages quantum systems as a computational resource for nonlinear information processing. While its application to time-series analysis is eagerly anticipated, prevailing approaches suffer from the collapse of the quantum state upon measurement, resulting in the erasure of temporal input memories. Neither repeated initializations nor weak measurements offer a fundamental solution, as the former escalates time complexity while the latter restricts the information extraction from the Hilbert space. To address this issue, we propose a feedback-driven QRC framework. This methodology employs projective measurements on all qubits for unrestricted access to the quantum state, with the measurement outcomes subsequently fed back into the reservoir to restore the memory of prior inputs. We demonstrate that our QRC successfully acquires the fading-memory property through the feedback, a critical element in time-series processing. Notably, analysis of measurement trajectories reveal three distinct phases depending on the feedback strength, with the memory performance maximized at the edge of chaos. We also evaluate the predictive capabilities of our QRC, demonstrating its suitability for forecasting signals originating from quantum spin systems.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 産業用AIシステムにおけるデータ問題:メタレビューと研究戦略 Data Issues in Industrial AI System: A Meta-Review and Research Strategy ( http://arxiv.org/abs/2406.15784v1 ) ライセンス: Link先を確認	Xuejiao Li, Cheng Yang, Charles Møller, Jay Lee,	(参考訳) 産業4.0の時代には、人工知能(AI)は産業システムにおいてますます重要な役割を担っている。近年、さまざまな業界でAIを採用する傾向にあるが、実際のAIの採用は認識されるほど発展していない。この遅れに寄与する重要な要因は、AI実装におけるデータ問題である。これらのデータ問題にどのように対処するかは、業界と学術の両方に直面する重要な懸念事項である。データ問題に対処する最初のステップは、これらの問題をマッピングすることです。そこで本研究では,産業用AIの実装におけるデータ問題と手法のメタレビューを行う。データソースとコレクション、データアクセスとストレージ、データ統合と相互運用、データ前処理、データ処理、データセキュリティとプライバシ、AIテクノロジの採用などだ。その後、さまざまなAIアルゴリズムのデータ要求を分析する。上記の分析に基づいて、データライフサイクルのすべての段階で、データの問題を体系的に解決する方法について、データ管理フレームワークを提案する。最後に、この研究は今後の研究の方向性を強調している。そこで本研究では、既存の知識体系を充実させ、産業用AIにおけるデータの使いやすさと有用性を達成するための複雑な景観をナビゲートする専門家のためのガイドラインを提供する。 In the era of Industry 4.0, artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. A significant factor contributing to this lag is the data issues in AI implementation. How to address these data issues stands as a significant concern confronting both industry and academia. To address data issues, the first step involves mapping out these issues. Therefore, this study conducts a meta-review to explore data issues and methods within the implementation of industrial AI. Seventy-two data issues are identified and categorized into various stages of the data lifecycle, including data source and collection, data access and storage, data integration and interoperation, data pre-processing, data processing, data security and privacy, and AI technology adoption. Subsequently, the study analyzes the data requirements of various AI algorithms. Building on the aforementioned analyses, it proposes a data management framework, addressing how data issues can be systematically resolved at every stage of the data lifecycle. Finally, the study highlights future research directions. In doing so, this study enriches the existing body of knowledge and provides guidelines for professionals navigating the complex landscape of achieving data usability and usefulness in industrial AI.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# トランスフォーマーには何があるのか? すべての注意が必要なわけではない What Matters in Transformers? Not All Attention is Needed ( http://arxiv.org/abs/2406.15786v1 ) ライセンス: Link先を確認	Shwai He, Guoheng Sun, Zheyu Shen, Ang Li,	(参考訳) Transformerベースの大規模言語モデル(LLM)のスケーリングは、様々なタスクで有望なパフォーマンスを示している。しかし、このスケーリングには冗長な構造も導入されており、現実のデプロイメントには課題がある。 LLMの冗長性はある程度認識されているが、MLPやアテンション層といった異なる構造における冗長性の多様性は未解明である。本研究では、類似度に基づくメトリクスを用いて、ブロック、MLP、アテンション層を含むトランスフォーマー内の異なるモジュール間の異なる冗長性について検討する。この計量は、冗長構造が入力と非常によく似た出力を生成するという前提で機能する。驚いたことに、アテンション層は他の主流アーキテクチャと区別するためにはアテンション層が不可欠であるが、多くのアテンション層が過剰に高い類似性を示し、性能を劣化させることなく安全に切断できることが判明し、メモリと計算コストの削減につながった。さらに,アテンション層とMLP層を共同でドロップする手法を提案し,性能向上と低下率の向上を実現した。 Llama-3-70Bは注目層の半分を刈っても同等の性能を維持している。我々の発見は将来のネットワークアーキテクチャ設計に貴重な洞察を与えてくれる。コードは: \url{https://github.com/Shwai-He/LLM-Drop} でリリースされる。 Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and dropping ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers. Our findings provide valuable insights for future network architecture design. The code will be released at: \url{https://github.com/Shwai-He/LLM-Drop}.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 強双対下におけるロバスト制約強化学習 Distributionally Robust Constrained Reinforcement Learning under Strong Duality ( http://arxiv.org/abs/2406.15788v1 ) ライセンス: Link先を確認	Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue,	(参考訳) 本研究では, 環境分布の変動や制約にともなう期待報酬を最大化することを目的として, 分散ロバスト制約RL (DRC-RL) の問題について検討する。この設定は、トレーニングとテスト環境が異なる状況を捉え、安全や限られた予算によって動機付けられた制約を満たす必要がある。分散ロバストなRLと制約付きRLの分離問題に対するアルゴリズム設計への大きな進歩にもかかわらず、DRC-RLのエンドツーエンド収束保証付きアルゴリズムは存在しない。我々は,環境不確実性のクラスにおいて,最初の効率的かつ証明可能な解を可能にする,強い双対性に基づくアルゴリズム的枠組みを開発する。さらに,本フレームワークは,分散ロバストなRLと制約付きRLのそれぞれに対して適用可能であるにもかかわらず,分散ロバストなRLと制約の組合せから生じるDRC-RL固有の構造を明らかにする。最後に,提案アルゴリズムの有効性を評価するために,カーレースベンチマーク実験を行った。 We study the problem of Distributionally Robust Constrained RL (DRC-RL), where the goal is to maximize the expected reward subject to environmental distribution shifts and constraints. This setting captures situations where training and testing environments differ, and policies must satisfy constraints motivated by safety or limited budgets. Despite significant progress toward algorithm design for the separate problems of distributionally robust RL and constrained RL, there do not yet exist algorithms with end-to-end convergence guarantees for DRC-RL. We develop an algorithmic framework based on strong duality that enables the first efficient and provable solution in a class of environmental uncertainties. Further, our framework exposes an inherent structure of DRC-RL that arises from the combination of distributional robustness and constraints, which prevents a popular class of iterative methods from tractably solving DRC-RL, despite such frameworks being applicable for each of distributionally robust RL and constrained RL individually. Finally, we conduct experiments on a car racing benchmark to evaluate the effectiveness of the proposed algorithm.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# データ駆動システムにおける説明可能なAIのプライバシ含意 Privacy Implications of Explainable AI in Data-Driven Systems ( http://arxiv.org/abs/2406.15789v1 ) ライセンス: Link先を確認	Fatima Ezzeddine,	(参考訳) 機械学習(ML)モデルは、明らかに強力であり、解釈可能性の欠如に悩まされている。透明性の欠如は、しばしばMLモデルのブラックボックスの性質と呼ばれ、信頼を損ね、その説明可能性を高める努力の必要性を喚起する。説明可能なAI(XAI)技術は、これらの複雑なモデルの内部決定プロセスを説明するためのフレームワークと方法を提供することで、この問題に対処する。対実的説明(CF)や特徴の重要性といったテクニックは、この目標を達成する上で重要な役割を担います。さらに、高品質で多様なデータが、堅牢で信頼性の高いMLアプリケーションの基礎的な要素として残っています。多くのアプリケーションにおいて、MLとXAIの説明器のトレーニングに使用されるデータは機密情報を含んでいる。このコンテキストでは、差分プライバシーなど、データ内の機密情報を保護するために、多数のプライバシ保存技術を使用することができる。その後、XAIとプライバシソリューションの対立は、その反対の目標のために現れます。 XAI技術はモデル動作の推論を提供するため、決定境界や特徴値、説明が第3のエンティティに露出した場合のディープラーニングモデルの勾配といったMLモデルに関する情報を明らかにする。攻撃者はこれらの説明を使ってプライバシー侵害攻撃を開始し、モデル抽出、推論、およびメンバーシップ攻撃を行うことができる。このジレンマは、ML意思決定の理解とプライバシ保護の間の適切な均衡を見つけるという課題を浮き彫りにしている。 Machine learning (ML) models, demonstrably powerful, suffer from a lack of interpretability. The absence of transparency, often referred to as the black box nature of ML models, undermines trust and urges the need for efforts to enhance their explainability. Explainable AI (XAI) techniques address this challenge by providing frameworks and methods to explain the internal decision-making processes of these complex models. Techniques like Counterfactual Explanations (CF) and Feature Importance play a crucial role in achieving this goal. Furthermore, high-quality and diverse data remains the foundational element for robust and trustworthy ML applications. In many applications, the data used to train ML and XAI explainers contain sensitive information. In this context, numerous privacy-preserving techniques can be employed to safeguard sensitive information in the data, such as differential privacy. Subsequently, a conflict between XAI and privacy solutions emerges due to their opposing goals. Since XAI techniques provide reasoning for the model behavior, they reveal information relative to ML models, such as their decision boundaries, the values of features, or the gradients of deep learning models when explanations are exposed to a third entity. Attackers can initiate privacy breaching attacks using these explanations, to perform model extraction, inference, and membership attacks. This dilemma underscores the challenge of finding the right equilibrium between understanding ML decision-making and safeguarding privacy.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 量子囚人ジレンマにおけるリスク支配平衡 Risk-Dominant Equilibrium in Quantum Prisoner's Dilemma ( http://arxiv.org/abs/2406.15795v1 ) ライセンス: Link先を確認	Ahmed S. Elgazzar,	(参考訳) ユニークなナッシュ均衡(NE)の選択は理論的な古典ゲームや量子ゲームにおいて重要である。 Eiswer-Wilkens-Lewenstein量子化スキームは、囚人のジレンマを高い絡み合いのために解決する。中絡みでは複数のNEが存在する。量子囚人のジレンマにおけるユニークなNEの選択について,ジレンマ強度パラメータの変動による検討を行った。リスク管理基準が使用される。ジレンマ強度パラメータと絡み合いの影響を強調した。絡み合いがリスク支配均衡を完全にコントロールしていることがわかった。絡み合いはリスク支配平衡における量子協調を促進し、その結果を改善する。 The choice of a unique Nash equilibrium (NE) is crucial in theoretical classical and quantum games. The Eiswer-Wilkens-Lewenstein quantization scheme solves the prisoner's dilemma only for high entanglement. At medium entanglement, there are multiple NEs. We investigate the selection of a unique NE in the quantum prisoner's dilemma with variable dilemma strength parameters. The risk-dominance criterion is used. The influence of the dilemma strength parameters and entanglement is emphasized. We found that entanglement completely controls the risk-dominant equilibrium. Entanglement promotes quantum-cooperation in the risk-dominant equilibrium and thus improves its outcome.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 大規模言語モデルのためのエンティティレベルの未学習の再考 Rethinking Entity-level Unlearning for Large Language Models ( http://arxiv.org/abs/2406.15796v1 ) ライセンス: Link先を確認	Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin,	(参考訳) 大規模言語モデルのアンラーニングは、セキュリティとプライバシの懸念を軽減する可能性があるため、注目を集めている。現在の研究は、主にインスタンスレベルのアンラーニングに焦点を当てており、特に機密コンテンツの予め定義されたインスタンスを忘れることを目的としている。しかし、著作権保護など多くの現実のシナリオにおいて重要な、完全なエンティティ関連情報の削除を探求する上で、注目すべきギャップがまだ残っている。そこで本研究では,対象モデル内のエンティティ関連知識を完全に消去する,エンティティレベルのアンラーニングという新しいタスクを提案する。モデル内のすべてのエンティティ関連知識に実際にアクセスすることの難しさを考えると、擬似エンティティを導入するための微調整モデルを通じて、エンティティレベルの未学習シナリオをシミュレートすることから始める。次に,非学習手法のトレンドにインスパイアされたベースライン手法を開発し,その効果を詳細に比較する。大規模な実験により、現在のアンラーニングアルゴリズムは、効果的なエンティティレベルのアンラーニングを達成するのに苦労していることが明らかになった。さらに,本研究では,未学習時の事前学習において,微調整によって注入される実体関連知識が本来の実体よりも受容されやすいことを示し,事前学習された知識に近づけるために,より徹底的な擬似性注入法の必要性を強調した。 Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many real-world scenarios, such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, where the entity-related knowledge within the target model is supposed to be entirely erased. Given the challenge of practically accessing all entity-related knowledge within a model, we begin by simulating entity-level unlearning scenarios through fine-tuning models to introduce pseudo entities. Following this, we develop baseline methods inspired by trending unlearning techniques and conduct a detailed comparison of their effectiveness in this task. Extensive experiments reveal that current unlearning algorithms struggle to achieve effective entity-level unlearning. Additionally, our analyses further indicate that entity-related knowledge injected through fine-tuning is more susceptible than original entities from pre-training during unlearning, highlighting the necessity for more thorough pseudo-entity injection methods to make them closer to pre-trained knowledge.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# シナジスティックディープグラフクラスタリングネットワーク Synergistic Deep Graph Clustering Network ( http://arxiv.org/abs/2406.15797v1 ) ライセンス: Link先を確認	Benyu Wu, Shifei Ding, Xiao Xu, Lili Guo, Ling Ding, Xindong Wu,	(参考訳) グラフニューラルネットワーク(GNN)を用いて、クラスタリングのための凝集性および識別ノード表現を学習することは、ディープグラフクラスタリングにおいて有望な結果を示している。しかし,既存の手法では,表現学習と構造強化の相互関係は無視されている。本研究は,GNNが深層グラフクラスタリングの可能性を解き放つためには,埋め込みと構造を相乗的に拡張することが重要であることを示唆する。信頼性の高い構造はより凝集性の高いノード表現の獲得を促進する一方、高品質なノード表現は構造の増大を導くことができ、見返りに構造的信頼性を高めることができる。さらに、既存のGNNベースのモデルの一般化能力は比較的貧弱である。それらは高い等質性を持つグラフではうまく機能するが、低い等質性を持つグラフでは不十分に機能する。そこで我々はSynC(Syngistic Deep Graph Clustering Network)というグラフクラスタリングフレームワークを提案する。本稿では,構造拡張を導くための高品質な埋め込みを実現するために,TIGAE (Transform Input Graph Auto-Encoder) を設計する。次に、拡張グラフ上の近傍表現を再取得し、クラスタリングに親しみやすい埋め込みを取得し、自己教師付きクラスタリングを行う。特に、表現学習と構造増強は重みを共有し、モデルパラメータの数を著しく減少させる。さらに、モデルの一般化を改善するための構造微調整戦略を導入する。ベンチマークデータセットの大規模な実験により,本手法の優位性と有効性を示す。コードはGitHubとCode Oceanでリリースされている。 Employing graph neural networks (GNNs) to learn cohesive and discriminative node representations for clustering has shown promising results in deep graph clustering. However, existing methods disregard the reciprocal relationship between representation learning and structure augmentation. This study suggests that enhancing embedding and structure synergistically becomes imperative for GNNs to unleash their potential in deep graph clustering. A reliable structure promotes obtaining more cohesive node representations, while high-quality node representations can guide the augmentation of the structure, enhancing structural reliability in return. Moreover, the generalization ability of existing GNNs-based models is relatively poor. While they perform well on graphs with high homogeneity, they perform poorly on graphs with low homogeneity. To this end, we propose a graph clustering framework named Synergistic Deep Graph Clustering Network (SynC). In our approach, we design a Transform Input Graph Auto-Encoder (TIGAE) to obtain high-quality embeddings for guiding structure augmentation. Then, we re-capture neighborhood representations on the augmented graph to obtain clustering-friendly embeddings and conduct self-supervised clustering. Notably, representation learning and structure augmentation share weights, significantly reducing the number of model parameters. Additionally, we introduce a structure fine-tuning strategy to improve the model's generalization. Extensive experiments on benchmark datasets demonstrate the superiority and effectiveness of our method. The code is released on GitHub and Code Ocean.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# スマートな機能とは何か Smart Feature is What You Need ( http://arxiv.org/abs/2406.15805v1 ) ライセンス: Link先を確認	Zhaoxin Hu, Keyan Ren,	(参考訳) 弱ラベル情報不足による形状誘導の欠如とラベルジッタは、3次元弱教師対象検出の主な問題である。現在の弱教師付きモデルは、弱教師付きおよび完全教師付き手法の本質的な手がかりを生かさずに、弱ラベルから情報を引き出すためのヒューリスティックや仮定手法を用いることが多いため、データ利用効率とモデル精度を組み合わせた手法を探求することは困難である。これらの問題に対処するために,Multiscale Mixed Attention (MMA)と呼ばれる新しいプラグイン・アンド・イン・ポイント・クラウド特徴表現ネットワークを提案する。 MMAは、近傍の隣接注意と異なる密度スケールにおける不均一注意を利用して特徴表現ネットワークを構築する。 MMAから得られるスマート特徴表現は、形状傾向とオブジェクト存在領域推定を有し、検出ボックスの領域を制約し、弱いラベルのデフォルト情報に起因する問題を緩和することができる。室内の弱いラベルのシナリオでは、完全教師付きネットワークは、MMAによる点特徴の改善によってのみ弱教師付きネットワークに近い性能を発揮する。同時に、MMAは廃棄物を宝にし、もともとデータ強化の源となる弱教師付き検出に干渉したラベルジッタ問題を逆転させ、既存の弱監督検出手法の性能を高める。私たちのコードはhttps://github.com/hzx-9894/MMAで公開されています。 Lack of shape guidance and label jitter caused by information deficiency of weak label are the main problems in 3D weakly-supervised object detection. Current weakly-supervised models often use heuristics or assumptions methods to infer information from weak labels without taking advantage of the inherent clues of weakly-supervised and fully-supervised methods, thus it is difficult to explore a method that combines data utilization efficiency and model accuracy. In an attempt to address these issues, we propose a novel plug-and-in point cloud feature representation network called Multi-scale Mixed Attention (MMA). MMA utilizes adjacency attention within neighborhoods and disparity attention at different density scales to build a feature representation network. The smart feature representation obtained from MMA has shape tendency and object existence area inference, which can constrain the region of the detection boxes, thereby alleviating the problems caused by the information default of weak labels. Extensive experiments show that in indoor weak label scenarios, the fully-supervised network can perform close to that of the weakly-supervised network merely through the improvement of point feature by MMA. At the same time, MMA can turn waste into treasure, reversing the label jitter problem that originally interfered with weakly-supervised detection into the source of data enhancement, strengthening the performance of existing weak supervision detection methods. Our code is available at https://github.com/hzx-9894/MMA.	翻訳日:2024-06-25 20:35:12 公開日:2024-06-22
# 評価とフィードバックにおけるAI活用の学生とアカデミックスタッフの理解 Understanding Student and Academic Staff Perceptions of AI Use in Assessment and Feedback ( http://arxiv.org/abs/2406.15808v1 ) ライセンス: Link先を確認	Jasper Roe, Mike Perkins, Daniel Ruelle,	(参考訳) 高等教育における人工知能(AI)と生成人工知能(GenAI)の台頭は、評価改革を必要としている。この研究は、AIとGenAIツールを用いた学生や学術スタッフの経験を探求し、学習と評価における現在の潜在的な応用に対する親しみと快適さに焦点を当てることで、重要なギャップに対処する。オンライン調査では、ベトナムの2つの大学とシンガポールの2つの大学にまたがる35人の研究スタッフと282人の学生のデータを収集し、GenAI習熟度、評価マーキングとフィードバックにおけるその使用感、知識チェックと参加、GenAIテキスト検出の経験を調べた。記述的統計値と反射的主題分析の結果,両群ともGenAIとの親和性は概して低かった。 GenAIのフィードバックは否定的な評価を受けたが、インストラクターのフィードバックと組み合わせると、より肯定的な評価が得られた。研究員は, 学生と比較して, GenAIテキスト検出ツールの受入れや, 検出結果に基づく等級調整が多かった。質的分析では、テキスト検出ツールの不明な理解、GenAI検出器の経験の多様性、教育評価におけるGenAIの将来的な影響に関する混合感情の3つのテーマを特定した。これらの知見は、高等教育におけるGenAI対応評価とフィードバックのための政策と実践の発達に大きな影響を及ぼす。 The rise of Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) in higher education necessitates assessment reform. This study addresses a critical gap by exploring student and academic staff experiences with AI and GenAI tools, focusing on their familiarity and comfort with current and potential future applications in learning and assessment. An online survey collected data from 35 academic staff and 282 students across two universities in Vietnam and one in Singapore, examining GenAI familiarity, perceptions of its use in assessment marking and feedback, knowledge checking and participation, and experiences of GenAI text detection. Descriptive statistics and reflexive thematic analysis revealed a generally low familiarity with GenAI among both groups. GenAI feedback was viewed negatively; however, it was viewed more positively when combined with instructor feedback. Academic staff were more accepting of GenAI text detection tools and grade adjustments based on detection results compared to students. Qualitative analysis identified three themes: unclear understanding of text detection tools, variability in experiences with GenAI detectors, and mixed feelings about GenAI's future impact on educational assessment. These findings have major implications regarding the development of policies and practices for GenAI-enabled assessment and feedback in higher education.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# LaMSUM: LLMを用いたユーザ生成コンテンツの抽出要約のための新しいフレームワーク LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs ( http://arxiv.org/abs/2406.15809v1 ) ライセンス: Link先を確認	Garima Chhikara, Anurag Sharma, V. Gurucharan, Kripabandhu Ghosh, Abhijnan Chakraborty,	(参考訳) 大規模言語モデル(LLM)は、要約を含む幅広いNLPタスクにおいて、印象的なパフォーマンスを示している。代わってLLMは抽象的な要約を生成するが、LLMを通して抽出的な要約を達成するという課題はいまだに未解明のままである。本研究では,このギャップを埋めるために,投票アルゴリズムを利用してLLMを用いて抽出要約を生成する新しいフレームワークであるLaMSUMを提案する。 Llama 3 と Mixtral と Gemini の3つのオープンソース LLM について評価した結果,LaMSUM は最先端の抽出要約法より優れていることがわかった。さらに,LLMが生成したアウトプット・サマリーの背景にある理論的根拠について述べる。全体として、これはLLMを利用して大きなユーザ生成テキストを抽出的に要約する試みの1つであり、コミュニティにさらなる関心を喚起する可能性が高い。 Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization. Inherently LLMs produce abstractive summaries, and the task of achieving extractive summaries through LLMs still remains largely unexplored. To bridge this gap, in this work, we propose a novel framework LaMSUM to generate extractive summaries through LLMs for large user-generated text by leveraging voting algorithms. Our evaluation on three popular open-source LLMs (Llama 3, Mixtral and Gemini) reveal that the LaMSUM outperforms state-of-the-art extractive summarization methods. We further attempt to provide the rationale behind the output summary produced by LLMs. Overall, this is one of the early attempts to achieve extractive summarization for large user-generated text by utilizing LLMs, and likely to generate further interest in the community.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# PointDreamer: 2Dインパインティングによる色付き点雲からのゼロショット3Dテクスチャメッシュ再構成 PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting ( http://arxiv.org/abs/2406.15811v1 ) ライセンス: Link先を確認	Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen,	(参考訳) 色のついたポイントクラウドからテクスチャ化されたメッシュを再構築することは、3Dグラフィックスとビジョンにおいて重要な課題である。既存のほとんどの手法は、ぼやけたテクスチャや一般化能力の欠如に苦しむ3DまたはUV空間における暗黙の関数として色を予測する。そこで我々は,色付き点雲からテクスチャ化されたメッシュ再構築のための新しいフレームワークであるPointDreamerを提案する。成熟した技術と2Dビジョンの膨大なデータを活用することで、2Dイメージのインペイントにより、忠実さと明瞭さを向上したメッシュを生成する。具体的には、まず入力点雲を2次元空間に投影し、スパースなマルチビュー画像を生成し、事前訓練された2次元拡散モデルを用いて空のピクセルを塗布する。次に,塗布された濃淡画像の色を3次元空間に戻して最終テクスチャメッシュを得る,新しい非境界ファースト戦略を設計する。このように、PointDreamerはゼロショットで動作し、追加のトレーニングは不要です。各種合成および実スキャンデータセットの大規模定性的および定量的実験は、LPIPSスコア(0.118から0.068)を30倍改善したベースライン法を著しく上回り、PointDreamerのSoTA性能を示す。コードネームはhttps://github.com/YuQiao0303/PointDreamer。 Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produces meshes with enhanced fidelity and clarity by 2D image inpainting, taking advantage of the mature techniques and massive data of 2D vision. Specifically, we first project the input point cloud into 2D space to generate sparse multi-view images, and then inpaint empty pixels utilizing a pre-trained 2D diffusion model. Next, we design a novel Non-Border-First strategy to unproject the colors of the inpainted dense images back to 3D space, thus obtaining the final textured mesh. In this way, our PointDreamer works in a zero-shot manner, requiring no extra training. Extensive qualitative and quantitative experiments on various synthetic and real-scanned datasets show the SoTA performance of PointDreamer, by significantly outperforming baseline methods with 30\% improvement in LPIPS score (from 0.118 to 0.068). Code at: https://github.com/YuQiao0303/PointDreamer.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 固有次元相関:マルチモーダル表現における非線形接続の発見 Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations ( http://arxiv.org/abs/2406.15812v1 ) ライセンス: Link先を確認	Lorenzo Basile, Santiago Acevedo, Luca Bortolussi, Fabio Anselmi, Alex Rodriguez,	(参考訳) 機械学習手法の背後にあるメカニズムを理解するためには、データポイントを記述する機能間の接続を確立することが不可欠である。しかし、これらの相関はしばしば高次元かつ強い非線形性を示すため、標準手法による検出は困難である。本稿では、内在次元と相関の絡み合いを利用して、高次元多様体間の(潜在的に非線形な)相関を定量化する計量を提案する。まず,制御環境における合成データの検証を行い,その利点と欠点を既存手法と比較した。その後、ニューラルネットワーク表現における大規模アプリケーションに分析を拡張します。具体的には,マルチモーダルデータの潜在表現に着目し,ペアの視覚とテキストの埋め込みの間に明確な相関関係を明らかにする。その結果, 潜在多様体間の高非線形相関パターンの存在が示唆された。 To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 可読性:画像分類のための言語ボトルネックモデルの再検討 Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification ( http://arxiv.org/abs/2406.15816v1 ) ライセンス: Link先を確認	Honori Udo, Takafumi Koshinaka,	(参考訳) 我々は、画像分類のためのディープラーニングモデルの説明可能性を保証するアプローチとして、言語ボトルネックモデルを再考する。画像が言語に変換される過程で必然的に発生する情報損失のため、言語のボトルネックモデルの精度は標準のブラックボックスモデルよりも劣っていると考えられる。しかし,近年の視覚・言語モデルに基づく画像キャプタは,これまで現実的には不可能と考えられていた程度まで,口コミで正確に画像を記述する能力を有している。災害画像分類の課題として,現代の画像キャプタと事前学習された言語モデルを組み合わせた言語ボトルネックモデルが,ブラックボックスモデルを上回る画像分類精度を達成できることを実験的に示す。また,言語ボトルネックモデルとブラックボックスモデルが画像から異なる特徴を抽出し,両者を融合させることで相乗効果が得られ,さらに高い分類精度が得られることを示した。 We revisit language bottleneck models as an approach to ensuring the explainability of deep learning models for image classification. Because of inevitable information loss incurred in the step of converting images into language, the accuracy of language bottleneck models is considered to be inferior to that of standard black-box models. Recent image captioners based on large-scale foundation models of Vision and Language, however, have the ability to accurately describe images in verbal detail to a degree that was previously believed to not be realistically possible. In a task of disaster image classification, we experimentally show that a language bottleneck model that combines a modern image captioner with a pre-trained language model can achieve image classification accuracy that exceeds that of black-box models. We also demonstrate that a language bottleneck model and a black-box model may be thought to extract different features from images and that fusing the two can create a synergistic effect, resulting in even higher classification accuracy.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# ワイヤレスシステムのためのAIモデル自動選択:デジタルツインニングによるオンライン学習 Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning ( http://arxiv.org/abs/2406.15819v1 ) ライセンス: Link先を確認	Qiushuo Hou, Matteo Zecchin, Sangwoo Park, Yunlong Cai, Guanding Yu, Kaushik Chowdhury, Osvaldo Simeone,	(参考訳) O-RANのような現代の無線ネットワークアーキテクチャでは、人工知能(AI)ベースのアプリケーションは、スケジューリングや電力制御などの機能を実行するためにインテリジェントコントローラにデプロイされる。 AI"アプリ"は、ネットワーク条件、トポロジ、トラフィック統計、設計目標などのコンテキスト情報に基づいて選択される。コンテキストとAIモデルパラメータのマッピングは、現在のデータを必要としないコンテキスト情報のみを活用する自動モデル選択(AMS)マッピングを通じて、ゼロショットで理想的に行われる。本稿では,AMSマッピングのオンライン最適化のための一般的な手法を紹介する。 AMSマッピングの最適化は、さまざまなコンテキストから収集されたデータを公開する必要があるため、難しい。したがって、オンラインに実行された場合、この初期最適化フェーズは非常に時間がかかります。可能な解決策は、物理システムのデジタルツインを利用して、複数のシミュレートされたコンテキストから合成データを生成することである。しかし、デジタルツインのシミュレータが不完全なことを考えると、AMSマッピングの最適化にシミュレーションデータを直接使用すると、実際のシステムでのテストでは性能が低下する。本稿では,物理システムから収集した限られた実データを用いて,シミュレータのバイアスを補正するAMSマッピングのオンライン最適化手法を提案する。グラフニューラルネットワークに基づく電力制御アプリの実験結果から,提案手法の利点が示された。 In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameters is ideally done in a zero-shot fashion via an automatic model selection (AMS) mapping that leverages only contextual information without requiring any current data. This paper introduces a general methodology for the online optimization of AMS mappings. Optimizing an AMS mapping is challenging, as it requires exposure to data collected from many different contexts. Therefore, if carried out online, this initial optimization phase would be extremely time consuming. A possible solution is to leverage a digital twin of the physical system to generate synthetic data from multiple simulated contexts. However, given that the simulator at the digital twin is imperfect, a direct use of simulated data for the optimization of the AMS mapping would yield poor performance when tested in the real system. This paper proposes a novel method for the online optimization of AMS mapping that corrects for the bias of the simulator by means of limited real data collected from the physical system. Experimental results for a graph neural network-based power control app demonstrate the significant advantages of the proposed approach.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 非線形偏微分方程式の量子シミュレーションの一般的な枠組み A general frame of quantum simulation for nonlinear partial differential equations ( http://arxiv.org/abs/2406.15821v1 ) ライセンス: Link先を確認	Shijun Liao,	(参考訳) 現在、Jinらは任意の線形PDEの量子シミュレーション手法(Schr\"{o}dingerization [1-3])を提案しており、これは多くの非ハミルトン線型PDEを解くためにうまく適用されている。本稿では、任意の非線形PDEを直列解の収束保証付き線形PDEに変換できるホモトピー解析法(HAM) [4-6] とを併用することにより、量子シミュレーションのシュル'{o}ディンガー化法を任意の非線形PDEに拡張する。このようにして、非線形PDEは量子コンピュータを用いた量子シミュレーションによって解決できるが、将来的には開発されない。単純性については、'the HAM-Schr\"{o}dingerisation quantum algorithm' と呼ぶ。 Currently, Jin et al. proposed a quantum simulation technique for any a linear PDE, called Schr\"{o}dingerisation [1-3], which has been successfully applied to solve many non-Hamiltonian linear PDEs. In this paper, the Schr\"{o}dingerisation technique of quantum simulation is expanded to any a nonlinear PDE by means of combining the Schr\"{o}dingerisation technique with the homotopy analysis method (HAM) [4-6] that can transfer any a nonlinear PDE into a series of linear PDEs with convergence guarantee of series solution. In this way, a nonlinear PDE can be solved by quantum simulation using a quantum computer -- yet to be developed in the future. For simplicity, we call it ``the HAM-Schr\"{o}dingerisation quantum algorithm''.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# CaT-BENCH:計画における因果依存性と時間依存性のベンチマーク言語モデル CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans ( http://arxiv.org/abs/2406.15823v1 ) ライセンス: Link先を確認	Yash Kumar Lal, Vanya Cohen, Nathanael Chambers, Niranjan Balasubramanian, Raymond Mooney,	(参考訳) 指導文やレシピなどの自然言語プランを推論するLLMの能力を理解することは、意思決定システムにおいてそれらを確実に活用することが重要である。計画の基本的な側面は、ステップの実行が必要な時間的順序であり、それら間の因果依存性を反映している。本稿では,調理レシピ計画において,ステップの前後にステップが必ず発生する必要があるかどうかを検証した,ステップ順序予測のベンチマークであるCaT-Benchを紹介する。我々は、この手法を用いて、フロンティアのLLMが因果関係と時間的依存をいかによく理解しているかを評価する。我々はSOTA LLMが圧倒されていること(最もゼロショットはF1でわずか0.59)、そしてより頻繁に依存を予測することに偏りがあり、おそらくヒューリスティックなステップの時間的順序に依存している。説明のプロンプトと少数ショット例の使用によりパフォーマンスが向上する一方で、最高のF1結果は0.73である。さらに,人間による説明の評価と回答の正しさは,平均的にモデル推論と一致しないことを示している。驚いたことに、回答後の説明は通常のチェーン・オブ・シークレット・プロンプトよりも優れたパフォーマンスをもたらし、LCMの回答は、同じステップペアに関する質問間で一貫性がないこともわかりました。その結果,LSMがステップ間の依存性を検出する能力は改善の余地があることが示唆された。 Understanding the abilities of LLMs to reason about natural language plans, such as instructional text and recipes, is critical to reliably using them in decision-making systems. A fundamental aspect of plans is the temporal order in which their steps needs to be executed, which reflects the underlying causal dependencies between them. We introduce CaT-Bench, a benchmark of Step Order Prediction questions, which test whether a step must necessarily occur before or after another in cooking recipe plans. We use this to evaluate how well frontier LLMs understand causal and temporal dependencies. We find that SOTA LLMs are underwhelming (best zero-shot is only 0.59 in F1), and are biased towards predicting dependence more often, perhaps relying on temporal order of steps as a heuristic. While prompting for explanations and using few-shot examples improve performance, the best F1 result is only 0.73. Further, human evaluation of explanations along with answer correctness show that, on average, humans do not agree with model reasoning. Surprisingly, we also find that explaining after answering leads to better performance than normal chain-of-thought prompting, and LLM answers are not consistent across questions about the same step pairs. Overall, results show that LLMs' ability to detect dependence between steps has significant room for improvement.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# MVOC:拡散モデルを用いたトレーニング不要なマルチビデオオブジェクト合成法 MVOC: a training-free multiple video object composition method with diffusion models ( http://arxiv.org/abs/2406.15829v1 ) ライセンス: Link先を確認	Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang,	(参考訳) ビデオ編集は、ビデオ編集のコアタスクである。拡散モデルに基づく画像合成は非常に成功しているが、映像オブジェクト合成タスクに成果を拡大することは容易ではない。これは、対応する相互作用効果を示すだけでなく、合成されたビデオ内のオブジェクトが、物理ハーモニービデオの合成に必要な動きとアイデンティティの整合性を維持することを保証する。この課題に対処するため,拡散モデルに基づくMVOC法を提案する。具体的には、まず各ビデオオブジェクトに対してDDIMインバージョンを行い、対応するノイズ特性を得る。次に、画像編集手法で各オブジェクトを合成して編集し、合成ビデオの最初のフレームを得る。最後に,ビデオ生成のための訓練不要条件付きガイダンス操作であるVideo Object Dependence Moduleにおいて,映像に特徴や注意を注入した映像を合成するために画像から映像生成モデルを用い,合成ビデオに非依存な様々なオブジェクト間の特徴や注意マップの調整を可能にする。最後の生成モデルは、生成されたビデオ内のオブジェクトを、元のオブジェクトの動きとアイデンティティと整合性に制約するだけでなく、オブジェクト間の相互作用効果も導入する。大規模な実験により,提案手法は既存の最先端手法よりも優れていることが示された。プロジェクトページ: https://sobeymil.github.io/mvoc.com Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to composite a physical harmony video. To address this challenge, we propose a Multiple Video Object Composition (MVOC) method based on diffusion models. Specifically, we first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video. Finally, we use the image-to-video generation model to composite the video with feature and attention injections in the Video Object Dependence Module, which is a training-free conditional guidance operation for video generation, and enables the coordination of features and attention maps between various objects that can be non-independent in the composited video. The final generative model not only constrains the objects in the generated video to be consistent with the original object motion and identity, but also introduces interaction effects between objects. Extensive experiments have demonstrated that the proposed method outperforms existing state-of-the-art approaches. Project page: https://sobeymil.github.io/mvoc.com.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# Shape2.5D: 深さと正規値推定のためのテクスチャレス表面のデータセット Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation ( http://arxiv.org/abs/2406.15831v1 ) ライセンス: Link先を確認	Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker,	(参考訳) テクスチャのない表面を再構築することは、主にテクスチャ情報がない場合に、不規則な深さと正規さの要求を満たす特別なデータセットが欠如していることから、コンピュータビジョンにおいてユニークな課題を生んでいる。このギャップに対処するために設計された,新しい大規模データセットであるShape2.5Dを紹介した。 2635の3Dモデルと48のユニークなオブジェクトからなる364kフレームで構成されたデータセットは、テクスチャレスオブジェクト再構成のための深さと表面の正常マップを提供する。提案したデータセットは、様々な照明条件や視角をシミュレートする3Dモデリングソフトウェアでレンダリングされた合成画像を含む。また、深度カメラでキャプチャされた4672フレームからなる現実世界のサブセットも含まれている。修正エンコーダデコーダネットワークを用いて実施した包括的なベンチマークでは,RGB画像から深度と正常度を頑健に推定するアルゴリズムの開発を支援するデータセットの能力を示す。私たちのオープンソースのデータ生成パイプラインは、データセットを拡張して、将来の研究に適応できるようにします。データセットは \url{https://github.com/saifkhichi96/Shape25D} で公開されている。 Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4672 frames captured with a depth camera. Our comprehensive benchmarks, performed using a modified encoder-decoder network, showcase the dataset's capability to support the development of algorithms that robustly estimate depth and normals from RGB images. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at \url{https://github.com/saifkhichi96/Shape25D}.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 集中集約型分散型変圧器は, サンプル効率の良い多エージェント世界モデルである Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models ( http://arxiv.org/abs/2406.15836v1 ) ライセンス: Link先を確認	Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li,	(参考訳) モデルフリー強化学習(RL)エージェントのワールドモデルを学ぶことは、想像力で学習ポリシーを学習することで、サンプル効率を著しく向上させることができる。しかし,MARL(Multi-Agent RL)の世界モデルの構築は,多数のエージェントから生じる集中型アーキテクチャにおけるスケーラビリティの問題や,エージェント間の依存性から生じる分散型アーキテクチャにおける非定常性の問題から,特に困難である。両課題に対処するために,拡張性のための分散ローカルダイナミクスを学習するMARLの新たな世界モデルと,すべてのエージェントからの集中型表現集約を提案する。本研究では,異なるエージェント間で複雑な局所力学をモデル化し,正確かつ一貫した長期的想像力を提供するために,表現型トランスフォーマーアーキテクチャを活用することで,離散トークン上の自己回帰シーケンスモデリング問題として動的学習を論じる。マルチエージェントシステムのためのトランスフォーマーベースの世界モデルとして,Perceiver Transformer を有効解として導入し,このコンテキスト内での表現集約を実現する。 Starcraft Multi-Agent Challenge (SMAC) の結果は、サンプル効率と全体的な性能の両方において、強力なモデルフリーアプローチと既存のモデルベース手法よりも優れていることを示している。 Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 量子参照フレームのフレームバンドル定式化:視点の重ね合わせから幾何学の重ね合わせへ A frame-bundle formulation of quantum reference frames: from superposition of perspectives to superposition of geometries ( http://arxiv.org/abs/2406.15838v1 ) ライセンス: Link先を確認	Daniel A. Turolla Vanzella, Jeremy Butterfield,	(参考訳) 我々は、量子参照フレーム(QRF)のコアアイデアが重力の文脈で適用され、その定義が座標系のような不必要な(しかし便利な)要素から解放されるため、完全に幾何学的な定式化が可能である。私たちの定式化は2つの主要な考えに基づいている。まず、QRFは観測者の(従って測定装置の)各時空点(すなわち事象)における時間と空間の認識の不確かさを符号化する。これに対し、イベント $p$ のオブザーバは、通常のように、接空間 $T_p$ のテトラッドとしてモデル化される。したがって、イベントにおける QRF は、$p$ のテトラッド上の複素函数である。第二に、与えられた多様体上の計量を指定できるのは、各接空間に割り当てる基底が、指定したい計量の四元数であることを定義することで得られる。したがって、時空、すなわち多様体+計量は、その上の「視点」の選択とともに、基底の束の部分で表され、各点に割り当てられた基底を四元数とするものとして理解される。したがって、時空の重ね合わせは、大まかに言えば、このバンドルの切断に対する複素振幅の割り当てとして表される。ここで定義される QRF は、事象の基底に割り当てられた複素振幅の集合、すなわち多様体の基底の束上に定義される複素関数の集まりであり、局所的な方法で記述することができる(つまり、区間全体ではなく、事象の基底に振幅を帰属させる)。この定式化は、いくつかの概念的側面と、QRFに関する現在の考え方の拡張に光を当てていると信じている。例えば、幾何学的な用語で考えると、文献で扱われる重力的シナリオ(線形近似の他に)に適用されるQRFの考えは、任意性による予測力を欠いていることが明らかになる。 We provide a possible fully geometric formulation of the core idea of quantum reference frames (QRFs) as it has been applied in the context of gravity, freeing its definition from unnecessary (though convenient) ingredients, such as coordinate systems. Our formulation is based on two main ideas. First, a QRF encodes uncertainty about what is the observer's (and, hence, the measuring apparatus's) perception of time and space at each spacetime point (i.e., event). For this, an observer at an event $p$ is modeled, as usual, as a tetrad in the tangent space $T_p$. So a QRF at an event $p$ is a complex function on the tetrads at $p$. Second, we use the result that one can specify a metric on a given manifold by stipulating that a basis one assigns at each tangent space is to be a tetrad in the metric one wants to specify. Hence a spacetime, i.e. manifold plus metric, together with a choice of "point of view" on it, is represented by a section of the bundle of bases, understood as taking the basis assigned to each point to be a tetrad. Thus a superposition of spacetimes gets represented as, roughly speaking, an assignment of complex amplitudes to sections of this bundle. A QRF, defined here as the collection of complex amplitudes assigned to bases at events--i.e., a complex function defined on the bundle of bases of the manifold--can describe, in a local way (i.e., attributing the amplitudes to bases at events instead of to whole sections), these superpositions. We believe that this formulation sheds some light on some conceptual aspects and possible extensions of current ideas about QRFs. For instance, thinking in geometric terms makes it clear that the idea of QRFs applied to the gravitational scenarios treated in the literature (beyond linear approximation) lacks predictive power due to arbitrariness which, we argue, can only be resolved by some further input from physics.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# テキストベース説明可能なAIにおける局所サロゲートモデルの精度安定度に及ぼす類似度の影響 The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI ( http://arxiv.org/abs/2406.15839v1 ) ライセンス: Link先を確認	Christopher Burger, Charles Walter, Thai Le,	(参考訳) 最近の研究は、機械学習(ML)モデルの入力に対する対向的摂動に対する局所的代理法(英語版)の脆弱性について検討している。多くの方法にまたがる弱点が存在することが示されているが、その理由はいまだほとんど調査されていない。説明可能なAI(XAI)に対する敵対的攻撃の概念の中心は、ある説明が別の説明とどのように異なるかを計算するのに使用される類似度尺度である。過度に敏感な測定は過大な脆弱性をもたらすが、過度に弱さを減らしている。我々は、ケンドールのタウ、スピアマンのフットルール、ランクバイアスオーバーラップなど、テキストベースのランクリストのために設計された様々な類似度尺度について検討し、一般的な敵攻撃プロセスから生じる結論に、測定値や成功のしきい値の実質的な変化がどの程度影響するかを検証した。ある種の測定は過度に敏感であることが判明し、誤った安定性の見積がもたらされる。 Recent work has investigated the vulnerability of local surrogate methods to adversarial perturbations on a machine learning (ML) model's inputs, where the explanation is manipulated while the meaning and structure of the original input remains similar under the complex model. While weaknesses across many methods have been shown to exist, the reasons behind why still remain little explored. Central to the concept of adversarial attacks on explainable AI (XAI) is the similarity measure used to calculate how one explanation differs from another A poor choice of similarity measure can result in erroneous conclusions on the efficacy of an XAI method. Too sensitive a measure results in exaggerated vulnerability, while too coarse understates its weakness. We investigate a variety of similarity measures designed for text-based ranked lists including Kendall's Tau, Spearman's Footrule and Rank-biased Overlap to determine how substantial changes in the type of measure or threshold of success affect the conclusions generated from common adversarial attack processes. Certain measures are found to be overly sensitive, resulting in erroneous estimates of stability.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# デジタル公共商品のプライバシ要件と現実 Privacy Requirements and Realities of Digital Public Goods ( http://arxiv.org/abs/2406.15842v1 ) ライセンス: Link先を確認	Geetika Gopi, Aadyaa Maddi, Omkhar Arasaratnam, Giulia Fanti,	(参考訳) 国際開発コミュニティでは、「デジタル・パブリック・グッズ」という用語が、国連(UN)持続可能な開発目標に対応するためのオープンソースのデジタル製品(ソフトウェア、データセットなど)を指すために使われる。 DPGは、世界中の政府サービス(ID管理、医療登録など)に利用されている。 DPGは機密データを処理できるため、国連はDPGのファーストオーダー要件としてユーザプライバシを確立している。 DPGのプライバシーリスクは、現在、DPGのプライバシー姿勢を評価するために設計された質問を含む、DPG標準によって部分的に管理されている。本研究は、適切なプライバシー保護を確保するため、現行のDMG標準の有効性について検討する。本稿では,ユーザプライバシ保護に関するDSGからの回答を体系的に評価する。プライバシの脅威を特定し,DSG標準に対する回答と比較するため,広範に使用されている3つのDSGの詳細なケーススタディも提示する。以上の結果から,現在のDSG標準の評価手法の限界が明らかとなった。我々は、プライバシーに関する DPG 標準を強化するための事前勧告と提案を提示することで、結論付ける。さらに、この研究は、エンドユーザーだけでなく、サードパーティによるユーザー対応技術の採用者に対しても、プライバシーのコミュニケーションに関するより有用なプライバシー研究を促進することを願っています。 In the international development community, the term "digital public goods" is used to describe open-source digital products (e.g., software, datasets) that aim to address the United Nations (UN) Sustainable Development Goals. DPGs are increasingly being used to deliver government services around the world (e.g., ID management, healthcare registration). Because DPGs may handle sensitive data, the UN has established user privacy as a first-order requirement for DPGs. The privacy risks of DPGs are currently managed in part by the DPG standard, which includes a prerequisite questionnaire with questions designed to evaluate a DPG's privacy posture. This study examines the effectiveness of the current DPG standard for ensuring adequate privacy protections. We present a systematic assessment of responses from DPGs regarding their protections of users' privacy. We also present in-depth case studies from three widely-used DPGs to identify privacy threats and compare this to their responses to the DPG standard. Our findings reveal limitations in the current DPG standard's evaluation approach. We conclude by presenting preliminary recommendations and suggestions for strengthening the DPG standard as it relates to privacy. Additionally, we hope this study encourages more usable privacy research on communicating privacy, not only to end users but also third-party adopters of user-facing technologies.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 進化のユニタリ性に埋め込まれた量子幾何学--スピン共鳴と結晶帯における量子振動と脱落としての影響を明らかにする Quantum geometry embedded in unitarity of evolution: revealing its impacts as quantum oscillation and dephasing in spin resonance and crystal bands ( http://arxiv.org/abs/2406.15845v1 ) ライセンス: Link先を確認	B. Q. Song, J. D. H. Smith, T. Jiang, Y. X. Yao, J. Wang,	(参考訳) 量子ホール効果は結晶中のトポロジーを明らかにする直感的な方法を提供する。ここでは、より広い概念である量子幾何学の「視覚化」の相手を探す。量子幾何学は、特定の詳細や近似から独立して、ユニタリ進化の本質的な結果として量子においてどのように現れるかを示し、量子幾何学が広く適用可能であることを示唆する。実際、スピンやバンドのシナリオにおいて、振動やデファスティングなどの幾何学的可観測物を例示する。これらの現象は幾何学の連続性のために頑健であり、幾何学的パラメータによって調整することができる。解析解と数値解の両方によって支持される異常は、幾何学的視点を採用するという利点を強調し、識別可能な実験的シグネチャをもたらす可能性がある。 Quantum Hall effects provide intuitive ways of revealing the topology in crystals, i.e., each quantized "step" represents a distinct topological state. Here, we seek a counterpart for "visualizing" quantum geometry, which is a broader concept. We show how geometry emerges in quantum as an intrinsic consequence of unitary evolution, independent of specific details or approximations, suggesting quantum geometry may have widespread applicability. Indeed, we exemplify geometric observables, such as oscillation, dephasing, in spin and band scenarios. These phenomena are robust owing to the continuity of geometry, and can be tuned by geometric parameters. Anomalies, supported by both analytic and numerical solutions, underscore the advantages of adopting a geometric perspective, potentially yielding distinguishable experimental signatures.	翻訳日:2024-06-25 20:25:27 公開日:2024-06-22
# 音声テキスト生成のための補間強化の再検討 Revisiting Interpolation Augmentation for Speech-to-Text Generation ( http://arxiv.org/abs/2406.15846v1 ) ライセンス: Link先を確認	Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang,	(参考訳) 音声テキスト生成システム(S2T)は、主にラベル付きデータセットが不足しているため、低リソースシナリオでしばしば課題に直面している。新たなソリューションの1つは、入力とラベルを補間することで仮想トレーニングサンプルを構築することである。その可能性にも拘わらず、S2Tタスクにおけるこの手法の適用は、まだ未調査のままである。本稿では,いくつかの重要な疑問に導かれる補間強化の有用性を探求する。その結果,補間強化に適切な戦略を採用することで,各種タスクやアーキテクチャ,データスケールのパフォーマンスが大幅に向上し,資源制約下でのより堅牢なS2Tシステムの実現が期待できることがわかった。 Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 多変量変圧器によるソーラードライバ予測の強化 Enhancing Solar Driver Forecasting with Multivariate Transformers ( http://arxiv.org/abs/2406.15847v1 ) ライセンス: Link先を確認	Sergio Sanchez-Hurtado, Victor Rodriguez-Fernandez, Julia Briden, Peng Mun Siew, Richard Linares,	(参考訳) 本研究では,F10.7,S10.7,M10.7,Y10.7を時系列変換器(PatchTST)で予測する総合的なフレームワークを開発する。太陽活動の高レベルと低レベルを均等に表現するために、太陽運転者の歴史的分布とトレーニングセットの間の距離に基づいて、試料を重み付けするためのカスタム損失関数を構築した。ソーラードライバー予測フレームワークには、18日間の見返りウィンドウと6日間の将来の予測が含まれている。宇宙環境技術(SET)データセットに対してベンチマークを行うと、我々のモデルは、ほぼ全てのケースにおいて標準平均誤差が低い予測を常に生成し、高い太陽活動の期間における予測精度が向上する。すべてのコードはGithub https://github.com/ARCLab-MIT/sw-driver-forecasterで公開されている。 In this work, we develop a comprehensive framework for F10.7, S10.7, M10.7, and Y10.7 solar driver forecasting with a time series Transformer (PatchTST). To ensure an equal representation of high and low levels of solar activity, we construct a custom loss function to weight samples based on the distance between the solar driver's historical distribution and the training set. The solar driver forecasting framework includes an 18-day lookback window and forecasts 6 days into the future. When benchmarked against the Space Environment Technologies (SET) dataset, our model consistently produces forecasts with a lower standard mean error in nearly all cases, with improved prediction accuracy during periods of high solar activity. All the code is available on Github https://github.com/ARCLab-MIT/sw-driver-forecaster.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 写真画質向上のための品質誘導型肌音強調 Quality-guided Skin Tone Enhancement for Portrait Photography ( http://arxiv.org/abs/2406.15848v1 ) ライセンス: Link先を確認	Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai,	(参考訳) 近年,写真の色とトーン向上手法が普及している。しかし、ほとんどの学習ベースの画像強調手法は、あるデータセットに基づいて、ある分布から別の分布へのマッピングを学習するだけであり、画像の連続的かつ制御性に欠ける。学習ベースエンハンスメントモデルで画像を連続的に調整できることが重要である。本稿では,画質評価の異なる画像の分布を画像強調モデルで学習できる品質誘導型画像強調パラダイムを提案する。この分布を学習することにより、画像強調モデルは、画像の特徴とそれに対応する知覚的品質を関連付けることができ、異なる品質スコアに応じて画像を継続的に調整することができる。提案手法の有効性を検証するために,画像の肌色調整に着目した主観的品質評価実験を行った。本実験から得られた主観的品質評価によって,本手法は品質要求に応じて皮膚のトーンを調整することができる。さらに,10個の天然物画像に対して行った実験により,被写体数が少なく,被写体数が少ない状況において,本モデルの有効性を裏付けるとともに,本モデルが自然物画像に適用可能であることを示す。私たちのプロジェクトページはhttps://github.com/IntMeGroup/quality-guided-enhancementです。 In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image continuously, since in many cases we may want to get a slighter or stronger enhancement effect rather than one fixed adjusted result. In this paper, we propose a quality-guided image enhancement paradigm that enables image enhancement models to learn the distribution of images with various quality ratings. By learning this distribution, image enhancement models can associate image features with their corresponding perceptual qualities, which can be used to adjust images continuously according to different quality scores. To validate the effectiveness of our proposed method, a subjective quality assessment experiment is first conducted, focusing on skin tone adjustment in portrait photography. Guided by the subjective quality ratings obtained from this experiment, our method can adjust the skin tone corresponding to different quality requirements. Furthermore, an experiment conducted on 10 natural raw images corroborates the effectiveness of our model in situations with fewer subjects and fewer shots, and also demonstrates its general applicability to natural images. Our project page is https://github.com/IntMeGroup/quality-guided-enhancement .	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 単一スピンセンサを用いた磁気オートオシレーションのナノスケールマッピング Nanoscale Mapping of Magnetic Auto-oscillations with a single Spin Sensor ( http://arxiv.org/abs/2406.15849v1 ) ライセンス: Link先を確認	Toni Hache, Anshu Anshu, Tetyana Shalomayeva, Rainer Stöhr, Klaus Kern, Jörg Wrachtrup, Aparajita Singha,	(参考訳) 磁気自己振動は減衰補償磁化前兆である。これらはスピンホールナノオシレータ(SHNO)などで生成することができる。これらのデバイスに関する現在の研究は、次世代の通信技術のためのエネルギー効率の高いハードウェアを作成することを目的としている。しかし, 単一SHNOデバイスにおいて, 自動振動モードの形成, 出力出力, 線幅を規定する基礎物理学はいまだ解明されていない。我々は、単一スピン量子センサを用いて、金属SHNO中の磁気オートオシレーションの発生源を画像化した。センサスピンの電子スピン共鳴遷移を駆動することで、ナノスケールのオートオシレーションスポットが生成するマイクロ波を直接測定し、より高速な取得速度(100ミリ秒/ピクセル)を実現する。最大反制振点のみによって定義されるのではなく,磁場ミニマの位置によって自己振動点が決定されることを定量的磁気メソメトリーで実験的に実証した。後者はスピン波を閉じ込める局所ポテンシャル井戸として機能し、大きな振幅オートオシレーションをサポートする。これらの点における磁場の大きさを比較することにより、自動振動モードの異なる周波数を解読する。オートオシレーションモードとスピン波ポテンシャル井戸の相互作用に関する洞察は、実際のデバイスの高度なエンジニアリングを可能にする。 Magnetic auto-oscillations are damping-compensated magnetization precessions. They can be generated in spin Hall nano-oscillators (SHNO) among others. Current research on these devices is dedicated to create next generation energy-efficient hardware for communication technologies. However, the underlying physics governing the formation of auto-oscillation modes, their output power and line width in a single SHNO device have remained elusive so far. We image the sources of magnetic auto-oscillations in a metallic SHNO using a single spin quantum sensor. We directly measure the microwave field generated by an auto-oscillation spot at the nanoscale by driving the electron spin resonance transition of the sensor spin, enabling faster acquisition speed (100 ms/pixel). Instead of being defined by the points of the largest antidamping only, we experimentally demonstrate for the first time with quantitative magnetometry that the auto-oscillation spots are determined by the positions of the magnetic field minima. The latter act as local potential wells for confining spin-waves, thus supporting large amplitude auto-oscillations. By comparing the magnitude of the magnetic stray field at these spots, we decipher the different frequencies of the auto-oscillation modes. The insights gained regarding the interaction between auto-oscillation modes and spin-wave potential wells enable advanced engineering of real devices.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# オプションによる価値保全計画のための抽象世界モデル学習 Learning Abstract World Model for Value-preserving Planning with Options ( http://arxiv.org/abs/2406.15850v1 ) ライセンス: Link先を確認	Rafael Rodriguez-Sanchez, George Konidaris,	(参考訳) 汎用エージェントは、広範囲なタスクを実行するために、きめ細かい制御とリッチな感覚入力を必要とする。しかし、この複雑さはしばしば難解な意思決定につながる。伝統的にエージェントは、この課題を軽減するためにタスク固有のアクションと観察空間を提供するが、これは自律性を低下させる。その代わり、エージェントは感覚運動経験から適切な抽象化レベルで状態行動空間を構築することができる必要がある。我々は、時間的および状態的粒度のより高いレベルで動作する抽象マルコフ決定過程(MDP)を学ぶために、時間的拡張された一連の行動の構造を利用する。我々は、これらのスキルによる計画が、抽象MDPにおける軌跡をシミュレートすることによって、元のMDPにおける有界値損失のポリシーをもたらすことを確実にするために必要な状態抽象化を特徴付ける。目標をベースとしたナビゲーション環境では,連続的な抽象状態の計画が成功し,抽象モデル学習が計画と学習のサンプル効率を向上させることを示す。 General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 階層型サポートグラフによる次世代メッセージパッシング Next Level Message-Passing with Hierarchical Support Graphs ( http://arxiv.org/abs/2406.15852v1 ) ライセンス: Link先を確認	Carlos Vonessen, Florian Grötschla, Roger Wattenhofer,	(参考訳) メッセージパッシングニューラルネットワーク(MPNN)は、グラフ学習タスクに広く使用されているが、各ラウンドのメッセージパッシング中に隣接するノードに制限されるため、情報交換の制限範囲のような制限に悩まされている。グローバルな情報交換を容易にするために仮想ノードを組み込むなど、これらの制限に対処する様々な戦略が提案されている。本研究では,元のグラフの再帰的粗大化によって生成された仮想ノードの概念の拡張である階層支援グラフ(HSG)を紹介する。このアプローチは、使用する特定のMPNN層とは独立して、グラフ内の情報フローを強化する柔軟なフレームワークを提供する。本稿では、HSGの理論的解析を行い、その経験的性能を検証し、HSGが仮想ノードで拡張された他の手法を超越し、複数のデータセットにまたがって最先端の結果を達成できることを実証する。 Message-Passing Neural Networks (MPNNs) are extensively employed in graph learning tasks but suffer from limitations such as the restricted scope of information exchange, by being confined to neighboring nodes during each round of message passing. Various strategies have been proposed to address these limitations, including incorporating virtual nodes to facilitate global information exchange. In this study, we introduce the Hierarchical Support Graph (HSG), an extension of the virtual node concept created through recursive coarsening of the original graph. This approach provides a flexible framework for enhancing information flow in graphs, independent of the specific MPNN layers utilized. We present a theoretical analysis of HSGs, investigate their empirical performance, and demonstrate that HSGs can surpass other methods augmented with virtual nodes, achieving state-of-the-art results across multiple datasets.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# Repeater-like Asynchronous Measurement-Device-Independent Quantum Conference Key Agreement Repeater-Like Asynchronous Measurement-Device-Independent Quantum Conference Key Agreement ( http://arxiv.org/abs/2406.15853v1 ) ライセンス: Link先を確認	Yu-Shuo Lu, Yuan-Mei Xie, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen,	(参考訳) 量子会議鍵契約は、複数のパーティ間のセキュアな通信を促進するもので、将来の量子ネットワークにとって重要な暗号プリミティブとして期待されている。しかし、多部交絡状態の同期検出に伴う実験的複雑さと低効率は、その実用化を著しく妨げている。本研究では,非同期なグリーンベルガー・ホルン・ザイリンガー状態測定を用いた計測デバイスに依存しない会議鍵契約プロトコルを提案する。本手法は,複数のパーティ間での会議鍵レートの線形スケーリングを実現し,量子ネットワークにおける単一リピータ方式と同じような性能を示す。非同期計測戦略は、複雑なグローバルフェーズロック技術の必要性を回避し、有限鍵方式における構成可能なセキュリティと都市間伝送距離を同時に拡張する。さらに、我々の研究は、マルチパーティ量子絡み合いにおける非同期ペアリングの概念の利点も示している。 Quantum conference key agreement facilitates secure communication among multiple parties through multipartite entanglement and is anticipated to be an important cryptographic primitive for future quantum networks. However, the experimental complexity and low efficiency associated with the synchronous detection of multipartite entangled states have significantly hindered their practical application. In this work, we propose a measurement-device-independent conference key agreement protocol that utilizes asynchronous Greenberger-Horne-Zeilinger state measurement.This approach achieves a linear scaling of the conference key rate among multiple parties, exhibiting performance similar to that of the single-repeater scheme in quantum networks. The asynchronous measurement strategy bypasses the need for complex global phase locking technologies, concurrently extending the intercity transmission distance with composable security in the finite key regime. Additionally, our work also showcases the advantages of the asynchronous pairing concept in multiparty quantum entanglement.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# ReLU層のインジェクティビティ:フレーム理論からの展望 Injectivity of ReLU-layers: Perspectives from Frame Theory ( http://arxiv.org/abs/2406.15856v1 ) ライセンス: Link先を確認	Daniel Haider, Martin Ehler, Peter Balazs,	(参考訳) インジェクティビティ(英: Injectivity)とは、情報を失うことなく、その出力からあらゆる入力を完全に再構成できるマッピングの定義特性である。ハードしきい値を設定することで、ReLU関数は自然にこの性質を阻害し、ニューラルネットワークにおけるReLU層のインジェクティビティ解析を、まだ完全に解決されていない挑戦的かつ興味深いタスクにする。本稿では,この問題にアプローチするためのフレーム理論的視点を確立する。主な目的は、ReLU層の注入挙動の最も一般的な特徴を、これら3つの成分のすべての観点から明らかにすることである。重量; 重量; 重量; 重量 (二)偏見、及び三データが引き出される領域実用的応用に焦点を合わせながら、我々は有界領域への注意を制限し、与えられた重みとデータ領域に対する最大バイアスを数値的に近似する2つの方法を提案する。これらの手法はこれらの領域におけるReLU層の注入性について十分な条件を提供し、ReLU層の情報損失を研究するための新しい実践的手法を提供する。最後に、フレーム理論から双対性の概念に基づく明示的な再構成公式を導出する。 Injectivity is the defining property of a mapping that ensures no information is lost and any input can be perfectly reconstructed from its output. By performing hard thresholding, the ReLU function naturally interferes with this property, making the injectivity analysis of ReLU-layers in neural networks a challenging yet intriguing task that has not yet been fully solved. This article establishes a frame theoretic perspective to approach this problem. The main objective is to develop the most general characterization of the injectivity behavior of ReLU-layers in terms of all three involved ingredients: (i) the weights, (ii) the bias, and (iii) the domain where the data is drawn from. Maintaining a focus on practical applications, we limit our attention to bounded domains and present two methods for numerically approximating a maximal bias for given weights and data domains. These methods provide sufficient conditions for the injectivity of a ReLU-layer on those domains and yield a novel practical methodology for studying the information loss in ReLU layers. Finally, we derive explicit reconstruction formulas based on the duality concept from frame theory.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# LLMによる説明:サブグラフ推論による勧告の展開 LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning ( http://arxiv.org/abs/2406.15859v1 ) ライセンス: Link先を確認	Guangsi Shi, Xiaofeng Deng, Linhao Luo, Lijuan Xia, Lei Bao, Bei Ye, Fei Du, Shirui Pan, Yuxiao Li,	(参考訳) リコメンダシステムは、ユーザとアイテム間の複雑な関係を分析することによって、さまざまなWebアプリケーションにおけるユーザエクスペリエンスの向上に重要な役割を果たす。知識グラフ(KG)は、推薦システムの性能を高めるために広く使われている。しかしながら、KGsはノイズが多く不完全であることが知られており、推奨結果に対して信頼できる説明を提供するのは難しい。説明可能なレコメンデータシステムは、製品開発とその後の意思決定に不可欠である。これらの課題に対処するため,我々は,Large Language Models (LLMs) とKGsを相乗的に導入し,レコメンデーションを強化し,解釈可能な結果を提供する新しいレコメンデータを提案する。具体的には、まずLLMのパワーを活用してKG再構成を増強する。 LLMはユーザレビューを理解して、KGに追加される新しいトリプルに分解する。このようにして、ユーザの好みを表す説明可能なパスでKGを豊かにすることができる。拡張KGのレコメンデーションを強化するために,ノードの重要性を効果的に測定し,レコメンデーションのレコメンデーションを発見する新しいサブグラフ推論モジュールを提案する。最後に、これらの推論経路をLSMに入力し、レコメンデーション結果の解釈可能な説明を生成する。提案手法はレコメンデータシステムの有効性と解釈性を両立させ,特に従来の手法が失敗するクロスセールスシナリオにおいて顕著に促進する。提案手法の有効性は4つのオープンな実世界のデータセットで厳密に検証され,従来の最先端技術よりも平均12%向上した。多国籍技術系企業のクロスセールスレコメンデーションシステムへの私たちのモデルの適用は、その実用性と、精度の向上とユーザ信頼を通じてレコメンデーションプラクティスを再定義する可能性をさらに強調する。 Recommender systems are pivotal in enhancing user experiences across various web applications by analyzing the complicated relationships between users and items. Knowledge graphs(KGs) have been widely used to enhance the performance of recommender systems. However, KGs are known to be noisy and incomplete, which are hard to provide reliable explanations for recommendation results. An explainable recommender system is crucial for the product development and subsequent decision-making. To address these challenges, we introduce a novel recommender that synergies Large Language Models (LLMs) and KGs to enhance the recommendation and provide interpretable results. Specifically, we first harness the power of LLMs to augment KG reconstruction. LLMs comprehend and decompose user reviews into new triples that are added into KG. In this way, we can enrich KGs with explainable paths that express user preferences. To enhance the recommendation on augmented KGs, we introduce a novel subgraph reasoning module that effectively measures the importance of nodes and discovers reasoning for recommendation. Finally, these reasoning paths are fed into the LLMs to generate interpretable explanations of the recommendation results. Our approach significantly enhances both the effectiveness and interpretability of recommender systems, especially in cross-selling scenarios where traditional methods falter. The effectiveness of our approach has been rigorously tested on four open real-world datasets, with our methods demonstrating a superior performance over contemporary state-of-the-art techniques by an average improvement of 12%. The application of our model in a multinational engineering and technology company cross-selling recommendation system further underscores its practical utility and potential to redefine recommendation practices through improved accuracy and user trust.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# イタリアにおける言語品種の音声分析 Speech Analysis of Language Varieties in Italy ( http://arxiv.org/abs/2406.15862v1 ) ライセンス: Link先を確認	Moreno La Quatra, Alkis Koudounas, Elena Baralis, Sabato Marco Siniscalchi,	(参考訳) イタリアは、異なる地域で話される異なる地域言語のために、その領域に豊富な言語多様性を示す。近年の自己教師型学習の進歩は、音声データのみを用いてイタリアの言語品種を分析する新たな機会を提供する。これには、大量のデータから学んだ表現を活用して、密接に関連する言語品種間のニュアンスをよりよく調べる能力が含まれる。本研究では,イタリアの多様な言語品種から抽出された音声サンプルの発声領域の自動同定に焦点をあてる。我々は,この課題に対処するための自己教師付き学習モデルを活用し,イタリアの地域言語の違いと類似点を分析する。また,これらの多様で近縁な品種間の関係に関する新たな知見を探索し,言語学者が時間的・空間的に相互に相互に関係する進化と地域発展を理解するのに役立つかもしれない。学習表現の識別能力を向上させるため,教師付きコントラスト学習目標を事前学習ステップと追加の微調整目的として評価した。実験的な証拠は、事前訓練された自己教師付きモデルが音声記録から領域を効果的に識別できることを示している。さらに、微調整中に対照的な目的を取り入れることで、分類精度が向上し、個別に地域品種を分離する埋め込みが得られ、この課題に対して自己教師付き事前学習と対照的な学習を組み合わせる価値が示される。 Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech data alone. This includes the potential to leverage representations learned from large amounts of data to better examine nuances between closely related linguistic varieties. In this study, we focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties. We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages. In doing so, we also seek to uncover new insights into the relationships among these diverse yet closely related varieties, which may help linguists understand their interconnected evolution and regional development over time and space. To improve the discriminative ability of learned representations, we evaluate several supervised contrastive learning objectives, both as pre-training steps and additional fine-tuning objectives. Experimental evidence shows that pre-trained self-supervised models can effectively identify regions from speech recording. Additionally, incorporating contrastive objectives during fine-tuning improves classification accuracy and yields embeddings that distinctly separate regional varieties, demonstrating the value of combining self-supervised pre-training and contrastive learning for this task.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# EmoAttack:感情からイメージへの拡散モデル EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation ( http://arxiv.org/abs/2406.15863v1 ) ライセンス: Link先を確認	Tianyu Wei, Shanmin Pang, Qi Guo, Yizhuo Ma, Qing Guo,	(参考訳) テキストから画像への拡散モデルでは、入力されたテキストに基づいてリアルな画像を作成することができる。ユーザーは自分の意見を視覚的に伝えるオブジェクトを記述することができる。本研究は,拡散モデルを用いて画像を生成することによる,未認識かつ潜伏的なリスクを明らかにし,入力テキスト中の感情を利用してネガティブな内容を導入し,ユーザから好ましくない感情を引き出す。感情は、日々のやりとりにおいて個人的意見を表現する上で重要な役割を担い、悪意のあるネガティブなコンテンツを含めることで、ユーザーを混乱させ、ネガティブな感情を悪化させます。具体的には、画像生成中に感情テキストによって引き起こされる悪意のあるネガティブコンテンツを組み込むことができる感情認識バックドアアタック(EmoAttack)を特定する。拡散パーソナライズ問題としてこのような攻撃を定式化し、広範囲なモデル再訓練を避けるとともに、EmoBoothを提案する。従来のパーソナライズ手法とは異なり,情緒的単語群と悪意のある負のコンテンツを含む参照画像とのマッピングを確立することにより,事前学習した拡散モデルを微調整する。提案手法の有効性を検証するため,我々はデータセットを構築し,その有効性について広範な分析と議論を行った。消費者の拡散モデルの普及を考えると、この脅威を明らかにすることは社会にとって重要である。 Text-to-image diffusion models can create realistic images based on input texts. Users can describe an object to convey their opinions visually. In this work, we unveil a previously unrecognized and latent risk of using diffusion models to generate images; we utilize emotion in the input texts to introduce negative contents, potentially eliciting unfavorable emotions in users. Emotions play a crucial role in expressing personal opinions in our daily interactions, and the inclusion of maliciously negative content can lead users astray, exacerbating negative emotions. Specifically, we identify the emotion-aware backdoor attack (EmoAttack) that can incorporate malicious negative content triggered by emotional texts during image generation. We formulate such an attack as a diffusion personalization problem to avoid extensive model retraining and propose the EmoBooth. Unlike existing personalization methods, our approach fine-tunes a pre-trained diffusion model by establishing a mapping between a cluster of emotional words and a given reference image containing malicious negative content. To validate the effectiveness of our method, we built a dataset and conducted extensive analysis and discussion about its effectiveness. Given consumers' widespread use of diffusion models, uncovering this threat is critical for society.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# DISHA:視覚障害者のための屋外ナビゲーションのためのエッジでの低エネルギースパーストランス DISHA: Low-Energy Sparse Transformer at Edge for Outdoor Navigation for the Visually Impaired Individuals ( http://arxiv.org/abs/2406.15864v1 ) ライセンス: Link先を確認	Praveen Nagil, Sumit K. Mandal,	(参考訳) 視覚障害者の補助技術は、日々の雑用や自信の行使において、他人から独立させるのに非常に有用である。補助技術の重要な側面の1つは、視覚障害者のための屋外ナビゲーションである。文献にはいくつかの屋外ナビゲーション技術があるが、主に障害物検出に限られている。しかし、歩道を通って視覚障害者(外を歩きながら)をナビゲートすることも重要である。さらに、この補助技術は、デバイスのバッテリー寿命を延ばすための低エネルギー運転を確保する必要がある。そこで本研究では,視覚障害者を支援するためにエッジデバイスに実装したエンドツーエンド技術を提案する。具体的には,歩道を検知するトランスアルゴリズムのための新しいプルーニング手法を提案する。プルーニング技術は、プルーニングトランスフォーマーアルゴリズムがエッジデバイスにデプロイされる際に、実行の低レイテンシと低エネルギー消費を保証する。実験結果から,提案技術はベースライン技術の精度を最大32.49%向上し,バッテリ寿命を1.4時間延長することを示した。 Assistive technology for visually impaired individuals is extremely useful to make them independent of another human being in performing day-to-day chores and instill confidence in them. One of the important aspects of assistive technology is outdoor navigation for visually impaired people. While there exist several techniques for outdoor navigation in the literature, they are mainly limited to obstacle detection. However, navigating a visually impaired person through the sidewalk (while the person is walking outside) is important too. Moreover, the assistive technology should ensure low-energy operation to extend the battery life of the device. Therefore, in this work, we propose an end-to-end technology deployed on an edge device to assist visually impaired people. Specifically, we propose a novel pruning technique for transformer algorithm which detects sidewalk. The pruning technique ensures low latency of execution and low energy consumption when the pruned transformer algorithm is deployed on the edge device. Extensive experimental evaluation shows that our proposed technology provides up to 32.49% improvement in accuracy and 1.4 hours of extension in battery life with respect to a baseline technique.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# マトリックス力学における平面回転子と量子物理学における状態の役割 Planar Rotor in Matrix Mechanics and the Role of States in Quantum Physics ( http://arxiv.org/abs/2406.15866v1 ) ライセンス: Link先を確認	Vlatko Vedral,	(参考訳) 平面量子ロータの例を用いて,ハイゼンベルクの行列力学の手法を説明する。システムの固有状態を使わずに、この単純なモデルのスペクトルを見つける方法を示す。このことから、ハイゼンベルク状態が量子力学において果たす役割を推測し、状態の必要性を完全に解消できるかどうかを問うことになる。 We illustrate Heisenberg's method of matrix mechanics using the planar quantum rotor example. We show how to find the spectrum of this simple model without the need to use the eigenstates of the system. This then leads us to speculate on the role the Heisenberg state plays in quantum mechanics and to ask whether one could even completely dispose of the need for states.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# アノテーションの主観性を活かして誤認識を識別するマルチタスク学習フレームワーク A multitask learning framework for leveraging subjectivity of annotators to identify misogyny ( http://arxiv.org/abs/2406.15869v1 ) ライセンス: Link先を確認	Jason Angel, Segun Taofeek Aroyehun, Grigori Sidorov, Alexander Gelbukh,	(参考訳) 人工知能(AI)を用いた不正行為の特定は、女性に対するオンライン毒性との戦いの一形態である。しかし、誤読を解釈する主観的な性質は、この現象をモデル化する上で重要な課題となっている。本稿では,この課題の主観性を活かしたマルチタスク学習手法を提案する。我々は,6つのプロファイルグループにまたがる性別や年齢を考慮し,アノテータから様々な視点を取り入れた上で,2つの言語モデルを用いた広範囲な実験と誤り解析を行い,我々の4つの代替設計であるマルチタスク学習手法の有効性を検証し,英語のつぶやきにおける擬似的内容の同定を行った。その結果,様々な視点を取り入れることで,異なる形態の誤字を解釈する言語モデルの能力が向上することが示唆された。本研究は、コンテンツモデレーションを推進し、効果的なオンラインモデレーションシステムを構築するための多様な視点を受け入れることの重要性を強調している。 Identifying misogyny using artificial intelligence is a form of combating online toxicity against women. However, the subjective nature of interpreting misogyny poses a significant challenge to model the phenomenon. In this paper, we propose a multitask learning approach that leverages the subjectivity of this task to enhance the performance of the misogyny identification systems. We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups, and conducted extensive experiments and error analysis using two language models to validate our four alternative designs of the multitask learning technique to identify misogynistic content in English tweets. The results demonstrate that incorporating various viewpoints enhances the language models' ability to interpret different forms of misogyny. This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 量子液体と固体の量子エレクトロニクス Quantum Electronics on Quantum Liquids and Solids ( http://arxiv.org/abs/2406.15870v1 ) ライセンス: Link先を確認	Wei Guo, Denis Konstantinov, Dafei Jin,	(参考訳) 軽い粒子質量と弱い粒子-粒子相互作用を持つ非極性原子または分子は、低温で量子液体と固体(QLS)を形成することができる。過剰電子は真空中のQLSの表面に自然に結合し、2次元と低次元でユニークな量子電子の挙動を示す。本稿では,この領域における歴史的研究と最近の進歩について概観する。このレビューで取り上げられた主なトピックは、液体ヘリウム、固体ネオン、固体水素の集合電子輸送、超流動ヘリウムの単一電子量子ビットへの理論的提案と実験的取り組み、固体ネオン上の単一電子電荷量子ビットの最近の実験的実現と関連する理論計算である。最後に、異種QLS上での量子エレクトロニクスの探究を概観する。 Nonpolar atoms or molecules with light particle mass and weak particle-particle interaction can form quantum liquids and solids (QLS) at low temperatures. Excess electrons can be naturally bound to the surface of a QLS in a vacuum and exhibit unique quantum electronic behaviors in two and lower dimensions. In this article, we review the historical study and recent progress in this area. The main topics covered in this review include the collective and individual electron transport on liquid helium, solid neon, and solid hydrogen, the theoretical proposal and experimental effort toward single electron qubits on superfluid helium, the recent experimental realization of single electron charge qubits on solid neon and the related theoretical calculation. In the end, we review and envision extended exploration of quantum electronics on heterogeneous QLS.	翻訳日:2024-06-25 20:15:22 公開日:2024-06-22
# 隠れた意図を明らかにする: 生成したテキストのより深い洞察のためのプロンプトリカバリを探る Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts ( http://arxiv.org/abs/2406.15871v1 ) ライセンス: Link先を確認	Louis Give, Timo Zaoral, Maria Antonietta Bruno,	(参考訳) 今日では、AI生成コンテンツの検出がますます注目を集めている。私たちの考えは、検出を超えて、テキストを生成するために使われるプロンプトを復元することです。本稿は、我々の知る限り、タスクのクローズドなセットなしに、この特定の領域における最初の調査を紹介します。私たちの目標は、このアプローチが有望かどうかを研究することです。ゼロショットと少数ショットのインコンテキスト学習に加えて,LoRAファインチューニングも試行する。その後、半合成データセットを使用することの利点を評価する。この最初の研究では、1つのモデルで生成されたテキストに限定する。その結果,元のプロンプトをある程度の精度で復元できることが示唆された。 Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# g-サーキュラント行列のMDS特性について On MDS Property of g-Circulant Matrices ( http://arxiv.org/abs/2406.15872v1 ) ライセンス: Link先を確認	Tapas Chatterjee, Ayantika Laha,	(参考訳) AESブロック暗号の拡散層への応用により、サーカラント最大距離分離(MDS)行列の重要性が高まっている。 2013年、Gupta と Ray は、30ドル以上の循環的不揮発行列は MDS では成り立たないことを証明した。この発見は、様々な著者による循環行列の一般化と行列のインボリュート性をもたらした。 2016年、LiuとSimは循環行列の置換を変更して巡回行列を導入した。 1961年、フリードマンは循環行列のサブクラスを形成する$g$循環行列を導入した。本稿では、まず、インボリュート特性とMDS特性を持つ$g$循環行列について論じる。次数$k \times k$ の $g$-循環不揮発行列は、$g \equiv -1 \pmod k がなければ MDS にはならないことを証明している。次に、有限体からの成分を持つ半等角行列と半等角行列を$g$で探索する。次数$k \times k$の半直交半直交行列(半直交行列)の対応する対角行列の$k$-次パワーがスカラー行列となることを証明した。これらの結果は2022.$でChatterjee {\it{et al }}によって確立された循環行列に関する結果の拡張と見なすことができる。 Circulant Maximum Distance Separable (MDS) matrices have gained significant importance due to their applications in the diffusion layer of the AES block cipher. In $2013$, Gupta and Ray established that circulant involutory matrices of order greater than $3$ cannot be MDS. This finding prompted a generalization of circulant matrices and the involutory property of matrices by various authors. In $2016$, Liu and Sim introduced cyclic matrices by changing the permutation of circulant matrices. In $1961,$ Friedman introduced $g$-circulant matrices which form a subclass of cyclic matrices. In this article, we first discuss $g$-circulant matrices with involutory and MDS properties. We prove that $g$-circulant involutory matrices of order $k \times k$ cannot be MDS unless $g \equiv -1 \pmod k.$ Next, we delve into $g$-circulant semi-involutory and semi-orthogonal matrices with entries from finite fields. We establish that the $k$-th power of the associated diagonal matrices of a $g$-circulant semi-orthogonal (semi-involutory) matrix of order $k \times k$ results in a scalar matrix. These findings can be viewed as an extension of the results concerning circulant matrices established by Chatterjee {\it{et al.}} in $2022.$	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# NeuralSCF:密度汎関数理論のためのニューラルネットワーク自己整合体 NeuralSCF: Neural network self-consistent fields for density functional theory ( http://arxiv.org/abs/2406.15873v1 ) ライセンス: Link先を確認	Feitong Song, Ji Feng,	(参考訳) コーンシャム密度汎関数理論(KS-DFT)は、正確な電子構造計算に広く応用されている。しかし、特に大規模シミュレーションには計算的に要求される可能性があり、機械学習(ML)アクセラレーションへの最近の取り組みを動機付けている。我々は,コーン・シャム密度マップを深層学習目的として確立し,コーン・シャム方程式の力学を符号化するニューラルネットワーク自己整合体(NeuralSCF)フレームワークを提案する。 SE(3)-同変グラフ変換器を用いてこの写像をモデル化し、NeuralSCFはコーン=シャムの自己整合反復をエミュレートして電子密度を得る。ニューラルSCFは電子密度予測と導出特性の最先端の精度を達成し、例外的なゼロショットの一般化を特徴とする。 NeuralSCFは、KS-DFTの内在力学からの学習がモデルの精度と伝達性を大幅に向上させ、メカニックラーニングを通じて電子構造計算を加速させる有望なステップストーンを提供することを明らかにした。 Kohn-Sham density functional theory (KS-DFT) has found widespread application in accurate electronic structure calculations. However, it can be computationally demanding especially for large-scale simulations, motivating recent efforts toward its machine-learning (ML) acceleration. We propose a neural network self-consistent fields (NeuralSCF) framework that establishes the Kohn-Sham density map as a deep learning objective, which encodes the mechanics of the Kohn-Sham equations. Modeling this map with an SE(3)-equivariant graph transformer, NeuralSCF emulates the Kohn-Sham self-consistent iterations to obtain electron densities, from which other properties can be derived. NeuralSCF achieves state-of-the-art accuracy in electron density prediction and derived properties, featuring exceptional zero-shot generalization to a remarkable range of out-of-distribution systems. NeuralSCF reveals that learning from KS-DFT's intrinsic mechanics significantly enhances the model's accuracy and transferability, offering a promising stepping stone for accelerating electronic structure calculations through mechanics learning.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# AIによる災害救助支援ドローン:課題と機会 AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities ( http://arxiv.org/abs/2406.15875v1 ) ライセンス: Link先を確認	Narek Papyan, Michel Kulhandjian, Hovannes Kulhandjian, Levon Hakob Aslanyan,	(参考訳) 本調査では,特にヒトの悲鳴やその他の苦難信号を識別することで,個人検出にドローンベースのシステムを活用することに重点を置いている。この研究は、地震、ハリケーン、軍事紛争、山火事などの災害後のシナリオに大きく関係している。これらのドローンは、救助隊が直接アクセスすることが困難な災害に遭った地域をホバリングすることができる。無人航空機(英: Unmanned air vehicle、UAV)は、災害時の捜索救助任務のためにしばしば配備される航空機である。通常、ドローンは空中画像をキャプチャして、構造的な損傷を評価し、災害の程度を識別する。また、熱画像技術を使って体温を検知し、個人を見つけるのに役立つ。大規模なドローンは、孤立した災害で立ち往生している人々に必須の物資を届けるために使われる場合もある。本論では, 空中音響による人間の位置推定にまつわる課題について考察する。聴覚システムは、動物の鳴き声や風など、自然に発生する人間の叫び声と音を区別しなければならない。さらに、人々が救助隊に合図しようとする、叫び声や拍手などの信号に関連する、異なるパターンを認識する能力も備えるべきである。この課題に対処するためには、人工知能(AI)を使用して音の周波数を分析し、一般的なオーディオシグネチャを識別する。畳み込みニューラルネットワーク(CNN)のようなディープラーニングベースのネットワークは、これらのシグネチャを使用して、ドローンモーターやその他の環境要因によって発生するノイズを除去する訓練が可能である。さらに、マイクロホンアレイ信号に基づく到着方向(DOA)のような信号処理技術を用いることで、人間の騒音の音源を追跡する精度を高めることができる。 In this survey we are focusing on utilizing drone-based systems for the detection of individuals, particularly by identifying human screams and other distress signals. This study has significant relevance in post-disaster scenarios, including events such as earthquakes, hurricanes, military conflicts, wildfires, and more. These drones are capable of hovering over disaster-stricken areas that may be challenging for rescue teams to access directly. Unmanned aerial vehicles (UAVs), commonly referred to as drones, are frequently deployed for search-and-rescue missions during disaster situations. Typically, drones capture aerial images to assess structural damage and identify the extent of the disaster. They also employ thermal imaging technology to detect body heat signatures, which can help locate individuals. In some cases, larger drones are used to deliver essential supplies to people stranded in isolated disaster-stricken areas. In our discussions, we delve into the unique challenges associated with locating humans through aerial acoustics. The auditory system must distinguish between human cries and sounds that occur naturally, such as animal calls and wind. Additionally, it should be capable of recognizing distinct patterns related to signals like shouting, clapping, or other ways in which people attempt to signal rescue teams. To tackle this challenge, one solution involves harnessing artificial intelligence (AI) to analyze sound frequencies and identify common audio signatures. Deep learning-based networks, such as convolutional neural networks (CNNs), can be trained using these signatures to filter out noise generated by drone motors and other environmental factors. Furthermore, employing signal processing techniques like the direction of arrival (DOA) based on microphone array signals can enhance the precision of tracking the source of human noises.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# BigCodeBench: さまざまな関数呼び出しと複雑な命令によるベンチマークコード生成 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions ( http://arxiv.org/abs/2406.15877v1 ) ライセンス: Link先を確認	Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, David Lo, Binyuan Hui, Niklas Muennighoff, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro Von Werra,	(参考訳) 自動化されたソフトウェアエンジニアリングは、プログラミングにおける最近のLarge Language Models(LLMs)の進歩によって、非常に力強くなっています。現在のベンチマークでは、LLMは人間の開発者のような様々なソフトウェアエンジニアリングタスクを実行できることが示されているが、その評価の大部分は、短くて自己完結したアルゴリズムタスクに限られている。困難で実用的なプログラミングタスクを解決するには、さまざまな関数呼び出しをデータ分析やWeb開発といった機能を効率的に実装するためのツールとして活用する必要がある。さらに、複数のツールを使ってタスクを解くには、複雑な命令を正確に理解することで構成的推論が必要である。これら2つの特徴をフルフィルすることは、LLMにとって大きな課題となる。このベンチマークでは、LLMが139のライブラリと7つのドメインから1,140のきめ細かいプログラミングタスクのツールとして複数の関数呼び出しを呼び出している。 LLMを厳格に評価するために、各プログラムタスクは5.6のテストケースを含み、平均的なブランチカバレッジは99%である。また,ベンチの自然言語指向の変種であるベンチ(Benchi)を提案する。 60個のLDMを広範囲に評価したところ、LLMは機能コールを正確に使用するための複雑な命令に従うことができず、スコアは最大60%で、人間の97%よりも大幅に低かった。結果は、この地域のさらなる進歩の必要性を浮き彫りにした。 Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 低変位ランクからトポロジカルトランスへの高速木場積分器 Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers ( http://arxiv.org/abs/2406.15881v1 ) ライセンス: Link先を確認	Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi,	(参考訳) 本稿では,重み付き木に定義されたテンソル場を統合するために,構造行列(特に低位階)の理論に基づく高速多元線形アルゴリズムを提案する。結果の高速木体積分器(FTFI)のいくつかの応用について述べる。 (a)木メトリクスによるグラフメトリクスの近似 b) グラフ分類 (c)メッシュ上でのモデリング、そして最後に (d)画像用トポロジカルトランスフォーマー(Choromanski et al ,2022) トポロジカルトランスフォーマーでは、トランスフォーマー層ごとに3つの余分な学習可能なパラメータを持つ新しい相対位置符号化(RPE)マスキング機構を提案し、1.0-1.5%以上の精度向上を実現した。重要なことに、ほとんどのFTFIは正確な方法であり、したがって数値的にはそのブルートフォースと等価である。何千ものノードを持つグラフに適用すると、それらの正確なアルゴリズムは5.7-13倍のスピードアップを提供する。また,本手法の広範な理論的解析も行う。 We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, and finally (d) Topological Transformers (TTs) (Choromanski et al., 2022) for images. For Topological Transformers, we propose new relative position encoding (RPE) masking mechanisms with as few as three extra learnable parameters per Transformer layer, leading to 1.0-1.5%+ accuracy gains. Importantly, most of FTFIs are exact methods, thus numerically equivalent to their brute-force counterparts. When applied to graphs with thousands of nodes, those exact algorithms provide 5.7-13x speedups. We also provide an extensive theoretical analysis of our methods.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# SimSMoE: 類似度測定による表現的崩壊の解決 SimSMoE: Solving Representational Collapse via Similarity Measure ( http://arxiv.org/abs/2406.15883v1 ) ライセンス: Link先を確認	Giang Do, Hung Le, Truyen Tran,	(参考訳) SMOE(Sparse mixed of experts)は、計算コストを一定に保ちながら、大きな言語モデルをスケールするための効果的なアプローチとして登場した。 SMoEのいくつかの顕著な成功にもかかわらず、表現崩壊問題により、そのようなアーキテクチャを効果的に訓練することは、モデル性能を害し、パラメータ冗長性を引き起こす。本研究では,ニューラルネットワークアルゴリズムの新たな類似性であるSimisity-based Sparse Mixture of Experts (SimSMoE)を提案する。提案手法の有効性, 堅牢性, 拡張性を示すために, 3つの大規模言語モデルに対して, 事前学習タスクと微調整タスクの両方に対して広範な実験的な評価を行う。その結果、SimSMoEは既存のルーティングポリシーを大幅に改善し、タスクのパフォーマンスにおいて他のSMoEトレーニング手法よりも優れていることが示された。 Sparse mixture of experts (SMoE) have emerged as an effective approach for scaling large language models while keeping a constant computational cost. Regardless of several notable successes of SMoE, effective training such architecture remains elusive due to the representation collapse problem, which in turn harms model performance and causes parameter redundancy. In this work, we present Similarity-based Sparse Mixture of Experts (SimSMoE), a novel similarity of neural network algorithm, that guarantees a solution to address the representation collapse issue between experts given a fixed FLOPs budget. We conduct extensive empirical evaluations on three large language models for both Pre-training and Fine-tuning tasks to illustrate the efficacy, robustness, and scalability of our method. The results demonstrate that SimSMoE significantly enhances existing routing policy and outperforms other SMoE training methods in performance for the tasks.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 大規模言語モデルのための大規模音楽評価ベンチマーク, Music Maestro あるいは The Musically Challenged The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models ( http://arxiv.org/abs/2406.15885v1 ) ライセンス: Link先を確認	Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao,	(参考訳) ベンチマークは、大規模言語モデル(LLM)の進歩を評価する上で重要な役割を果たす。 LLMの能力を評価するために多くのベンチマークが提案されているが、その音楽能力を評価するための専用のベンチマークが存在しないことは注目すべきである。このギャップに対処するため,LLMの音楽関連能力を評価するための総合的かつ大規模音楽ベンチマークであるZIQI-Evalを提案する。 ZIQI-Evalは10の主要なカテゴリと56のサブカテゴリをカバーし、14,000以上の精巧にキュレートされたデータエントリをカバーしている。 ZIQI-Eval を利用して16 LLM の総合評価を行い,音楽領域における LLM の性能評価と解析を行う。その結果,全てのLLMはZIQI-Evalベンチマークでは性能が悪く,音楽能力の向上の余地が示唆された。 ZIQI-Evalでは,LLMの音楽関連能力の包括的評価を容易にする,標準化されたロバストな評価フレームワークの提供を目指している。データセットはGitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} と HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval} で入手できる。 Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} and HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval}.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 医療会話におけるリアルタイム音声要約 Real-time Speech Summarization for Medical Conversations ( http://arxiv.org/abs/2406.15888v1 ) ライセンス: Link先を確認	Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy,	(参考訳) 医師と患者の会話では、医学的な関連情報を特定することが重要であり、会話の要約の必要性を浮き彫りにする。本研究では,業界における実世界のアプリケーションを対象とした最初のリアルタイム音声要約システムを提案する。このシステムは,会話中のN音声の発話毎に局所的な要約を生成し,会話終了後にグローバルな要約を生成する。当社のシステムでは,ビジネスの観点からユーザエクスペリエンスを向上させると同時に,技術的観点から計算コストを削減できる。第二に、VietMed-Sumは医療会話のための最初の音声要約データセットである。第3に、医療会話要約のための金標準および合成要約を作成するために、LSMとヒトアノテーションを協調的に利用した最初の人物である。最後に、VietMed-Sum上での最先端モデルのベースライン結果を示す。すべてのコード、データ(英訳、ベトナム語)、モデルはオンラインで入手できる。 In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# ナッシュラーニングと適応フィードバックによる言語アライメント Language Alignment via Nash-learning and Adaptive feedback ( http://arxiv.org/abs/2406.15890v1 ) ライセンス: Link先を確認	Ari Azarafrooz, Farshid Faal,	(参考訳) 近年の研究では、ミニマックスゲームの設定に嗜好モデルの概念を取り入れることで、大規模言語モデルアライメントのためのヒューマンフィードバックによるナッシュラーニングの可能性を示している。さらに、改良された相手の適応的フィードバックに対して、アライメントをミラー降下アルゴリズムとしてキャストすることで、好みモデルや注釈付きデータセットの存在を完全に学習する必要がなくなる。得られたアルゴリズムは、Nash-learning and Adaptive feedback (LANA)を介して言語アライメント(Language Alignment)と呼ばれ、人間に注釈付けされた嗜好データセットを必要とせずに自己アライメントを行うことができる。我々は、様々な実験と数学的議論でこの主張を支持している。 Recent research has shown the potential of Nash Learning via Human Feedback for large language model alignment by incorporating the notion of a preference model in a minimax game setup. We take this idea further by casting the alignment as a mirror descent algorithm against the adaptive feedback of an improved opponent, thereby removing the need for learning a preference model or the existence of an annotated dataset altogether. The resulting algorithm, which we refer to as Language Alignment via Nash-learning and Adaptive feedback (LANA), is capable of self-alignment without the need for a human-annotated preference dataset. We support this statement with various experiments and mathematical discussion.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 異例のデュエル:ユニークなシナリオによるLCMの創造的記述の評価 The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario ( http://arxiv.org/abs/2406.15891v1 ) ライセンス: Link先を確認	Carlos Gómez-Rodríguez, Paul Williams,	(参考訳) A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing”は、EMNLP 2023のファインディングスに掲載されている。我々は、英語のクリエイティブ・ライティング・タスクにおいて、最新の最先端、命令調整型大規模言語モデル(LLM)について評価し、それらをヒトのライターと比較した。この目的のために、我々は特別に調整されたプロンプト(ジョン・ケネディ・トゥール(John Kennedy Toole)の"A Confederacy of Dunces"の主人公であるイグナティウス・J・ライリー(Ignatius J. Reilly)とプテロダクティル(Pterodactyl)とのエピックな戦いに基づく)を使用して、データ漏洩のトレーニングのリスクを最小化し、既存のストーリーを再利用するのではなく、モデルが創造的になるように強制する。同様のプロンプトがLLMやヒューマンライターに提示され、フレンチ、スタイル、独創性、ユーモアといった様々な側面を含む詳細なルーリックを使用して人間によって評価される。その結果、現在最先端の商用LCMは、評価されたほとんどの次元において、人間のライターよりも若干優れています。オープンソースのLLMは遅れを取っている。人間は独創性を強く保ち、トップ3のLSMだけが人間のようなレベルでユーモアを扱える。 This is a summary of the paper "A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing", which was published in Findings of EMNLP 2023. We evaluate a range of recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task, and compare them to human writers. For this purpose, we use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly, main character of John Kennedy Toole's "A Confederacy of Dunces", and a pterodactyl) to minimize the risk of training data leakage and force the models to be creative rather than reusing existing stories. The same prompt is presented to LLMs and human writers, and evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor. Results show that some state-of-the-art commercial LLMs match or slightly outperform our human writers in most of the evaluated dimensions. Open-source LLMs lag behind. Humans keep a close lead in originality, and only the top three LLMs can handle humor at human-like levels.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 部分順序のトップ$kの統計モデル Statistical Models of Top-$k$ Partial Orders ( http://arxiv.org/abs/2406.15893v1 ) ライセンス: Link先を確認	Amel Awadelkarim, Johan Ugander,	(参考訳) ランク付けされた選好を含む多くの文脈において、エージェントは利用可能な代替品に対して部分的な注文を提出する。統計モデルはしばしばこれらを全順序空間の限界として扱うが、このアプローチはリストの長さ自体に含まれる情報を見落としている。本研究では, 部分順序を全順序の切り離しと見なす複合モデルと, リストの構成を停止決定を含む選択決定の順序としてモデル化する拡張的ランキングモデルという, 2種類のアプローチを考慮し, トップ$k$部分順序とリスト長を併用して分布をモデル化するためのアプローチを導入・分類する。複合モデルの場合,次数とトラニケート長の合同モデリングのための3つの依存構造を考察する。拡張ランキングモデルでは、停止選択がどうモデル化されるかについて異なる仮定を考察する。サンフランシスコの学校選考とサンフランシスコの選考選考から得られる部分的なランキングのデータを用いて、モデルが観測データをどのように予測し、現実的な合成データセットを生成するかを評価する。その結果,分類変数として長さを明示的にモデル化し,正確な長さ分布を持つ合成データセットを生成する複合モデルと,負のログロスによって測定されたトレーニングデータにおける長さと嗜好を協調的にモデル化した位置依存アイテムユーティリティを持つ拡張モデルが得られた。この研究から得られた手法は、ランク付けされた嗜好を求める実世界の社会システムのシミュレーションと評価に重要な意味を持つ。 In many contexts involving ranked preferences, agents submit partial orders over available alternatives. Statistical models often treat these as marginal in the space of total orders, but this approach overlooks information contained in the list length itself. In this work, we introduce and taxonomize approaches for jointly modeling distributions over top-$k$ partial orders and list lengths $k$, considering two classes of approaches: composite models that view a partial order as a truncation of a total order, and augmented ranking models that model the construction of the list as a sequence of choice decisions, including the decision to stop. For composite models, we consider three dependency structures for joint modeling of order and truncation length. For augmented ranking models, we consider different assumptions on how the stop-token choice is modeled. Using data consisting of partial rankings from San Francisco school choice and San Francisco ranked choice elections, we evaluate how well the models predict observed data and generate realistic synthetic datasets. We find that composite models, explicitly modeling length as a categorical variable, produce synthetic datasets with accurate length distributions, and an augmented model with position-dependent item utilities jointly models length and preferences in the training data best, as measured by negative log loss. Methods from this work have significant implications on the simulation and evaluation of real-world social systems that solicit ranked preferences.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# Fusing Audio and Metadata Embeddingsは言語ベースのオーディオ検索を改善する Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval ( http://arxiv.org/abs/2406.15897v1 ) ライセンス: Link先を確認	Paul Primus, Gerhard Widmer,	(参考訳) 生音声信号とテキスト記述とを一致させるには、音声の内容と記述の意味を理解する必要がある。本稿では,音声メタデータを利用したハイブリッド検索システムについて検討する。我々は,キーワードや自然言語記述などの音声記録に付随するメタデータを実験し,音声とメタデータを融合するための後期・中期融合戦略について検討した。キーワードメタデータとレイトフュージョンを用いたハイブリッド手法により,コンテンツベースベースラインでの検索性能を2.36pp,3.69pp。 ClothoV2ベンチマークとAudioCapsベンチマークのmAP@10。 Matching raw audio signals with textual descriptions requires understanding the audio's content and the description's semantics and then drawing connections between the two modalities. This paper investigates a hybrid retrieval system that utilizes audio metadata as an additional clue to understand the content of audio signals before matching them with textual queries. We experimented with metadata often attached to audio recordings, such as keywords and natural-language descriptions, and we investigated late and mid-level fusion strategies to merge audio and metadata. Our hybrid approach with keyword metadata and late fusion improved the retrieval performance over a content-based baseline by 2.36 and 3.69 pp. mAP@10 on the ClothoV2 and AudioCaps benchmarks, respectively.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 学習システムにおける競合者間の欠陥のない協調 Defection-Free Collaboration between Competitors in a Learning System ( http://arxiv.org/abs/2406.15898v1 ) ライセンス: Link先を確認	Mariel Werner, Sai Praneeth Karimireddy, Michael I. Jordan,	(参考訳) 協力によって収益を失うと、参加者がシステムから外れる競合相手となる協調学習システムについて検討する。そこで我々は,このシステムを,それぞれが機械学習モデルをトレーニングし,その予測を消費者市場に販売する競合企業の二重企業として位置づける。まず、両社がモデルを互いに共有する、完全に協調的なスキームを調べ、両社の収益がゼロになるにつれて市場が崩壊することを示す。次に、低品質モデルを共有する企業のみによる一方的なコラボレーションによって、両社の収益が向上することを示す。最後に,両企業が収益を損なわずに相互に共有する,より公平な欠陥のないスキームを提案し,我々のアルゴリズムがナッシュ取引ソリューションに収束することを示す。 We study collaborative learning systems in which the participants are competitors who will defect from the system if they lose revenue by collaborating. As such, we frame the system as a duopoly of competitive firms who are each engaged in training machine-learning models and selling their predictions to a market of consumers. We first examine a fully collaborative scheme in which both firms share their models with each other and show that this leads to a market collapse with the revenues of both firms going to zero. We next show that one-sided collaboration in which only the firm with the lower-quality model shares improves the revenue of both firms. Finally, we propose a more equitable, defection-free scheme in which both firms share with each other while losing no revenue, and we show that our algorithm converges to the Nash bargaining solution.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 富田竹崎理論と量子コンカレンス Tomita-Takesaki theory and quantum concurrence ( http://arxiv.org/abs/2406.15900v1 ) ライセンス: Link先を確認	Rupak Chatterjee,	(参考訳) 共起の量子エンタングルメント測度は、2つの量子系に対して観測可能な局所フォン・ノイマン代数から構築されたトミタ・タケサキモジュラー作用素フレームワークから直接計算可能であることが示されている。具体的には、トミータ・タケサキモジュラー共役作用素$J$は、フォン・ノイマン代数に関して2つの異なる系を結び、これらの系から成り立つ純粋な双変数交絡状態の量子収束$C$と関係している。この収束関係は、対称性作用素とエンタングルメントの定量的測度の両方として$J$に直接物理的意味を与える。この手順は超対称量子力学系と、2つの絡み合ったスピン-$\frac{1}{2}$ Unruh-DeWitt 量子ビット検出器と相互作用する実スカラー場に対して実証される。後者の系では、収束結果はそのような系に対するベル-CHSHの不等式に関する既知の結果と一致することが示されている。 The quantum entanglement measure of concurrence is shown to be directly calculable from a Tomita- Takesaki modular operator framework constructed from the local von Neumann algebras of observables for two quantum systems. Specifically, the Tomita-Takesaki modular conjugation operator $J$ that links two separate systems with respect to their von Neumann algebras is related to the quantum concurrence $C$ of a pure bi-variate entangled state composed from these systems. This concurrence relation provides a direct physical meaning to $J$ as both a symmetry operator and a quantitative measure of entanglement. This procedure is then demonstrated for a supersymmetric quantum mechanical system and a real scalar field interacting with two entangled spin-$\frac{1}{2}$ Unruh-DeWitt qubit detectors. For the latter system, the concurrence result is shown to be consistent with some known results on the Bell-CHSH inequality for such a system.	翻訳日:2024-06-25 20:03:15 公開日:2024-06-22
# 概念が変わるときの学習 - 定義、不変性、次元削減 Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction ( http://arxiv.org/abs/2406.15904v1 ) ライセンス: Link先を確認	Kulunu Dharmakeerthi, YoonHaeng Hur, Tengyuan Liang,	(参考訳) 実践者は、共変量と応答の共分散がシフトした新しい環境で、学習した予測モデルをデプロイすることが多い。観測データでは、分布シフトは環境に潜む観測されていない共起因子によって駆動され、基礎となるメカニズムは未知である。コンバウンディングは、最良の予測モデル(概念シフト)の定義を曖昧にし、共変分をまだ見えない領域(共変分数シフト)にシフトすることができる。したがって、ソース環境における予測精度を最大化するモデルは、ターゲット環境においてかなりの精度低下を被る可能性がある。これは、ソース環境からラベル付き共変量と応答ペアが与えられたり、ターゲット環境からラベル付き共変量がある場合、欠落したターゲット応答を確実に予測するにはどうすればよいかという、観察データによるドメイン適応問題の研究を動機付けます。線形構造因果モデルに適応問題を根付き、内在性と観測不能な共起に対処する。本研究では,外因性で不変な共変量表現を活用して概念シフトを修復し,目標予測を改善する必要性とメリットについて検討する。これはさらに、低次元の線形部分空間に最適化する適応のための新しい表現学習法を動機付け、その後、その部分空間に限定した予測モデルを生み出す。この手順は、スティーフェル多様体上での予測可能性と安定性/不変性の間に自然に補間する非凸客観的関数で作用する。最適化のランドスケープを研究し、正規化が十分であれば、ほとんどすべての局所最適化は、概念と共変量シフトの両方にレジリエントな不変線型部分空間と整合することを示す。予測可能性の観点からは,学習した低次元部分空間を用いて,目標とソースリスクのほぼ理想的なギャップを生じさせるモデルを示す。提案手法と理論を検証するために,実世界の3つのデータセットについて検討した。 Practitioners often deploy a learned prediction model in a new environment where the joint distribution of covariate and response has shifted. In observational data, the distribution shift is often driven by unobserved confounding factors lurking in the environment, with the underlying mechanism unknown. Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. This motivates us to study the domain adaptation problem with observational data: given labeled covariate and response pairs from a source environment, and unlabeled covariates from a target environment, how can one predict the missing target response reliably? We root the adaptation problem in a linear structural causal model to address endogeneity and unobserved confounding. We study the necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction. This further motivates a new representation learning method for adaptation that optimizes for a lower-dimensional linear subspace and, subsequently, a prediction model confined to that subspace. The procedure operates on a non-convex objective-that naturally interpolates between predictability and stability/invariance-constrained on the Stiefel manifold. We study the optimization landscape and prove that, when the regularization is sufficient, nearly all local optima align with an invariant linear subspace resilient to both concept and covariate shift. In terms of predictability, we show a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk. Three real-world data sets are investigated to validate our method and theory.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# OpticGAI:光ネットワーク最適化のためのAI支援深層強化学習 OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization ( http://arxiv.org/abs/2406.15906v1 ) ライセンス: Link先を確認	Siyuan Li, Xi Lin, Yaju Liu, Gaolei Li, Jianhua Li,	(参考訳) 深層強化学習(DRL)は光ネットワーク最適化のための有望なツールであると考えられている。しかし、光ネットワーク最適化のための現在のDRLベースのソリューションの柔軟性と効率性にはさらなる改善が必要である。現在、生成モデルは様々な領域で大きなパフォーマンス上のアドバンテージを示しています。本稿では,光学ネットワークのためのAI生成ポリシー設計パラダイムであるOpticGAIを紹介する。具体的には、生成モデルを利用して最適なポリシーネットワークを学習する新しいDRLフレームワークとして実装されている。さらに,2つのNPハード光ネットワーク問題であるルーティングと波長アサインメント(RWA)と動的ルーティング,変調,スペクトルアロケーション(RMSA)におけるOpticGAIの性能を評価し,AI生成ポリシーパラダイムの実現可能性を示す。シミュレーションの結果, OpticGAI は RWA と RMSA の両問題の中で最も高い報酬とブロッキング率を達成していることがわかった。 OpticGAIは、生成AIによって強化されたフレキシブルな光ネットワーク最適化に関する将来の研究に有望な方向を示す。 Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated policy design paradigm for optical networks. In detail, it is implemented as a novel DRL framework that utilizes generative models to learn the optimal policy network. Furthermore, we assess the performance of OpticGAI on two NP-hard optical network problems, Routing and Wavelength Assignment (RWA) and dynamic Routing, Modulation, and Spectrum Allocation (RMSA), to show the feasibility of the AI-generated policy paradigm. Simulation results have shown that OpticGAI achieves the highest reward and the lowest blocking rate of both RWA and RMSA problems. OpticGAI poses a promising direction for future research on generative AI-enhanced flexible optical network optimization.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# CTからMRIへのソフトマッシュマンバ拡散モデル Soft Masked Mamba Diffusion Model for CT to MRI Conversion ( http://arxiv.org/abs/2406.15910v1 ) ライセンス: Link先を確認	Zhenbin Wang, Lei Zhang, Lituan Wang, Zhenwei Zhang,	(参考訳) 磁気共鳴イメージング (MRI) とCT (CT) は, 医用画像の分野で主に用いられている。 MRIはCTより詳細な解剖学的構造の複雑さを捉えているが、費用は高く、画像取得時間も長い。本研究では,一般的に使用されているU-NetあるいはTransformerのバックボーンを,潜時パッチで動作するMambaと呼ばれる状態空間モデル(SSM)に置き換え,CTからMRIへの遅延拡散モデルを訓練することを目的とする。まず, パッチトークンの空間的連続性に対する不適切な注意や, 対象タスクに対する重要度の変化に対する考慮の欠如など, マンバをベースとした視覚的手法のスキャン手法における重要な点について述べる。第2に,Diffusion Mamba (DiffMa)を導入し,Mambaにクロスシーケンスアテンションを統合し,スパイラルな方法で選択的スキャンを行う。最後に、医用画像生成タスクにおけるDiffMaによる印象的なパフォーマンスを広範な実験で示しており、既存のベンチマークモデルよりも入力スケーリング効率に顕著な利点がある。コードとモデルはhttps://github.com/wongzbb/DiffMa-Diffusion-Mambaで公開されている。 Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the predominant modalities utilized in the field of medical imaging. Although MRI capture the complexity of anatomical structures with greater detail than CT, it entails a higher financial costs and requires longer image acquisition times. In this study, we aim to train latent diffusion model for CT to MRI conversion, replacing the commonly-used U-Net or Transformer backbone with a State-Space Model (SSM) called Mamba that operates on latent patches. First, we noted critical oversights in the scan scheme of most Mamba-based vision methods, including inadequate attention to the spatial continuity of patch tokens and the lack of consideration for their varying importance to the target task. Secondly, extending from this insight, we introduce Diffusion Mamba (DiffMa), employing soft masked to integrate Cross-Sequence Attention into Mamba and conducting selective scan in a spiral manner. Lastly, extensive experiments demonstrate impressive performance by DiffMa in medical image generation tasks, with notable advantages in input scaling efficiency over existing benchmark models. The code and models are available at https://github.com/wongzbb/DiffMa-Diffusion-Mamba	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 信用属性と安定圧縮 Credit Attribution and Stable Compression ( http://arxiv.org/abs/2406.15916v1 ) ライセンス: Link先を確認	Roi Livni, Shay Moran, Kobbi Nissim, Chirag Pabbaraju,	(参考訳) 信用属性は様々な分野において重要である。学術研究において、適切な引用は事前の作業を認め、オリジナルの貢献を確立する。同様に、既存の芸術作品や音楽で訓練された作品などの生成モデルでは、これらの作品に影響された生成されたコンテンツが、原作者を適切に信用することが重要である。我々は、機械学習アルゴリズムによる信用属性の研究を行う。我々は、$k$のデータポイントの指定されたサブセットに対する安定性保証を弱める新しい定義-差分プライバシーの緩和-を提案する。これらの$k$のデータポイントは、所有者の許可を得て、潜在的に報酬と引き換えに、安定して使用することができる。一方、残りのデータポイントはアルゴリズムの出力に大きな影響を与えないことが保証されている。我々のフレームワークは、よく研究されている安定性の概念を拡張しています。例えば、差分プライバシー(k = 0$)、パブリックデータによる差分プライベートな学習($k$公開データポイントが事前に固定されている)、安定したサンプル圧縮($k$データポイントがアルゴリズムによって適応的に選択される)です。 PAC学習フレームワークにおけるこれらの安定性概念の表現力について検討し、これらの原則に忠実なアルゴリズムの学習可能性の包括的評価を提供し、今後の研究の方向性と課題を提案する。 Credit attribution is crucial across various fields. In academic research, proper citation acknowledges prior work and establishes original contributions. Similarly, in generative models, such as those trained on existing artworks or music, it is important to ensure that any generated content influenced by these works appropriately credits the original creators. We study credit attribution by machine learning algorithms. We propose new definitions--relaxations of Differential Privacy--that weaken the stability guarantees for a designated subset of $k$ datapoints. These $k$ datapoints can be used non-stably with permission from their owners, potentially in exchange for compensation. Meanwhile, the remaining datapoints are guaranteed to have no significant influence on the algorithm's output. Our framework extends well-studied notions of stability, including Differential Privacy ($k = 0$), differentially private learning with public data (where the $k$ public datapoints are fixed in advance), and stable sample compression (where the $k$ datapoints are selected adaptively by the algorithm). We examine the expressive power of these stability notions within the PAC learning framework, provide a comprehensive characterization of learnability for algorithms adhering to these principles, and propose directions and questions for future research.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# Disentangled Counterfactual Visual Interpreter (DISCOVER) は自然画像に一般化する DISentangled Counterfactual Visual interpretER (DISCOVER) generalizes to natural images ( http://arxiv.org/abs/2406.15918v1 ) ライセンス: Link先を確認	Oded Rotem, Assaf Zaritsky,	(参考訳) 画像ベース分類モデルの系統的視覚的解釈性を示すDISCOVER(Disentangled Counterfactual Visual InterpretER)を提案し,その2つの生体領域への適用性を示した。ここでは自然画像の領域に適用できることを実証する。第一に、ディスコバーは鼻の大きさ、銃口面積、顔の大きさを、犬と猫の顔画像間で区別する意味的識別的視覚特性として視覚的に解釈した。第二に、ディスコバーは頬、顎、額、髪、眼を識別的な顔の特徴として視覚的に解釈した。これらの2つの自然画像領域における視覚的解釈の成功は、disCOVERが一般化された解釈可能性法であることを示唆している。 We recently presented DISentangled COunterfactual Visual interpretER (DISCOVER), a method toward systematic visual interpretability of image-based classification models and demonstrated its applicability to two biomedical domains. Here we demonstrate that DISCOVER can be applied to the domain of natural images. First, DISCOVER visually interpreted the nose size, the muzzle area, and the face size as semantic discriminative visual traits discriminating between facial images of dogs versus cats. Second, DISCOVER visually interpreted the cheeks and jawline, eyebrows and hair, and the eyes, as discriminative facial characteristics. These successful visual interpretations across two natural images domains indicate that DISCOVER is a generalized interpretability method.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# SEDMamba: ロボット支援手術における効率的なエラー検出のためのボツネック機構と微細から粗い時間融合による選択的状態空間モデルの実現 SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery ( http://arxiv.org/abs/2406.15920v1 ) ライセンス: Link先を確認	Jialang Xu, Nazir Sirajudeen, Matthew Boal, Nader Francis, Danail Stoyanov, Evangelos Mazomenos,	(参考訳) 外科的エラーの自動検出は、ロボット支援手術を改善することができる。期待された進歩にもかかわらず、既存の手法は計算効率を保ちながら長期的な依存関係を確立するために、豊富な時間的コンテキストを捉えるという課題に直面している。本稿では,選択状態空間モデル(SSM)を外科的エラー検出に組み込んだSEDMambaという新しい階層モデルを提案する。 SEDMambaは、長期ビデオにおける外科的エラーの検出と時間的局所化のために、ボトルネック機構と微細な時間的融合(FCTF)を備えた選択的SSMを強化する。ボトルネック機構は空間次元内の特徴を圧縮して復元し、計算複雑性を低減させる。 FCTFは、複数の拡張された1D畳み込み層を使用して、様々なスケール範囲にわたる時間情報をマージし、様々な期間のエラーを調節する。さらに、オープンソースの根治的前立腺切除術データセット(SAR-RARP50)における縫合作業の誤りを注釈するために、確立された観察的臨床人間信頼性評価ツール(OCHRA)をデプロイし、実世界のシナリオにおけるエラー検出を支援するために、第1フレームレベルの外科的エラー検出データセットを構築した。実験の結果,SEDMambaはAUCが1.82%,AP性能が3.80%,計算複雑性が大幅に低下した状態で,最先端の手法よりも優れていた。 Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying durations. Besides, we deploy an established observational clinical human reliability assessment tool (OCHRA) to annotate the errors of suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50), constructing the first frame-level in-vivo surgical error detection dataset to support error detection in real-world scenarios. Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gain with significantly reduced computational complexity.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# PUDD:ロバストなマルチモーダルプロトタイプベースディープフェイク検出に向けて PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection ( http://arxiv.org/abs/2406.15921v1 ) ライセンス: Link先を確認	Alvaro Lopez Pellcier, Yi Li, Plamen Angelov,	(参考訳) ディープフェイク技術は、非常にリアルなデータを生成するため、人間が実際の画像と人工的な画像を区別することは困難である。近年,ディープラーニングに基づくディープフェイク検出法,特に拡散モデルが進歩している。しかし、見えない個人、ディープフェイク技術、シナリオを検出する現実世界のアプリケーションに対する需要は増えている。本稿では,この制限に対処するため,Pepfake Detection (PUDD) のためのプロトタイプベースの統一フレームワークを提案する。 PUDDは類似性に基づく検出システムを提供し、入力データをビデオ分類のための既知のプロトタイプと比較し、類似性のあるドロップを分析して、潜在的なディープフェイクや以前は見つからなかったクラスを特定する。 1) PUDDはCeleb-DFで95.1%の精度を実現し,(2) PUDDはトレーニング中の上流タスクとして画像分類を活用し,推論中の画像分類と深度検出タスクの両方において有望な性能を示す。 Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10$^{5}$ times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# ロバストな自律着陸滑走路検出のためのフェデレーション逆学習 Federated Adversarial Learning for Robust Autonomous Landing Runway Detection ( http://arxiv.org/abs/2406.15925v1 ) ライセンス: Link先を確認	Yi Li, Plamen Angelov, Zhengxin Yu, Alvaro Lopez Pellicer, Neeraj Suri,	(参考訳) 自律着陸システムにおけるディープラーニング技術の開発が成長を続ける中、大きな課題の1つは、敵の攻撃の可能性がある場合の信頼とセキュリティである。本論文では,クリーンなローカルデータとその逆バージョンからなるペアデータを用いて着陸滑走路を検出するための,連合型対角学習に基づくフレームワークを提案する。まず、局所モデルは大規模レーン検出データセット上で事前訓練される。そこで我々は,大規模なインスタンス適応モデルを活用する代わりに,事前学習モデルに基づいて,スケール・アンド・シフト・ディープ・フィーチャー(SSF)と呼ばれるパラメータ効率の高い微調整手法を用いる。第2に、各SSF層において、正確な統計推定のために、クリーンなローカルデータの分布とその逆バージョンが切り離されている。我々の知る限りでは、着陸滑走路検出における対向的なサンプル問題に対処する連合学習の事例としては、これが初めてである。ランディング・アプローチ・ランウェイ検出(LARD)データセットの合成と実画像の比較実験により, 提案した対角学習の優れた性能と, 対角攻撃に対する堅牢性について一貫した評価を行った。 As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pre-trained on a large-scale lane detection dataset. Then, instead of exploiting large instance-adaptive models, we resort to a parameter-efficient fine-tuning method known as scale and shift deep features (SSF), upon the pre-trained model. Secondly, in each SSF layer, distributions of clean local data and its adversarial version are disentangled for accurate statistics estimation. To the best of our knowledge, this marks the first instance of federated learning work that address the adversarial sample problem in landing runway detection. Our experimental evaluations over both synthesis and real images of Landing Approach Runway Detection (LARD) dataset consistently demonstrate good performance of the proposed federated adversarial learning and robust to adversarial attacks.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# セマンティックエントロピープローブ : LLMにおけるロバストおよびチープ幻覚検出 Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs ( http://arxiv.org/abs/2406.15927v1 ) ライセンス: Link先を確認	Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal,	(参考訳) 本研究では,Large Language Models (LLMs) における不確実性定量化手法である意味エントロピープローブ (SEPs) を提案する。幻覚は、もっともらしい音質であるが、実際は誤りであり、任意のモデル世代である。 Farquhar et al (2024) による最近の研究で意味論的エントロピー (SE) が提案されている。しかし、SE計算に伴う計算コストの5倍から10倍の増加は、実用化を妨げている。この問題に対処するため,一世代の隠れ状態からSEを直接近似するSEPを提案する。 SEPは訓練が簡単で、テスト時に複数のモデル生成をサンプリングする必要がなく、セマンティックな不確実性定量化のオーバーヘッドをほぼゼロに減らす。モデル精度を直接予測する従来の探索手法に比べて,SEPは幻覚検出の性能を保ち,分布外データに優れることを示す。我々のモデルとタスクにわたる結果は、モデルが隠された状態がSEを捉えていることを示唆し、私たちのアブレーション研究はトークンの位置とモデル層についてさらなる洞察を与えます。 We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 強化学習によるマイクロリアクターの多段階臨界探索と電力整形 Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning ( http://arxiv.org/abs/2406.15931v1 ) ライセンス: Link先を確認	Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias,	(参考訳) 運転コストとメンテナンスコストの削減は、一般的な先進的な原子炉と特にマイクロリアクターにとって重要な目標である。この削減を実現するためには、安全かつ自律的な原子炉運転を確保するために、堅牢な自律制御アルゴリズムの開発が不可欠である。近年、人工知能と機械学習アルゴリズム、特に強化学習(RL)アルゴリズムは、融合トカマクにおけるプラズマ制御やエネルギー管理などの制御問題に急速に応用されている。本稿では,原子力マイクロリアクターのインテリジェント制御におけるRLの利用について紹介する。 RLエージェントは、ウェスティングハウスeVinci\textsuperscript{TM}設計にインスパイアされたマイクロリアクター設計の高精度なシミュレーションに基づいて、PPOとA2C、最先端の深部RL技術を用いて訓練される。我々は、Serpentモデルを用いて、ドラム位置、コア臨界度、コア電力分布のデータを生成し、フィードフォワードニューラルネットワークサロゲートモデルをトレーニングした。このサロゲートモデルを用いて、PPOおよびA2C制御ポリシーを導出し、様々な原子炉燃焼状態における最適なドラム位置を決定し、臨界コア条件と6つのコア部分すべてに対称的な電力分布を確保する。その結果, 最適ドラム位置同定におけるPPOの優れた性能, 約1.002($<$1.02)のヘクタントパワー傾き比を実現し, 臨界度を10cmの範囲で維持できることが示唆された。 A2Cは、サイクルで考慮されたすべてのバーンアップステップのパフォーマンス指標に関して、PPOほどパフォーマンスの競争力を提供していません。さらに、よく訓練されたRL制御ポリシーが制御動作を迅速に識別する能力を強調し、デジタル双生児によるリアルタイム自律制御を可能にするための有望なアプローチを提案する。 Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid increased application to control problems, such as plasma control in fusion tokamaks and building energy management. In this work, we introduce the use of RL for intelligent control in nuclear microreactors. The RL agent is trained using proximal policy optimization (PPO) and advantage actor-critic (A2C), cutting-edge deep RL techniques, based on a high-fidelity simulation of a microreactor design inspired by the Westinghouse eVinci\textsuperscript{TM} design. We utilized a Serpent model to generate data on drum positions, core criticality, and core power distribution for training a feedforward neural network surrogate model. This surrogate model was then used to guide a PPO and A2C control policies in determining the optimal drum position across various reactor burnup states, ensuring critical core conditions and symmetrical power distribution across all six core portions. The results demonstrate the excellent performance of PPO in identifying optimal drum positions, achieving a hextant power tilt ratio of approximately 1.002 (within the limit of $<$ 1.02) and maintaining criticality within a 10 pcm range. A2C did not provide as competitive of a performance as PPO in terms of performance metrics for all burnup steps considered in the cycle. Additionally, the results highlight the capability of well-trained RL control policies to quickly identify control actions, suggesting a promising approach for enabling real-time autonomous control through digital twins.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 意識に基づく畳み込みニューラルネットワークを用いたSQLクエリ自動解析システム An Automated SQL Query Grading System Using An Attention-Based Convolutional Neural Network ( http://arxiv.org/abs/2406.15936v1 ) ライセンス: Link先を確認	Donald R. Schwartz, Pablo Rivas,	(参考訳) SQLクエリのグラディングは、特に学生の応募数が増加するにつれて、時間がかかり、面倒で困難なタスクになる可能性がある。これらの課題を軽減するために、いくつかのシステムが導入されたが、これらのシステムには独自の制限がある。本稿では,SQLクエリのグレード処理を自動化する新しいアプローチについて述べる。従来のアプローチとは異なり、我々は独自の畳み込みニューラルネットワークアーキテクチャを採用しており、異なる機械学習タスクに対してパラメータ共有アプローチを用いて、アーキテクチャがデータの異なる知識表現を誘導し、SQL文を理解する可能性を高めることができる。 Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique convolutional neural network architecture that employs a parameter-sharing approach for different machine learning tasks that enables the architecture to induce different knowledge representations of the data to increase its potential for understanding SQL statements.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# RuleR:ルールベースのデータリサイクルによるLCM制御性の向上 RuleR: Improving LLM Controllability by Rule-based Data Recycling ( http://arxiv.org/abs/2406.15938v1 ) ライセンス: Link先を確認	Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou,	(参考訳) 大規模言語モデル(LLM)は応答に対する微妙な制御性に欠けており、パフォーマンスとユーザエクスペリエンスの向上に不可欠である。しかしながら、LLMの制御性を改善するための教師付き微調整(SFT)データセットのキュレーションは通常、追加のコストを必要とする人間の専門家やプロプライエタリなLSMに依存している。このギャップを埋めるため,ルールベースのデータリサイクリング(RuleR)を提案し,複数の制約を予め定義されたルールに従って元のデータサンプルに組み込んだデータ拡張手法を提案する。スクラッチから新しいデータを生成する代わりに、ルールベースの編集をそのレスポンスに単純に適用し、元の命令にルール命令を追加することで、ルールR ``recycles' の既存のデータを生成する。一般的な指示追従能力を維持しつつ,LLM制御性の向上におけるルールRの有効性を示す実験結果が得られた。コードはhttps://github.com/MingLiiii/RuleR.comでリリースされる。 Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities. The code will be released on https://github.com/MingLiiii/RuleR.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 個々の要因を超えて:GPTモデルにおける分類学と気象概念の分類学的知識の局所性を探る Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models ( http://arxiv.org/abs/2406.15940v1 ) ライセンス: Link先を確認	Christopher Burger, Yifan Hu, Thai Le,	(参考訳) GPT(Generative Pre-trained Transformer)のようなモデルにおける知識の位置は、近年広く研究されている。しかし、研究の多くは個々の事実の場所を決定することに重点を置いており、最終的なゴールは、モデル全体を再トレーニングする時間と費用なしで、時代遅れ、誤った、あるいは有害な事実を編集することである。本研究では,個々の事実を異にするのではなく,知識の場所,概念や関連する情報の集合に関する広い視点を考察する。そこで我々はまず,約120Kの事実記述を,分類学と気象学という2つの階層的カテゴリに分けた34の概念を含む,DARCと呼ばれる新しいデータセットをキュレートした。次に,個々の事実の重要領域を決定するために開発された既存の因果媒介分析手法を応用し,それらを一連の関連カテゴリに適用し,概念がこれらのモデル内の異なる領域に関連付けられているかどうかを詳細に調査する。関連カテゴリは、類似の少ないカテゴリとは対照的に、類似した重要な領域を示す。しかし、個々の圏の部分集合の特定の領域への微粒化は明らかではない。 The location of knowledge within Generative Pre-trained Transformer (GPT)-like models has seen extensive recent investigation. However, much of the work is focused towards determining locations of individual facts, with the end goal being the editing of facts that are outdated, erroneous, or otherwise harmful, without the time and expense of retraining the entire model. In this work, we investigate a broader view of knowledge location, that of concepts or clusters of related information, instead of disparate individual facts. To do this, we first curate a novel dataset, called DARC, that includes a total of 34 concepts of ~120K factual statements divided into two types of hierarchical categories, namely taxonomy and meronomy. Next, we utilize existing causal mediation analysis methods developed for determining regions of importance for individual facts and apply them to a series of related categories to provide detailed investigation into whether concepts are associated with distinct regions within these models. We find that related categories exhibit similar areas of importance in contrast to less similar categories. However, fine-grained localization of individual category subsets to specific regions is not apparent.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 誘導バイアスの厳密な計算に向けて Towards Exact Computation of Inductive Bias ( http://arxiv.org/abs/2406.15941v1 ) ライセンス: Link先を確認	Akhilan Boopathy, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete,	(参考訳) 機械学習における多くの研究は、タスクの一般化を促進するために適切な帰納バイアス(例えば畳み込みニューラルネットワーク、モーメントベースのオプティマイザ、トランスフォーマー)を見つけることである。しかしながら、これらのアーキテクチャやハイパーパラメータに関連する帰納バイアスの量の定量化は限られている。本稿では,与えられたトレーニングデータ予算を用いてタスクの一般化に必要な帰納バイアスを効率的に計算する手法を提案する。提案手法では、仮説空間から引き出されたランダム仮説の損失分布をモデル化し、これらの仮説に対するタスクに必要な帰納バイアスを推定する。従来の研究とは異なり、本手法は有界性を用いることなく帰納的バイアスを直接推定し、多様な仮説空間に適用できる。さらに, サンプル仮説の数から近似誤差境界を導出する。先行結果と一致して、我々の経験的結果は、高次元のタスクはより帰納的バイアスを必要とすることを示した。本稿では,他の表現型モデルクラスと比較して,モデルクラスとしてのニューラルネットワークが大量の帰納バイアスを符号化していることを示す。さらに,ニューラルネットワークアーキテクチャ間の帰納バイアスの相対的差を定量化する。提案した帰納的バイアス尺度は,特定のタスクに対する特定のモデルアーキテクチャの利点を情報理論で解釈し,より強力な帰納的バイアスを必要とするタスクを開発するための定量的ガイドを提供する。 Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# LaneSegNet設計研究 LaneSegNet Design Study ( http://arxiv.org/abs/2406.15946v1 ) ライセンス: Link先を確認	William Stevens, Vishal Urs, Karthik Selvaraj, Gabriel Torres, Gaurish Lakhanpal,	(参考訳) 自動運転車の普及に伴い、コンピュータビジョンアルゴリズムはリアルタイムで道路の特徴を正確に評価することが不可欠である。本研究では,車線情報と車線データを統合して道路環境をより文脈的に理解する,車線トポロジー予測の新しいアプローチであるレーンセグネットアーキテクチャについて検討する。 LaneSegNetアーキテクチャには、機能抽出器、レーンエンコーダ、レーンデコーダ、予測ヘッドが含まれており、ResNet-50、BEVFormer、および様々な注意機構のコンポーネントを活用している。特徴抽出器およびトランスフォーマーエンコーダ-デコーダスタック修正によるLaneSegNetアーキテクチャの最適化実験を行った。エンコーダスタックとデコーダスタックを変更することで、トレーニング時間と予測精度の間に興味深いトレードオフが生じ、いくつかの組み合わせが有望な結果を示していることがわかった。我々の実装は1台のNVIDIA Tesla A100 GPUでトレーニングされ、2:4の比率でトレーニング時間を22.3%削減し、平均的精度は7.1%しか低下せず、4:8の比率でトレーニング時間を11.1%向上しなかったが、平均的精度は23.7%向上した。これらの結果から, 戦略的ハイパーパラメータチューニングは, 利用者の資源によって大幅に改善される可能性が示唆された。この研究は、利用可能な計算能力に応じてLaneSegNetを最適化し、限られたリソースを持つユーザにとってよりアクセスしやすくし、より強力なリソースを持つユーザの能力を高めるための貴重な洞察を提供する。 With the increasing prevalence of autonomous vehicles, it is essential for computer vision algorithms to accurately assess road features in real-time. This study explores the LaneSegNet architecture, a new approach to lane topology prediction which integrates topological information with lane-line data to provide a more contextual understanding of road environments. The LaneSegNet architecture includes a feature extractor, lane encoder, lane decoder, and prediction head, leveraging components from ResNet-50, BEVFormer, and various attention mechanisms. We experimented with optimizations to the LaneSegNet architecture through feature extractor modification and transformer encoder-decoder stack modification. We found that modifying the encoder and decoder stacks offered an interesting tradeoff between training time and prediction accuracy, with certain combinations showing promising results. Our implementation, trained on a single NVIDIA Tesla A100 GPU, found that a 2:4 ratio reduced training time by 22.3% with only a 7.1% drop in mean average precision, while a 4:8 ratio increased training time by only 11.1% but improved mean average precision by a significant 23.7%. These results indicate that strategic hyperparameter tuning can yield substantial improvements depending on the resources of the user. This study provides valuable insights for optimizing LaneSegNet according to available computation power, making it more accessible for users with limited resources and increasing the capabilities for users with more powerful resources.	翻訳日:2024-06-25 19:53:14 公開日:2024-06-22
# 多言語フィードバックによる言語横断のLLM教育 Teaching LLMs to Abstain across Languages via Multilingual Feedback ( http://arxiv.org/abs/2406.15948v1 ) ライセンス: Link先を確認	Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Orevaoghene Ahia, Shuyue Stella Li, Vidhisha Balachandran, Sunayana Sitaram, Yulia Tsvetkov,	(参考訳) マルチリンガル LLM は、しばしば言語間での知識格差があり、リソース不足の言語では大きなギャップがある。したがって、LLMが知識ギャップに直面することを禁ずることは、多言語環境における幻覚を軽減するための有望な戦略である。しかし、LLMの禁止に関する以前の研究は主に英語に焦点を合わせており、既存のソリューションを英語以外に直接適用すると、高リソース言語と低リソース言語の間に最大20.5%のパフォーマンスギャップが生じることが判明した。この目的のために,多言語からのフィードバックから学習し,LLMが複数のフィードバック項目を関連言語で生成することで,一つの言語で提案された回答を自己認識する手法を提案し,多様な言語,文化,コミュニティ間の知識ギャップを識別する上で有効であることを示す。大規模な実験により、我々の多言語フィードバックアプローチは、さまざまな強力なベースラインを上回り、オープンブック、クローズドブック、コモンセンスQAを備えた3つのデータセット上の3つのブラックボックスおよびオープンモデルで、9.2%の低リソース言語の改善を実現している。さらに分析したところ、多言語フィードバックは多様な言語話者に役立てるための効果的かつ公平な棄権戦略であり、文化的要因は言語選択やLLMの棄権行動に大きな影響を与え、多言語および多文化の信頼できる言語モデリングの今後の方向性を強調している。 Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in up to 20.5% performance gaps between high and low-resource languages, potentially due to LLMs' drop in calibration and reasoning beyond a few resource-rich languages. To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines, achieving up to 9.2% improvement for low-resource languages across three black-box and open models on three datasets, featuring open-book, closed-book, and commonsense QA. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers, and cultural factors have great impact on language selection and LLM abstention behavior, highlighting future directions for multilingual and multi-cultural reliable language modeling.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# モジュール型多元主義:マルチLLMコラボレーションによる多元的アライメント Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration ( http://arxiv.org/abs/2406.15951v1 ) ライセンス: Link先を確認	Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, Yulia Tsvetkov,	(参考訳) 既存のアライメントパラダイムは大規模言語モデル(LLM)の開発に不可欠であるが、LLMは平均的な人間の好みを学習し、文化、人口統計学、地域社会の様々な好みをモデル化するのに苦労することが多い。多元的アライメントのためのマルチLLM協調に基づくモジュラーフレームワークであるModular Pluralismを提案する。LLMは小さなが特殊なコミュニティLMのプールであり、モデルが異なるモードで協調して3つの多元的モード(オーバートン、ステアブル、分散)をサポートする。 Modular Pluralism はブラックボックス LLM と一意に互換性があり、以前は表現されていないコミュニティに新しいコミュニティ LM を追加するモジュールコントロールを提供する。我々は,6つのタスクと4つのデータセットによるモジュール型多元性の評価を行った。広汎な実験により、モジュラー多元論は6つのブラックボックスとオープンソース LLM にまたがる3つの多元主義の目的を推し進めることを示した。さらなる分析により、LSMはより小さなコミュニティのLSMからのインプットに概して忠実であり、新しいコミュニティのLMを追加して、以前は表現されていなかったコミュニティをよりよくカバーすることでシームレスなパッチ適用を可能にすることが明らかとなった。 While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but specialized community LMs, where models collaborate in distinct modes to flexibility support three modes of pluralism: Overton, steerable, and distributional. Modular Pluralism is uniquely compatible with black-box LLMs and offers the modular control of adding new community LMs for previously underrepresented communities. We evaluate Modular Pluralism with six tasks and four datasets featuring questions/instructions with value-laden and perspective-informed responses. Extensive experiments demonstrate that Modular Pluralism advances the three pluralism objectives across six black-box and open-source LLMs. Further analysis reveals that LLMs are generally faithful to the inputs from smaller community LLMs, allowing seamless patching by adding a new community LM to better cover previously underrepresented communities.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# 知覚のドアを超えて:視覚変換器は物体間の関係を表現する Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects ( http://arxiv.org/abs/2406.15955v1 ) ライセンス: Link先を確認	Michael A. Lepori, Alexa R. Tartaglini, Wai Keen Vong, Thomas Serre, Brenden M. Lake, Ellie Pavlick,	(参考訳) 視覚変換器(ViT)は様々な環境で最先端のパフォーマンスを達成しているが、視覚的関係を含むタスクを実行する際に驚くほどの失敗を見せている。 ViTはどのようにしてオブジェクト間の視覚的関係の計算を必要とするタスクを実行しようとするのか? ViTを解釈する以前の取り組みは、関連する低レベルの視覚的特徴を特徴づけることに集中する傾向があった。対照的に、ViTが抽象的な視覚的推論を行うために使用する高レベルな視覚的アルゴリズムを研究するために、機械論的解釈可能性の手法を採用する。本稿では,2つの視覚的実体が同一であるか異なるのかを判断する,基本的な,しかし驚くほど難しい,関係推論タスクのケーススタディを示す。私たちは、このタスクで微調整された事前訓練されたViTは、明らかに誘導バイアスがないにもかかわらず、2つの質的に異なる処理段階を示すことが多いことに気付きました。 1) 局所対象物の特徴を抽出し、歪んだ表現に記憶する知覚段階 2)オブジェクト表現の比較を行う関係段階。第2段階では、ViTsがある程度抽象的な視覚関係を表現することができるという証拠が見つかる。最後に、各段階での障害点が、モデルが極めて単純なタスクに対して一般化可能な解を学ぶのを防ぐことを実証する。離散処理段階の観点からViTを理解することで、既存のモデルと将来のモデルの欠点をより正確に診断し、修正することができる。 Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform tasks that require computing visual relations between objects? Prior efforts to interpret ViTs tend to focus on characterizing relevant low-level visual features. In contrast, we adopt methods from mechanistic interpretability to study the higher-level visual algorithms that ViTs use to perform abstract visual reasoning. We present a case study of a fundamental, yet surprisingly difficult, relational reasoning task: judging whether two visual entities are the same or different. We find that pretrained ViTs fine-tuned on this task often exhibit two qualitatively different stages of processing despite having no obvious inductive biases to do so: 1) a perceptual stage wherein local object features are extracted and stored in a disentangled representation, and 2) a relational stage wherein object representations are compared. In the second stage, we find evidence that ViTs can learn to represent somewhat abstract visual relations, a capability that has long been considered out of reach for artificial neural networks. Finally, we demonstrate that failure points at either stage can prevent a model from learning a generalizable solution to our fairly simple tasks. By understanding ViTs in terms of discrete processing stages, one can more precisely diagnose and rectify shortcomings of existing and future models.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# Transfer Learning を用いた骨骨折の分類 Bone Fracture Classification using Transfer Learning ( http://arxiv.org/abs/2406.15958v1 ) ライセンス: Link先を確認	Shyam Gupta, Dhanisha Sharma,	(参考訳) 骨折に対するX線画像の手作業による検査は, 人体エラーに起因した時間を要するプロセスである。本研究では,骨折の分類に頑健だが簡単な訓練ループを導入し,既存の方法よりも優れていることを示す。提案手法は,10時間以内で優れた性能を達成し,最新のデータセットを用いて,そのタスクに最適な性能モデルを提供する。我々は,ディープラーニングモデルのトレーニングの重要性と,高品質なデータセットの選択における重要な役割を強調した。 The manual examination of X-ray images for fractures is a time-consuming process that is prone to human error. In this work, we introduce a robust yet simple training loop for the classification of fractures, which significantly outperforms existing methods. Our method achieves superior performance in less than ten epochs and utilizes the latest dataset to deliver the best-performing model for this task. We emphasize the importance of training deep learning models responsibly and efficiently, as well as the critical role of selecting high-quality datasets.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# 極端学習機械の重複しない領域分割法:楕円問題 A Nonoverlapping Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems ( http://arxiv.org/abs/2406.15959v1 ) ライセンス: Link先を確認	Chang-Ock Lee, Youngkyu Lee, Byungeun Ryoo,	(参考訳) エクストリーム・ラーニング・マシン(ELM)は、単一層フィードフォワードニューラルネットワークを用いて偏微分方程式(PDE)を解く手法である。隠れ層の重み/バイアス係数をランダムな値でプリセットし、計算全体を通して固定され、ニューラルネットワークの出力層のパラメータをトレーニングするために線形最小二乗法を使用する。物理情報ニューラルネットワークよりもはるかに高速であることが知られている。しかしながら、古典的EMMは、最小二乗系を解く必要があるため、解法において高いレベルの表現が要求される場合、依然として計算コストがかかる。本稿では,EMMのトレーニング時間を短縮するだけでなく,並列計算にも適する非重複領域分解法(DDM)を提案する。数値解析において、DDMは並列計算により楕円型PDEの有限要素解を得る時間を削減するために広く研究されている。これらのアプローチの中で、重複しないDDMが最も注目を集めている。これらの手法により、対応するサブドメインでのみ有効な局所ニューラルネットワークと、インタフェースにおける補助変数を導入する。ローカルニューラルネットワークの変数とパラメータに基づくシステムを構築する。インタフェース上のシュア補体系は、出力層のパラメータを排除して導出することができる。次に、各局所ニューラルネットワークのパラメータを並列に解いた減算系を解くことで、補助変数を直接取得する。また,大規模システムにおける高い近似品質に適した隠蔽層パラメータを初期化する手法を提案する。提案手法の高速化性能をサブドメイン数で検証する数値結果を示す。 Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is known to be much faster than Physics informed neural networks. However, classical ELM is still computationally expensive when a high level of representation is desired in the solution as this requires solving a large least squares system. In this paper, we propose a nonoverlapping domain decomposition method (DDM) for ELMs that not only reduces the training time of ELMs, but is also suitable for parallel computation. In numerical analysis, DDMs have been widely studied to reduce the time to obtain finite element solutions for elliptic PDEs through parallel computation. Among these approaches, nonoverlapping DDMs are attracting the most attention. Motivated by these methods, we introduce local neural networks, which are valid only at corresponding subdomains, and an auxiliary variable at the interface. We construct a system on the variable and the parameters of local neural networks. A Schur complement system on the interface can be derived by eliminating the parameters of the output layer. The auxiliary variable is then directly obtained by solving the reduced system after which the parameters for each local neural network are solved in parallel. A method for initializing the hidden layer parameters suitable for high approximation quality in large systems is also proposed. Numerical results that verify the acceleration performance of the proposed method with respect to the number of subdomains are presented.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# フェアクラスタリング - 批判とキャベツ,今後の方向性 Fair Clustering: Critique, Caveats, and Future Directions ( http://arxiv.org/abs/2406.15960v1 ) ライセンス: Link先を確認	John Dickerson, Seyed A. Esmaeili, Jamie Morgenstern, Claire Jie Zhang,	(参考訳) クラスタリングは、機械学習と運用研究における根本的な問題である。したがって、アルゴリズム設計において公平性が最重要視されていることを考えると、クラスタリングにおける公平性は研究コミュニティから大きな注目を集めている。フェアクラスタリングに関する文献は、興味深いフェアネスの概念と精巧なアルゴリズムのコレクションを生み出した。本稿では,フェアクラスタリングを批判的に捉え,明確なユーティリティ特性の欠如や,機械学習環境におけるフェアクラスタリングアルゴリズムの下流効果を考慮することの難しさなど,無視された問題の集合を同定する。いくつかのケースでは、公正クラスタリングアルゴリズムの適用が社会福祉に重大な影響を及ぼす例を示す。最終的に、公正クラスタリングにおけるより影響力のある研究につながるステップの集合を特定します。 Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# ファンクリアルデータ移動を用いたロボットタスクプランの自動移動 Automating Transfer of Robot Task Plans using Functorial Data Migrations ( http://arxiv.org/abs/2406.15961v1 ) ライセンス: Link先を確認	Angeline Aguinaldo, Evan Patterson, William Regli,	(参考訳) 本稿では,圏論からの関数的データ移動を用いたオントロジーに基づくロボット計画伝達手法を提案する。ファクタは、ソースドメインからターゲットドメインへのプランの転送に使用可能な、ドメインタイプと述語の間の構造化されたマップを提供する。特定のプランを転送するためのモデルを作成する方法とは異なり、我々のアプローチは特定のドメイン内の任意の計画に適用できる。本稿では,AI2-THOR Kitchen環境と互換性のあるタスクプランを標準Blocksworldドメインから移行することで,このアプローチを実証する。さらに,ロボット作業計画の適応性を高めるための実践的応用についても論じる。 This paper introduces a novel approach to ontology-based robot plan transfer using functorial data migrations from category theory. Functors provide structured maps between domain types and predicates which can be used to transfer plans from a source domain to a target domain without the need for replanning. Unlike methods that create models for transferring specific plans, our approach can be applied to any plan within a given domain. We demonstrate this approach by transferring a task plan from the canonical Blocksworld domain to one compatible with the AI2-THOR Kitchen environment. In addition, we discuss practical applications that may enhance the adaptability of robotic task planning in general.	翻訳日:2024-06-25 19:43:16 公開日:2024-06-22
# 大規模言語モデルによる旅行選択モデルの実現--Prompt-Learningアプローチ Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach ( http://arxiv.org/abs/2406.13558v2 ) ライセンス: Link先を確認	Xuehao Zhai, Hanlin Tian, Lintong Li, Tianyu Zhao,	(参考訳) 旅行選択分析は、知的交通システム(ITS)における適切な交通政策とレコメンデーションシステムを開発するために、個々の旅行行動を理解するために不可欠である。広範な研究にもかかわらず、この領域は2つの重要な課題に直面している。イ限られた調査データによるモデリング及びロ高いモデル説明可能性及び精度を同時に達成すること。本稿では,予測精度を大幅に向上させ,個々の予測に対して明確な説明を提供する,プロンプト学習に基づく大規模言語モデル(LLM)フレームワークを提案する。このフレームワークには、入力変数をテキスト形式に変換すること、オブジェクトに似たデモを構築すること、これらを十分に訓練されたLLMに適用すること、の3つの主要なステップが含まれている。スイスで収集されたLondon Passenger Mode Choice(LPMC)とOptima-Mode(Optima-Mode)の2つの選択肢データセットを用いて,フレームワークの有効性を検証した。その結果,LLMは人々の選択を予測する上で,最先端のディープラーニング手法や個別選択モデルよりも優れていたことが示唆された。さらに,LLMフレームワークが個々のレベルで理解し易く明示的な説明を生成する方法について解説する。 Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduce a novel prompt-learning-based Large Language Model(LLM) framework that significantly improves prediction accuracy and provides explicit explanations for individual predictions. This framework involves three main steps: transforming input variables into textual form; building of demonstrations similar to the object, and applying these to a well-trained LLM. We tested the framework's efficacy using two widely used choice datasets: London Passenger Mode Choice (LPMC) and Optima-Mode collected in Switzerland. The results indicate that the LLM significantly outperforms state-of-the-art deep learning methods and discrete choice models in predicting people's choices. Additionally, we present a case of explanation illustrating how the LLM framework generates understandable and explicit explanations at the individual level.	翻訳日:2024-06-25 13:26:35 公開日:2024-06-22
# 超高精細復元 : 新しいベンチマークとデュアルインタラクション優先型ソリューション Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution ( http://arxiv.org/abs/2406.13607v2 ) ライセンス: Link先を確認	Liyan Wang, Cong Wang, Jinshan Pan, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su,	(参考訳) 超高精細画像復元(UHD)は,その実用的需要から注目されている。本稿では, UHD-Snow と UHD-Rain という, UHD 雪と降雨のベンチマークを構築し, この分野での不足を解消する。 UHD-Snow/UHD-Rainは雨・雪の物理過程をシミュレーションして構築され、それぞれのベンチマークには4K解像度の3200の劣化/クラー画像対が含まれている。さらに,これらの先行画像の空間的および詳細的な寄与により,モデル設計の勾配や正規化を考慮し,有効なUHD画像復元ソリューションを提案する。具体的には,本手法は2つの枝を含む。 (a)高分解能空間における特徴融合再生枝 (b)低分解能空間における先行的特徴相互作用分岐。前者は高精細な特徴を学習し、前者は高精細な画像を再構成するために事前誘導された低精細な特徴を融合する。これらの先行処理をよりよく活用するために、前者は正常な特徴と勾配の先行処理を融合させ、後者は強化された先行処理の類似性を計算し、さらに二重誘導フィルタリングを利用して二重先行処理の特性相互作用を増強する、単一先行処理と二重先行処理を導入する。提案手法は,UHD画像の低照度化,UHD画像のデソイング,UHD画像のデコライニングについて,新規および既存両方の公開データセットの実験を行い,その最先端性能を実証する。ソースコードとベンチマークは \url{https://github.com/wlydlut/UHDDIP} で公開されている。 Ultra-High-Definition (UHD) image restoration has acquired remarkable attention due to its practical demand. In this paper, we construct UHD snow and rain benchmarks, named UHD-Snow and UHD-Rain, to remedy the deficiency in this field. The UHD-Snow/UHD-Rain is established by simulating the physics process of rain/snow into consideration and each benchmark contains 3200 degraded/clear image pairs of 4K resolution. Furthermore, we propose an effective UHD image restoration solution by considering gradient and normal priors in model design thanks to these priors' spatial and detail contributions. Specifically, our method contains two branches: (a) feature fusion and reconstruction branch in high-resolution space and (b) prior feature interaction branch in low-resolution space. The former learns high-resolution features and fuses prior-guided low-resolution features to reconstruct clear images, while the latter utilizes normal and gradient priors to mine useful spatial features and detail features to guide high-resolution recovery better. To better utilize these priors, we introduce single prior feature interaction and dual prior feature interaction, where the former respectively fuses normal and gradient priors with high-resolution features to enhance prior ones, while the latter calculates the similarity between enhanced prior ones and further exploits dual guided filtering to boost the feature interaction of dual priors. We conduct experiments on both new and existing public datasets and demonstrate the state-of-the-art performance of our method on UHD image low-light enhancement, UHD image desonwing, and UHD image deraining. The source codes and benchmarks are available at \url{https://github.com/wlydlut/UHDDIP}.	翻訳日:2024-06-25 13:26:35 公開日:2024-06-22
# 進化的マルチタスクのためのトランスファーの学習 Learning to Transfer for Evolutionary Multitasking ( http://arxiv.org/abs/2406.14359v2 ) ライセンス: Link先を確認	Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan,	(参考訳) 進化的マルチタスク(EMT)は、マルチタスク最適化問題(MTOP)を解決するための新しいアプローチであり、かなりの研究関心を集めている。暗黙のEMTは、進化演算子を用いてタスク間の知識伝達(KT)を可能にする重要な研究分野である。しかしながら、暗黙のEMTにおける現在のアプローチは、限られた数の進化演算子を使用し、KTを実行するための進化状態の不十分な利用のため、適応性の課題に直面している。これにより、様々なMTOPに対処する暗黙的なKTのポテンシャルを最適に活用できる。これらの制約を克服するために,MTOPの効率的なKTポリシーを自動的に発見する新しいLearning to Transfer(L2T)フレームワークを提案する。本フレームワークは,EMTプロセス内でのKT過程を,学習エージェントの戦略決定のシーケンスとして概念化する。本稿では、いつ、どのように転送するかを決定するためのアクション定式化、進化状態の情報的特徴を持つ状態表現、収束と転送効率向上に関する報酬定式化、MTOPと対話するエージェントの環境を提案する。我々はエージェントにアクター・クリティカル・ネットワーク構造を採用し、近似ポリシー最適化により学習する。この学習されたエージェントは、様々な進化的アルゴリズムと統合することができ、新しいMTOPに対処する能力を高めることができる。提案したL2Tフレームワークを検証するために,多種多様なタスク間関係,機能クラス,タスク分布を含む,総合的および実世界のMTOPに関する実証的研究を行った。その結果,未確認MTOPの広帯域化にともなう暗黙EMTの適応性と性能は顕著に向上した。 Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs.	翻訳日:2024-06-25 13:26:35 公開日:2024-06-22
# 第2回eXplainable AI for the Arts(XAIxArts)国際ワークショップの開催報告 Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts) ( http://arxiv.org/abs/2406.14485v2 ) ライセンス: Link先を確認	Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zijin Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni,	(参考訳) この第2回説明可能なAI for the Arts(XAIxArts)に関する国際ワークショップは、HCI、インタラクションデザイン、AI、説明可能なAI(XAI)、デジタルアートの研究者のコミュニティを集めて、XAI for the Artsの役割を探求した。第16回 ACM Conference on Creativity and Cognition (C&C 2024) でワークショップを開催した。 This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.	翻訳日:2024-06-25 13:16:50 公開日:2024-06-22
# 現代のDRAMチップにおけるRowHammerとRowPress読み取り障害の併用による評価 An Experimental Characterization of Combined RowHammer and RowPress Read Disturbance in Modern DRAM Chips ( http://arxiv.org/abs/2406.13080v2 ) ライセンス: Link先を確認	Haocong Luo, Ismail Emir Yüksel, Ataberk Olgun, A. Giray Yağlıkçı, Mohammad Sadrosadati, Onur Mutlu,	(参考訳) DRAM読み取り障害は、システムの堅牢性(信頼性、セキュリティ、安全性)を保証する基本的な特性であるメモリアイソレーションを壊す可能性がある。 RowHammerとRowPressは2つの異なるDRAM読み取り障害現象である。 RowHammerは、攻撃者DRAM行を何度も開いて閉じることで、物理的に隣接する被害者DRAM行のビットフリップを誘導する一方、RowPressは攻撃者DRAM行を長時間開いておくことでビットフリップを誘導する。本研究では,RowHammerとRowPressを組み合わせたDRAMアクセスパターンを,3大DRAMメーカすべてから84個の実DDR4 DRAMチップで特徴付ける。私たちの重要な結果は 1) このRowHammerパターンとRowPressパターンの組み合わせは、最先端のRowPressパターンと比較して、最初のビットフリップを誘導するために、はるかに少ない時間(最大46.1%高速)を要する。 2) 最小アグレシタ行のアクティベーションカウントで少なくとも1ビットフリップを誘導すると、フリップするビットはRowHammer、RowPress、および組み合わせパターンによって異なる。その結果,両面パターンの2つの攻撃行のうちの1つからRowPressが引き起こした読み出し障害効果が,他方よりもはるかに重要であるという重要な仮説が得られた。 DRAM read disturbance can break memory isolation, a fundamental property to ensure system robustness (i.e., reliability, security, safety). RowHammer and RowPress are two different DRAM read disturbance phenomena. RowHammer induces bitflips in physically adjacent victim DRAM rows by repeatedly opening and closing an aggressor DRAM row, while RowPress induces bitflips by keeping an aggressor DRAM row open for a long period of time. In this study, we characterize a DRAM access pattern that combines RowHammer and RowPress in 84 real DDR4 DRAM chips from all three major DRAM manufacturers. Our key results show that 1) this combined RowHammer and RowPress pattern takes significantly smaller amount of time (up to 46.1% faster) to induce the first bitflip compared to the state-of-the-art RowPress pattern, and 2) at the minimum aggressor row activation count to induce at least one bitflip, the bits that flip are different across RowHammer, RowPress, and the combined patterns. Based on our results, we provide a key hypothesis that the read disturbance effect caused by RowPress from one of the two aggressor rows in a double-sided pattern is much more significant than the other.	翻訳日:2024-06-25 11:16:10 公開日:2024-06-22
# パラメータ推定問題に対する深部最適実験設計 Deep Optimal Experimental Design for Parameter Estimation Problems ( http://arxiv.org/abs/2406.14003v2 ) ライセンス: Link先を確認	Md Shahriar Rahim Siddiqui, Arman Rahmim, Eldad Haber,	(参考訳) 最適実験設計は応用科学と工学の分野でよく研究されている分野である。このような設計を推定する手法は、パラメータ推定の枠組みの中で一般的に用いられる。しかし,近年,従来の推定手法を代替するディープラーニング技術の導入に伴い,パラメータ推定手法が急速に変化している。これは、これらの新しい技術に関連する最適な実験設計の適応を必要とする。本稿では,ディープラーニングを用いた新しい実験設計手法について検討する。ネットワークを「いいね!」自由推定器としてトレーニングすることで、設計プロセスを大幅に単純化し、非線形システムに対する最適実験設計に固有の計算コストの高い二段階最適化問題を回避することができることを示す。さらに,パラメータ推定問題に対する回収プロセスの品質も向上する。概念実証として、我々の方法論を通常の微分方程式の2つの異なる系に適用する。 Optimal experimental design is a well studied field in applied science and engineering. Techniques for estimating such a design are commonly used within the framework of parameter estimation. Nonetheless, in recent years parameter estimation techniques are changing rapidly with the introduction of deep learning techniques to replace traditional estimation methods. This in turn requires the adaptation of optimal experimental design that is associated with these new techniques. In this paper we investigate a new experimental design methodology that uses deep learning. We show that the training of a network as a Likelihood Free Estimator can be used to significantly simplify the design process and circumvent the need for the computationally expensive bi-level optimization problem that is inherent in optimal experimental design for non-linear systems. Furthermore, deep design improves the quality of the recovery process for parameter estimation problems. As proof of concept we apply our methodology to two different systems of Ordinary Differential Equations.	翻訳日:2024-06-25 11:16:10 公開日:2024-06-22

Title

Authors

Abstract

論文公表日・翻訳日

# 論理的知識追跡モデルにおける意図的要因とスペーシングを統合して学習順序がカテゴリー学習に与える影響を探索する

Integrating Attentional Factors and Spacing in Logistic Knowledge Tracing Models to Explore the Impact of Training Sequences on Category Learning ( http://arxiv.org/abs/2407.15020v1 )

ライセンス: Link先を確認

Meng Cao, Philip I. Pavlik Jr., Wei Chu, Liang Zhang,

(参考訳) カテゴリー学習では、ブロックとは対照的に、インターリービングの影響を探求する文献が増えている。逐次的注意仮説は、インターリービングはカテゴリ間の差異に注意を向け、ブロックはカテゴリ内の類似性に注意を向ける、という仮説である。近年の研究では、記憶と注意要素の協調的影響がシークエンシング効果に与える影響を浮き彫りにしているが、学生のパフォーマンスに与える影響を総合的に理解するために、注意と記憶の両方を統合した効果的な計算モデルが不足している。本研究は,留学生の学習シーケンス(インターリービングとブロッキング)におけるパフォーマンスを監視するために,注目要因の新たな統合とロジスティック・ナレッジ・トレース(LKT)モデルへのスペーシングを提案する。目的因子は, 同一又は異なるカテゴリーに属するか否かを考慮し, 隣接臨床試験の比較回数を記録することで構成した。時間間隔を考慮するためにいくつかの特徴が採用された。モデルの適合性テストや,学習セッションやテスト後の予測にクロスバリデーションを使用した。その結果,AFM(Additive Factors Model)に注意要素とスペーシング特徴を組み込むことによって,インターリーブとブロッキングの効果を捉える能力が著しく向上し,学生の学習結果の予測精度が向上することが判明した。注意要因と記憶過程のギャップを埋めることで、我々の計算手法は、教育環境におけるカテゴリ学習の結果を理解し予測するためのより包括的なフレームワークを提供する。

In category learning, a growing body of literature has increasingly focused on exploring the impacts of interleaving in contrast to blocking. The sequential attention hypothesis posits that interleaving draws attention to the differences between categories while blocking directs attention toward similarities within categories. Although a recent study underscores the joint influence of memory and attentional factors on sequencing effects, there remains a scarcity of effective computational models integrating both attentional and memory considerations to comprehensively understand the effect of training sequences on students' performance. This study introduces a novel integration of attentional factors and spacing into the logistic knowledge tracing (LKT) models to monitor students' performance across different training sequences (interleaving and blocking). Attentional factors were incorporated by recording the counts of comparisons between adjacent trials, considering whether they belong to the same or different category. Several features were employed to account for temporal spacing. We used cross-validations to test the model fit and predictions on the learning session and posttest. Our findings reveal that incorporating both attentional factors and spacing features in the Additive Factors Model (AFM) significantly enhances its capacity to capture the effects of interleaving and blocking and demonstrates superior predictive accuracy for students' learning outcomes. By bridging the gap between attentional factors and memory processes, our computational approach offers a more comprehensive framework for understanding and predicting category learning outcomes in educational settings.

翻訳日:2024-07-28 18:39:09 公開日:2024-06-22

# AI適応の規制:AI医療機器のアップデートの分析

Regulating AI Adaptation: An Analysis of AI Medical Device Updates ( http://arxiv.org/abs/2407.16900v1 )

ライセンス: Link先を確認

Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou,

(参考訳) 近年、AIの開発ペースは急速に進歩しているが、安全で効果的な規制フレームワークの実装は遅れを取っている。特に、AIモデルの適応性は、モデルのアップデートによってパフォーマンスが向上するだけでなく、安全性のリスクも伴うため、規制当局に固有の課題をもたらす。米国では、食品医薬品局(FDA)が、何百ものAI医療機器の規制と承認の先駆者だ。 AIの更新方法とその規制上の配慮をより深く理解するために、FDAが承認したAI医療機器の更新頻度と性質を体系的に分析する。新しいデータで再トレーニングした結果、全デバイスレポートの2%未満が更新されていることがわかった。一方、デバイスの4分の1近くが、新しい機能とマーケティングクレームの形でアップデートを報告している。実験例では, 気胸検出モデルの解析を行い, 新たな部位で評価した場合, モデル性能は0.18AUCまで低下するが, サイト固有のデータによる再トレーニングは, この性能低下を軽減し, 0.23AUCまで回復できることがわかった。しかし,新たなサイトからのデータを用いて再トレーニングを行った結果,元のサイトが著しく劣化していることも確認できた。私たちの分析は、FDAが承認したAIデバイスのアップデートの状況を詳細に分析し、モデル更新と適応AIに対する将来の規制ポリシーに関する洞察を提供します。

While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner in regulating and approving hundreds of AI medical devices. To better understand how AI is updated and its regulatory considerations, we systematically analyze the frequency and nature of updates in FDA-approved AI medical devices. We find that less than 2% of all devices report having been updated by being re-trained on new data. Meanwhile, nearly a quarter of devices report updates in the form of new functionality and marketing claims. As an illustrative case study, we analyze pneumothorax detection models and find that while model performance can degrade by as much as 0.18 AUC when evaluated on new sites, re-training on site-specific data can mitigate this performance drop, recovering up to 0.23 AUC. However, we also observed significant degradation on the original site after re-training using data from new sites, providing insight from one example that challenges the current one-model-fits-all approach to regulatory approvals. Our analysis provides an in-depth look at the current state of FDA-approved AI device updates and insights for future regulatory policies toward model updating and adaptive AI.

翻訳日:2024-07-28 18:19:29 公開日:2024-06-22

# 5G対応モノのインターネットのための適応型ディジタルツインと通信効率の良いフェデレーション学習ネットワークスライシング

Adaptive Digital Twin and Communication-Efficient Federated Learning Network Slicing for 5G-enabled Internet of Things ( http://arxiv.org/abs/2407.10987v1 )

ライセンス: Link先を確認

Daniel Ayepah-Mensah, Guolin Sun, Yu Pang, Wei Jiang,

(参考訳) ネットワークスライシングは、ネットワークリソースの効率的な使用と管理を通じて、増大する要求を満たすために、マルチサービスとリソース要求を区別した産業用IoTネットワークを可能にする。通常、ネットワークスライスオーケストレータは、各スライスに対する需要予測に依存して、情報的決定を行い、リソース利用を最大化する。次世代のIndustry 4.0は、物理システムを正確な意思決定のためにデジタルモデルにマッピングするデジタルツインを導入した。提案手法では,まずグラフアテンションネットワークを用いて,ネットワークスライスのためのディジタルツイン環境を構築し,リアルタイムトラフィック分析,監視,需要予測を実現する。これらの予測に基づいて、資源配分問題を連合型多エージェント強化学習問題として定式化し、資源配分政策を決定するために、スライスのプライバシを保ちながら、深い決定論的政策勾配を用いる。提案手法は,ネットワークスライスに対する需要予測の精度を向上し,動的ネットワークスライシングの通信オーバーヘッドを低減できることを示す。

Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has introduced digital twins to map physical systems to digital models for accurate decision-making. In our approach, we first use graph-attention networks to build a digital twin environment for network slices, enabling real-time traffic analysis, monitoring, and demand forecasting. Based on these predictions, we formulate the resource allocation problem as a federated multi-agent reinforcement learning problem and employ a deep deterministic policy gradient to determine the resource allocation policy while preserving the privacy of the slices. Our results demonstrate that the proposed approaches can improve the accuracy of demand prediction for network slices and reduce the communication overhead of dynamic network slicing.

翻訳日:2024-07-22 12:39:32 公開日:2024-06-22

# LOGIC-LM++:シンボリックな定式化のためのマルチステップリファインメント

LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations ( http://arxiv.org/abs/2407.02514v1 )

ライセンス: Link先を確認

Shashank Kirtania, Priyanshu Gupta, Arjun Radhakirshna,

(参考訳) 本稿では,複雑な推論タスクに対するLarge Language Models(LLM)の限界について検討する。現在のアプローチでは、形式言語を推論問題の中間表現として活用しているが、中間形式仕様の生成とそれらの表現の修正に苦慮している。そこで本研究では,Logic-LM++の改良であるLogic-LM++を提案する。 LLMの機能をペアで比較し、LLMが提案する改善点の評価を可能にする。本稿では、Logic-LM++が、FOLIOとAR-LSATという2つのデータセット上の自然言語推論タスクにおいて、Logic-LMとLLMに基づく技術より優れていることを示す。 Logic-LM++は、標準のプロンプトで13.5%、思考の連鎖で11%、Logic-LMで5%の平均的な改善を示している。

In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. While current approaches leverage formal languages as intermediate representation of reasoning problems, they struggle with generating intermediate formal specifications and refining these representations. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and LLM based techniques on natural language reasoning tasks on two datasets, FOLIO and AR-LSAT. Logic-LM++ show an average improvement of 13.5% on standard prompting, 11% on chain of thought prompting and 5% on Logic-LM.

翻訳日:2024-07-07 13:14:55 公開日:2024-06-22

# ビッツ2に基づくマルチスピーカ多言語音声クローニングシステム

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge ( http://arxiv.org/abs/2406.17801v1 )

ライセンス: Link先を確認

Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu,

(参考訳) 本稿では,トラック2を中心に,LIMMITS'24チャレンジのための音声合成システムの開発について述べる。この課題の目的は、男性と女性の両方の話者で7つのインド語をカバーし、音声クローニング機能を備えた多言語多言語Indic Text-to-Speechシステムを確立することである。このシステムは、課題データを用いて訓練され、ターゲットスピーカー上で数発の音声クローンを行うための微調整が行われた。評価には、自然性および話者類似性を評価する主観的テストを含む、7言語すべてにわたる単言語合成と多言語合成の両方が含まれていた。本システムは,多言語IDとBERTモデルで拡張したVITS2アーキテクチャを用いて,文脈言語理解を強化する。追加のデータ使用が許可されていないトラック1では、私たちのモデルは話者類似度スコア4.02を達成しました。追加データの使用を可能にするトラック2では、話者類似度スコアが4.17に達した。

This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers. Evaluation included both mono-lingual and cross-lingual synthesis across all seven languages, with subjective tests assessing naturalness and speaker similarity. Our system uses the VITS2 architecture, augmented with a multi-lingual ID and a BERT model to enhance contextual language comprehension. In Track 1, where no additional data usage was permitted, our model achieved a Speaker Similarity score of 4.02. In Track 2, which allowed the use of extra data, it attained a Speaker Similarity score of 4.17.

翻訳日:2024-06-27 17:46:26 公開日:2024-06-22

# 大規模言語モデルのパーソナライズにおけるユーザプロファイルの役割の理解

Understanding the Role of User Profile in the Personalization of Large Language Models ( http://arxiv.org/abs/2406.17803v1 )

ライセンス: Link先を確認

Bin Wu, Zhengyan Shi, Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz,

(参考訳) LLM(Large Language Models)をパーソナライズするためにユーザプロファイルを利用することで、幅広いタスクのパフォーマンスを向上させることが示されている。しかし,LLMにおけるユーザプロファイルの正確な役割とその効果メカニズムは未だ不明である。本研究はまず,ユーザプロファイルの有効性が主に意味情報よりもパーソナライズ情報によるものであることを確認した。さらに,ユーザプロファイルがLLMのパーソナライズにどのように影響するかを検討する。ユーザプロファイル内では,LDMをパーソナライズする上で重要な役割を果たすユーザによって作成された,あるいは承認された歴史的パーソナライズされた応答が明らかにされている。この発見はLLMの可能性を解き明かし、限られた入力長の制約により多くのユーザプロファイルを組み込む。ユーザプロファイルの位置については、入力コンテキストの異なる位置に統合されたユーザプロファイルがパーソナライズに等しく寄与しないことが観察される。代わりに、始めに近いユーザープロファイルがLLMのパーソナライズにより多くの影響を及ぼす。本研究は, LLMのパーソナライズにおけるユーザプロファイルの役割を明らかにし, ユーザプロファイルを組み込むことが, ユーザプロファイルを効果的に活用するための洞察を提供する性能に与える影響を明らかにする。

Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.

翻訳日:2024-06-27 17:46:26 公開日:2024-06-22

# 低磁場可搬型MRIスキャナの電磁界除去法の検討

A Review of Electromagnetic Elimination Methods for low-field portable MRI scanner ( http://arxiv.org/abs/2406.17804v1 )

ライセンス: Link先を確認

Wanyu Bian,

(参考訳) 本稿では,MRIシステムにおける電磁干渉(EMI)を除去するための従来法と深層学習法の両方を包括的に分析する。我々は、最先端のディープラーニングアプローチと同様に、従来の分析的および適応的なEMI除去技術の基礎となる原則と実装について検討する。詳細な比較によって,各手法の強度と限界が強調される。複数の外部EMI受信コイルと解析技術を用いたアクティブEMI除去の最近の進歩は、広範囲なMRIデータに基づいてトレーニングされたニューラルネットワークを利用するディープラーニング手法の優れた性能と並行して議論されている。深層学習手法は、EMIの抑制、診断能力の向上、MRI技術のアクセシビリティ向上など、大幅な改善を示す一方で、特にプロダクションおよび商用アプリケーションにおいて、潜在的なセキュリティと安全性の懸念も導入している。本研究は、EMI除去におけるディープラーニングのメリットを十分に実現するために、これらの課題に対処する必要があることを明らかにする。この結果は,従来の手法の信頼性とディープラーニングの高度な能力を組み合わせることで,MRIシステムにおけるより堅牢で効果的なEMI抑制戦略を開発するためのバランスのとれたアプローチを示唆している。

This paper presents a comprehensive analysis of both conventional and deep learning methods for eliminating electromagnetic interference (EMI) in MRI systems. We explore the underlying principles and implementation of traditional analytical and adaptive EMI elimination techniques, as well as cutting-edge deep learning approaches. Through a detailed comparison, the strengths and limitations of each method are highlighted. Recent advancements in active EMI elimination utilizing multiple external EMI receiver coils and analytical techniques are discussed alongside the superior performance of deep learning methods, which leverage neural networks trained on extensive MRI data. While deep learning methods demonstrate significant improvements in EMI suppression, enhancing diagnostic capabilities and accessibility of MRI technology, they also introduce potential security and safety concerns, especially in production and commercial applications. This study underscores the need to address these challenges to fully realize the benefits of deep learning in EMI elimination. The findings suggest a balanced approach, combining the reliability of conventional methods with the advanced capabilities of deep learning, to develop more robust and effective EMI suppression strategies in MRI systems.

翻訳日:2024-06-27 17:46:26 公開日:2024-06-22

# LLMはデータレスプロンプトで可視化を生成することができるか?

Can LLMs Generate Visualizations with Dataless Prompts? ( http://arxiv.org/abs/2406.17805v1 )

ライセンス: Link先を確認

Darius Coelho, Harshit Barot, Naitik Rathod, Klaus Mueller,

(参考訳) 大規模言語モデルの最近の進歩は情報アクセスに革命をもたらし、これらのモデルはWeb上で利用可能なデータを利用して複雑なクエリに対処し、多くのユーザにとって好まれる情報ソースとなっている。場合によっては、クエリは公開データに関するもので、データビジュアライゼーションによって効果的に答えられる。本稿では,このようなクエリに応答して,大規模言語モデルが正確なデータと関連する視覚化を提供する能力について検討する。具体的には,GPT-3 と GPT-4 によるデータレスプロンプトによる可視化機能について検討する。モデルの結果を,可視化の専門家が作成した浮き彫りシートと比較することで評価する。

Recent advancements in large language models have revolutionized information access, as these models harness data available on the web to address complex queries, becoming the preferred information source for many users. In certain cases, queries are about publicly available data, which can be effectively answered with data visualizations. In this paper, we investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries. Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query. We evaluate the results of the models by comparing them to visualization cheat sheets created by visualization experts.

翻訳日:2024-06-27 17:46:26 公開日:2024-06-22

# MOSSBench: マルチモーダル言語モデルは安全なクエリに過敏か?

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? ( http://arxiv.org/abs/2406.17806v1 )

ライセンス: Link先を確認

Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh,

(参考訳) 人間は認知の歪みに悩まされがちで、特定の刺激に対する過大な反応を引き起こす偏見のある思考パターンは、状況によって異なる。本稿では,高度マルチモーダル言語モデル (MLLM) が同様の傾向を示すことを示す。これらのモデルは、安全メカニズムの下でクエリに応答するように設計されているが、特定の視覚刺激が存在する場合の無害なクエリを拒否することがある。この行動を調査する最初のステップとして、既存のMLLMの過敏性を引き起こす3つの種類の刺激を同定する。これらの刺激に対するMLLMの過敏度を体系的に評価するために,Multimodal OverSenSitivity Benchmark (MOSSBench)を提案する。このツールキットは300個の手作業で収集された良質なマルチモーダルクエリで構成されており、サードパーティのレビュアー(AMT)によって相互に検証されている。 20個のMLLM上でのMOSSBenchを用いた実証研究により,いくつかの知見が得られた。 SOTA MLLMでは過敏性が一般的であり、無害なクエリに対して最大76%の拒絶率に達する。 (2)。安全性の増大は、モデルの応答において不注意と保守性を必然的に引き起こす可能性がある。 (3)。 MLLMの反応過程において、異なる種類の刺激が特定の段階(知覚、意図的推論、安全判断)でエラーを引き起こす傾向がある。これらの知見は、文脈的に適切な応答に注意を払い、現実のアプリケーションにおけるMLLMの信頼性を向上させるための、洗練された安全メカニズムの必要性を強調している。私たちのプロジェクトはhttps://turningpoint-ai.github.io/MOSSBench/で公開しています。

Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts. As the initial step in investigating this behavior, we identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT). Empirical studies using MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages -- perception, intent reasoning, and safety judgement -- in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications. We make our project available at https://turningpoint-ai.github.io/MOSSBench/.

翻訳日:2024-06-27 17:46:26 公開日:2024-06-22

# 平均場状態におけるニューラルネットによる関数近似:エントロピー正則化とマッキーン・ブラソフダイナミクスの制御

Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics ( http://arxiv.org/abs/2002.01987v4 )

ライセンス: Link先を確認

Belinda Tzen, Maxim Raginsky,

(参考訳) クルバック・リーブラーの発散という意味では、「ほぼガウス的」なランダム重みを持つ2層ニューラルネットによる関数近似の問題を考察する。我々の設定は平均場限界であり、隠れた層のニューロンの有限個体群は連続的なアンサンブルに置き換えられる。この問題は、重みの確率測度よりも(有限長の)経路の空間上で機能する自由エネルギーの大域的最小化と表現できる。この関数は、前の等方的ブラウン運動に関して経路のKL発散に対して終端測度の$L^2$近似リスクを負う。我々は、一意な大域最小化器を特徴づけ、それを達成できる重みよりも確率測度の空間における力学を考察する。特に、最適経路空間測度は、古典的なシュリンガー橋問題と密接に関連するマッキーン・ヴラソフ最適制御問題の解であるF\"ollmer drift"に対応することを示す。 F\'ollmer のドリフトは一般に閉形式では得られないため、その潜在的なアルゴリズム的有用性は制限されるが、エントロピー正則化の様々な条件下での有限時間近似として平均場ランゲヴィン拡散が実現可能であることを示す。具体的には、正則化が最小化密度が対数凹凸であるような場合、F\"ollmer"ドリフトを密に追跡することを示す。

We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the $L^2$ approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the F\"ollmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schr\"odinger bridge problem. While the F\"ollmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the F\"ollmer drift when the regularization is such that the minimizing density is log-concave.

翻訳日:2024-06-26 23:34:57 公開日:2024-06-22

# MetaGreen: グリーンセマンティックコミュニケーションのためのメタラーニングインスパイアされたトランスフォーマー選択

MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication ( http://arxiv.org/abs/2406.16962v1 )

ライセンス: Link先を確認

Shubhabrata Mukherjee, Cory Beard, Sejun Song,

(参考訳) セマンティックコミュニケーションは、情報伝達の方法を変え、個々のシンボルやビットに対して有意義で効果的なコンテンツを優先することができる。この進化は、レイテンシの削減、帯域使用量の削減、従来の通信に比べてスループットの向上など、大きなメリットをもたらす。しかし、セマンティックコミュニケーションの発展は、意味情報損失とエネルギー消費の合同効果をベンチマークするための普遍的な指標の必要性という、重要な課題に直面している。本研究では,「エネルギー最適化セマンティックロス」関数(EOSL)を導入し,セマンティック情報損失とエネルギー消費を効果的にバランスさせる新しい多目的ロス関数を提案する。エネルギーベンチマークを含む変圧器モデルに関する総合的な実験を通じて、EOSLモデル選択の顕著な効果を実証する。 EOSLベースのトランスフォーマーモデル選択はBLEUのスコアベース選択と比較して最大で83%の類似度/パワー比(SPR)と67倍のSPRを達成できることを確認した。さらに,EOSLの適用性は,メタラーニングの原則に触発されて,多様で多様なコンテキストに拡張する。 EOSLを累積的に適用することにより、モデル選択システムがこの変化に適応できるようにし、歴史的EOSL値を利用して学習プロセスをガイドする。この研究は、エネルギー効率の良いモデル選択とグリーンセマンティックコミュニケーションの発展の基礎を築いた。

Semantic Communication can transform the way we transmit information, prioritizing meaningful and effective content over individual symbols or bits. This evolution promises significant benefits, including reduced latency, lower bandwidth usage, and higher throughput compared to traditional communication. However, the development of Semantic Communication faces a crucial challenge: the need for universal metrics to benchmark the joint effects of semantic information loss and energy consumption. This research introduces an innovative solution: the ``Energy-Optimized Semantic Loss'' (EOSL) function, a novel multi-objective loss function that effectively balances semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including energy benchmarking, we demonstrate the remarkable effectiveness of EOSL-based model selection. We have established that EOSL-based transformer model selection achieves up to 83\% better similarity-to-power ratio (SPR) compared to BLEU score-based selection and 67\% better SPR compared to solely lowest power usage-based selection. Furthermore, we extend the applicability of EOSL to diverse and varying contexts, inspired by the principles of Meta-Learning. By cumulatively applying EOSL, we enable the model selection system to adapt to this change, leveraging historical EOSL values to guide the learning process. This work lays the foundation for energy-efficient model selection and the development of green semantic communication.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# グラフニューラルネットワークに対するリンクステアリング攻撃のための大規模言語モデル

Large Language Models for Link Stealing Attacks Against Graph Neural Networks ( http://arxiv.org/abs/2406.16963v1 )

ライセンス: Link先を確認

Faqian Guan, Tianqing Zhu, Hui Sun, Wanlei Zhou, Philip S. Yu,

(参考訳) グラフデータには、引用ネットワークやレコメンデーションシステムなど、さまざまな領域に適用される、豊富なノードの特徴とユニークなエッジ情報が含まれている。グラフニューラルネットワーク(GNN)はそのようなデータを扱うのに特化しており、多くのアプリケーションで顕著なパフォーマンスを示している。しかし、GNNには機密情報が含まれており、プライバシー攻撃の影響を受けやすい可能性がある。例えば、リンク盗難は、攻撃者が2つのノードがリンクされているかどうかを推測する攻撃の一種である。以前のリンク盗難攻撃は、主にターゲットのGNNモデルからの後方確率に依存しており、ノードの特徴の重要性を無視していた。さらに、異なるデータセットにわたるノードクラスの変動は、後続確率の異なる次元をもたらす。これらのさまざまなデータ次元の扱いは、単一のモデルを使用して異なるデータセットに対するリンク盗難攻撃を効果的に実行する上で、課題となった。これらの課題に対処するため、GNNに対するリンク盗難攻撃を行うために、LLM(Large Language Models)を導入する。 LLMはテキスト機能を効果的に統合し、強力な一般化性を示し、様々なデータセットにわたる多様なデータ次元を攻撃で処理することができる。グラフノードのテキスト特徴と後部確率を効果的に組み合わせる2つの異なるLCMプロンプトを設計する。これらのプロンプトを通じて、リンク盗難攻撃タスクに対応するためにLLMを微調整する。さらに,複数のデータセットを用いてLLMを微調整し,異なるデータセットからの特徴を同時に学習することを可能にする。実験の結果,提案手法は,ホワイトボックスとブラックボックスの両方のシナリオにおいて,既存のリンク盗難攻撃タスクの性能を大幅に向上させることが示された。提案手法は,1つのモデルのみを用いて,異なるデータセットをまたいだリンク盗難攻撃を実行し,リンク盗難攻撃をより現実のシナリオに適用することができる。

Graph data contains rich node features and unique edge information, which have been applied across various domains, such as citation networks or recommendation systems. Graph Neural Networks (GNNs) are specialized for handling such data and have shown impressive performance in many applications. However, GNNs may contain of sensitive information and susceptible to privacy attacks. For example, link stealing is a type of attack in which attackers infer whether two nodes are linked or not. Previous link stealing attacks primarily relied on posterior probabilities from the target GNN model, neglecting the significance of node features. Additionally, variations in node classes across different datasets lead to different dimensions of posterior probabilities. The handling of these varying data dimensions posed a challenge in using a single model to effectively conduct link stealing attacks on different datasets. To address these challenges, we introduce Large Language Models (LLMs) to perform link stealing attacks on GNNs. LLMs can effectively integrate textual features and exhibit strong generalizability, enabling attacks to handle diverse data dimensions across various datasets. We design two distinct LLM prompts to effectively combine textual features and posterior probabilities of graph nodes. Through these designed prompts, we fine-tune the LLM to adapt to the link stealing attack task. Furthermore, we fine-tune the LLM using multiple datasets and enable the LLM to learn features from different datasets simultaneously. Experimental results show that our approach significantly enhances the performance of existing link stealing attack tasks in both white-box and black-box scenarios. Our method can execute link stealing attacks across different datasets using only a single model, making link stealing attacks more applicable to real-world scenarios.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# 言語モデルは時系列予測に本当に有用か?

Are Language Models Actually Useful for Time Series Forecasting? ( http://arxiv.org/abs/2406.16964v1 )

ライセンス: Link先を確認

Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, Thomas Hartvigsen,

(参考訳) 大規模言語モデル(LLM)は時系列タスク、特に時系列予測に応用されている。しかし、言語モデルは実際に時系列に役立ちますか? 最近の3つのLCMに基づく時系列予測手法に関する一連のアブレーション研究の結果、LCM成分を除去したり、基本的な注意層に置き換えたりしても、予測結果を劣化させることはないことが判明した。また,計算コストが非常に高いにもかかわらず,事前学習したLLMは,スクラッチからトレーニングしたモデルに劣らず,時系列の逐次的依存関係を表現せず,数ショット設定を補助しないことがわかった。さらに、時系列エンコーダを探索し、パッチとアテンション構造が最先端のLSMベースの予測器と同様に機能することを明らかにする。

Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results -- in most cases the results even improved. We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and reveal that patching and attention structures perform similarly to state-of-the-art LLM-based forecasters.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# 再生可能エネルギー分野におけるAIの現状と将来 : 総合調査

Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey ( http://arxiv.org/abs/2406.16965v1 )

ライセンス: Link先を確認

Abdur Rashid, Parag Biswas, Angona Biswas, MD Abdullah Al Nasim, Kishor Datta Gupta, Roy George,

(参考訳) 人工知能(AI)は、近年のデジタル化の結果として、電力システムを含む様々な産業におけるプロセスの合理化に欠かせない手段となっている。人工知能のアルゴリズムは、統計学習理論に基づくデータ駆動モデルであり、電力システムとそのユーザが生成するデータを利用するためのツールとして使用される。当初我々は,再生可能エネルギー(RE)に関連する人工知能(AI)応用の詳細な文献分析を行った。次に、再生可能エネルギー工場の徹底的な分析を行い、その適合性と、最も広く使われている適切なAIアルゴリズムのリストを示す。現代の電力システムの再生可能エネルギー(RE)を支援するため、9つのAIベースの戦略がここで特定されている。本研究は、再生可能エネルギーに使用されるいくつかのAI技術と、再生可能エネルギーの異なる分野にわたる様々なインテリジェントシステム応用ドメインの研究のための文献の方法論的分析を含む。本研究は,9種類の研究手法の性能と成果を評価し,その強度と限界に関する貴重な知見を抽出することを目的としている。この研究は、再生可能エネルギー生成にAI技術を使用すること、再生可能エネルギー予測にAIを活用すること、エネルギーシステムの最適化という3つの主要なトピックについても論じている。さらに、制御性、データハンドリング、サイバーアタック防止、スマートグリッド実装、ロボティクスといった従来のモデルよりもAIの方が、エネルギー産業の未来を形作る上で重要であることを探求した。さらに、再生可能エネルギーのためのAIの統合における今後の方向性について概説する。

Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we perform a thorough literature analysis of artificial intelligence (AI) applications related to renewable energy (RE). Next, we present a thorough analysis of renewable energy factories and assess their suitability, along with a list of the most widely used and appropriate AI algorithms. Nine AI-based strategies are identified here to assist Renewable Energy (RE) in contemporary power systems. This survey paper comprises an extensive review of the several AI techniques used for renewable energy as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of renewable energy. This literature review identifies the performance and outcomes of nine different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. This study also addressed three main topics: using AI technology for renewable power generation, utilizing AI for renewable energy forecasting, and optimizing energy systems. Additionally, it explored AI's superiority over conventional models in controllability, data handling, cyberattack prevention, smart grid implementation, robotics- AI's significance in shaping the future of the energy industry. Furthermore, this article outlines future directions in the integration of AI for renewable energy.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# ソフトラベルを用いた合成サンプルによる雑音重畳の緩和

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels ( http://arxiv.org/abs/2406.16966v1 )

ライセンス: Link先を確認

Yangdi Lu, Wenbo He,

(参考訳) ノイズラベルは、特にクラウドソーシングやWeb検索から派生した大規模データセットにおいて、現実世界のデータセットにおいてユビキタスである。トレーニング中にノイズラベルを過度に適合させる傾向にあるため、ノイズの多いデータセットでディープニューラルネットワークをトレーニングすることは困難であり、結果として一般化性能は低下する。早期学習期間中、深層ニューラルネットワークは、誤ってラベル付けされたサンプルを記憶する前にクリーンなサンプルに適合することが観察されている。本稿では,初期学習段階における表現分布を深く掘り下げ,ノイズラベルによらず,同じカテゴリのイメージの学習表現がいっしょに集まっていることを見出した。そこで本研究では,ノイズラベルの影響を軽減するために,新しい合成サンプルを用いてモデルを訓練するフレームワークを提案する。具体的には, 原試料を最寄りの近傍に凝集させて合成試料を合成する方法を提案し, 試料当たりの損失分布から学習した混合モデルを用いて重量を算出する。極端ラベルノイズの存在下での性能を高めるため,ノイズラベルを徐々に補正することにより,ソフトターゲットを推定する。さらに, 推定したソフトターゲットは, より正確な真実ラベルの近似を導出し, 提案手法は, より分離された, 明確に有界なクラスタを持つ学習表現の優れた品質が得られることを示した。 2つのベンチマーク(CIFAR-10とCIFAR-100)と2つの大規模実世界のデータセット(Clothing1MとWebvision)の広範な実験により、我々のアプローチは、最先端の手法と学習表現の堅牢性より優れていることが示された。

Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to fit the clean samples before memorizing the mislabeled samples. In this paper, we dig deeper into the representation distributions in the early learning phase and find that, regardless of their noisy labels, learned representations of images from the same category still congregate together. Inspired by it, we propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels. Specifically, we propose a mixing strategy to create the synthetic samples by aggregating original samples with their top-K nearest neighbours, wherein the weights are calculated using a mixture model learning from the per-sample loss distribution. To enhance the performance in the presence of extreme label noise, we estimate the soft targets by gradually correcting the noisy labels. Furthermore, we demonstrate that the estimated soft targets yield a more accurate approximation to ground truth labels and the proposed method produces a superior quality of learned representations with more separated and clearly bounded clusters. The extensive experiments in two benchmarks (CIFAR-10 and CIFAR-100) and two larg-scale real-world datasets (Clothing1M and Webvision) demonstrate that our approach outperforms the state-of-the-art methods and robustness of the learned representation.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# 抑うつ認識のためのマルチスケールコントラストを用いたマルチモーダル生理信号表現学習

Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition ( http://arxiv.org/abs/2406.16968v1 )

ライセンス: Link先を確認

Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen,

(参考訳) 機能的近赤外分光法(fNIRS)や脳波法(EEG)などの生理的信号に基づく抑うつ認識は大きな進歩を遂げている。しかし、既存のほとんどの研究は、複雑な時空間パターンにおける同じ刺激課題の下での多モード生理的信号の相補性と意味的一貫性を無視している。本稿では,抑うつ認識のためのマルチスケールコントラストを用いたシームズアーキテクチャを用いたマルチモーダル生理学的信号表現学習フレームワークを提案する。まず、fNIRSとEEGは、時間領域データ拡張戦略に基づいて異なるが相関したデータに変換される。そして,重み共有型マルチスケール時空間畳み込みにより,fNIRSとEEGの表現を学習する時空間コントラストモジュールを設計する。さらに、刺激タスクに関連する意味表現の学習を強化するために、fNIRSとEEGの意味的類似性を最大化することを目的とした意味一貫性コントラストモジュールを提案する。公開および自己収集された多モード生理信号データセットに関する大規模な実験は、MRLMCが最先端のモデルよりも優れていることを示している。さらに,提案するフレームワークは,下流タスクをマルチモーダル時系列に転送することができる。

Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# 流れの正規化のためのフレキシブルテール

Flexible Tails for Normalizing Flows ( http://arxiv.org/abs/2406.16971v1 )

ライセンス: Link先を確認

Tennessee Hickling, Dennis Prangle,

(参考訳) 正規化フローは、単純な基底分布の変換として表される確率分布の柔軟なクラスである。標準正規化フローの制限は、密度推定と変分推論の両方に適用される重い尾を持つ分布を表す。この問題に対する一般的な解決策は、重い尾のベース分布を使用することである。例えば、Laszkiewicz et al (2022)のTAF法がある。これは、重み付き入力下でのフローの正規化などのニューラルネットワークの最適化が困難であるため、パフォーマンスが低下する可能性がある、と我々は主張する。この問題は我々の論文で実証されている。ガウス基底分布と重みを生成可能な最終変換層を用いる方法を提案する。我々はこのアプローチをテールトランスフォーメーションフロー (TTF) と呼ぶ。実験により,本手法は,特にターゲット分布が大きな寸法あるいはテールウェイトである場合,現在の手法よりも優れることが示された。

Normalizing flows are a flexible class of probability distributions, expressed as transformations of a simple base distribution. A limitation of standard normalizing flows is representing distributions with heavy tails, which arise in applications to both density estimation and variational inference. A popular current solution to this problem is to use a heavy tailed base distribution. Examples include the tail adaptive flow (TAF) methods of Laszkiewicz et al (2022). We argue this can lead to poor performance due to the difficulty of optimising neural networks, such as normalizing flows, under heavy tailed input. This problem is demonstrated in our paper. We propose an alternative: use a Gaussian base distribution and a final transformation layer which can produce heavy tails. We call this approach tail transform flow (TTF). Experimental results show this approach outperforms current methods, especially when the target distribution has large dimension or tail weight.

翻訳日:2024-06-26 19:10:10 公開日:2024-06-22

# 効率的なNASに基づく不均衡データセット処理手法

An Efficient NAS-based Approach for Handling Imbalanced Datasets ( http://arxiv.org/abs/2406.16972v1 )

ライセンス: Link先を確認

Zhiwei Yao,

(参考訳) クラス不均衡は実世界のデータ配信において一般的な問題であり、正確な分類器の訓練に悪影響を及ぼす。この問題を緩和する伝統的なアプローチは、クラス再バランス、情報伝達、表現学習の3つの主要なカテゴリに分類される。本稿では,ニューラルアーキテクチャサーチ(NAS)によるバックボーンアーキテクチャの最適化により,長い尾を持つデータセットの性能を向上させる新しい手法を提案する。我々の研究は、バランスの取れたデータセット上でのアーキテクチャの精度が、バランスの取れていないデータセットのパフォーマンスを確実に予測できないことを示している。これにより、計算コストのかかる長いデータセット上で完全なNASを実行する必要がある。 IMB-NASは、バランスの取れたソースデータセットでトレーニングされたNASスーパーネットワークを、不均衡なターゲットデータセットに効率的に適応することを提案する。本報告では, IMB-NASの基本技術について, NASやアーキテクチャ転送など, 詳細な解説を行う。様々な適応戦略の中で、最も効果的なアプローチは、バランスの取れたソースデータセットでトレーニングされたバックボーンNASスーパーネットワークを凍結させながら、線形分類ヘッドを再重み付き損失で再訓練することである。最後に,不均衡なCIFARデータセットを用いて評価実験を行った。結論はIMB-NAS論文で提案されているものと同じである。

Class imbalance is a common issue in real-world data distributions, negatively impacting the training of accurate classifiers. Traditional approaches to mitigate this problem fall into three main categories: class re-balancing, information transfer, and representation learning. This paper introduces a novel approach to enhance performance on long-tailed datasets by optimizing the backbone architecture through neural architecture search (NAS). Our research shows that an architecture's accuracy on a balanced dataset does not reliably predict its performance on imbalanced datasets. This necessitates a complete NAS run on long-tailed datasets, which can be computationally expensive. To address this computational challenge, we focus on existing work, called IMB-NAS, which proposes efficiently adapting a NAS super-network trained on a balanced source dataset to an imbalanced target dataset. A detailed description of the fundamental techniques for IMB-NAS is provided in this paper, including NAS and architecture transfer. Among various adaptation strategies, we find that the most effective approach is to retrain the linear classification head with reweighted loss while keeping the backbone NAS super-network trained on the balanced source dataset frozen. Finally, we conducted a series of experiments on the imbalanced CIFAR dataset for performance evaluation. Our conclusions are the same as those proposed in the IMB-NAS paper.

翻訳日:2024-06-26 19:00:25 公開日:2024-06-22

# 循環行列のMDS特性に関する一考察

A note on MDS Property of Circulant Matrices ( http://arxiv.org/abs/2406.16973v1 )

ライセンス: Link先を確認

Tapas Chatterjee, Ayantika Laha,

(参考訳) 2014年、Gupta と Ray は有限体 $\mathbb{F}_{2^m}$ 上の循環不変行列が最大距離分離(MDS)できないことを証明した。この非存在は、標数 2^d \times 2^d$ の巡回直交行列(英語版)(circulant orthogonal matrices of order $2^d \times 2^d$ over finite field of characteristic $2$)にまで拡張する。これらの知見は, 実用化を念頭に, 軽量MDS行列を構築するための循環特性を一般化するきっかけとなった。最近、2022年、チャタジーとラハは半インボリュート性および半直交性を考慮した循環行列の研究を開始した。彼らの研究を拡張して、この記事は有限体 $\mathbb{F}_{2^m} 上のこれらの特性を持つ循環行列に展開する。例えば、関連する対角行列のトレースと行列のMDS特性の相関関係を確立する。この相関は、任意の順序の半直交行列とすべての順序の半インボリュートリー行列に対して成り立つことを証明している。さらに、標数 2$ の有限体上の奇数順序の循環的半直交行列に対して、関連する対角行列のトレースはゼロでない値を持つことができる。

In $2014$, Gupta and Ray proved that the circulant involutory matrices over the finite field $\mathbb{F}_{2^m}$ can not be maximum distance separable (MDS). This non-existence also extends to circulant orthogonal matrices of order $2^d \times 2^d$ over finite fields of characteristic $2$. These findings inspired many authors to generalize the circulant property for constructing lightweight MDS matrices with practical applications in mind. Recently, in $2022,$ Chatterjee and Laha initiated a study of circulant matrices by considering semi-involutory and semi-orthogonal properties. Expanding on their work, this article delves into circulant matrices possessing these characteristics over the finite field $\mathbb{F}_{2^m}.$ Notably, we establish a correlation between the trace of associated diagonal matrices and the MDS property of the matrix. We prove that this correlation holds true for even order semi-orthogonal matrices and semi-involutory matrices of all orders. Additionally, we provide examples that for circulant, semi-orthogonal matrices of odd orders over a finite field with characteristic $2$, the trace of associated diagonal matrices may possess non-zero values.

翻訳日:2024-06-26 19:00:25 公開日:2024-06-22

# SHDB-AF : 心房細動のホルター心電図データベース

SHDB-AF: a Japanese Holter ECG database of atrial fibrillation ( http://arxiv.org/abs/2406.16974v1 )

ライセンス: Link先を確認

Kenta Tsutsui, Shany Biton Brimer, Noam Ben-Moshe, Jean Marc Sellal, Julien Oster, Hitoshi Mori, Yoshifumi Ikeda, Takahide Arai, Shintaro Nakano, Ritsushi Kato, Joachim A. Behar,

(参考訳) 心房細動 (AF) は、生活の質を損なう一般的な心房不整脈であり、塞栓性脳卒中、心不全、その他の合併症を引き起こす。機械学習(ML)とディープラーニング(DL)の最近の進歩は、診断精度を高める可能性を示している。 DLモデルは、民族、年齢、性別、その他の要因において堅牢で一般化可能であることが不可欠である。多くのECGデータベースが研究コミュニティで利用可能になっているが、日本の人口サンプルは含まれていない。 Saitama Heart Database Atrial Fibrillation (SHDB-AF)は、日本発の新規オープンソースホルター心電図データベースである。 SHDB-AFのそれぞれのレコードは24時間の長さで200Hzでサンプリングされ、合計で2400万秒のECGデータである。

Atrial fibrillation (AF) is a common atrial arrhythmia that impairs quality of life and causes embolic stroke, heart failure and other complications. Recent advancements in machine learning (ML) and deep learning (DL) have shown potential for enhancing diagnostic accuracy. It is essential for DL models to be robust and generalizable across variations in ethnicity, age, sex, and other factors. Although a number of ECG database have been made available to the research community, none includes a Japanese population sample. Saitama Heart Database Atrial Fibrillation (SHDB-AF) is a novel open-sourced Holter ECG database from Japan, containing data from 100 unique patients with paroxysmal AF. Each record in SHDB-AF is 24 hours long and sampled at 200 Hz, totaling 24 million seconds of ECG data.

翻訳日:2024-06-26 19:00:25 公開日:2024-06-22

# 多出力回帰のためのエクササイズと近似等式推論

Exact and Approximate Conformal Inference for Multi-Output Regression ( http://arxiv.org/abs/2210.17405v2 )

ライセンス: Link先を確認

Chancellor Johnstone, Eugene Ndiaye,

(参考訳) 機械学習では、与えられた共変量情報$x$のレスポンスを$y$と見積もるのが一般的である。しかし、これらの予測だけでは、これらの予測に関連する不確実性は定量化されない。この欠損を克服する1つの方法は、所定の確率で観測されない応答$y$を含む集合を構成する共形推論法である。残念なことに、1次元の応答であっても、最近の奨励的な進歩にもかかわらず、共形推論は計算に高価である。本稿では、予測モデルが$y$の線形関数として記述できる場合に、共形推論の正確な導出を$p$-値で提供する多出力回帰について検討する。さらに, 線形および非線形の両方の多出力予測器に対して, 共形予測領域を効率よく近似し, 計算上の優位性を保ちながら, より効率的な方法として, textt{unionCP} と多変量拡張を提案する。また、実世界とシミュレーションデータの両方を用いて、これらの手法の有効性に関する理論的および実証的な証拠を提供する。

It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference $p$-values when the predictive model can be described as a linear function of $y$. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.

翻訳日:2024-06-26 05:28:15 公開日:2024-06-22

# 視野を超えて:Clip-recurrent Transformerによるシーンの可視性と知覚を高める

Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer ( http://arxiv.org/abs/2211.11293v3 )

ライセンス: Link先を確認

Hao Shi, Qi Jiang, Kailun Yang, Xiaoting Yin, Ze Wang, Kaiwei Wang,

(参考訳) 視覚センサーは車両、ロボット、道路インフラストラクチャーに広く応用されている。しかし、ハードウェアコストとシステムサイズに制限があるため、FoV(Field-of-View)はしばしば制限され、十分なカバレッジを提供することができない。しかし、時空間的観点からは、過去のビデオストリームからカメラの物理的FoV以外の情報を得ることができる。本稿では,車両の視野を拡大し,シーンの可視性,知覚性,システム安全性を向上するオンラインビデオインペインティングの概念を提案する。これを実現するために、光学フローを明示的に用い、特徴伝搬のための新規なクリップリカレント変換器を暗黙的に組み込んだFlowLensアーキテクチャを導入する。 FlowLensには2つの重要な機能がある。 1) FlowLensは、3Dデカップリングされたクロスアテンション(DDCA)を備えた新たに設計されたClip-Recurrent Hubを含み、時間とともに蓄積されたグローバル情報を段階的に処理する。 2)MixF3N(MixF3N)とMixF3N(MixF3N)を統合し,局所的な特徴の正確な空間フローを向上させる。トレーニングと評価を容易にするため,様々なFoVマスクを用いたKITTI360データセットを作成した。また,FoV以上のセマンティクスの定量的評価と定性比較と,FoV以外のオブジェクト検出を異なるモデルで行う。本研究では,FlowLensを用いて見えないシーンを再構成することで,信頼性の高いセマンティックコンテキストを提供することで,視野内での認識を向上することを示す。オフラインおよびオンラインビデオのインペイントを含む大規模な実験とユーザスタディ、さらにはFoV以外の知覚タスクは、FlowLensが最先端のパフォーマンスを達成することを実証している。ソースコードとデータセットはhttps://github.com/MasterHow/FlowLensで公開されている。

Vision sensors are widely applied in vehicles, robots, and roadside infrastructure. However, due to limitations in hardware cost and system size, camera Field-of-View (FoV) is often restricted and may not provide sufficient coverage. Nevertheless, from a spatiotemporal perspective, it is possible to obtain information beyond the camera's physical FoV from past video streams. In this paper, we propose the concept of online video inpainting for autonomous vehicles to expand the field of view, thereby enhancing scene visibility, perception, and system safety. To achieve this, we introduce the FlowLens architecture, which explicitly employs optical flow and implicitly incorporates a novel clip-recurrent transformer for feature propagation. FlowLens offers two key features: 1) FlowLens includes a newly designed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated over time. 2) It integrates a multi-branch Mix Fusion Feed Forward Network (MixF3N) to enhance the precise spatial flow of local features. To facilitate training and evaluation, we derive the KITTI360 dataset with various FoV mask, which covers both outer- and inner FoV expansion scenarios. We also conduct both quantitative assessments and qualitative comparisons of beyond-FoV semantics and beyond-FoV object detection across different models. We illustrate that employing FlowLens to reconstruct unseen scenes even enhances perception within the field of view by providing reliable semantic context. Extensive experiments and user studies involving offline and online video inpainting, as well as beyond-FoV perception tasks, demonstrate that FlowLens achieves state-of-the-art performance. The source code and dataset are made publicly available at https://github.com/MasterHow/FlowLens.

翻訳日:2024-06-26 05:28:15 公開日:2024-06-22

# 確率分布の近接性と$k$-wise一様性に対する付帯量子テスター

Succinct quantum testers for closeness and $k$-wise uniformity of probability distributions ( http://arxiv.org/abs/2304.12916v3 )

ライセンス: Link先を確認

Jingquan Luo, Qisheng Wang, Lvzhou Li,

(参考訳) 確率分布の近さ特性と$k$-wise均一性をテストする基本的な問題に対する潜在的な量子スピードアップについて検討する。 \textit{Closeness testing} は、2つの$n$次元分布が同一であるか、少なくとも$$\varepsilon$-far in $\ell^1$- または $\ell^2$-distance を区別する問題である。我々は、$\ell^1$-および$\ell^2$-closeness testの量子クエリ複雑性が、それぞれ$O\rbra{\sqrt{n}/\varepsilon}$と$O\rbra{1/\varepsilon}$であり、それぞれ$\varepsilon$への最適依存を達成し、 \hyperlink{cite.gilyen2019distriional}{Gily{\'e}nとLi~(2019)}の事前の最適結果を改善することを示す。 \textit{$k$-wise uniformity testing} は、$\cbra{0, 1}^n$ 上の分布が任意の$k$座標に制限された場合、またはそのような分布から$\varepsilon$-far に制限された場合、その分布が一様であるかどうかを区別する問題である。我々は、クエリ複雑性$O\rbra{\sqrt{n^k}/\varepsilon}$で、サンプル複雑性$O\rbra{n^k/\varepsilon^2}$で最先端の古典的アルゴリズムを2次高速化する。さらに、$k = 2$のとき、我々の量子アルゴリズムは古典的下界$\Omega\rbra{n/\varepsilon^2}$のために古典的よりも優れる。我々の量子アルゴリズムは、振幅推定のような基本的な量子サブルーチンのみを用いて、かなり単純で時間効率が高い。

We explore potential quantum speedups for the fundamental problem of testing the properties of closeness and $k$-wise uniformity of probability distributions. \textit{Closeness testing} is the problem of distinguishing whether two $n$-dimensional distributions are identical or at least $\varepsilon$-far in $\ell^1$- or $\ell^2$-distance. We show that the quantum query complexities for $\ell^1$- and $\ell^2$-closeness testing are $O\rbra{\sqrt{n}/\varepsilon}$ and $O\rbra{1/\varepsilon}$, respectively, both of which achieve optimal dependence on $\varepsilon$, improving the prior best results of \hyperlink{cite.gilyen2019distributional}{Gily{\'e}n and Li~(2019)}. \textit{$k$-wise uniformity testing} is the problem of distinguishing whether a distribution over $\cbra{0, 1}^n$ is uniform when restricted to any $k$ coordinates or $\varepsilon$-far from any such distributions. We propose the first quantum algorithm for this problem with query complexity $O\rbra{\sqrt{n^k}/\varepsilon}$, achieving a quadratic speedup over the state-of-the-art classical algorithm with sample complexity $O\rbra{n^k/\varepsilon^2}$ by \hyperlink{cite.o2018closeness}{O'Donnell and Zhao (2018)}. Moreover, when $k = 2$ our quantum algorithm outperforms any classical one because of the classical lower bound $\Omega\rbra{n/\varepsilon^2}$. All our quantum algorithms are fairly simple and time-efficient, using only basic quantum subroutines such as amplitude estimation.

翻訳日:2024-06-26 05:18:24 公開日:2024-06-22

# CLImage: 補完的なラベル学習のための人間アノテーションデータセット

CLImage: Human-Annotated Datasets for Complementary-Label Learning ( http://arxiv.org/abs/2305.08295v3 )

ライセンス: Link先を確認

Hsiu-Hsuan Wang, Tan-Ha Mai, Nai-Xuan Ye, Wei-I Lin, Hsuan-Tien Lin,

(参考訳) 補完ラベル学習(英:complementary-label learning, CLL)は、補完ラベルのみを用いて多クラス分類器を訓練することを目的とした、弱い教師付き学習パラダイムである。多くのアルゴリズムによるCLLの提案にもかかわらず、その実用性は2つの理由により検証されていない。第一に、これらのアルゴリズムはしばしば相補的なラベルの生成に関する仮定に依存しており、仮定が現実からどこまで遠いかは定かではない。第二に、それらの評価は合成データセットに限られている。 CLLアルゴリズムの実際の性能に関する知見を得るため,人間のアノテータから補完ラベルを収集するプロトコルを開発した。 CLCIFAR10, CLCIFAR20, CLMicroImageNet10, CLMicroImageNet20の4つのデータセットを作成した。これらのデータセットは、最初の現実世界のCLLデータセットを表している。大規模なベンチマーク実験により、合成データセットから実世界のデータセットに移行する際の顕著な性能低下が判明した。本研究は, データセットレベルのアブレーション研究により, 減少に寄与する重要な要因について検討した。本分析では, 実世界のデータセットにおいて, アノテーションノイズが最も影響のある要因として強調する。さらに,人間の注釈付き補完ラベルの偏りと,補完ラベルのみによる検証の難しさが,実用的CLLの2つの際立った障壁であることが判明した。これらの結果から,CLLアルゴリズムの開発や,雑音に頑健で相補的ラベル分布に偏った検証手法の開発に,コミュニティがより多くの研究を注いでいることが示唆された。

Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetic datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetic datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.

翻訳日:2024-06-26 05:18:24 公開日:2024-06-22

# ナノスケールにおける効率的な量子ワーク貯留層

Efficient Quantum Work Reservoirs at the Nanoscale ( http://arxiv.org/abs/2305.17815v4 )

ライセンス: Link先を確認

Jinghao Lyu, Alexander B. Boyd, James P. Crutchfield,

(参考訳) 資源理論として再編成された場合、熱力学は単発状態における系の挙動を分析することができる。この場合、状態遷移を実装するのに必要な作業は {\alpha}-Renyi の発散によって制限されるため、確率的熱力学と比較して効率的な演算の特定が異なる。したがって、確率と資源理論の熱力学の違いを詳細に理解する必要がある。この目的のために, 単発式作業貯水池の可逆性について検討し, 多段式作業貯水池に使用される2段式作業貯水池を一般化した。これにより、単発体制におけるあらゆる遷移において可逆性が達成される。そこで我々は, 触媒を使わずとも非散逸状態の多層作業貯水池を体系的に開発する。資源理論的な結果から、ランダウアーの制約下にある2段階の作業貯水池は、計算中のエネルギー散逸を誤解を招くことを示している。対照的に、多層作業貯水池はランダウアーの境界を任意に低エントロピーを発生させながら達成できることを実証する。

When reformulated as a resource theory, thermodynamics can analyze system behaviors in the single-shot regime. In this, the work required to implement state transitions is bounded by {\alpha}-Renyi divergences and so differs in identifying efficient operations compared to stochastic thermodynamics. Thus, a detailed understanding of the difference between stochastic and resource-theoretic thermodynamics is needed. To this end, we explore reversibility in the single-shot regime, generalizing the two-level work reservoirs used there to multi-level work reservoirs. This achieves reversibility in any transition in the single-shot regime. Building on this, we systematically develop multi-level work reservoirs in the nondissipation regime with and without catalysts. The resource-theoretic results show that two-level work reservoirs undershoot Landauer's bound, misleadingly implying energy dissipation during computation. In contrast, we demonstrate that multilevel work reservoirs achieve Landauer's bound while producing arbitrarily low entropy.

翻訳日:2024-06-26 05:18:24 公開日:2024-06-22

# 安定拡散噴流のエンベディングの操作

Manipulating Embeddings of Stable Diffusion Prompts ( http://arxiv.org/abs/2308.12059v2 )

ライセンス: Link先を確認

Niklas Deckers, Julia Peters, Martin Potthast,

(参考訳) プロンプトエンジニアリングは、生成した画像をターゲットとして操作する生成テキスト・画像モデルのユーザにとって、依然として主要な方法である。モデルを連続関数として扱い,画像空間と即時埋め込み空間の勾配を渡すことにより,プロンプトテキストの代わりにプロンプトの埋め込みを直接操作する新しい手法を提案し,解析する。次に,画像生成を支援するための3つの実践的インタラクションツールを導出する。(1)画像空間で定義されたメトリックの最適化。 2) ユーザを創造的なタスクで支援し, 画像空間内を「近づいた」プロンプト埋め込みの選択方向に沿って移動させることにより, ユーザを創造的タスクで支援する。 (3) ユーザが特定のシードで見た情報を含むようにプロンプトの埋め込みを変更するが、プロンプトでの記述が困難になる。プロンプトエンジニアリングと比較して、ユーザ主導のプロンプト埋め込み操作は、ユーザの意図を統合する、よりきめ細かなターゲット制御を可能にする。ユーザスタディでは,我々の手法は退屈さを減らし,結果のイメージが好まれることが示されている。

Prompt engineering is still the primary way for users of generative text-to-image models to manipulate generated images in a targeted way. Based on treating the model as a continuous function and by passing gradients between the image space and the prompt embedding space, we propose and analyze a new method to directly manipulate the embedding of a prompt instead of the prompt text. We then derive three practical interaction tools to support users with image generation: (1) Optimization of a metric defined in the image space that measures, for example, the image style. (2) Supporting a user in creative tasks by allowing them to navigate in the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that a user has seen in a particular seed but has difficulty describing in the prompt. Compared to prompt engineering, user-driven prompt embedding manipulation enables a more fine-grained, targeted control that integrates a user's intentions. Our user study shows that our methods are considered less tedious and that the resulting images are often preferred.

翻訳日:2024-06-26 04:58:37 公開日:2024-06-22

# ロングビデオにおける時間的文字グループ化のための統一および動的グラフ

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos ( http://arxiv.org/abs/2308.14105v3 )

ライセンス: Link先を確認

Xiujun Shu, Wei Wen, Liangsheng Xu, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun,

(参考訳) ビデオ時間的キャラクタグループ化は、ビデオ内の主要なキャラクタの出現モーメントを、そのアイデンティティに応じて特定する。この目的のために、最近の研究は教師なしクラスタリングからグラフベースのクラスタリングへと進化してきた。しかし、グラフ法は固定親和性グラフの前提の上に構築されており、多くの不正確な接続をもたらす。さらに、デプロイに不都合なモデルでマルチモーダルな特徴を抽出する。本稿では,時間的文字グループ化のための統一動的グラフ(UniDG)フレームワークを提案する。これはまず、同じ空間内で複数のモジュラリティの表現を学習し、同時にモダリティの特異性を保存する統一表現ネットワークによって達成される。第2に,各ノードごとに異なる量の近傍を循環マッチング戦略により動的に構築し,より信頼性の高い親和性グラフを生成する動的グラフクラスタリングを提案する。第3に、異なるモダリティ間の空間的・時間的文脈を活用するためのプログレッシブアソシエーション手法を導入し、マルチモーダルクラスタリング結果をうまく融合させる。現在のデータセットは事前抽出された特徴しか提供しないため、各文字の顔と体と発声音声トラックの出現クリップを含むMTCGと呼ばれる収集データセット上で、UniDG法の評価を行う。また、既存のクラスタリングおよび検索データセットの重要なコンポーネントを評価し、一般化能力を検証する。実験結果から,提案手法は有望な結果を達成し,いくつかの最先端手法より優れることが示された。

Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of models, which are unfriendly to deployment. In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping. This is accomplished firstly by a unified representation network that learns representations of multiple modalities within the same space and still preserves the modality's uniqueness simultaneously. Secondly, we present a dynamic graph clustering where the neighbors of different quantities are dynamically constructed for each node via a cyclic matching strategy, leading to a more reliable affinity graph. Thirdly, a progressive association method is introduced to exploit spatial and temporal contexts among different modalities, allowing multi-modal clustering results to be well fused. As current datasets only provide pre-extracted features, we evaluate our UniDG method on a collected dataset named MTCG, which contains each character's appearing clips of face and body and speaking voice tracks. We also evaluate our key components on existing clustering and retrieval datasets to verify the generalization ability. Experimental results manifest that our method can achieve promising results and outperform several state-of-the-art approaches.

翻訳日:2024-06-26 04:58:37 公開日:2024-06-22

# 認証ロバスト性向上のためのレシピ

A Recipe for Improved Certifiable Robustness ( http://arxiv.org/abs/2310.02513v2 )

ライセンス: Link先を確認

Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson,

(参考訳) 近年の研究は、リプシッツをベースとした、敵の攻撃に対して確実に堅牢なニューラルネットワークを訓練する手法の可能性を強調している。理論的にも経験的にも支持される重要な課題は、ロバストネスが通常のトレーニングよりもネットワーク容量とデータ量を必要とすることだ。しかし、厳密なリプシッツ制約の下で効果的にキャパシティを追加することは、おそらくより難しいことが証明されている。さらに,Lipshitzをベースとしたアプローチの設計空間を慎重に探索する能力が欠如していることから,性能向上の可能性が示唆された。本研究では,リプシッツに基づく認証手法の可能性を明らかにするため,より包括的な評価を行う。新たな手法,設計最適化,先行作業の合成を組み合わせることで,さまざまなベンチマークデータセットに対する決定論的証明と,さまざまな摂動サイズに対して,最先端のVRAを大幅に改善することができる。特に,既存技術であるリプシッツ制御ResNetアーキテクチャの終端に 'Cholesky-orthogonalized residual dense'' 層を追加することは,ネットワーク容量と性能の向上に特に有効である。フィルタリング生成データ拡張と組み合わせて、最終結果は、最先端の決定論的VRAを最大8.5ポイント向上させる。

Recent studies have highlighted the potential of Lipschitz-based methods for training certifiably robust neural networks against adversarial attacks. A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training. However, effectively adding capacity under stringent Lipschitz constraints has proven more difficult than it may seem, evident by the fact that state-of-the-art approach tend more towards \emph{underfitting} than overfitting. Moreover, we posit that a lack of careful exploration of the design space for Lipshitz-based approaches has left potential performance gains on the table. In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. Using a combination of novel techniques, design optimizations, and synthesis of prior work, we are able to significantly improve the state-of-the-art VRA for deterministic certification on a variety of benchmark datasets, and over a range of perturbation sizes. Of particular note, we discover that the addition of large ``Cholesky-orthogonalized residual dense'' layers to the end of existing state-of-the-art Lipschitz-controlled ResNet architectures is especially effective for increasing network capacity and performance. Combined with filtered generative data augmentation, our final results further the state of the art deterministic VRA by up to 8.5 percentage points\footnote{Code is available at \url{https://github.com/hukkai/liresnet}}.

翻訳日:2024-06-26 04:48:52 公開日:2024-06-22

# 逆バックプロパゲーションを用いたテキスト・画像拡散モデルの調整

Aligning Text-to-Image Diffusion Models with Reward Backpropagation ( http://arxiv.org/abs/2310.03739v2 )

ライセンス: Link先を確認

Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki,

(参考訳) テキスト・ツー・イメージの拡散モデルは、画像生成の最前線に登場し、非常に大規模な教師なしまたは弱い教師付きテキスト・ツー・イメージのトレーニングデータセットによって実現されている。教師なしの訓練のため、人間の知覚された画像品質、画像テキストアライメント、倫理的画像生成などの下流作業における行動を制御することは困難である。近年のバニラ強化学習による下流の報酬関数への拡散モデルの研究は、勾配推定器の高分散で有名である。本稿では,拡散モデルと下流の報酬関数を協調する手法であるAlignPropを提案する。このようなバックプロパゲーションの実装は、現代のテキスト・ツー・イメージモデルの部分的なデリバティブを格納するために禁止的なメモリリソースを必要とするが、AlignPropは低ランクのアダプタ重みモジュールを微調整し、グラデーション・チェックポインティングを使用してメモリ使用率を高める。画像テキストのセマンティックアライメント,美学,オブジェクト数の圧縮性と制御性,およびそれらの組み合わせなど,さまざまな目的に対する微調整拡散モデルでAlignPropをテストする。また,AlignPropは,学習段階を減らしてより高い報酬を得られるが,概念的にはシンプルであり,興味のある報酬関数に対する拡散モデルを最適化するための簡単な選択であることを示す。コードと視覚化結果はhttps://align-prop.github.io/.com/で公開されている。

Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.

翻訳日:2024-06-26 04:48:52 公開日:2024-06-22

# 常識推論のための忠実な知識グラフ記述法

Faithful Knowledge Graph Explanations for Commonsense Reasoning ( http://arxiv.org/abs/2310.04910v4 )

ライセンス: Link先を確認

Weihe Zhai, Arkaitz Zubiaga,

(参考訳) 言語モデル(LM)と知識グラフ(KG)の融合は、常識的な質問応答において広く用いられているが、忠実な説明を生み出すことは依然として難しい。現在の手法は、しばしば忠実さをデコードするパスを見落とし、グラフエンコーダ出力とモデル予測の間にばらつきをもたらす。本研究は,LM-KGのミスアライメントと相反する効果を,突発的な説明を引き起こす重要な要因として同定する。そこで本研究では,KG表現の信頼性を評価するためのLM-KGフィデリティ尺度を導入し,説明忠実度を改善するためのLM-KG分布認識アライメント(\textit{LKDA})アルゴリズムを提案する。基礎的な事実を欠くことなく,提案したF-Sparsity Trade-off Curveを用いてKGの説明を評価する。 CommonsenseQAとOpenBookQAの実験では、LKDAは説明の忠実度とモデル性能を著しく向上させ、信頼性の高いCommonsense推論のための分散的不整合に対処する必要性を強調している。

The fusion of language models (LMs) and knowledge graphs (KGs) is widely used in commonsense question answering, but generating faithful explanations remains challenging. Current methods often overlook path decoding faithfulness, leading to divergence between graph encoder outputs and model predictions. We identify confounding effects and LM-KG misalignment as key factors causing spurious explanations. To address this, we introduce the LM-KG Fidelity metric to assess KG representation reliability and propose the LM-KG Distribution-aware Alignment (\textit{LKDA}) algorithm to improve explanation faithfulness. Without ground truth, we evaluate KG explanations using the proposed Fidelity-Sparsity Trade-off Curve. Experiments on CommonsenseQA and OpenBookQA show that LKDA significantly enhances explanation fidelity and model performance, highlighting the need to address distributional misalignment for reliable commonsense reasoning.

翻訳日:2024-06-26 04:48:52 公開日:2024-06-22

# ニューラルコード生成のための関数オーバーラップリグレード

Functional Overlap Reranking for Neural Code Generation ( http://arxiv.org/abs/2311.03366v3 )

ライセンス: Link先を確認

Hung Quoc To, Minh Huynh Nguyen, Nghi D. Q. Bui,

(参考訳) Code Large Language Models (CodeLLMs) は、コード生成の進歩の新たな時代を支えている。しかし、可能なすべてのCodeLLM出力から最高のコードソリューションを選択することは、依然として困難である。それまでの手法では、複雑な機能的類似性やソリューションクラスタ間の相互作用を見落としていた。 SRankは、ソリューションのクラスタ間の関係をモデル化することに焦点を当てた、コード生成から最良のソリューションを選択するための、新しい優先順位付け戦略である。ソリューションクラスタ間の機能の重複を定量化することにより、私たちのアプローチは、コードソリューションのより良いランキング戦略を提供します。実験結果から,pass@1のスコアで顕著な結果が得られることがわかった。例えば、Human-Evalベンチマークでは、Codex002で69.66%、WizardCoderで75.31%、StarCoderで53.99%、CodeGenで60.55%、同じCodeLLMでCodeTやCoder-Reviewerのような最先端のコード生成メソッドをかなり上回っている(平均で約6.1%改善)。サンプル化されたソリューションやテストケースが限られているシナリオであっても、私たちのアプローチは堅牢性と優位性を示し、コード生成の新たなベンチマークを再評価します。私たちの実装はhttps://github.com/FSoft-AI4Code/SRank-CodeRankerで確認できます。

Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation, focusing on modeling the relationships between clusters of solutions. By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions. Empirical results show that our method achieves remarkable results on the pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66% in pass@1 with Codex002, 75.31% with WizardCoder, 53.99% with StarCoder, and 60.55% with CodeGen, surpassing state-of-the-art code generation reranking methods such as CodeT and Coder-Reviewer on the same CodeLLM by a significant margin (approximately 6.1% improvement on average). Even in scenarios with a limited number of sampled solutions and test cases, our approach demonstrates robustness and superiority, marking a new benchmark in code generation reranking. Our implementation can be found at https://github.com/FSoft-AI4Code/SRank-CodeRanker.

翻訳日:2024-06-26 04:39:08 公開日:2024-06-22

# iACOS:InformativeおよびAdaptive Negative例を用いたインプシットセンシティメント抽出の改善

iACOS: Advancing Implicit Sentiment Extraction with Informative and Adaptive Negative Examples ( http://arxiv.org/abs/2311.03896v2 )

ライセンス: Link先を確認

Xiancai Xu, Jia-Dong Zhang, Lei Xiong, Zhishang Liu,

(参考訳) アスペクトベース感情分析(ABSA)は広く研究されているが,4つの基本要素(アスペクト,カテゴリ,意見,感情,特に暗黙的な側面と意見)から構成される4倍の抽出にはほとんど光が当たっていない。本稿では,カテゴリとオピニオンを感性で抽出する新しい手法iACOSを提案する。まず、iACOSはテキストの最後に2つの暗黙のトークンを付加し、暗黙のアスペクトや意見を含むすべてのトークンのコンテキスト認識表現をキャプチャする。第2に、iACOSは、明示的で暗黙的な側面と意見の共抽出のために、コンテキスト認識トークン表現上のシーケンスラベリングモデルを開発する。第3に、iACOSはアスペクトオピニオン対を発見し、カテゴリと感情を同時に予測する、特別なマルチヘッドアテンションを持つマルチラベル分類器を考案した。第4に、iACOSは情報的かつ適応的な負の例を利用して、マルチタスク学習によって、カテゴリと感情に関するマルチラベル分類器と他の2つの分類器を共同で訓練する。最後に、実験結果から、iACOSは2つの公開ベンチマークデータセットのF1スコアに従って、他の4倍の抽出ベースラインを著しく上回っていることが示された。

Aspect-based sentiment analysis (ABSA) have been extensively studied, but little light has been shed on the quadruple extraction consisting of four fundamental elements: aspects, categories, opinions and sentiments, especially with implicit aspects and opinions. In this paper, we propose a new method iACOS for extracting Implicit Aspects with Categories and Opinions with Sentiments. First, iACOS appends two implicit tokens at the end of a text to capture the context-aware representation of all tokens including implicit aspects and opinions. Second, iACOS develops a sequence labeling model over the context-aware token representation to co-extract explicit and implicit aspects and opinions. Third, iACOS devises a multi-label classifier with a specialized multi-head attention for discovering aspect-opinion pairs and predicting their categories and sentiments simultaneously. Fourth, iACOS leverages informative and adaptive negative examples to jointly train the multi-label classifier and the other two classifiers on categories and sentiments by multi-task learning. Finally, the experimental results show that iACOS significantly outperforms other quadruple extraction baselines according to the F1 score on two public benchmark datasets.

翻訳日:2024-06-26 04:39:08 公開日:2024-06-22

# コンセンサスによる高次元自由エネルギー表面の構築

Consensus-based construction of high-dimensional free energy surface ( http://arxiv.org/abs/2311.05009v3 )

ライセンス: Link先を確認

Liyao Lyu, Huan Lei,

(参考訳) 分子系の集合的挙動の定量化における重要な問題は、自由エネルギー表面(FES)の正確な構築にある。主な課題は、エネルギー障壁の出現と高次元性から生じる。既存のアプローチは、フルフェーズ空間の効率的な探索を確立するための洗練されたサンプリング手法に基づいていることが多い。一方、FESの数値近似のための最適なサンプル点の収集は、多くの集合変数 (CV) を持つシステムでは、離散化誤差が支配的になりうるため、ほとんど未探索のままである。本稿では,関数表現とトレーニングセットを同時に最適化するミニマックス問題として構成を再構成し,コンセンサスサンプリングに基づくアプローチを提案する。特に、最大化ステップは、現在損失関数のラプラス近似の活用と未チャート位相空間の探索を調節し、最大残留状態の適応サンプリングを達成する確率的相互作用粒子系を確立し、最小化ステップは新しいトレーニングセットでFES近似を更新する。本手法は,ミニマックス問題を反復的に解くことにより,位相空間探索と後部誤差強調サンプリングの両面において,FESの対角学習を実現する。分子系のFESを最大30個までのCVで構築し,本手法を実証した。

One essential problem in quantifying the collective behaviors of molecular systems lies in the accurate construction of free energy surfaces (FESs). The main challenges arise from the prevalence of energy barriers and the high dimensionality. Existing approaches are often based on sophisticated enhanced sampling methods to establish efficient exploration of the full-phase space. On the other hand, the collection of optimal sample points for the numerical approximation of FESs remains largely under-explored, where the discretization error could become dominant for systems with a large number of collective variables (CVs). We propose a consensus sampling-based approach by reformulating the construction as a minimax problem which simultaneously optimizes the function representation and the training set. In particular, the maximization step establishes a stochastic interacting particle system to achieve the adaptive sampling of the max-residue regime by modulating the exploitation of the Laplace approximation of the current loss function and the exploration of the uncharted phase space; the minimization step updates the FES approximation with the new training set. By iteratively solving the minimax problem, the present method essentially achieves an adversarial learning of the FESs with unified tasks for both phase space exploration and posterior error-enhanced sampling. We demonstrate the method by constructing the FESs of molecular systems with a number of CVs up to 30.

翻訳日:2024-06-26 04:39:08 公開日:2024-06-22

# グラフの弱さと強烈な専門家の混在

Mixture of Weak & Strong Experts on Graphs ( http://arxiv.org/abs/2311.05185v2 )

ライセンス: Link先を確認

Hanqing Zeng, Hanjia Lyu, Diyi Hu, Yinglong Xia, Jiebo Luo,

(参考訳) 現実的なグラフは(1) ノードの豊富な自己特徴と(2) 近隣の情報構造の両方を含み、典型的な設定ではグラフニューラルネットワーク(GNN)が共同で処理する。本稿では,弱い多層パーセプトロン (MLP) と弱い多層パーセプトロン (MLP) の混合により2つのモードを分離することを提案する。専門家の協力関係を異なる目標ノードに適応させるために,弱い専門家の予測ロジットの分散に基づく「自信」機構を提案する。強い専門家は、ノードの分類が近隣情報に依存するか、弱い専門家がモデル品質の低い場合、低信頼領域で条件的に活性化される。我々は,信頼度関数が損失に与える影響を分析することによって,興味深いトレーニングダイナミクスを明らかにする。さらに、我々の"自信"設計は、GNNのより良い一般化能力の恩恵を受けるために、強力な専門家に対して望ましいバイアスを課します。 Mowstは最適化が容易で、単一のGNNに匹敵する計算コストで強力な表現力を実現する。 4つのバックボーンGNNアーキテクチャ上のMowstは、ホモフィルグラフとヘテロフィルグラフ(https://github.com/facebookresearch/mowst-gnn)を含む6つの標準ノード分類ベンチマークにおいて、大幅な精度向上を示している。

Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).

翻訳日:2024-06-26 04:39:08 公開日:2024-06-22

# Social Bias Probing: 言語モデルのフェアネスベンチマーク

Social Bias Probing: Fairness Benchmarking for Language Models ( http://arxiv.org/abs/2311.09090v3 )

ライセンス: Link先を確認

Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein,

(参考訳) 言語モデルにおける社会的バイアスの影響は認識されているが、偏見評価の先行手法は、小さなデータセット上でのバイナリアソシエーションテストに限られており、偏見の複雑さに対する理解が制限されている。本稿では,社会的偏見を考慮した言語モデル構築のための新しい枠組みを提案する。既存のフェアネスコレクションの制限に対処するために設計された大規模なベンチマークであるSOFAをキュレートする。 SOFAは、ステレオタイプとアンチステレオタイプIDのバイナリ比較を超えて分析を拡張し、多様なアイデンティティとステレオタイプを含む。提案手法を既存のベンチマークと比較したところ,言語モデル内のバイアスは認識されるよりもニュアンスが高いことが判明した。 SOFA上でのLMのベンチマークにより、異なる宗教を表現しているアイデンティティが、すべてのモデルにおいて最も顕著な異なる治療につながることを明らかにした。最後に,女性や障害者などの多様な集団が直面する現実の逆境が,これらのモデルの行動に反映されていることを示す。

While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SOFA, a large-scale benchmark designed to address the limitations of existing fairness collections. SOFA expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SOFA, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.

翻訳日:2024-06-26 04:39:08 公開日:2024-06-22

# 大規模言語モデルにおけるマルチステップ推論のためのグラフ励振

Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models ( http://arxiv.org/abs/2311.09762v2 )

ライセンス: Link先を確認

Jinyoung Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim,

(参考訳) CoT(Chain-of-Thought)はサブクエスト生成と応答を促進させ、LLM(Large Language Models)の多段階推論機能を強化した。しかし、LCMが直接サブクエストを生成するように促すことは、しばしば冗長または無関係な質問を生成するため、最適ではない。そこで我々はGE-Reasoning法を提案する。GE-Reasoning法はLLMに対して適切なサブクエストとそれに対応する回答を生成するよう指示する。具体的には,まず LLM に知識三重項を生成するように促し,質問のグラフ表現を形成する。従来の知識三重項とは異なり、本手法は変数を頭や尾の実体として許容し、質問を知識三重項として効果的に表現する。第2に、各三重項に対して、LLMは対応するサブクエストを生成し、知識検索とともに回答する。予測信頼度がしきい値を超えると、サブクエストと予測がその後の処理のプロンプトに組み込まれる。このアプローチは、サブクエストが抽出された知識三重項に根ざされていることを奨励し、冗長性と無関係性を減少させる。実験により,提案手法は,マルチホップ質問応答ベンチマークデータセットにおいて,従来のCoTプロンプト手法とその変種よりも優れていることが示された。

Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# RecExplainer: 推奨モデルを説明するための大規模言語モデルの調整

RecExplainer: Aligning Large Language Models for Explaining Recommendation Models ( http://arxiv.org/abs/2311.10947v2 )

ライセンス: Link先を確認

Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, Xing Xie,

(参考訳) リコメンダシステムはオンラインサービスで広く使われており、埋め込みベースのモデルは複雑な信号を表現することの表現力から特に人気がある。しかしながら、これらのモデルはブラックボックスとして機能するため、ユーザと開発者にとって透明性が低く、信頼性も低い。近年,大規模言語モデル (LLM) は理解,推論,指導において顕著な知性を示している。本稿では, ブラックボックスレコメンデータモデルを説明するために, LLM を代理モデルとして利用することについて検討する。第一の概念は、ターゲットレコメンデータモデルの振る舞いを理解し、エミュレートするためにLLMを訓練することである。 LLMの広い世界知識と多段階の推論能力を活用することで、これらのLCMは高度なサロゲートとして機能し、観測について推論することができる。さらに、自然言語をインターフェースとして使用することで、個々のユーザの好みに合わせてカスタマイズ可能な説明を作成することができる。効果的なアライメントを容易にするために,行動アライメント,意図アライメント,ハイブリッドアライメントという3つの手法を導入する。ビヘイビアアライメントは、ユーザ好みとアイテム情報をテキストとしてテキストとして表現し、リコメンデーションモデルの潜在空間で意図アライメントを行い、ユーザとアイテムの表現を使ってモデルの振る舞いを理解する。 3つの公開データセットで実施された総合的な実験により、我々のアプローチは、ターゲットモデルを理解し、模倣し、高品質で、高忠実で、独特な説明をもたらす有望な結果をもたらすことが示された。私たちのコードはhttps://github.com/microsoft/RecAI.comで公開されています。

Recommender systems are widely used in online services, with embedding-based models being particularly popular due to their expressiveness in representing complex signals. However, these models often function as a black box, making them less transparent and reliable for both users and developers. Recently, large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following. This paper presents the initial exploration of using LLMs as surrogate models to explaining black-box recommender models. The primary concept involves training LLMs to comprehend and emulate the behavior of target recommender models. By leveraging LLMs' own extensive world knowledge and multi-step reasoning abilities, these aligned LLMs can serve as advanced surrogates, capable of reasoning about observations. Moreover, employing natural language as an interface allows for the creation of customizable explanations that can be adapted to individual user preferences. To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment. Behavior alignment operates in the language space, representing user preferences and item information as text to mimic the target model's behavior; intention alignment works in the latent space of the recommendation model, using user and item representations to understand the model's behavior; hybrid alignment combines both language and latent spaces. Comprehensive experiments conducted on three public datasets show that our approach yields promising results in understanding and mimicking target models, producing high-quality, high-fidelity, and distinct explanations. Our code is available at https://github.com/microsoft/RecAI.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# 身体運動のアーカイブ:中国書道の集合的生成

Archiving Body Movements: Collective Generation of Chinese Calligraphy ( http://arxiv.org/abs/2311.13770v3 )

ライセンス: Link先を確認

Aven Le Zhou, Jiayi Ye, Tianchen Liu, Kang Zhang,

(参考訳) コミュニケーションチャネルとして、身体運動は行動研究やキネシクスで広く研究されている。演技と視覚芸術は同じ関心を共有しているが、ダンス表記や視覚作品の作成など、人間の身体運動の文書化と表現に重点を置いている。本稿では, 東洋書道における身体運動と書道の原理を適用し, 身体運動を刺激し, アーカイブする方法について検討する。著者らは、アートワーク(武州)を通じて、読者の身体参加と身体運動のアーカイブ化を対話的で創造的なアプローチで試みた。読者は書き手と読み手の両方の役割を前提としており、生成した書道(読み手)は、この無限の「書」の中で循環的な過程となり、漢字や書道に関するさらなる注意と議論の動機となる。

As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# 高等教育におけるChatGPTの社会的バイアスの可能性:スコーピング・レビュー

Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review ( http://arxiv.org/abs/2311.14381v2 )

ライセンス: Link先を確認

Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng,

(参考訳) 目的:ChatGPTのような生成人工知能(Generative Artificial Intelligence, GAI)モデルは、広範囲なデータセットのトレーニングによって社会的バイアスを継承または増幅することができる。高等教育機関(HEIs)における学生、教員、職員のGAI利用の増加に伴い、これらの技術に関連する倫理的問題や潜在的なバイアスについて検討することが急務である。デザイン/アプリケーション/メソッド:このスコーピングレビューは、近年の学術論文で、AIに関するバイアスがどのように研究され議論されているかを明らかにすることを目的としている。我々は、高等教育分野において、GAIが引き起こす可能性のある社会的バイアスを分類した。本レビューでは,4つの主要データベースにまたがる英語,中国語,日本語の記事を取り上げ,高等教育におけるGAI活用と偏見に着目した。我々の発見は、AI分野におけるLSMに関するバイアスと差別に関する有意義な学術的な議論がある一方で、高等教育のアプローチに関するほとんどの記事が表面上問題にアプローチしていることを示している。異なる状況下で特定の種類の偏見を識別する記事はほとんどなく、実証研究の欠如が顕著である。概説では、主に医学・工学に関する教育・研究分野に焦点をあてており、一部は英語教育について論じている。しかし、人文科学や社会科学に関する議論はほとんどない。さらに、現在の談話の大部分は英語で書かれており、主に英語の文脈を扱う。原性/価値:私たちの知識を最大限に活用するために、私たちの研究は、高等教育における潜在的な社会的バイアスを初めて要約したものです。このレビューは、GAIが教育環境で導入または増幅する可能性のある特定のバイアスを理解するために、より深い研究と実証的な研究の必要性を強調し、高等教育におけるより倫理的なAIアプリケーションの開発を導く。

Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings:Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value:To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# 多体散逸量子物質における創発的トポロジー

Emergent Topology in Many-Body Dissipative Quantum Matter ( http://arxiv.org/abs/2311.14640v3 )

ライセンス: Link先を確認

Antonio M. García-García, Lucas Sá, Jacobus J. M. Verbaarschot, Can Yin,

(参考訳) トポロジカル特徴の同定、記述、分類は、物理学のいくつかの分野における発見と革新のエンジンである。この研究は、凝縮物質中の整数および分数チャーン絶縁体から光学における複雑なフォトニック格子における保護状態、QCD真空の構造まで幅広いシステムを含む。ここでは、擬エルミート多体量子系の散逸ダイナミクスという、トポロジーの別の遊び場を紹介する。そこで我々は,散逸型Sachdev-Ye-Kitaev(SYK)モデルと量子カオスデファスリングスピン鎖の2つの異なるシステムについて検討した。 2つの異なる多体モデルに対して、それらが普遍的であることを示す幅広いパラメータの位相的特徴を求める。 SYKモデルでは、フェルミオン交換を行うユニタリ作用素の異常なトレースの存在に直接関係するベクトル化されたリウビリアンの長方形ブロック表現によって特徴づけられる擬ハーミティティーに関連する4つの普遍性クラスを同定する。この長方形化の結果、対称性にのみ依存する位相指標 $\nu$ が特定される。矩形化のもう1つの顕著な結果として、入浴へのカップリングについて、リウヴィリアの純粋に実位相モードの観測がある。これらの実モードのレベル統計は対応するランダム行列のアンサンブルと一致し、4つの位相対称性クラスを特徴づけるために用いられる。浴への弱いカップリングの限界において、トポロジカルモードは平衡へのアプローチを制御し、散逸性多体量子カオス系におけるトポロジの実験的な確認を可能にする。

The identification, description, and classification of topological features is an engine of discovery and innovation in several fields of physics. This research encompasses a broad variety of systems, from the integer and fractional Chern insulators in condensed matter, to protected states in complex photonic lattices in optics, and the structure of the QCD vacuum. Here, we introduce another playground for topology: the dissipative dynamics of pseudo-Hermitian many-body quantum systems. For that purpose, we study two different systems, the dissipative Sachdev-Ye-Kitaev (SYK) model, and a quantum chaotic dephasing spin chain. For the two different many-body models, we find the same topological features for a wide range of parameters suggesting that they are universal. In the SYK model, we identify four universality classes, related to pseudo-Hermiticity, characterized by a rectangular block representation of the vectorized Liouvillian that is directly related to the existence of an anomalous trace of the unitary operator implementing fermionic exchange. As a consequence of this rectangularization, we identify a topological index $\nu$ that only depends on symmetry. Another distinct consequence of the rectangularization is the observation, for any coupling to the bath, of purely real topological modes in the Liouvillian. The level statistics of these real modes agree with that of the corresponding random matrix ensemble and therefore can be employed to characterize the four topological symmetry classes. In the limit of weak coupling to the bath, topological modes govern the approach to equilibrium, which may enable a direct path for experimental confirmation of topology in dissipative many-body quantum chaotic systems.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# Feature Speed Formula: ディープニューラルネットワークのハイパーパラメータ拡張のためのフレキシブルアプローチ

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks ( http://arxiv.org/abs/2311.18718v3 )

ライセンス: Link先を確認

Lénaïc Chizat, Praneeth Netrapalli,

(参考訳) ディープラーニングは階層的な特徴学習によって成功するが、初期化スケールや学習率などのハイパーパラメータ(HP)のチューニングは、この振る舞いを間接的に制御するだけである。本稿では,機能更新と後方パスの間の角度$\theta_\ell$(層インデックス$\ell$)を予測し,制御するための重要な概念を紹介する。この角度$\theta_\ell$, 損失減衰, 後方通過の大きさから, 任意のトレーニング時間において, 任意のGDステップ後の特徴更新の程度を, 単純かつ一般の \emph{feature speed formula} で表すことができることを示す。この角 $\theta_\ell$ は層から層へのヤコビアンの条件付けとランダム初期化によって制御され、あるカーネルのスペクトルによって決定される。 Theta_\ell$が与えられたとき、特徴速度公式はHP(スケールと学習率)を調整し、特徴学習や損失減衰といった特定の力学特性を満たすためのルールを提供する。本研究では,ReLU MLPとResNetの広帯域幅幅幅制限に対するアプローチの有効性について検討する。先行研究に基づき、 iid 初期化を伴う ReLU MLP において、角度は $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$ で縮退することを示す。対照的に、ブランチスケール $O(1/\sqrt{\text{depth}})$ の ResNets は非退化角 $\cos(\theta_\ell)=\Theta(1)$ を維持する。我々はこれらの知見を用いて、既知のHPスケーリングの重要な特性を復元し、また、理論的性質が好ましい大深度ReLU MLPのための新しいHPスケーリングを導入する。

Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of feature updates after one GD step, at any training time, can be expressed via a simple and general \emph{feature speed formula} in terms of this angle $\theta_\ell$, the loss decay, and the magnitude of the backward pass. This angle $\theta_\ell$ is controlled by the conditioning of the layer-to-layer Jacobians and at random initialization, it is determined by the spectrum of a certain kernel, which coincides with the Neural Tangent Kernel when $\ell=\text{depth}$. Given $\theta_\ell$, the feature speed formula provides us with rules to adjust HPs (scales and learning rates) so as to satisfy certain dynamical properties, such as feature learning and loss decay. We investigate the implications of our approach for ReLU MLPs and ResNets in the large width-then-depth limit. Relying on prior work, we show that in ReLU MLPs with iid initialization, the angle degenerates with depth as $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$. In contrast, ResNets with branch scale $O(1/\sqrt{\text{depth}})$ maintain a non-degenerate angle $\cos(\theta_\ell)=\Theta(1)$. We use these insights to recover key properties of known HP scalings and also to introduce a new HP scaling for large depth ReLU MLPs with favorable theoretical properties.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# 対話型インテクスト学習によるプロンプト最適化

Prompt Optimization via Adversarial In-Context Learning ( http://arxiv.org/abs/2312.02614v3 )

ライセンス: Link先を確認

Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He,

(参考訳) 本稿では,1つの LLM をジェネレータとして,もう1つは識別器として,もう1つはプロンプト修飾器として,さらに1つはプロンプト修飾器として用いることで,文脈内学習(ICL)のプロンプトを最適化する新しい手法であるadv-ICLを提案する。従来の逆数学習と同様に、adv-ICLはジェネレータと判別器の間で2人プレイヤゲームとして実装され、ジェネレータは判別器を騙すのに十分な出力を生成しようとする。各ラウンドにおいて、タスク命令といくつかの例によってプレフィックスされた入力が与えられたとき、ジェネレータは出力を生成する。次に、判別器は、ジェネレータの入出力ペアをモデル生成または実データとして分類する。判別器損失に基づいて、プロンプト修飾器は生成器への編集が可能であり、識別器のプロンプトが提案され、最も良くなる編集が選択される。本稿では,Adv-ICLにより,11 世代におけるオープンソースモデルとクローズドソースモデルの最適化手法と,要約,算術的推論,機械翻訳,データ-テキスト生成,MMLU およびBig-bench ハードベンチマークなどの分類タスクが大幅に改善されることを示す。さらに,本手法では事前学習モデルを用いて,モデルパラメータではなくプロンプトのみを更新するので,計算効率が良く,どのLLMやタスクにも容易に拡張でき,低リソース設定でも有効である。

We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# コラボレーション・コーポレート・キャプチャー : NLPの産業アーチファクト・コントリビューションへの信頼の定量化

Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions ( http://arxiv.org/abs/2312.03912v2 )

ライセンス: Link先を確認

Will Aitken, Mohamed Abdalla, Karen Rudie, Catherine Stinson,

(参考訳) 事前訓練されたモデルの印象的なパフォーマンスは大衆の注目を集め、近年ニュースの見出しを飾っている。ほぼ常に、これらのモデルは産業と共同で生産される。それらの使用は自然言語処理(NLP)ベンチマークと競合し、NLP研究に関連性を維持するために重要である。我々は,EMNLP 2022で公表された100の論文を調査し,研究者が産業モデルや他のアーティファクトにどの程度依存しているか,そしてNLPの権威ある会場で出版する貢献が期待するよりも少なくとも3倍大きいことを確かめた。私たちの研究は、将来の研究者がより正確に解決できる足場として役立ちます。 1 産業との連携は、選択肢がない状態での連携である。 2 民間企業のモチベーション及び研究の方向性により、NLP調査が達成された場合。

Impressive performance of pre-trained models has garnered public attention and made news headlines in recent years. Almost always, these models are produced by or in collaboration with industry. Using them is critical for competing on natural language processing (NLP) benchmarks and correspondingly to stay relevant in NLP research. We surveyed 100 papers published at EMNLP 2022 to determine the degree to which researchers rely on industry models, other artifacts, and contributions to publish in prestigious NLP venues and found that the ratio of their citation is at least three times greater than what would be expected. Our work serves as a scaffold to enable future researchers to more accurately address whether: 1) Collaboration with industry is still collaboration in the absence of an alternative or 2) if NLP inquiry has been captured by the motivations and research direction of private corporations.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# 地球観測のためのデータ中心機械学習

Better, Not Just More: Data-Centric Machine Learning for Earth Observation ( http://arxiv.org/abs/2312.05327v2 )

ライセンス: Link先を確認

Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl, Devis Tuia,

(参考訳) 現代の機械学習における最近の発展と研究は、地理空間分野の大幅な改善につながっている。多くのディープラーニングアーキテクチャとモデルが提案されているが、その大半は、強力な現実世界の関連性を欠いたベンチマークデータセット上でのみ開発されている。さらに、これらのデータセットには、すでに多くのメソッドのパフォーマンスが飽和している。モデル中心の視点から補完的なデータ中心の視点へのシフトは、より正確性、一般化能力、そしてエンドユーザーアプリケーションへの影響を高めるために必要である。さらに、問題定義からモデルデプロイメントまでのマシンラーニングサイクル全体を考慮すれば、予期せぬ状況で信頼性のあるマシンラーニングモデルを強化する上で、極めて重要です。本研究は、地理空間データに対する自動データ中心学習手法の正確な分類と概要と、その定義を提示する。これは、より大きな機械学習デプロイメントサイクルにおけるモデル中心の学習に対するデータ中心学習の補完的な役割を強調している。地理空間領域全体にわたる論文をレビューし、それらを異なるグループに分類する。代表的な実験のセットは具体的な実装例を示している。これらの例は、データ中心の機械学習アプローチで地理空間データに作用する具体的なステップを提供する。

Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle - from problem definition to model deployment with feedback - is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.

翻訳日:2024-06-26 02:42:42 公開日:2024-06-22

# マルチモーダル・インフォメーション・ボトルネック属性による画像テキスト表現の視覚的説明

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution ( http://arxiv.org/abs/2312.17174v2 )

ライセンス: Link先を確認

Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson,

(参考訳) 視覚言語で事前訓練されたモデルは目覚ましい成功を収めてきたが、安全クリティカルな設定への適用は、解釈可能性の欠如によって制限されている。 CLIPのような視覚言語モデルの解釈性を改善するために,関連性のある視覚的特徴とテキスト的特徴を保存しつつ,無関係な情報を圧縮する潜時表現を学習するマルチモーダル情報ボトルネック(M2IB)アプローチを提案する。本稿では,M2IBを視覚言語事前学習モデルの帰属分析に適用し,帰属精度を高め,医療などの安全クリティカル領域に適用した場合の解釈可能性を向上させる方法を示す。重要な点として、一般的に使われるユニモーダル属性法とは異なり、M2IBは基礎的な真理ラベルを必要としないため、複数のモダリティがあるが、基礎的真実データがない場合に、視覚言語事前訓練されたモデルの表現を監査することができる。 CLIPを例として、M2IB属性の有効性を示し、勾配に基づく、摂動に基づく、注意に基づく属性法を質的かつ定量的に上回ることを示す。

Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively.

翻訳日:2024-06-26 02:32:50 公開日:2024-06-22

# 異常検出のための拡散モデルにおける雑音の動的付加

Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection ( http://arxiv.org/abs/2401.04463v2 )

ライセンス: Link先を確認

Justin Tebbe, Jawad Tayyub,

(参考訳) 拡散モデルは、名目データ分布を捕捉し、再構成を通して異常を識別することで、異常検出に有用な応用を見出した。それらの利点にもかかわらず、彼らは様々なスケールの異常、特に欠落した成分全体のような大きな異常をローカライズするのに苦労している。そこで我々は,従来の暗黙的条件付け手法であるMeng et al(2022)を3つの重要な方法で拡張することにより,拡散モデルの能力を高める新しい枠組みを提案する。まず,初期異常予測によって導かれるフォワードプロセスにおける可変ノイズ発生ステップを動的ステップサイズ計算に組み込む。第二に、ノイズが加わらずにのみスケールした入力をデノナイズすることが従来のデノナイズ処理より優れていることを示す。第三に、我々は、大きな欠落したコンポーネントの再構築を妨害する細部を抽象化するために、潜伏した空間に画像を投影する。さらに,対象領域のニュアンスを効果的に把握するための微調整機構を提案する。本手法は,VisA,BTAD,MVTecなどの異常検出データセットの厳密な評価を行い,高い性能を示した。重要な点として,本フレームワークは,拡散に基づく異常検出における重要な進歩を示すため,スケールに関わらず,効果的に異常の局所化を行う。

Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components. Addressing this, we present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. (2022) in three significant ways. First, we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second, we demonstrate that denoising an only scaled input, without any added noise, outperforms conventional denoising process. Third, we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally, we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on prominent anomaly detection datasets VisA, BTAD and MVTec yielding strong performance. Importantly, our framework effectively localizes anomalies regardless of their scale, marking a pivotal advancement in diffusion-based anomaly detection.

翻訳日:2024-06-26 02:32:50 公開日:2024-06-22

# 推論ステップ長が大規模言語モデルに及ぼす影響

The Impact of Reasoning Step Length on Large Language Models ( http://arxiv.org/abs/2401.04925v4 )

ライセンス: Link先を確認

Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du,

(参考訳) 思考の連鎖(CoT)は、大きな言語モデル(LLM)の推論能力を改善する上で重要である。しかし, プロンプトにおけるCoTの有効性と推論ステップの長さの相関はよく分かっていない。これを明らかにするために,我々はいくつかの実験を行い,その関係について検討した。具体的には、CoTの実証実験において、他のすべての要因を一定に保ちながら、合理的推論ステップを拡張して圧縮する実験を設計する。主な発見は以下の通りである。まず、プロンプトに新たな情報を加えることなく、プロンプトにおける推論ステップを延長することで、複数のデータセットにまたがるLLMの推論能力が大幅に向上することを示す。あるいは、キー情報を保存しながらも推論ステップを短縮することは、モデルの推論能力を著しく低下させる。この発見は、CoTプロンプトにおけるステップ数の重要性を強調し、複雑な問題解決シナリオにおけるLLMのポテンシャルをよりよく活用するための実践的なガイダンスを提供する。次に,CoTの性能と実演における有理性との関係について検討した。驚くべきことに、たとえ誤った有理数であっても、推論の必要な長さを維持すれば、有利な結果が得られることが示される。第三に、より単純なタスクはより少ないステップを必要とするのに対して、複雑なタスクはより長い推論シーケンスから著しく向上する。コードはhttps://github.com/MingyuJ666/The-Impact-of-Reasoning-Step-Length-on-Language-Modelsで公開されている。

Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences. The code is available at https://github.com/MingyuJ666/The-Impact-of-Reasoning-Step-Length-on-Large-Language-Models

翻訳日:2024-06-26 02:32:50 公開日:2024-06-22

# シルエット・アグリゲーションの再検討

Revisiting Silhouette Aggregation ( http://arxiv.org/abs/2401.05831v3 )

ライセンス: Link先を確認

John Pavlopoulos, Georgios Vardakas, Aristidis Likas,

(参考訳) シルエット係数(Silhouette coefficient)は、クラスタリングの割り当ての品質を評価し、データポイント当たりのスコアを生成する確立された内部クラスタリング評価尺度である。データセット全体のクラスタリングの品質を評価するために、データセットのすべてのポイントのスコアは通常、(マイクロ)1つの値に平均化されます。しかし、滅多に採用されない代替のパスは、まずクラスタレベルで平均化し、次に(マクロ)クラスタ全体で平均となることである。この研究を合成例で示すように、典型的なマイクロデバッグ戦略はクラスタ不均衡に敏感であり、見過ごされたマクロデバッグ戦略ははるかに堅牢である。マクロシルエットをさらに調査することで、既存の図書館で唯一利用可能な戦略である統一サブサンプリングが、不均衡に対する尺度の頑健さを損なうことが判明した。クラスタごとのサンプリング手法を提案することでこの問題に対処する。 8つの実世界のデータセットに関する実験的研究は、2つのクラスタリングタスクにおいて両方の係数を分析するために使用される。

Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are typically (micro) averaged into a single value. An alternative path, however, that is rarely employed, is to average first at the cluster level and then (macro) average across clusters. As we illustrate in this work with a synthetic example, the typical micro-averaging strategy is sensitive to cluster imbalance while the overlooked macro-averaging strategy is far more robust. By investigating macro-Silhouette further, we find that uniform sub-sampling, the only available strategy in existing libraries, harms the measure's robustness against imbalance. We address this issue by proposing a per-cluster sampling method. An experimental study on eight real-world datasets is then used to analyse both coefficients in two clustering tasks.

翻訳日:2024-06-26 02:22:43 公開日:2024-06-22

# 適応ファイナフィルタによる基底状態生成のための一元行列の量子固有値変換のスケーラビリティ向上

Enhancing Scalability of Quantum Eigenvalue Transformation of Unitary Matrices for Ground State Preparation through Adaptive Finer Filtering ( http://arxiv.org/abs/2401.09091v3 )

ライセンス: Link先を確認

Erenay Karacan, Yanbin Chen, Christian B. Mendl,

(参考訳) ハミルトニアンシミュレーション(英: Hamiltonian Simulation)は、量子コンピュータが、その固有の量子的振る舞いのために古典的な計算能力を上回る能力を持つ領域である。このような量子アルゴリズムの主な課題の1つは、意味のある量子優位性を達成するために必要とされるシステムサイズをアップスケーリングすることである。本研究では,与えられたハミルトニアンの基底状態作成のための固有空間フィルタリングのスケーラビリティ向上のためのアプローチを提案する。本手法は,低エネルギー状態の小さなスペクトルギャップと高縮退によって生じる制約に対処することを目的としている。単位行列の量子固有値変換(QETU)とスペクトルプロファイリングによる固有空間フィルタリングの適応配列に基づく。提案アルゴリズムと最先端位相推定法を組み合わせることで,地上状態エネルギーと局所2量子ゲート脱分極確率を最大10^{-4}$で近似した。本研究の重要な成果を示すために,Qiskit を用いた古典計算機上での逆場イジングモデルを用いてシミュレーションを行った。提案手法とQETUの静的実装を比較し,絶対誤差率の3～4桁の改善を連続的に達成可能であることを示す。

Hamiltonian simulation is a domain where quantum computers have the potential to outperform their classical counterparts due to their inherent quantum behavior. One of the main challenges of such quantum algorithms is up-scaling the system size, which is necessary to achieve meaningful quantum advantage. In this work, we present an approach to improve the scalability of eigenspace filtering for the ground state preparation of a given Hamiltonian. Our method aims to tackle limitations introduced by a small spectral gap and high degeneracy of low energy states. It is based on an adaptive sequence of eigenspace filtering through Quantum Eigenvalue Transformation of Unitary Matrices (QETU) followed by spectrum profiling. By combining our proposed algorithm with state-of-the-art phase estimation methods, we achieved good approximations for the ground state energy with local, two-qubit gate depolarizing probability up to $10^{-4}$. To demonstrate the key results in this work, we ran simulations with the transverse-field Ising Model on classical computers using Qiskit. We compare the performance of our approach with the static implementation of QETU and show that we can consistently achieve three to four orders of magnitude improvement in the absolute error rate.

翻訳日:2024-06-26 02:22:43 公開日:2024-06-22

# ブリッジング進化アルゴリズムと強化学習:ハイブリッドアルゴリズムに関する総合的な調査

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms ( http://arxiv.org/abs/2401.11963v4 )

ライセンス: Link先を確認

Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang,

(参考訳) 進化的アルゴリズム(EA)と強化学習(RL)を統合した進化的強化学習(ERL)は、目覚ましい性能向上を示した。両方のアプローチを融合させることで、ERLは有望な研究方向として浮上した。本調査では,ERLの多様な研究分野について概観する。具体的には, 関連アルゴリズムの最近の進歩を体系的に要約し, EA支援によるRL最適化, RL支援によるEA最適化, EAとRLの相乗的最適化の3つの研究方向を特定する。その後、各研究の方向性を詳細に分析し、複数の研究部門を編成する。それぞれのブランチが取り組もうとしている問題と、EAとRLの統合がこれらの課題にどのように対処するかを明らかにする。結論として,様々な研究方向性にまたがる潜在的な課題と今後の研究方向性について議論する。研究者によるERLの探究を容易にするため, https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learningに関するアルゴリズムとコードを整理した。

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in related algorithms and identify three primary research directions: EA-assisted Optimization of RL, RL-assisted Optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EAs and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.

翻訳日:2024-06-26 02:22:43 公開日:2024-06-22

# 強化学習エージェントにおける創発的支配階層

Emergent Dominance Hierarchies in Reinforcement Learning Agents ( http://arxiv.org/abs/2401.12258v7 )

ライセンス: Link先を確認

Ram Rachum, Yonatan Nakar, Bill Tomlinson, Nitay Alon, Reuth Mirsky,

(参考訳) 現代の強化学習(RL)アルゴリズムは、様々なタスクにおいて人間より優れている。マルチエージェント強化学習(MARL)の設定には新たな課題があり、エージェントの混合モチベーションにおける協調の成功は、個人とグループ間の微妙なバランスをとる行為に依存する。社会慣習や規範は、しばしば人間の制度にインスパイアされ、このバランスを打つための道具として使用される。本稿では,動物社会と人間社会の連携の基盤となる,基礎的でよく研究された社会慣行,支配階層について考察する。我々は、支配階層の倫理理論を人工エージェントに適用し、確立された用語と定義を可能な限り少ない修正で借用する。明示的なプログラミングや本質的な報酬なしに活動するRLエージェントの集団は、新しい集団に支配階層を発明し、学習し、強制し、伝達することができることを実証する。支配的な階層構造は、鶏、マウス、魚、その他の種で研究されるものと類似した構造を持つ。

Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.

翻訳日:2024-06-26 02:22:43 公開日:2024-06-22

# OMPGPT: OpenMPのための生成事前学習型トランスモデル

OMPGPT: A Generative Pre-trained Transformer Model for OpenMP ( http://arxiv.org/abs/2401.16445v3 )

ライセンス: Link先を確認

Le Chen, Arijit Bhattacharjee, Nesreen Ahmed, Niranjan Hasabnis, Gal Oren, Vy Vo, Ali Jannesari,

(参考訳) ChatGPTのような大規模言語モデル(LLM)は自然言語処理(NLP)の分野を大きく進歩させた。この傾向は、StarCoder、WizardCoder、CodeLlamaといったコードベースの大規模言語モデルの開発につながった。これらのコードの汎用的な能力は、コード生成のようなタスクにおいて多くのプログラマにとって有用であるが、ハイパフォーマンスコンピューティング(HPC)の領域は、より小さく、よりドメイン固有のモデルをよりスマートな選択にするための、より狭い要求セットを持っている。本稿では,OpenMPプラグマ生成のための言語モデル固有の強みを巧みに活用したドメイン固有モデルであるOMPGPTを提案する。さらに、我々は、NLPドメインからの迅速なエンジニアリング技術を活用して、OMPGPTの有効性を高めるために設計された革新的な戦略であるChain-of-OMPを作成する。 OMPGPTはOpenMPタスクに特化している既存の大規模言語モデルよりも優れており、HPC環境の典型的なハードウェア制約とより密に一致している。我々は、言語モデルの利点とHPCタスクの特定の要求を結びつけるために、我々の貢献を重要な橋と考えます。

Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are trained extensively on vast repositories of code and programming languages. While the generic abilities of these code LLMs are useful for many programmers in tasks like code generation, the area of high-performance computing (HPC) has a narrower set of requirements that make a smaller and more domain-specific model a smarter choice. This paper presents OMPGPT, a novel domain-specific model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation. Furthermore, we leverage prompt engineering techniques from the NLP domain to create Chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive evaluations demonstrate that OMPGPT outperforms existing large language models specialized in OpenMP tasks and maintains a notably smaller size, aligning it more closely with the typical hardware constraints of HPC environments. We consider our contribution as a pivotal bridge, connecting the advantage of language models with the specific demands of HPC tasks.

翻訳日:2024-06-26 02:22:43 公開日:2024-06-22

# スキルセット最適化:トランスファー可能なスキルによる言語モデル行動の強化

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills ( http://arxiv.org/abs/2402.03244v2 )

ライセンス: Link先を確認

Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox,

(参考訳) 大規模言語モデル(LLM)は、インタラクティブ環境でのシーケンシャルな意思決定に最近使用されている。しかし,環境報酬信号の連続的LLMアクター改善への活用は容易ではない。トランスファー可能なスキルセットの構築と精細化を通じて,LLMアクターのパフォーマンスを向上させるためのスキルセット最適化(SSO)を提案する。 SSOは、報酬の高い共通のサブトラジェクトリを抽出し、各スキルを表すサブゴールと命令を生成することで、スキルを構築する。これらのスキルは、高い報酬で行動を強化するために、LLMアクターにコンテキストで提供される。そして、SSOは、高い報酬を得られない技術を切り刻むことによって設定されたスキルをさらに洗練する。我々は,従来のビデオゲームNetHackとテキスト環境ScienceWorldで,SSOのスキルセットを最適化し,コンテキスト内ポリシーの改善を行う能力を実証するために,本手法を評価した。 SSOは当社のカスタムNetHackタスクのベースラインを40%上回り、ScienceWorldの最先端を35%上回ります。

Large language models (LLMs) have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting common subtrajectories with high rewards and generating subgoals and instructions to represent each skill. These skills are provided to the LLM actor in-context to reinforce behaviors with high rewards. Then, SSO further refines the skill set by pruning skills that do not continue to result in high rewards. We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom NetHack task and outperforms the previous state-of-the-art in ScienceWorld by 35%.

翻訳日:2024-06-26 02:11:02 公開日:2024-06-22

# ダンス生成のための双方向自己回帰拡散モデル

Bidirectional Autoregressive Diffusion Model for Dance Generation ( http://arxiv.org/abs/2402.04356v4 )

ライセンス: Link先を確認

Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang,

(参考訳) ダンスは人間の感情を表現するための強力な媒体として機能するが、人生のようなダンスの生成は依然としてかなりの課題である。近年、拡散モデルは様々な領域で顕著な生成能力を示した。彼らは、適応可能な多対多の性質のために、人間のモーションジェネレーションを約束します。それにもかかわらず、現在の拡散に基づく運動生成モデルは、局所的および双方向的な拡張による動きに焦点を絞らず、直接かつ一方向の運動列を直接生成することが多い。高品質な舞踊の動きを振る舞う際には、音楽的文脈だけでなく、近隣の音楽的な舞踊の動きも考慮する必要がある。本研究では,音楽間距離生成のための双方向自己回帰拡散モデル (BADM) を提案する。生成したダンス動作をよりスムーズにするため、局所運動強調のための局所情報デコーダを構築する。提案フレームワークは入力条件と近傍の動作に基づいて新しい動きを生成することができ、個々の動きスライスを反復的に予測し、全ての予測を統合する。生成されたダンスとビートとの同期性を更に向上させるため、ビート情報を入力として組み込んで、より優れた音楽整列ダンス動作を生成する。実験結果から,提案モデルが既存の一方向アプローチと比較して最先端性能を達成できることが示唆された。

Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.

翻訳日:2024-06-26 02:11:02 公開日:2024-06-22

# 確率微分方程式によるスコアベース拡散モデル-技術チュートリアル

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial ( http://arxiv.org/abs/2402.07487v2 )

ライセンス: Link先を確認

Wenpin Tang, Hanyang Zhao,

(参考訳) 以下は、スコアベース拡散モデルに関する解説記事であり、特に確率微分方程式(SDE)による定式化に焦点を当てている。本稿では,SDE/ODEサンプリング,スコアマッチング効率,一貫性モデル,強化学習を含む,拡散モデルにおける2つの柱について考察する。提案された結果の主案を説明するための短い証明が与えられる。この記事は、主にこの分野の技術的な紹介であり、実践者は、新しいモデルやアルゴリズムを設計するのに有用な分析を見出すかもしれない。

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency models, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily a technical introduction to the field, and practitioners may also find some analysis useful in designing new models or algorithms.

翻訳日:2024-06-26 02:01:18 公開日:2024-06-22

# All in One and One for All: クロスドメイングラフ事前トレーニングのためのシンプルで効果的な方法

All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining ( http://arxiv.org/abs/2402.09834v2 )

ライセンス: Link先を確認

Haihong Zhao, Aochuan Chen, Xiangguo Sun, Hong Cheng, Jia Li,

(参考訳) 大規模言語モデル (LLM) はコンピュータビジョン (CV) と自然言語処理 (NLP) の分野に革命をもたらした。 LLMの最も注目すべき進歩の1つは、単一のモデルが、複数のドメインにまたがる広範囲で多様なデータセット("All in One"と呼ばれるパラダイム)でトレーニングされていることだ。この方法論は、非常に一般化された能力を持つLLMに権限を与え、さまざまなデータ分散の包括的理解を促進する。これらの機能を活用することで、単一のLLMは、さまざまなドメインにまたがる顕著な汎用性を実証する。"One for All"というパラダイムは、我々が"One for All"と呼ぶパラダイムだ。しかし、このアイデアをグラフ場に適用することは、ドメイン間の事前学習がしばしば負の移動をもたらすため、依然として非常に難しい課題である。この問題は、トレーニングデータの質が外部知識源の組み入れを必要とする、数ショットの学習シナリオにおいて特に重要である。この課題に対応するために,多種多様なグラフデータセット間の共通性を生かしたグラフコーディネータ(GCOPE)を提案する。我々の新しい手法は、事前学習期間中に異なるグラフデータセットをアマルガメートして、目的のタスクに有意義な知識を蒸留し、伝達する統合フレームワークを包含する。複数のグラフデータセットにまたがる大規模な実験は、我々のアプローチの優れた効果を示す。複数のグラフデータセットの相乗的ポテンシャルを事前学習に活用することにより、我々の研究はグラフ基礎モデルの領域への先駆的な貢献として立証される。

Large Language Models (LLMs) have revolutionized the fields of computer vision (CV) and natural language processing (NLP). One of the most notable advancements of LLMs is that a single model is trained on vast and diverse datasets spanning multiple domains -- a paradigm we term `All in One'. This methodology empowers LLMs with super generalization capabilities, facilitating an encompassing comprehension of varied data distributions. Leveraging these capabilities, a single LLM demonstrates remarkable versatility across a variety of domains -- a paradigm we term `One for All'. However, applying this idea to the graph field remains a formidable challenge, with cross-domain pretraining often resulting in negative transfer. This issue is particularly important in few-shot learning scenarios, where the paucity of training data necessitates the incorporation of external knowledge sources. In response to this challenge, we propose a novel approach called Graph COordinators for PrEtraining (GCOPE), that harnesses the underlying commonalities across diverse graph datasets to enhance few-shot learning. Our novel methodology involves a unification framework that amalgamates disparate graph datasets during the pretraining phase to distill and transfer meaningful knowledge to target tasks. Extensive experiments across multiple graph datasets demonstrate the superior efficacy of our approach. By successfully leveraging the synergistic potential of multiple graph datasets for pretraining, our work stands as a pioneering contribution to the realm of graph foundational model.

翻訳日:2024-06-26 02:01:18 公開日:2024-06-22

# 異種情報ネットワークにおける大規模言語モデル駆動型メタ構造発見

Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network ( http://arxiv.org/abs/2402.11518v2 )

ライセンス: Link先を確認

Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui,

(参考訳) 異種情報ネットワーク(HIN)は近年,多様なノード間の複雑な関係を捉えることで人気が高まっている。メタ構造は、HINの重要なパターンを特定するのに有用なツールとして提案されているが、手作りのメタ構造はスケールアップに重大な課題をもたらし、自動検索アルゴリズムの開発に広範囲の研究が注がれている。それまでの取り組みは主に、人間の理解性と一般化性の重要性を見越して、経験的性能のよいメタ構造を探すことに焦点を当てていた。この課題に対処するため,大規模言語モデル(LLM)の創発的推論能力から着想を得た。本稿では,LLM推論を進化過程に統合するメタ構造探索フレームワークReStructを提案する。 ReStructは文法トランスレータを使用して、メタ構造を自然言語文にエンコードし、LLMの推論能力を活用して、それらの意味的な実現可能性を評価する。さらに、ReStructはパフォーマンス指向の進化操作も採用している。これら2つの競合する力により、ReStructはメタ構造のセマンティックな説明可能性と経験的なパフォーマンスを共同で最適化することができる。さらに、ReStructは、検索履歴を解析することで、発見されたメタ構造の自然言語説明を生成し、洗練するための微分LDM説明器を含んでいる。 8つの代表的HINデータセットの実験は、ReStructが推奨タスクとノード分類タスクの両方で最先端のパフォーマンスを達成することを示した。さらに、73人の大学院生を対象にした調査の結果、ReStructによるメタ構造と生成した説明は、かなり理解しやすいことがわかった。コードとアンケートはhttps://github.com/LinChen-65/ReStruct.comで公開されている。

Heterogeneous information networks (HIN) have gained increasing popularity in recent years for capturing complex relations between diverse types of nodes. Meta-structures are proposed as a useful tool to identify the important patterns in HINs, but hand-crafted meta-structures pose significant challenges for scaling up, drawing wide research attention towards developing automatic search algorithms. Previous efforts primarily focused on searching for meta-structures with good empirical performance, overlooking the importance of human comprehensibility and generalizability. To address this challenge, we draw inspiration from the emergent reasoning abilities of large language models (LLMs). We propose ReStruct, a meta-structure search framework that integrates LLM reasoning into the evolutionary procedure. ReStruct uses a grammar translator to encode the meta-structures into natural language sentences, and leverages the reasoning power of LLMs to evaluate their semantic feasibility. Besides, ReStruct also employs performance-oriented evolutionary operations. These two competing forces allow ReStruct to jointly optimize the semantic explainability and empirical performance of meta-structures. Furthermore, ReStruct contains a differential LLM explainer to generate and refine natural language explanations for the discovered meta-structures by reasoning through the search history. Experiments on eight representative HIN datasets demonstrate that ReStruct achieves state-of-the-art performance in both recommendation and node classification tasks. Moreover, a survey study involving 73 graduate students shows that the discovered meta-structures and generated explanations by ReStruct are substantially more comprehensible. Our code and questionnaire are available at https://github.com/LinChen-65/ReStruct.

翻訳日:2024-06-26 01:51:30 公開日:2024-06-22

# 両世界の多くの人々のベスト:未知の領域モデルに基づく予測付きオンラインリソース割り当て

Best of Many in Both Worlds: Online Resource Allocation with Predictions under Unknown Arrival Model ( http://arxiv.org/abs/2402.13530v2 )

ライセンス: Link先を確認

Lin An, Andrew A. Li, Benjamin Moseley, Gabriel Visotsky,

(参考訳) オンライン意思決定者は、到着、要求、在庫など、将来の変数に関する予測を得ることが多い。これらの予測は、単変量時系列の単純な予測アルゴリズムから、複数の時系列と追加の機能情報を活用する最先端の機械学習モデルまで、すべて生成することができる。しかし、事前判断者にとって予測精度は未知であるため、予測に盲目的に従うことは有害である可能性がある。本稿では,未知の予測精度に頑健な予測アルゴリズムを開発することにより,この問題に対処する。本稿では,オンライン意思決定の汎用モデルであるオンラインリソース割当問題について考察する。先行研究は、到着が確率的に(つまり)あるいは完全に逆向きに生成されるとき、最も達成可能なパフォーマンスを特徴付けており、基礎となるモデル「知識」を使わずに、両方の到着モデルの下でこれらの境界に一致するアルゴリズムが存在することを示した。この背景として,資源の種類ごとに影価格の形で予測を導入する。予測精度は、予測と実際の影価格の間の距離として自然に定義される。我々は、任意のアルゴリズムが予測を最適に活用できる範囲(正確には「follow」、不正確な場合は「ignore」、不正確な場合は「ignore」)を、予測精度や下層の到着モデルを知ることなく、形式的な下限によって強く特徴づける。我々の主な貢献は、この下限を達成するアルゴリズムである。最後に,小売業者のH&Mによる実データに対する大規模な実験により,我々のアルゴリズムを実証的に検証した。

Online decision-makers often obtain predictions on future variables, such as arrivals, demands, inventories, and so on. These predictions can be generated from simple forecasting algorithms for univariate time-series, all the way to state-of-the-art machine learning models that leverage multiple time-series and additional feature information. However, the prediction accuracy is unknown to decision-makers a priori, hence blindly following the predictions can be harmful. In this paper, we address this problem by developing algorithms that utilize predictions in a manner that is robust to the unknown prediction accuracy. We consider the Online Resource Allocation Problem, a generic model for online decision-making, in which a limited amount of resources may be used to satisfy a sequence of arriving requests. Prior work has characterized the best achievable performances when the arrivals are either generated stochastically (i.i.d.) or completely adversarially, and shown that algorithms exist which match these bounds under both arrival models, without ``knowing'' the underlying model. To this backdrop, we introduce predictions in the form of shadow prices on each type of resource. Prediction accuracy is naturally defined to be the distance between the predictions and the actual shadow prices. We tightly characterize, via a formal lower bound, the extent to which any algorithm can optimally leverage predictions (that is, to ``follow'' the predictions when accurate, and ``ignore'' them when inaccurate) without knowing the prediction accuracy or the underlying arrival model. Our main contribution is then an algorithm which achieves this lower bound. Finally, we empirically validate our algorithm with a large-scale experiment on real data from the retailer H&M.

翻訳日:2024-06-26 01:51:30 公開日:2024-06-22

# 言語モデルを用いた統計モデルの自動発見

Automated Statistical Model Discovery with Language Models ( http://arxiv.org/abs/2402.17879v2 )

ライセンス: Link先を確認

Michael Y. Li, Emily B. Fox, Noah D. Goodman,

(参考訳) 統計的モデル発見は、ドメイン固有の制約を受ける広大なモデルの空間を探索する難題である。この領域を効果的に探索するには、モデリングと問題領域の専門知識が必要である。大規模言語モデル(LM)のドメイン知識とプログラミング能力に動機付けられ,言語モデルによる自動統計モデル発見のための手法を提案する。 LMは確率的プログラムとして表される統計モデルを提案し、モデラーとして機能し、ドメインエキスパートとして機能し、それらのモデルを批判する。 LMを利用することで、モデルのドメイン固有言語を定義したり、手作りの検索手順を設計したりする必要がなくなる。確率的モデリングでは,制約されたモデルの空間内を探索し,オープンな空間を探索し,自然言語制約下での専門家モデルを改善する(例えば,このモデルは生態学者に解釈できる)。提案手法は,人間の専門家が設計したモデルと同等のモデルを特定し,解釈可能な方法で古典モデルを拡張する。その結果,LM駆動型モデル発見の可能性を浮き彫りにした。

Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, which are key restrictions of previous systems. We evaluate our method in three settings in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints (e.g., this model should be interpretable to an ecologist). Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.

翻訳日:2024-06-26 01:41:44 公開日:2024-06-22

# 高分子太陽電池の材料発見の加速:自然言語処理によるデータ駆動的洞察

Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing ( http://arxiv.org/abs/2402.19462v2 )

ライセンス: Link先を確認

Pranav Shetty, Aishat Adeboye, Sonakshi Gupta, Chao Zhang, Rampi Ramprasad,

(参考訳) 本稿では, 自然言語処理パイプラインを用いて20年間にわたる文献から抽出したデータを用いて, 高分子太陽電池ドナー/アクセプターペアの発見のための各種能動的学習手法のシミュレーションを行う。データ駆動法はエジソンの試行錯誤法よりも早く新しい物質を発見するために確立されているが、その利点は何十年もかかる物質発見問題に対して定量化されていない。提案手法は, 材料革新の15年間の加速に相当し, 発見時間を約75%短縮する可能性を示した。私たちのパイプラインでは、3300以上の論文からデータを抽出することができます。また、電力変換効率を予測するために機械学習モデルをトレーニングし、我々のモデルを使用して、まだ報告されていない有望なドナー/アクセプタの組み合わせを特定しました。そこで我々は,論文から抽出した資料データへのパイプラインを実証し,そのパイプラインがデータ駆動の洞察を得るために使用されることを示した。私たちの洞察には、物質特性の強い予測モデルをトレーニングしたり、使用した初期材料システムに対して堅牢であるような、アクティブな学習戦略が含まれています。この研究は、材料科学におけるデータ駆動研究のための貴重なフレームワークを提供する。

We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantified for material discovery problems that can take decades. Our approach demonstrates a potential reduction in discovery time by approximately 75 %, equivalent to a 15 year acceleration in material innovation. Our pipeline enables us to extract data from greater than 3300 papers which is $\sim$5 times larger and therefore more diverse than similar data sets reported by others. We also trained machine learning models to predict the power conversion efficiency and used our model to identify promising donor-acceptor combinations that are as yet unreported. We thus demonstrate a pipeline that goes from published literature to extracted material property data which in turn is used to obtain data-driven insights. Our insights include active learning strategies that can be used to train strong predictive models of material properties or be robust to the initial material system used. This work provides a valuable framework for data-driven research in materials science.

翻訳日:2024-06-26 01:41:44 公開日:2024-06-22

# 協調型対話型エージェントによるツールの活用

Learning to Use Tools via Cooperative and Interactive Agents ( http://arxiv.org/abs/2403.03031v4 )

ライセンス: Link先を確認

Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren,

(参考訳) ツール学習は、外部ツールを使用してユーティリティを拡張するエージェントとして、大きな言語モデル(LLM)を促進する。既存のメソッドでは、1つのLCMベースのエージェントを使用してツールを反復的に選択し、実行結果を次のアクション予測に組み込む。これらの手法は, それらの進歩にもかかわらず, 1) 誤った動作を校正する柔軟性に制限された事前定義されたパイプライン, (2) 汎用LLMエージェントを適応して, 様々な特殊動作を実行することによる, 実用上の課題に対処する際の性能劣化に悩まされる。ツール選択,ツール実行,アクションキャリブレーションの3つの特殊エージェントを個別にコーディネートする,協調型対話型エージェントフレームワークであるConAgentsを提案する。 ConAgentsはエージェントの柔軟な協調を可能にする2つの通信プロトコルを導入した。また,ConAgentsをオープンソースモデルに効果的に一般化するために,特別なアクション蒸留を提案し,フレームワーク内での特別なアクションの実行能力を向上する。 3つのデータセットに関する広範な実験により、LLMは、ConAgentsを装備した場合、かなりの改善(最大14%の成功率)でベースラインを上回ります。

Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).

翻訳日:2024-06-26 01:41:44 公開日:2024-06-22

# GPTopic:動的かつインタラクティブなトピック表現

GPTopic: Dynamic and Interactive Topic Representations ( http://arxiv.org/abs/2403.03628v2 )

ライセンス: Link先を確認

Arik Reuter, Anton Thielmann, Christoph Weisser, Sebastian Fischer, Benjamin Säfken,

(参考訳) トピックモデリングは、大きなテキストコーパス内のトピックを表すトップワードのリストを生成するのとほぼ同義であるようだ。しかし、そのような個々の用語のリストからトピックを導出するには、相当な専門知識と経験が必要であるため、トピックモデリングは、トップワード解釈の特殊性や落とし穴に慣れていない人々にとってアクセスしにくくなる。トップワードに限定されたトピック表現は、トピックが持つであろう様々な側面、ファセット、ニュアンスを包括的かつ容易にアクセス可能な特徴づけを提供するのに、さらに不足する可能性がある。これらの課題に対処するため,GPTopicは大規模言語モデル(LLM)を利用して動的に対話的なトピック表現を生成するソフトウェアパッケージである。 GPTopicは、対話的にトピックを探索、分析、洗練するための直感的なチャットインターフェースを提供する。対応するコードは、https://github.com/ArikReuter/TopicGPT.comで入手できる。

Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience, making topic modelling less accessible to people unfamiliar with the particularities and pitfalls of top-word interpretation. A topic representation limited to top-words might further fall short of offering a comprehensive and easily accessible characterization of the various aspects, facets and nuances a topic might have. To address these challenges, we introduce GPTopic, a software package that leverages Large Language Models (LLMs) to create dynamic, interactive topic representations. GPTopic provides an intuitive chat interface for users to explore, analyze, and refine topics interactively, making topic modeling more accessible and comprehensive. The corresponding code is available here: https://github.com/ArikReuter/TopicGPT.

翻訳日:2024-06-26 01:41:44 公開日:2024-06-22

# 解凍トークン化:テキスト圧縮の評価とモデル性能との関係

Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance ( http://arxiv.org/abs/2403.06265v2 )

ライセンス: Link先を確認

Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty,

(参考訳) 最も一般的なトークン化アルゴリズムであるBPEの基盤であるにもかかわらず、トークン化プロセスにおける圧縮の重要性はいまだ不明である。本稿では,全てのトークンに等しい確率が割り当てられる0-gram言語モデリングとして,圧縮の理論的重要性を論じる。また,事前学習した言語モデルの下流における圧縮の重要性を実証的に示す。トレーニング中に利用可能な文書の量を100万文書から、トレーニングデータに匹敵する文字ベースのトークン化器まで変更することにより、複数のBPEトークン化器の圧縮能力を制御する。次に、それらのトークン化子に基づいて英語モデルを事前訓練し、いくつかのタスクでそれらを微調整します。本稿では, トークン化器の圧縮性能とモデル下流性能との間に相関関係があることを示し, 圧縮がトークン化品質の信頼性の高い本質的な指標であることを示唆する。これらの相関関係は、生成タスク(分類よりも)やより小さなモデル(大きなものよりも)に対してより顕著である。我々はトルコ語に関する実験の代表的な部分を再現し、同様の結果を得た。より優れた圧縮トークン化器の構築は、さらなる研究と全体的なモデル性能向上のための実りある道であると結論付けている。

Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream success of pre-trained language models. We control the compression ability of several BPE tokenizers by varying the amount of documents available during their training: from 1 million documents to a character-based tokenizer equivalent to no training data at all. We then pre-train English language models based on those tokenizers and fine-tune them over several tasks. We show that there is a correlation between tokenizers' compression and models' downstream performance, suggesting that compression is a reliable intrinsic indicator of tokenization quality. These correlations are more pronounced for generation tasks (over classification) or for smaller models (over large ones). We replicated a representative part of our experiments on Turkish and found similar results, confirming that our results hold for languages with typological characteristics dissimilar to English. We conclude that building better compressing tokenizers is a fruitful avenue for further research and for improving overall model performance.

翻訳日:2024-06-26 01:31:59 公開日:2024-06-22

# Commonsenseナレッジグラフによる論理的クエリの複雑な推論

Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs ( http://arxiv.org/abs/2403.07398v2 )

ライセンス: Link先を確認

Tianqing Fang, Zeming Chen, Yangqiu Song, Antoine Bosselut,

(参考訳) イベントコモンセンス推論には、イベント間の関係を推論する機能と、その関係の根底にある暗黙的なコンテキストを推論する必要がある。しかし、データ不足により、複雑なイベント間の相互作用に関わるコンテキストや質問に対して、言語モデルが常識推論を生成することを学ぶことは困難になる。この要求に対処するために、COM2(Complex COMmonsense)という、既存のコモンセンス知識グラフ(CSKG)からマルチホップ論理的クエリ(例えば、イベントAとBの結合効果や原因、あるいはイベントCの効果)をサンプリングし、手書きのルールと大きな言語モデルを用いて言語化して、複数選択とテキスト生成の質問を合成する新しいデータセットを提示する。実験の結果,COM2で訓練した言語モデルでは複雑な推論能力が向上し,ドメイン内タスクとドメイン外タスクのゼロショット性能が向上することがわかった。コードとデータはhttps://github.com/tqfang/complex-commonsense-reasoning.comで公開されている。

Event commonsense reasoning requires the ability to reason about the relationship between events, as well as infer implicit context underlying that relationship. However, data scarcity makes it challenging for language models to learn to generate commonsense inferences for contexts and questions involving interactions between complex events. To address this demand, we present COM2 (COMplex COMmonsense), a new dataset created by sampling multi-hop logical queries (e.g., the joint effect or cause of both event A and B, or the effect of the effect of event C) from an existing commonsense knowledge graph (CSKG), and verbalizing them using handcrafted rules and large language models into multiple-choice and text generation questions. Our experiments show that language models trained on COM2 exhibit significant improvements in complex reasoning ability, resulting in enhanced zero-shot performance in both in-domain and out-of-domain tasks for question answering and generative commonsense reasoning, without expensive human annotations. Code and data are available at https://github.com/tqfang/complex-commonsense-reasoning.

翻訳日:2024-06-26 01:31:59 公開日:2024-06-22

# ガウス局所線型写像を用いた高速で高精度で軽量な逐次シミュレーションに基づく推論

Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear mappings ( http://arxiv.org/abs/2403.07454v3 )

ライセンス: Link先を確認

Henrik Häggström, Pedro L. C. Rodrigues, Geoffroy Oudoumanessah, Florence Forbes, Umberto Picchini,

(参考訳) 難易度の高い複素モデルに対するベイズ推論は、計算機シミュレータへの多くの呼び出しを実行するアルゴリズムを用いて取り組むことができる。これらの手法を総合的に「シミュレーションベース推論(SBI)」と呼ぶ。近年のSBI法では、ニューラルネットワーク(NN)を用いて、不可能な可能性関数と後部分布の近似的かつ表現的な構造を提供している。しかし、精度と計算需要のトレードオフは、改善の余地を多く残している。本研究では,確率分布の構造的混合を用いて,確率分布と後部分布の両方を近似する手法を提案する。提案手法は,マルチモーダル後部であっても,最先端のNNベースのSBI法と比較して,計算フットプリントがはるかに小さく,正確な後部推測を導出する。本研究は,SBI文献から得られたいくつかのベンチマークモデルと,mRNAトランスフェクション後の翻訳動態の生物学的モデルについて述べる。

Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection.

翻訳日:2024-06-26 01:31:59 公開日:2024-06-22

# LLMの知識紛争:調査

Knowledge Conflicts for LLMs: A Survey ( http://arxiv.org/abs/2403.08319v2 )

ライセンス: Link先を確認

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu,

(参考訳) この調査は、大規模言語モデル(LLM)における知識の衝突を詳細に分析し、文脈的知識とパラメトリック知識を混ぜ合わせる際に直面する複雑な課題を明らかにする。私たちの焦点は、コンテキストメモリ、コンテキスト間、メモリ内コンフリクトの3つのカテゴリの知識コンフリクトに焦点を当てています。これらの対立は、特にノイズや誤報が一般的である現実世界のアプリケーションにおいて、LLMの信頼性と性能に大きな影響を及ぼす可能性がある。これらの紛争を分類し、原因を探究し、これらの紛争下でのLSMの行動を調べ、利用可能な解決策を見直し、この調査は、LSMの堅牢性を改善するための戦略に光を当てることを目的としており、この発展途上国の研究を進めるための貴重な資源となっている。

This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area.

翻訳日:2024-06-26 01:31:59 公開日:2024-06-22

# Ctrl123: クローズドループ転写による新規なビュー合成

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription ( http://arxiv.org/abs/2403.10953v2 )

ライセンス: Link先を確認

Hongxiang Zhao, Xili Dai, Jianan Wang, Shengbang Tong, Jingyuan Zhang, Weida Wang, Lei Zhang, Yi Ma,

(参考訳) 大規模な画像拡散モデルは、新規ビュー合成(NVS)においてゼロショット機能を示した。しかし、既存の拡散に基づくNVS法は、トレーニングセット上でも対応する真実のポーズや外観と正確に一致した新しいビューを生成するのに苦労している。これにより、イメージ・ツー・マルチビュー生成や3D再構成といった下流タスクのパフォーマンスが制限される。このような矛盾は主に、Zero123のような既存の手法で行われているように、拡散訓練において、正確なポーズと外観アライメントを直接強制することが困難であるという事実から生じている。この問題を解決するために、我々はCtrl123を提案する。Ctrl123は、ポーズに敏感な特徴空間において、生成されたビューと地上の真実との間のアライメントを強制する、クローズドループ転写に基づくNVS拡散法である。我々は,Ctrl123がNVSおよび3次元再構成のタスクに与える影響を実証し,既存の手法よりも多視点整合性とポーズ整合性の両方において顕著な改善を実現した。

Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D reconstruction. We realize that such inconsistency is largely due to the fact that it is difficult to enforce accurate pose and appearance alignment directly in the diffusion training, as mostly done by existing methods such as Zero123. To remedy this problem, we propose Ctrl123, a closed-loop transcription-based NVS diffusion method that enforces alignment between the generated view and ground truth in a pose-sensitive feature space. Our extensive experiments demonstrate the effectiveness of Ctrl123 on the tasks of NVS and 3D reconstruction, achieving significant improvements in both multiview-consistency and pose-consistency over existing methods.

翻訳日:2024-06-26 01:22:15 公開日:2024-06-22

# 対話における言語モデル:人間とAIの対話における会話の最大化

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions ( http://arxiv.org/abs/2403.15115v2 )

ライセンス: Link先を確認

Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards,

(参考訳) 現代言語モデルは洗練されているが、固有の欠点、特に会話の場面で現れている。観察された欠点の多くは、1つ以上の会話の原則に違反しているためである、と我々は主張する。社会科学とAIコミュニティの両方からの広範な研究に基づいて、有効な人間とAIの会話を記述するために、量、品質、関連性、方法、慈悲、透明性のセットを提案する。まず、人間とAIの相互作用の文脈において、最初の4つの最大値(Griceから)の適用性を正当化する。次に、現代の人間とAIの相互作用に特有の行動に対処するためには、2つの新たな最大性、善意(有害なコンテンツの生成と関与)と透明性(知識境界、運用上の制約、意図の認識)が必要であると論じる。様々な言語モデルがこれらの最大値を理解することができる程度を評価し、モデルがそれらの最大値を正確に解釈する能力に大きな影響を与える原理の内的優先順位付けを持っていることを発見した。

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benevolence, and transparency -- for describing effective human-AI conversation. We first justify the applicability of the first four maxims (from Grice) in the context of human-AI interactions. We then argue that two new maxims, benevolence (concerning the generation of, and engagement with, harmful content) and transparency (concerning recognition of one's knowledge boundaries, operational constraints, and intents), are necessary for addressing behavior unique to modern human-AI interactions. We evaluate the degree to which various language models are able to understand these maxims and find that models possess an internal prioritization of principles that can significantly impact their ability to interpret the maxims accurately.

翻訳日:2024-06-26 01:22:15 公開日:2024-06-22

# Qibo: 漢方医学における大規模言語モデル

Qibo: A Large Language Model for Traditional Chinese Medicine ( http://arxiv.org/abs/2403.16056v3 )

ライセンス: Link先を確認

Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo,

(参考訳) LLM(Large Language Models)は、医学、法律、金融など多くの専門分野において大きな進歩を遂げている。しかし、伝統的な中国医学(TCM)においては、理論と近代医学の本質的な違い、専門的なコーパス資源の欠如、監督された微調整にのみ依存しているという事実は、過度な予測につながる可能性がある。これらの課題に対処するため,継続的事前学習と教師付き微調整を組み合わせた2段階の訓練手法を提案する。本研究の特筆すべき貢献は,TCM専用の2GBコーパスの処理であり,TCMのための事前学習データセットと命令微調整データセットの構築である。さらに,主観的,客観的,および3つのTCMNLPタスクを含む,TCMにおけるLLMの性能を評価するツールであるQibo-Benchmarkを開発した。 $\textbf{Qibo}$という名前の、私たちのパイプラインでトレーニングされた医療用LLMは、大幅なパフォーマンス向上を示します。ベースラインと比較すると、平均主観的勝利率は63%、平均目標精度は23%から58%向上し、3つのTCM NLPタスクのルージュ-Lスコアは0.72、0.61、0.55である。最後に,QiboをTCMコンサルテーションに適用するためのピップラインを提案し,ケーススタディを通じてモデル性能を実証する。

Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2GB corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named $\textbf{Qibo}$, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63%, the average objective accuracy improved by 23% to 58%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.

翻訳日:2024-06-26 01:22:15 公開日:2024-06-22

# データ正規化自己再生強化学習による人間互換運転パートナー

Human-compatible driving partners through data-regularized self-play reinforcement learning ( http://arxiv.org/abs/2403.19648v2 )

ライセンス: Link先を確認

Daphne Cornelisse, Eugene Vinitsky,

(参考訳) 自動運転車における中心的な課題は、人間と協調することだ。したがって、シミュレーションにおける自律運転システムのスケーラブルなトレーニングと評価には、現実的なヒューマンエージェントの導入が不可欠である。シミュレーションエージェントは通常、人間の運転の大規模で高品質なデータセットを模倣することによって開発される。しかし、純粋な模倣学習エージェントは、マルチエージェント閉ループ設定で実行される場合、経験的に高い衝突率を有する。クローズドループ設定において現実的で効果的なエージェントを構築するために,エージェントが人間の参照ポリシーから逸脱する小さなペナルティで自己プレイによって訓練されるマルチエージェントアルゴリズムであるHuman-Regularized PPO(HR-PPO)を提案する。従来の研究とは対照的に、我々のアプローチはRLファーストであり、不完全な人間のデモを30分しか使っていません。エージェントを多エージェントの交通シーンで評価する。その結果,HR-PPOは93%,オフロード率3.5%,衝突率3%の目標達成に極めて有効であることがわかった。同時に、エージェントは既存の人間の運転ログと類似性によって測定されるように、人間のように運転する。また、HR-PPOエージェントは、特に高度に対話的なシナリオにおいて、人間の運転と協調するためのプロキシ対策をかなり改善していることが判明した。私たちはコードと訓練されたエージェントをhttps://github.com/Emerge-Lab/nocturne_labでオープンソース化し、https://sites.google.com/view/driving-partnersでエージェントの動作のデモを提供します。

A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.

翻訳日:2024-06-26 01:12:30 公開日:2024-06-22

# 一貫性モデルのためのRL:より高速なリワードガイドテキスト-画像生成

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation ( http://arxiv.org/abs/2404.03673v2 )

ライセンス: Link先を確認

Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun,

(参考訳) Reinforcement Learning (RL)は、画像品質、美学、指示追従能力をキャプチャする報酬を直接最適化することにより、拡散モデルによるガイド付き画像生成を改善した。しかし、結果として生じる生成ポリシーは、遅い生成を引き起こす拡散モデルの反復サンプリングプロセスを継承する。この制限を克服するために、一貫性モデルは、ノイズを直接データにマッピングする新しい世代の生成モデルを学ぶことを提案した。本研究では,タスク固有報酬に対するテキスト・ツー・イメージ生成モデルを最適化し,高速なトレーニングと推論を実現するために,RLを用いた微調整一貫性モデルのためのフレームワークを提案する。 RLCM(Reinforcement Learning for Consistency Model)と呼ばれる我々のフレームワークは、一貫性モデルの反復推論プロセスをRLプロシージャとしてフレーム化します。 RL微調整拡散モデルと比較して、RCCMの列車は大幅に高速で、報奨目標に基づいて測定された生成の質を向上し、2段階の推論ステップで高品質な画像を生成することにより推論手順を高速化する。実験により,RLCMは画像の圧縮性や美的品質などの人間のフィードバックから導出されるようなプロンプトで表現しにくい目標に対して,テキスト・画像の整合性モデルを適用することができることを示す。私たちのコードはhttps://rlcm.owenoertell.comで公開されています。

Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Our code is available at https://rlcm.owenoertell.com.

翻訳日:2024-06-26 01:12:30 公開日:2024-06-22

# 超作用素マスター方程式による非分極力学の解法

Superoperator master equations for depolarizing dynamics ( http://arxiv.org/abs/2404.06595v2 )

ライセンス: Link先を確認

A. E. Teretenkov,

(参考訳) この作品はスーパーオペレーターマスター方程式に特化している。すなわち、ツイリング超プロジェクターの場合の超作用素マスター方程式はユニタリ群全体に対して導出される。そのような超射影と整合性を持つためには、自由力学は非分極的であると仮定される。そして、任意のゴリーニ-コサコフスキー-スダルシャン-リンドブラッド発生器によって摂動される。この場合、2階マスター方程式の明示的な形式が示される。

The work is devoted to superoperator master equations. Namely, the superoperator master equations in the case of the twirling hyperprojector with respect to the whole unitary group are derived. To be consistent with such a hyperprojector the free dynamics is assumed to be depolarizing. And it is perturbed by the arbitrary Gorini--Kossakowski--Sudarshan--Lindblad generator. The explicit form of the second order master equations are presented in this case.

翻訳日:2024-06-26 01:12:30 公開日:2024-06-22

# データ不足地域におけるPM2.5推定のための空間伝達学習

Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions ( http://arxiv.org/abs/2404.07308v2 )

ライセンス: Link先を確認

Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu,

(参考訳) 大気汚染、特に粒子状物質2.5(PM2.5)は公衆衛生への関心が高まり、地上センサーの欠如により発展途上国(データ貧しい地域)では推定が難しい。移行学習モデルは、知識を得るために代替データソース(すなわち、データ豊富な領域のデータ)を使用するため、この問題を解決するために利用することができる。しかし、現在の転送学習手法は、ソースとターゲットドメイン間の依存関係を考慮しない。我々はこの伝達問題を空間伝達学習として認識し、両方の領域の空間的および意味的依存関係をキャプチャし、その後、各領域の特徴空間に追加するLatent Dependency Factor (LDF) という新機能を提案する。我々は、類似したソースとターゲットドメインデータのクラスタから学習する新しい2段階オートエンコーダモデルを用いてLPFを生成する。実験の結果, LDFを用いた移動学習モデルでは, ベースラインよりも19.34%向上していることがわかった。また,定性的な実験も支援している。

Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a 19.34% improvement over the baselines. We additionally support our experiments with qualitative findings.

翻訳日:2024-06-26 01:02:45 公開日:2024-06-22

# 音声匿名化が病因とその限界に及ぼす影響

The Impact of Speech Anonymization on Pathology and Its Limits ( http://arxiv.org/abs/2404.08064v2 )

ライセンス: Link先を確認

Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang,

(参考訳) 医療へのスピーチの統合は、個々の生体情報を含む非侵襲的なバイオマーカーとしての可能性から、プライバシー上の懸念を強めている。これに対し、話者匿名化は、重要な言語内容を保持しながら個人識別可能な情報を隠蔽することを目的としている。しかし,プライバシが特に重要である重要な領域である病的音声への匿名化手法の適用については,広く検討されていない。本研究では,ドイツの複数の機関の2,700人以上の話者を対象に,匿名化が病的スピーチに与える影響について検討した。深層学習と信号処理を併用した匿名化手法について検討し,同程度のエラー率で推定される障害間のプライバシー改善を,実用性に最小限の影響を伴って,1933%まで向上することを示す。 Dysarthria, Dysphonia, Cleft Lip and Palateなどの特定の疾患は最小限の効用変化を経験し, Dysglossiaはわずかに改善した。以上より, 匿名化の影響は疾患によって大きく異なることが示唆された。これは、プライバシーと診断ユーティリティの最適なバランスをとるために、障害特異的匿名化戦略を必要とする。さらに, フェアネス分析の結果, 多くの人口層で一貫した匿名化効果が認められた。本研究は,病的音声の匿名化によるプライバシー向上効果を実証するとともに,逆攻撃を考慮に入れたカスタマイズおよび障害特異的アプローチの重要性を強調した。

Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.

翻訳日:2024-06-26 01:02:45 公開日:2024-06-22

# AIライフサイクルに沿ったフェアネスのための説明可能な人工知能(XAI)の可能性のマッピング

Mapping the Potential of Explainable Artificial Intelligence (XAI) for Fairness Along the AI Lifecycle ( http://arxiv.org/abs/2404.18736v3 )

ライセンス: Link先を確認

Luca Deck, Astrid Schomäcker, Timo Speith, Jakob Schöffer, Lena Kästner, Niklas Kühl,

(参考訳) さまざまな領域で人工知能(AI)システムが広く使われるようになると、アルゴリズムの公正性、特に高い評価のシナリオに関する問題がますます強調されている。したがって、AIシステムの公正性がどのように改善されるのか、このプロセスを支援するためにどのような手段が利用できるのか、という批判的な考察が過度に進んでいる。多くの研究者や政策立案者は、AIシステムの公正性を高めるための有望な方法として説明可能なAI(XAI)を考えている。しかし、異なるデシダラタを表すXAIの方法やフェアネスの概念は様々であり、XAIとフェアネスの正確な関係はいまだに不明瞭である。さらに、アルゴリズムの公正性を高めるためのさまざまな手段が、AIシステムのライフサイクルを通して異なるポイントに適用できる可能性がある。しかし、AIライフサイクルに沿って、現在フェアネスデシダータのコヒーレントなマッピングはありません。我々は8つの公正なデシダータを蒸留し、AIライフサイクルに沿ってそれらをマップし、XAIがそれぞれにどのように対処できるかを議論する。我々は,これらのフェアネス・デシダータに特化して,実践的応用のためのオリエンテーションを提供し,XAI研究のインスピレーションを期待する。

The widespread use of artificial intelligence (AI) systems across various domains is increasingly highlighting issues related to algorithmic fairness, especially in high-stakes scenarios. Thus, critical considerations of how fairness in AI systems might be improved, and what measures are available to aid this process, are overdue. Many researchers and policymakers see explainable AI (XAI) as a promising way to increase fairness in AI systems. However, there is a wide variety of XAI methods and fairness conceptions expressing different desiderata, and the precise connections between XAI and fairness remain largely nebulous. Besides, different measures to increase algorithmic fairness might be applicable at different points throughout an AI system's lifecycle. Yet, there currently is no coherent mapping of fairness desiderata along the AI lifecycle. In this paper, we set out to bridge both these gaps: We distill eight fairness desiderata, map them along the AI lifecycle, and discuss how XAI could help address each of them. We hope to provide orientation for practical applications and to inspire XAI research specifically focused on these fairness desiderata.

翻訳日:2024-06-26 01:02:45 公開日:2024-06-22

# DiffMatch:ビジュアルランゲージガイダンスは、半教師付き変更検出器を改善

DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector ( http://arxiv.org/abs/2405.04788v2 )

ライセンス: Link先を確認

Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang,

(参考訳) Change Detection (CD) は、画像間のセマンティックな変化でピクセルを識別することを目的としている。しかし、大量のピクセルレベルの画像に注釈を付けることは、特に人間の専門家によるピクセルレベルの比較を必要とするマルチテンポラリ画像に対して、労働集約的でコストがかかる。ゼロショットやオープンボキャブラリなどにおける視覚言語モデル(VLM)の性能を即時推論で向上させることを考えると,VLMを利用してラベル付きデータでより良いCDを作成することが期待できる。本稿では,VLM誘導に基づく半教師付きCD手法,すなわちDiffMatchを提案する。 DiffMatchの洞察は、VLMを使用して自由な変更ラベルを合成し、ラベルなしデータに対するさらなる監視信号を提供することである。しかしながら、現在のほとんどのVLMは単一時間画像用に設計されており、バイ時間画像や複数時間画像に直接適用することはできない。そこで我々はまず,VLMに基づく混合変化イベント生成(CEG)戦略を提案し,ラベルなしCDデータに擬似ラベルを付与する。これらのVLM駆動型擬似ラベルによって提供される追加の教師付き信号は、整合正則化パラダイム(例えば FixMatch)の擬似ラベルと矛盾する可能性があるため、異なる信号源を分離するための二重投影ヘッドを提案する。さらに、VLMによってガイドされる2つの補助セグメント化デコーダを通して、両時間画像の意味表現を明示的に分離する。最後に、モデルが変化表現をより適切にキャプチャするために、補助枝における特徴レベルのコントラスト損失によるメトリクス認識の監視を導入する。大規模な実験はDiffMatchの利点を示している。例えば、DiffMatchはFixMatchベースラインをWHU-CDで+5.3 IoU、LEVIR-CDで+2.4 IoUで5%改善している。さらに、当社のCEG戦略は、教師なしの方法で、最先端の教師なしCD手法よりもはるかに優れた性能を達成することができる。

Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely DiffMatch. The insight of DiffMatch is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of DiffMatch. For instance, DiffMatch improves the FixMatch baseline by +5.3 IoU on WHU-CD and by +2.4 IoU on LEVIR-CD with 5% labels. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art un-supervised CD methods.

翻訳日:2024-06-26 00:53:00 公開日:2024-06-22

# ミスセグメンテーション設定下でのユニバーサルバッチ学習

Universal Batch Learning Under The Misspecification Setting ( http://arxiv.org/abs/2405.07252v2 )

ライセンス: Link先を確認

Shlomi Vituri, Meir Feder,

(参考訳) 本稿では,ログロスを伴う不特定設定における普遍的バッチ学習の問題点について考察する。この設定では、仮説クラスはモデルの集合 $\Theta$ である。しかし、データは、この集合に属さないが、より大きなモデルの集合である$\Phi \supset \Theta$から生成される未知の分布によって生成される。トレーニングサンプルが与えられた場合、ユニバーサル学習者が次の結果の確率分布を予測するように要求され、ログロスが発生する。ユニバーサルラーナーのパフォーマンスは、$\Theta$から選択されたデータにマッチする最良の仮説に対する後悔によって測定される。ミニマックス定理と情報理論ツールを用いて、データ生成分布の集合上の混合である最適普遍学習者を導出し、min-max後悔の閉形式式を得る。我々は,この後悔を,データとその生成分布の条件付き容量の制約版と考えることができることを示す。この問題の複雑さは仮説モデルの豊かさによって支配され、データ生成分布セットの$\Phi$には支配されないことを暗示する。本研究では,有本・ブラフトアルゴリズムを拡張して,先行分布における後悔と能力の数値評価を行う。仮定クラス $\Theta$ はこの分布の族の部分集合に過ぎず、観測が $K$-parameters の多重項分布から来る場合の結果を実証する。

In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.

翻訳日:2024-06-26 00:53:00 公開日:2024-06-22

# Human-AIの安全性: 生成AIと制御システムの安全性の子孫

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety ( http://arxiv.org/abs/2405.09794v2 )

ライセンス: Link先を確認

Andrea Bajcsy, Jaime F. Fisac,

(参考訳) 人工知能(AI)は前例のない規模で人々と対話し、大きなポジティブな影響をもたらす新たな道を提供する一方で、個人や社会的な害の可能性を広く懸念している。今日、人間-AI安全のための主要なパラダイムは、人が提供する例やフィードバックによりよく一致するように生成モデルの出力を微調整することに焦点を当てている。しかし、実際には、AIモデルのアウトプットの結果は独立して決定することはできない。本稿では,AIの安全性と制御システムの安全性から重要な補完的教訓を抽出し,オープンな課題と両分野間の重要なシナジーを強調した。そして、高度なAI技術に対する有意義な安全保証には、AI出力と人間の振る舞いによって形成されるフィードバックループが、どのようにして異なる結果に向かって相互作用を駆動するかについての推論が必要である、と論じる。この目的のために、動的で安全クリティカルな人間-AIインタラクションをキャプチャするための統一的なフォーマリズムを導入し、次世代の人間中心AI安全性に向けた具体的な技術的なロードマップを提案する。

Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.

翻訳日:2024-06-26 00:53:00 公開日:2024-06-22

# 因果発見のための適応型オンライン実験設計

Adaptive Online Experimental Design for Causal Discovery ( http://arxiv.org/abs/2405.11548v3 )

ライセンス: Link先を確認

Muhammad Qasim Elahi, Lai Wei, Murat Kocaoglu, Mahsa Ghasemi,

(参考訳) 因果発見は、観察データ、介入データ、またはそれらの組み合わせを利用して因果グラフに符号化された因果関係を明らかにすることを目的としている。既存の因果発見法の大部分は、無限の介入データを想定して開発されている。我々は、データ介入効率に重点を置き、オンライン学習の観点から因果発見を形式化し、バンドイット問題における純粋な探索から着想を得た。グラフのすべてのエッジを少なくとも一度は切断する介入からなるグラフ分離システムは、最悪の場合であっても無限の介入データが利用できる場合に因果グラフを学習するのに十分である。本稿では,グラフ分離システムからの介入をアロケーションマッチングにより適応的に選択し,サンプリング履歴に基づいて因果グラフを学習するトラック・アンド・ストップ因果探索アルゴリズムを提案する。任意の信頼度が与えられた場合、アルゴリズムは終了条件を決定し、それを満たすまで実行させる。本稿では,提案アルゴリズムを解析し,必要な介入サンプルの期待数に基づいて問題依存上界を確立する。提案アルゴリズムは,様々なランダムに生成した因果グラフのシミュレーションにおいて,既存の手法よりも優れている。学習した因果グラフと地上の真理の間の構造的ハミング距離(SHD)によって測定され、試料は著しく少ない。

Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs by leveraging observational, interventional data, or their combination. The majority of existing causal discovery methods are developed assuming infinite interventional data. We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning, inspired by pure exploration in bandit problems. A graph separating system, consisting of interventions that cut every edge of the graph at least once, is sufficient for learning causal graphs when infinite interventional data is available, even in the worst case. We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system via allocation matching and learns the causal graph based on sampling history. Given any desired confidence value, the algorithm determines a termination condition and runs until it is met. We analyze the algorithm to establish a problem-dependent upper bound on the expected number of required interventional samples. Our proposed algorithm outperforms existing methods in simulations across various randomly generated causal graphs. It achieves higher accuracy, measured by the structural hamming distance (SHD) between the learned causal graph and the ground truth, with significantly fewer samples.

翻訳日:2024-06-26 00:43:06 公開日:2024-06-22

# 階層的セマンティックグラフを用いた3次元復元におけるガウス制御

Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery ( http://arxiv.org/abs/2405.12477v3 )

ライセンス: Link先を確認

Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Jing Li, Zhanyun Tang, Shengyu Zhang, Fei Wu, Feng Lin,

(参考訳) 3D Gaussian Splatting (3DGS)は、最近3Dの人間の再構築に進歩を遂げているが、主に2Dピクセルレベルの監視に依存しており、異なる部位の幾何学的複雑さとトポロジ的関係を見越している。このギャップに対処するために,高忠実度3次元再構成を実現するための階層型人ガウス制御(HUGS)フレームワークを導入する。我々のアプローチは、幾何学的トポロジーの整合性を確保するために、身体部分の明確な意味的先行を活用することにより、身体部分間の複雑な幾何学的およびトポロジ的関連の捕捉を可能にする。さらに,大域的な人体の特徴から高周波の特徴を引き離し,表面の細部を洗練させる。広範囲な実験により,本手法は人体再建において優れた性能を示し,特に表面の細部の改善と体部接合部の精密再構築に有効であることが示された。コードはhttps://wanghongsheng01.github.io/HUGS/で公開されている。

Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/.

翻訳日:2024-06-26 00:43:06 公開日:2024-06-22

# MOSS:モノクルビデオからのモーションベース3D合成

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video ( http://arxiv.org/abs/2405.12806v3 )

ライセンス: Link先を確認

Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu,

(参考訳) 単一視点の人間の再構築は、仮想現実の応用、特に複雑な人間の動きを含む文脈において中心的な位置を占める。これは、現実的な衣服の変形を達成する上での顕著な課題である。現在の手法は、運動が表面の変形に与える影響をしばしば見落とし、その結果、表面は大域的な動きによって課される制約を欠いている。これらの制約を克服するために,動作対応のガウス分割を実現するために,運動情報を利用した3次元クローン合成(MOSS)という革新的なフレームワークを導入する。本フレームワークは,KGAS (Kinematic Gaussian Locating Splatting) とUID (Surface deformation Detector) の2つのモジュールから構成される。 KGASは、体表面を横切る大域的な運動を伝播するためにマトリックス・フィッシャー分布を包含する。この分布の密度と回転係数はガウスを明示的に制御し、再構成された表面の現実性を高める。さらに,KGASに基づく単一視点での局所閉塞に対処するため,UIDは重要な表面を同定し,これらの変形を補うために幾何的再構成を行う。実験により,MOSSはモノクロビデオからの3次元衣料合成において,最先端の視覚的品質を実現することが示された。特に,ヒトNeRFとガウススプラッティングをそれぞれ33.94%,LPIPS*で16.75%改善した。コードはhttps://wanghongsheng01.github.io/MOSS/で公開されている。

Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.

翻訳日:2024-06-26 00:43:06 公開日:2024-06-22

# ロボット言語接地に関する調査:シンボルと埋め込みのトレードオフ

A Survey of Robotic Language Grounding: Tradeoffs between Symbols and Embeddings ( http://arxiv.org/abs/2405.13245v2 )

ライセンス: Link先を確認

Vanya Cohen, Jason Xinyu Liu, Raymond Mooney, Stefanie Tellex, David Watkins,

(参考訳) 大きな言語モデルでは、ロボットは言語をより柔軟に理解し、これまで以上に能力を高めることができる。この調査は、最近の文献を2つの極を持つスペクトルにレビューし、整理する。 1)言語といくつかの手作業で定義された意味の形式表現のマッピング 2)低レベルロボットポリシーに直接変換する言語と高次元ベクトル空間のマッピング。形式表現を使用することで、言語の意味を正確に表現することができ、学習の問題のサイズを制限し、解釈可能性と形式的安全性を保証するためのフレームワークにつながる。言語や知覚データを高次元空間に埋め込む手法は、手動で指定した記号構造を回避し、十分なデータを供給するとより一般的な可能性を持つが、訓練により多くのデータや計算を必要とする。我々は、それぞれのアプローチの利点とトレードオフについて議論し、両方の世界のベストを達成するための今後の仕事の方向性を提供することで、仕上げる。

With large language models, robots can understand language more flexibly and more capable than ever before. This survey reviews and situates recent literature into a spectrum with two poles: 1) mapping between language and some manually defined formal representation of meaning, and 2) mapping between language and high-dimensional vector spaces that translate directly to low-level robot policy. Using a formal representation allows the meaning of the language to be precisely represented, limits the size of the learning problem, and leads to a framework for interpretability and formal safety guarantees. Methods that embed language and perceptual data into high-dimensional spaces avoid this manually specified symbolic structure and thus have the potential to be more general when fed enough data but require more data and computing to train. We discuss the benefits and tradeoffs of each approach and finish by providing directions for future work that achieves the best of both worlds.

翻訳日:2024-06-26 00:43:06 公開日:2024-06-22

# SketchQLデモ - Sketchesによるゼロショットビデオモーメントクエリ

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches ( http://arxiv.org/abs/2405.18334v2 )

ライセンス: Link先を確認

Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong,

(参考訳) 本稿では、スケッチベースのクエリインタフェースでビデオモーメントを検索するビデオデータベース管理システム(VDBMS)であるSketchQLについて述べる。このインターフェースでは、単純なマウスドラッグアンドドロップ操作でオブジェクトのトラジェクトリイベントを指定できる。複雑なイベントを構成するために、単一のオブジェクトのトラジェクトリをビルディングブロックとして使用することができる。トラジェクトリ類似性を符号化した事前トレーニングモデルを使用して、SketchQLは、ビデオ上で類似性検索を実行してゼロショットビデオモーメント検索を実現し、ビジュアルクエリに最も近いクリップを識別する。このデモでは、SketchQLのグラフィックユーザインタフェースを導入し、その機能とインタラクションメカニズムを詳述する。また,クエリ合成からリアルタイムシナリオを用いたビデオモーメント検索まで,SketchQLのエンドツーエンド使用例を示す。

In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.

翻訳日:2024-06-26 00:33:22 公開日:2024-06-22

# マンハッタン世界仮説を用いた構造ガウスSLAM

Structure Gaussian SLAM with Manhattan World Hypothesis ( http://arxiv.org/abs/2405.20031v2 )

ライセンス: Link先を確認

Shuhong Liu, Heng Zhou, Liuzhuozheng Li, Yun Liu, Tianchen Deng, Yiming Zhou, Mingrui Li,

(参考訳) ガウスのSLAMシステムは、リアルタイム再構築の効率性と忠実性を向上させるために大きな進歩を遂げた。しかし、これらのシステムは複雑な屋内環境において、障害物や限られた視角によって引き起こされる未観測の幾何学により、実質的な穴を特徴とする不完全な再構成に遭遇することが多い。この課題に対処するために,マンハッタンワールド仮説を利用したRGB-DシステムであるManhattan Gaussian SLAM(MG-SLAM)を提案する。 MG-SLAMは、構造されたシーンから導かれた融合した線分をシームレスに統合することにより、テクスチャレス屋内領域におけるロバストな追跡を確実にする。さらに、抽出された線と平面面仮定により、欠測した幾何学領域における新しいガウスの戦略的補間が可能となり、効率的なシーン補完が可能となった。合成シーンと実世界のシーンの両方で行われた大規模な実験により、これらの手法が最先端の性能を実現し、ガウスSLAMシステムの能力を大幅に向上することを示す。

Gaussian SLAM systems have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM (MG-SLAM), an RGB-D system that leverages the Manhattan World hypothesis to enhance geometric accuracy and completeness. By seamlessly integrating fused line segments derived from structured scenes, MG-SLAM ensures robust tracking in textureless indoor areas. Moreover, The extracted lines and planar surface assumption allow strategic interpolation of new Gaussians in regions of missing geometry, enabling efficient scene completion. Extensive experiments conducted on both synthetic and real-world scenes demonstrate that these advancements enable our method to achieve state-of-the-art performance, marking a substantial improvement in the capabilities of Gaussian SLAM systems.

翻訳日:2024-06-26 00:33:22 公開日:2024-06-22

# ワンステップテキスト・ツー・イメージ生成のためのスコアアイデンティティ蒸留における長短誘導

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation ( http://arxiv.org/abs/2406.01561v2 )

ライセンス: Link先を確認

Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang,

(参考訳) 広範テキストイメージペアで訓練された拡散ベースのテキスト画像生成モデルは、テキスト記述と整合したフォトリアリスティック画像を生成する能力を示している。しかし、これらのモデルの顕著な制限は、その遅いサンプル生成であり、同じネットワークを通して反復的な改善を必要とする。本稿では,Score ID Distillation (SiD) を強化し,Long and Short Classifier-free Guide (LSG) を開発した。 SiD はモデルに基づく明示的なスコアマッチング損失を最適化することを目的としており、実際の計算のために提案したLSG と並行してスコア同一性に基づく近似を用いている。一段生成器で合成された偽画像のみをトレーニングすることにより、LSGを備えたSiDは、FIDとCLIPのスコアを急速に改善し、競争力のあるCLIPスコアを維持しながら最先端のFIDのパフォーマンスを達成する。具体的には、そのデータフリー蒸留である安定拡散1.5は、COCO-2014検証セットで8.15の低いFID、LSGスケールで0.304のCLIPスコア、LSGスケールで0.313のCLIPスコアで9.56のFIDを達成している。我々のSiD-LSGコードと蒸留したワンステップのテキスト・ツー・イメージ・ジェネレータはhttps://github.com/mingyuanzhou/SiD-LSGで入手できる。

Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and a FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. Our SiD-LSG code and distilled one-step text-to-image generators are available at https://github.com/mingyuanzhou/SiD-LSG

翻訳日:2024-06-26 00:33:22 公開日:2024-06-22

# 生成モデルにおける世界モデル含意の評価

Evaluating the World Model Implicit in a Generative Model ( http://arxiv.org/abs/2406.03689v2 )

ライセンス: Link先を確認

Keyon Vafa, Justin Y. Chen, Jon Kleinberg, Sendhil Mullainathan, Ashesh Rambachan,

(参考訳) 最近の研究は、大きな言語モデルが暗黙的に世界モデルを学ぶことを示唆している。この可能性をどのように評価するか。この問題は、基礎となる現実が決定論的有限オートマトンによって支配されている場合に公式化する。これには、単純な論理的推論、地理的ナビゲーション、ゲームプレイング、化学といった問題が含まれる。我々は,古典的なマイヒル・ネローデ定理に触発された世界モデル回復のための新しい評価指標を提案する。ゲームプレイ,ロジックパズル,ナビゲーションの3つの領域でそれらの実用性を解説する。すべての領域において、我々が検討する生成モデルは、世界モデルを評価するための既存の診断に優れているが、我々の評価指標は、世界モデルが現れるよりもはるかに一貫性が低いことを示している。生成モデルを使って、関連するが微妙に異なるタスクを解くと、それがひどく失敗する。モデルの基礎となるロジックを有意義に捉えた生成モデルを構築することは、非常に価値があるでしょう。

Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead it to fail badly. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal.

翻訳日:2024-06-26 00:23:38 公開日:2024-06-22

# 効率性を超えて: 持続可能なAIのスケーリング

Beyond Efficiency: Scaling AI Sustainably ( http://arxiv.org/abs/2406.05303v2 )

ライセンス: Link先を確認

Carole-Jean Wu, Bilge Acun, Ramya Raghavendra, Kim Hazelwood,

(参考訳) バローゾのエネルギーに比例した倉庫規模のコンピューティングへの献身的な貢献は、現代のデータセンターがこれまで以上にエネルギー効率とコスト効率を高めてきた時代を幕開けた。同時に、現代のAIアプリケーションは、ディープラーニングモデル開発サイクル全体にわたって効率を最適化することの重要性を強調しながら、コンピューティングにおける需要を継続的に増加させてきた。本稿では、トレーニングと推論からの運転中の二酸化炭素排出量と、データセンターの構築とハードウェア製造から排出した炭素排出量の両方を含む、AIのカーボンインパクトを特徴付ける。我々は、ディープラーニングレコメンデーションモデルからマルチモーダル生成AIタスクまで、最先端AI技術における主要な効率最適化機会を強調します。 AIを継続的にスケールアップするには、ハードウェア製造からデータセンタ運用、ハードウェアの終末処理に至るまで、コンピュータインフラストラクチャのライフサイクル全体にわたって、効率性を超えて最適化しなければなりません。

Barroso's seminal contributions in energy-proportional warehouse-scale computing launched an era where modern datacenters have become more energy efficient and cost effective than ever before. At the same time, modern AI applications have driven ever-increasing demands in computing, highlighting the importance of optimizing efficiency across the entire deep learning model development cycle. This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from datacenter construction and hardware manufacturing. We highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operations and end-of-life processing for the hardware.

翻訳日:2024-06-26 00:23:38 公開日:2024-06-22

# Ctrl-V:バウンディングボックス制御オブジェクトモーションによる高忠実度映像生成

Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion ( http://arxiv.org/abs/2406.05630v2 )

ライセンス: Link先を確認

Ge Ya Luo, Zhi Hao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal,

(参考訳) 近年の映像予測の進歩により、制御可能な映像生成が注目されている。単純でフレキシブルな条件付けによる高忠実度ビデオの生成は特に興味深い。そこで本研究では,2次元または3次元境界ボックスの画素レベルのレンダリングを条件付けとして,制御可能な映像生成モデルを提案する。さらに,初期フレームと終端フレームのバウンディングボックスを考慮すれば,フレーム毎に最大15個のバウンディングボックスを25フレームクリップで予測できるバウンディングボックス予測器も作成した。私たちは、KITTI、Virtual-KITTI 2、BDD100kという3つの有名なAVビデオデータセットで実験を行います。

With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor that, given the initial and ending frames' bounding boxes, can predict up to 15 bounding boxes per frame for all the frames in a 25-frame clip. We perform experiments across 3 well-known AV video datasets: KITTI, Virtual-KITTI 2 and BDD100k.

翻訳日:2024-06-26 00:23:38 公開日:2024-06-22

# fNIRSによる画像の復号化に向けて

Progress Towards Decoding Visual Imagery via fNIRS ( http://arxiv.org/abs/2406.07662v3 )

ライセンス: Link先を確認

Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu,

(参考訳) 我々は,fNIRS脳活動からのイメージ再構成の可能性を示し,必要な仕様に適合するプロトタイプの構築に着手する。縮小されたfMRIデータを用いて画像再構成モデルを訓練することにより,cmスケールの空間分解能は画像生成に十分であることがわかった。その結果, フル解像度fMRIでは93%, 2cmでは20%の精度で検索精度は71%であった。シミュレーションと高密度トモグラフィにより,時間領域fNIRSは連続波fNIRSの2cm分解能と比較して1cm分解能が得られることがわかった。最後に,レーザードライバ,光子検出器,デジタルコンバータシステムからなるプロトタイプの時間領域fNIRSデバイスの設計を共有する。

We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.

翻訳日:2024-06-26 00:13:51 公開日:2024-06-22

# 高等教育におけるGenAIを用いたボットプープの活用--学習に影響を及ぼす検索型世代チャットボットの検討

Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning ( http://arxiv.org/abs/2406.07796v2 )

ライセンス: Link先を確認

Maung Thway, Jose Recatala-Gomez, Fun Siong Lim, Kedar Hippalgaonkar, Leonard W. T. Ng,

(参考訳) ジェネレーティブ・人工知能(GenAI)と大規模言語モデル(LLM)は同時に、人間の学習を強化する新たな道を開いた。この研究は、ボットプープを減らしながら教育を強化するために設計された、カスタムビルドされたSinglish- speak Retrieval Augmented Generation (RAG)チャットボットであるLeodar教授を紹介した。シンガポールの南陽工科大学で展開されたLeodar教授は、AI支援学習の未来を垣間見るとともに、パーソナライズされたガイダンス、24/7の可用性、コンテキストに関連する情報を提供している。混合手法を用いて,レオダール教授が学習,エンゲージメント,試験準備に及ぼす影響について検討し,97.1%の参加者が肯定的な経験を報告した。これらの発見は、教育におけるAIの役割を定義し、カスタムなGenAIチャットボットの可能性を強調するのに役立つ。チャットボットの開発、クラス内展開、成果調査の組み合わせは、GenAI教育ツールのベンチマークを提供し、AIと人間の学習の相互作用を再定義するための一歩です。

Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a stepping stone for redefining the interplay between AI and human learning.

翻訳日:2024-06-26 00:13:51 公開日:2024-06-22

# 蛍光分光法による物理化学プロセスの深層学習領域適応:オリーブオイルの老化への応用

Deep Learning Domain Adaptation to Understand Physico-Chemical Processes from Fluorescence Spectroscopy Small Datasets: Application to Ageing of Olive Oil ( http://arxiv.org/abs/2406.10031v2 )

ライセンス: Link先を確認

Umberto Michelucci, Francesca Venturini,

(参考訳) 蛍光分光法は生命科学や化学の基本的な道具であり、環境モニタリング、食品品質管理、生物医学診断などの応用に広く用いられている。しかし、深層学習による分光データの解析、特に蛍光励起放出行列 (EEMs) は、典型的には小さくスパースなデータセットが利用可能であるため、大きな課題を呈している。さらに, スペクトル特性の重なりが強いため, 脳波の解析は困難である。本研究では、これらの課題に対処する新しい解釈可能性アルゴリズムとともに、事前学習された視覚モデルによるドメイン適応を利用する新しいアプローチを提案する。この研究で説明されているニューラルネットワークの機能エンジニアリングのおかげで、データの基礎となる物理化学的プロセスについて、より深い洞察を得られるようになりました。提案手法は, 熟成中のヴァージンオリーブ油 (EVOO) の酸化過程を解析し, 品質指標の予測とスペクトル帯の同定に有効であることを示す。この研究は、深層学習を用いた分光学の極めて革新的なアプローチを記述し、それをブラックボックスから複雑な生物学的および化学的プロセスを理解するためのツールに変換する。

Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets available. Furthermore, the analysis of EEMs is difficult due to their high dimensionality and overlapping spectral features. This study proposes a new approach that exploits domain adaptation with pretrained vision models, alongside a novel interpretability algorithm to address these challenges. Thanks to specialised feature engineering of the neural networks described in this work, we are now able to provide deeper insights into the physico-chemical processes underlying the data. The proposed approach is demonstrated through the analysis of the oxidation process in extra virgin olive oil (EVOO) during ageing, showing its effectiveness in predicting quality indicators and identifying the spectral bands, and thus the molecules involved in the process. This work describes a significantly innovative approach in the use of deep learning for spectroscopy, transforming it from a black box into a tool for understanding complex biological and chemical processes.

翻訳日:2024-06-26 00:13:51 公開日:2024-06-22

# マルチモーダルクエリによるビデオ内のイベントのローカライズ

Localizing Events in Videos with Multimodal Queries ( http://arxiv.org/abs/2406.10079v2 )

ライセンス: Link先を確認

Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, Yansong Tang, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu,

(参考訳) ビデオ理解はデジタル時代において重要な課題であるが、ビデオのダイナミックで多面的な性質は、労働集約的で、処理を計算的に要求する。このように、セマンティッククエリが与えられた特定のイベントのローカライズは、ビデオ検索のようなユーザ指向アプリケーションと、ビデオ基盤モデルに関する学術研究の両方において重要である。現在の研究における重要な制限は、セマンティッククエリが典型的には、対象イベントのセマンティックスを記述する自然言語にあることである。この設定は、画像とテキストからなるマルチモーダルなセマンティッククエリの可能性を見落としている。このギャップに対処するため、マルチモーダルクエリによるビデオ内のイベントのローカライズのための新しいベンチマークICQと、新しい評価データセットICQ-Highlightを導入する。我々の新しいベンチマークは、参照画像からなるマルチモーダルなセマンティッククエリと、画像のセマンティクスを調整するための洗練されたテキストを与えられたイベントを、モデルがいかにうまくローカライズできるかを評価することを目的としている。モデル性能を体系的にベンチマークするために、参照画像の4つのスタイルと5つのタイプの改善テキストを含む。我々は,既存のモデルを新しい設定に適合させる3つの適応法を提案し,特殊モデルから大規模基礎モデルまで10のSOTAモデルを評価した。このベンチマークは、ビデオイベントのローカライゼーションにおいて、マルチモーダルクエリを調査するための最初のステップであると考えています。

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images' semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

翻訳日:2024-06-26 00:04:06 公開日:2024-06-22

# 転送可能性モデリングによるグラフ上のマルチソース非教師付きドメイン適応

Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling ( http://arxiv.org/abs/2406.10425v2 )

ライセンス: Link先を確認

Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, Suhang Wang,

(参考訳) 本稿では、ノード分類のために、アノテーション付きソースドメインで訓練されたモデルを教師なしターゲットグラフに転送する必要があるグラフに対するtextit{multi-source unsupervised domain adaptation (MSUDA) という新しい問題に取り組む。ドメイン間の分散の相違により、重要な課題は、どのように優れたソースインスタンスを選択し、モデルを適応させるかである。様々なグラフ構造がこの問題をさらに複雑にし、以前の MSUDA のアプローチはより効果的でない。本稿では、グラフモデリングに基づくドメインセレクタ、サブグラフノードセレクタ、および適応のための双方向アライメント目的を備えたSelective Multi-source Adaptation for Graph ({\method})を提案する。具体的には、情報ソースデータの識別を容易にするため、グラフ間の類似性は、グラフモデリングタスクセットの転送可能性によって切り離され、測定され、ソースドメイン選択の証拠として使用される。ノードセレクタは、同じソースドメイン内のノードの転送可能性の変化をキャプチャするために、さらに組み込まれている。適応のための不変な特徴を学習するために、最適な輸送距離を最小化し、ラベル関数を蒸留することで分類レベルを最小化し、選択したソースデータにターゲット領域を合わせる。モジュールは、情報ソースデータを選択し、メタ学習戦略で仮想トレーニングスプリットのアライメントを実行するように明示的に学習される。 5つのグラフデータセットに対する実験結果から,提案手法の有効性が示された。

In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method.

翻訳日:2024-06-26 00:04:06 公開日:2024-06-22

# ドットの接続:New York Times Connections Word Gameを用いたLLMの抽象推論能力の評価

Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game ( http://arxiv.org/abs/2406.11012v3 )

ライセンス: Link先を確認

Prisha Samadarshi, Mariam Mustafa, Anushka Kulkarni, Raven Rothkopf, Tuhin Chakrabarty, Smaranda Muresan,

(参考訳) New York Times Connectionsゲームは、ワードパズル愛好家のための人気で挑戦的な追跡ゲームとして登場した。我々は200のConnectionsゲームを収集し、最先端の大規模言語モデル(LLM)の性能を専門家や初心者の人間プレイヤーに対して評価する。以上の結果から,多種多様なベンチマークで顕著な推論能力を示した最高のLPMであるGPT-4oでも,ゲーム全体の8%しか解けないことがわかった。 GPT-4oと比較すると、初心者や専門家のプレイヤーはGPT-4oより優れており、専門家のプレイヤーはGPT-4oよりも優れていた。我々の理解を深めるために、私たちはコネクティクスゲームにおける単語の分類に成功するために必要な知識タイプの分類を作成し、LLMが連想的、百科事典的、言語的知識に苦しむことを明らかにした。我々の発見は、New York Times Connectionsゲームが、人間とAIシステムの抽象的推論能力を評価するための挑戦的なベンチマークとして確立されている。

The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games. Compared to GPT-4o, novice and expert players perform better, with expert human players significantly outperforming GPT-4o. To deepen our understanding we create a taxonomy of the knowledge types required to successfully categorize words in the Connections game, revealing that LLMs struggle with associative, encyclopedic, and linguistic knowledge. Our findings establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems.

翻訳日:2024-06-26 00:04:06 公開日:2024-06-22

# シンプルだが効率的なFG-SBIR : 統一されたサンプル特徴アライメントによる自己監督型FG-SBIRの実現

Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment ( http://arxiv.org/abs/2406.11551v2 )

ライセンス: Link先を確認

Jianan Jiang, Di Wu, Zhilin Jiang, Weiren Yu,

(参考訳) Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) は、スケッチと埋め込み空間における対応する画像の距離を最小化することを目的としている。しかし、スケーラビリティはソリューションの複雑さの増大によって妨げられ、主にきめ細かいスケッチの抽象的な性質が原因である。本稿では,2つのモード間のギャップを狭めるための,シンプルで効率的な手法を提案する。主に、モダリティ間の単一特徴アライメント問題として扱うのではなく、サンプル内の情報とサンプル間の情報を共有する統一的な情報共有を促進する。特に、我々のアプローチには以下のものがある。一二重重み共有ネットワークを用いてスケッチと画像領域内のアライメントを最適化し、モデル学習飽和問題を効果的に軽減する。 (2)コントラスト損失に基づく目的最適化関数の導入により,モデルがサンプル内およびサンプル間の特徴を整列する能力を高める。三トークン間の特徴表現を促進するために自己注意と相互注意を組み合わせた学習可能なTRSMを提示し、さらに埋め込み空間におけるサンプルアライメントを強化する。このフレームワークは,CNNおよびViTベースのバックボーンにおいて優れた結果が得られる。大規模な実験は、既存の方法よりも優れていることを示す。また、最初のプロのファッションスケッチとイメージデータセットであるCloss-V1を導入し、私たちのメソッドを検証するために利用し、他のアプリケーションに役立ちます。

Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitates unified mutual information sharing both intra- and inter-samples, rather than treating them as a single feature alignment problem between modalities. Specifically, our approach includes: (i) Employing dual weight-sharing networks to optimize alignment within sketch and image domain, which also effectively mitigates model learning saturation issues. (ii) Introducing an objective optimization function based on contrastive loss to enhance the model's ability to align features intra- and inter-samples. (iii) Presenting a learnable TRSM combined of self-attention and cross-attention to promote feature representations among tokens, further enhancing sample alignment in the embedding space. Our framework achieves excellent results on CNN- and ViT-based backbones. Extensive experiments demonstrate its superiority over existing methods. We also introduce Cloths-V1, the first professional fashion sketches and images dataset, utilized to validate our method and will be beneficial for other applications.

翻訳日:2024-06-26 00:04:06 公開日:2024-06-22

# RetinaGS: 数十億ドル規模の3Dガウシアンによる高精細なシーンレンダリングのためのスケーラブルなトレーニング

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians ( http://arxiv.org/abs/2406.11836v2 )

ライセンス: Link先を確認

Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong,

(参考訳) 本研究では,大規模・高解像度データセット上での高パラメータ3次元ガウススプラッティング(3DGS)モデルのトレーニングの可能性について検討する。我々は、適切なレンダリング方程式を用いてガウス原始体の任意のシーンや任意の分布に適用可能な3DGSの一般モデル並列トレーニング手法であるRetinaGSを設計する。これにより、3DGSのスケーリングの振る舞いをプリミティブな数値とトレーニングの解像度で調べることができる。我々は,原始的な数を増やす際に,視覚的品質を増大させる明確な正の傾向を観察する。また、完全なMatrixCityデータセット上に10億以上のプリミティブを持つ3DGSモデルをトレーニングし、有望な視覚的品質を達成するための最初の試みを実演する。

In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.

翻訳日:2024-06-25 23:54:21 公開日:2024-06-22

# 細胞内における分子表現の学習

Learning Molecular Representation in a Cell ( http://arxiv.org/abs/2406.12056v2 )

ライセンス: Link先を確認

Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh,

(参考訳) 薬物の有効性と安全性をin vivoで予測するには、小さな分子摂動に対する生物学的反応(細胞形態、遺伝子発現など)に関する情報が必要である。しかしながら、現在の分子表現学習法は、これらの摂動下での細胞状態の包括的なビューを提供しておらず、ノイズを取り除くのに苦労し、モデル一般化を妨げている。本稿では,細胞内情報ボトルネック法を用いて分子表現を学習するための情報アライメント(InfoAlign)手法を提案する。我々は、分子と細胞応答データをノードとしてコンテキストグラフに統合し、化学、生物学的、計算基準に基づいて重み付けされたエッジと接続する。トレーニングバッチの各分子に対して、InfoAlignはエンコーダの潜在表現を最小限の目的で最適化し、冗長な構造情報を破棄する。十分性目的(sufficiency objective)は、コンテキストグラフ内の分子の近傍から異なる特徴空間と整合するように表現をデコードする。提案手法は,既存のエンコーダをベースとしたコントラスト法よりも,アライメントの効率向上を目標としている。経験的に、我々はInfoAlignの表現を2つの下流タスクで検証した: 4つのデータセットにまたがる19のベースライン法に対する分子特性予測とゼロショット分子形態整合である。

Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

翻訳日:2024-06-25 23:54:21 公開日:2024-06-22

# DASSF: 空中物体検出のためのダイナミックアテンションスケールシーケンスフュージョン

DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection ( http://arxiv.org/abs/2406.12285v2 )

ライセンス: Link先を確認

Haodong Li, Haicheng Qu,

(参考訳) 空中画像における小さな物体の検出は、コンピュータビジョンの分野における基本的な課題である。空中撮影における移動物体には、形状や大きさ、重なり合い、背景による隠蔽、物体のぼやけなどの問題があるが、元のYOLOアルゴリズムは、異なるスケールの目標を知覚する能力の弱いため、全体的な検出精度が低い。そこで本研究では,小型目標とファジィ目標の重なり合う検出精度を向上させるために,空中画像における小型目標検出のためのダイナミックアテンションスケール系列融合アルゴリズム(DASSF)を提案する。まず、アップサンプリング機構を改善し、計算負荷を低減する動的スケールシーケンス機能融合(DSSFF)モジュールを提案する。第2に、小目標の検出能力を高めるために、特別にx小物体検出ヘッドを付加する。最後に、異なるタイプやサイズのターゲットの表現能力を改善するために、動的ヘッド(DyHead)を使用します。提案するモデルでは,航空画像における目標検出の小型化が問題視され,YOLOアルゴリズムの多種多様なバージョンに適用可能である。実験の結果, YOLOv8nと比較すると, 平均平均精度 (mAP) は9.2%, DIORは2.4%向上し, 現在の主流手法よりも優れていた。

The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to improve the detection accuracy of densely overlapping small targets and fuzzy targets, this paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images. First, we propose a dynamic scale sequence feature fusion (DSSFF) module that improves the up-sampling mechanism and reduces computational load. Secondly, a x-small object detection head is specially added to enhance the detection capability of small targets. Finally, in order to improve the expressive ability of targets of different types and sizes, we use the dynamic head (DyHead). The model we proposed solves the problem of small target detection in aerial images and can be applied to multiple different versions of the YOLO algorithm, which is universal. Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, on the VisDrone-2019 and DIOR datasets, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP), respectively, and outperforms the current mainstream methods.

翻訳日:2024-06-25 23:54:21 公開日:2024-06-22

# BIOSCAN-5M:昆虫の生物多様性のためのマルチモーダルデータセット

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity ( http://arxiv.org/abs/2406.12723v2 )

ライセンス: Link先を確認

Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo A. Millan, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang,

(参考訳) 本稿では,昆虫の生物多様性を理解・監視するための国際的な取り組みの一環として,BIOSCAN-5M Insectデータセットを機械学習コミュニティに提示し,いくつかのベンチマークタスクを確立する。 BIOSCAN-5Mは500万以上の昆虫標本のマルチモーダル情報を含む包括的データセットであり、分類学的ラベル、生ヌクレオチドバーコード配列、割り当てられたバーコードインデックス番号、地理的情報を含む既存の画像ベースの生物学的データセットを著しく拡張する。マルチモーダルデータ型が分類とクラスタリングの精度に与える影響を示すための3つのベンチマーク実験を提案する。まず,<mbox{BIOSCAN-5M} データセットのDNAバーコード配列にマスク付き言語モデルを事前学習し,この大規模な参照ライブラリが種と種レベルの分類性能に与える影響を実証する。次に、自己教師付き学習から得られたクラスタ特徴埋め込みに画像やDNAバーコードに適用したゼロショット転送学習タスクを提案し、これらの表現埋め込みから有意義なクラスタを抽出できるかどうかを検討する。第3に、DNAバーコード、画像データ、分類情報に対してコントラスト学習を行うことにより、マルチモダリティをベンチマークする。これにより、複数の種類の情報とモダリティを用いた分類学的分類を可能にする一般的な共有埋め込み空間が得られる。 BIOSCAN-5M Insectデータセットのコードリポジトリは {\url{https://github.com/zahrag/BIOSCAN-5M}}で公開されている。

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, and geographical information. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy. First, we pretrain a masked language model on the DNA barcode sequences of the \mbox{BIOSCAN-5M} dataset, and demonstrate the impact of using this large reference library on species- and genus-level classification performance. Second, we propose a zero-shot transfer learning task applied to images and DNA barcodes to cluster feature embeddings obtained from self-supervised learning, to investigate whether meaningful clusters can be derived from these representation embeddings. Third, we benchmark multi-modality by performing contrastive learning on DNA barcodes, image data, and taxonomic information. This yields a general shared embedding space enabling taxonomic classification using multiple types of information and modalities. The code repository of the BIOSCAN-5M Insect dataset is available at {\url{https://github.com/zahrag/BIOSCAN-5M}}

翻訳日:2024-06-25 23:54:21 公開日:2024-06-22

# ユニバーサルリモートセンシング変更検出のための単一時間教師付き学習

Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection ( http://arxiv.org/abs/2406.15694v1 )

ライセンス: Link先を確認

Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang,

(参考訳) 特に高空間分解能(HSR)リモートセンシング画像において、バイテンポラル教師付き学習パラダイムは、多くのラベル付きバイテンポラルイメージペアを用いたリモートセンシング変化検出を常に支配している。しかし、大規模なバイテンポラルHSRリモートセンシング画像対における変化領域のラベル付けは非常に高価で労力がかかる。本稿では,非ペア画像間の変化を監視信号として活用する新しい視点から,リモートセンシングの普遍的変化検出のための単一時間教師付き学習(STAR)を提案する。 STARにより、未ペアラベル付き画像のみを用いて高精度な変化検出装置を訓練し、実世界のバイテンポラル画像ペアに一般化することができる。そこで本研究では,STARの柔軟性とスケーラビリティを実証するため,バイナリ変更検出,オブジェクト変更検出,セマンティック変更検出をひとつのアーキテクチャで処理可能な,シンプルで統一的な変更検出器であるChangeStar2を設計した。 ChangeStar2は、8つのパブリックリモートセンシング変更検出データセットの最先端のパフォーマンスを実現し、2つの教師付き設定、複数の変更タイプ、複数のシナリオをカバーする。コードはhttps://github.com/Z-Zheng/pytorch-change-modelsで入手できる。

Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (STAR) for universal remote sensing change detection from a new perspective of exploiting changes between unpaired images as supervisory signals. STAR enables us to train a high-accuracy change detector only using unpaired labeled images and can generalize to real-world bitemporal image pairs. To demonstrate the flexibility and scalability of STAR, we design a simple yet unified change detector, termed ChangeStar2, capable of addressing binary change detection, object change detection, and semantic change detection in one architecture. ChangeStar2 achieves state-of-the-art performances on eight public remote sensing change detection datasets, covering above two supervised settings, multiple change types, multiple scenarios. The code is available at https://github.com/Z-Zheng/pytorch-change-models.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# SS-Bench: ソーシャルストーリーの生成と評価のためのベンチマーク

SS-Bench: A Benchmark for Social Story Generation and Evaluation ( http://arxiv.org/abs/2406.15695v1 )

ライセンス: Link先を確認

Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu,

(参考訳) 自閉症スペクトラム障害(ASD)を持つ子供たちは、しばしば社会的状況を誤解し、日々のルーチンに参加するのに苦労する。心理学の専門家は、これらの体制における能力を高めるために、構造的明瞭さ、記述的指向、状況的安全性の厳格な制約の下で社会ストーリーを書く。しかし、ソーシャルストーリーは作成に費用がかかり、しばしば多様性やタイムラインに制限される。大規模言語モデル(LLMs)がますます強力になるにつれて、より自動化され、手頃な価格で、アクセスしやすい方法で、幅広い範囲でリアルタイムでソーシャルストーリーを生成する必要性が高まっています。ソーシャルストーリーのユニークで厳格な制約を満たすためにLLMを適用することは、難しい問題です。この目的のために,ソーシャルストーリーの生成と評価を行うために,textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}markを提案する。具体的には,社会的ストーリの生成とベンチマーク作成をLLMに階層的に促すための制約駆動型戦略である‘textbf{\textsc{StarSow}} を開発した。また、人間とGPTの評価に使用される「textbf{Quality Assessment Criteria」を導入し、生成したストーリーの有効性を検証する。我々は、この研究が自閉症コミュニティに恩恵を与え、特定のグループに焦点を当てた将来の研究を促進することを願っている。

Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timeliness. As Large Language Models (LLMs) become increasingly powerful, there is a growing need for more automated, affordable, and accessible methods to generate Social Stories in real-time with broad coverage. Adapting LLMs to meet the unique and strict constraints of Social Stories is a challenging issue. To this end, we propose \textbf{SS-Bench}, a \textbf{S}ocial \textbf{S}tory \textbf{Bench}mark for generating and evaluating Social Stories. Specifically, we develop a constraint-driven strategy named \textbf{\textsc{StarSow}} to hierarchically prompt LLMs to generate Social Stories and build a benchmark, which has been validated through experiments to fine-tune smaller models for generating qualified Social Stories. Additionally, we introduce \textbf{Quality Assessment Criteria}, employed in human and GPT evaluations, to verify the effectiveness of the generated stories. We hope this work benefits the autism community and catalyzes future research focusing on particular groups.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# 医用画像セグメンテーションのための自己教師付きアライメント学習

Self-Supervised Alignment Learning for Medical Image Segmentation ( http://arxiv.org/abs/2406.15699v1 )

ライセンス: Link先を確認

Haofeng Li, Yiming Ouyang, Xiang Wan,

(参考訳) 近年,2次元画像と3次元画像のセグメンテーションモデルの事前学習に,自己教師付き学習(SSL)法が用いられている。これらの手法のほとんどは、再構成、コントラスト学習、一貫性正規化に基づいている。しかし,3次元医用画像からの2次元スライスの空間的対応は十分に活用されていない。本稿では,医療画像セグメンテーションのためのニューラルネットワークを事前学習するための,自己教師付きアライメント学習フレームワークを提案する。提案するフレームワークは,新たな局所的なアライメント損失とグローバルな位置損失から構成される。同じ3Dスキャンでは、2つの近接2Dスライスは通常、同様の解剖学的構造を含む。そこで, 一致した構造の画素レベルの特徴を互いに近接させるために, 局所アライメント損失を提案する。実験結果から,提案したアライメント学習は,限定アノテーションの設定の下で,既存のCTおよびMRIデータセットに対する自己教師付き事前学習手法と競合することが示された。

Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment learning framework to pre-train the neural network for medical image segmentation. The proposed framework consists of a new local alignment loss and a global positional loss. We observe that in the same 3D scan, two close 2D slices usually contain similar anatomic structures. Thus, the local alignment loss is proposed to make the pixel-level features of matched structures close to each other. Experimental results show that the proposed alignment learning is competitive with existing self-supervised pre-training approaches on CT and MRI datasets, under the setting of limited annotations.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# 高効率低雑音光子結晶光子-フォノン変換器

High-Efficiency Low-Noise Optomechanical Crystal Photon-Phonon Transducers ( http://arxiv.org/abs/2406.15701v1 )

ライセンス: Link先を確認

Sameer Sonar, Utku Hatipoglu, Srujan Meesala, David Lake, Hengjiang Ren, Oskar Painter,

(参考訳) オプトメカニカルクリスタル(OMC)は光子とマイクロ波音響フォノンのコヒーレントな相互作用を可能にし、マイクロ波と光信号間の量子トランスダクションを実装するためのプラットフォームである。低温における光吸収誘起熱雑音は、OMCベースの量子トランスデューサの性能の第一の限界の一つである。ここでは,2次元シリコンOCC共振器を機械的に分離した光導波路に横結合し,従来の状態と比較して音響共振器の加熱速度を6倍に低減し,高光学バックアクションとミリケルビンベース温度の条件下で動作させることで,この問題に対処する。この還元加熱により、フォノン-光子の変換効率が93.1ドル\pm$ 0.8%のノイズが0.25ドル\pm$ 0.01量子化され、量子制限されたマイクロ波-光周波数変換と光制御された量子音響メモリへの大きな進歩を示す。

Optomechanical crystals (OMCs) enable coherent interactions between optical photons and microwave acoustic phonons, and represent a platform for implementing quantum transduction between microwave and optical signals. Optical absorption-induced thermal noise at cryogenic (millikelvin) temperatures is one of the primary limitations of performance for OMC-based quantum transducers. Here, we address this challenge with a two-dimensional silicon OMC resonator that is side-coupled to a mechanically detached optical waveguide, realizing a six-fold reduction in the heating rate of the acoustic resonator compared to prior state-of-the-art, while operating in a regime of high optomechanical-backaction and millikelvin base temperature. This reduced heating translates into a demonstrated phonon-to-photon conversion efficiency of 93.1 $\pm$ 0.8% at an added noise of 0.25 $\pm$ 0.01 quanta, representing a significant advance toward quantum-limited microwave-optical frequency conversion and optically-controlled quantum acoustic memories.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# video-SALMONN:音声強調型音声視覚大言語モデル

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models ( http://arxiv.org/abs/2406.15704v1 )

ライセンス: Link先を確認

Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang,

(参考訳) 音声-視覚的大言語モデル(av-LLM)を用いたより汎用的なビデオ理解の要素としての音声理解は、重要で未検討の側面である。本稿では,ビデオ処理のための一対一のav-LLMであるVideo-SALMONNを提案する。本稿では,他のビデオ要素に対して効率を保ちながら,音声理解に必要な微細な時間情報を得るために,事前学習した音声視覚エンコーダとバックボーン大言語モデルとを接続する,新しいマルチレゾリューション因果Q-Former(MRC Q-Former)構造を提案する。さらに、フレームやモダリティの優位性を回避するため、多様性の喪失や、聴覚と視覚の混在を考慮しない訓練手法を含む専用トレーニング手法を提案する。 The introduced speech-audio-visual evaluation benchmark, video-SALMONN has achieved more more more more exact accuracy on the video-QA task and more 30\% out above audio-visual QA task with human speech。さらに、ビデオSALMONNは、他のav-LLMでは前例のないタスクに対して、驚くべきビデオ理解と推論能力を示す。トレーニングコードとモデルチェックポイントは、 \texttt{\url{https://github.com/bytedance/SALMONN/}}で利用可能です。

Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model. Moreover, dedicated training approaches including the diversity loss and the unpaired audio-visual mixed training scheme are proposed to avoid frames or modality dominance. On the introduced speech-audio-visual evaluation benchmark, video-SALMONN achieves more than 25\% absolute accuracy improvements on the video-QA task and over 30\% absolute accuracy improvements on audio-visual QA tasks with human speech. In addition, video-SALMONN demonstrates remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other av-LLMs. Our training code and model checkpoints are available at \texttt{\url{https://github.com/bytedance/SALMONN/}}.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# psPRF:汎用3次元再構成衛星画像のための平面ニューラル放射場

psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery ( http://arxiv.org/abs/2406.15707v1 )

ライセンス: Link先を確認

Tongtong Zhang, Yuanxiang Li,

(参考訳) 現在の衛星用NeRF変種のほとんどは特定のシーンのために設計されており、新しい幾何学への一般化には至っていない。さらに、RGBイメージは独立した前処理ステップとしてパンシャーピングを必要とする。本稿では,低分解能RGB(LR-RGB)と高分解能パノクロマティック(HR-PAN)の画像にRational Polynomial Cameras(RPC)を併用した平面ニューラルラジアンスフィールドであるpsPRFを紹介する。 Unet型アーキテクチャでは, LR-RGBとHR-PANの両画像から先行するクロスモーダルをキャプチャするために, 露骨なスペクトル対空間畳み込み(SSConv)でエンコーダを適応させ, マルチモーダル表現能力を向上する。シーン間におけるpsRPFの一般化能力を支援するため、プロジェクションロスを採用し、強力な幾何学的自己監督を実現する。提案手法は,マルチシーンのWorldView-3 LR-RGBとHR-PANのペアを用いて評価し,最先端性能を実現する。

Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors with Rational Polynomial Cameras (RPC). To capture the cross-modal prior from both of the LR-RGB and HR-PAN images, for the Unet-shaped architecture, we adapt the encoder with explicit spectral-to-spatial convolution (SSConv) to enhance the multimodal representation ability. To support the generalization ability of psRPF across scenes, we adopt projection loss to ensure strong geometry self-supervision. The proposed method is evaluated with the multi-scene WorldView-3 LR-RGB and HR-PAN pairs, and achieves state-of-the-art performance.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# 授業改善か、よりスマートか? : 自動プロンプト最適化の指導と実践について

Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization ( http://arxiv.org/abs/2406.15708v1 )

ライセンス: Link先を確認

Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik,

(参考訳) 大規模言語モデルは目覚ましい能力を示してきたが、その性能は効果的なプロンプトエンジニアリングに大きく依存している。自動プロンプト最適化(APO)手法は、これを自動化するために設計されており、命令(命令最適化、IO)を対象とする命令(例選択、ES)に対して広範囲に分類することができる。彼らの共通の目的にもかかわらず、これらは比較的独立して進化しており、IOは最近より研究の注目を集めている。本稿では,このギャップを解消するために,多様な課題に対して,代表的IO技術とES技術(分離と組み合わせの両方)のパフォーマンスを総合的に比較する。実験結果によると, モデル生成した入力出力ペアを, 検証セット上でのプロンプトの評価からインテリジェントに再利用することで, IO法よりも連続的に性能が向上するが, 未検討であることがわかった。また,最近の IO に焦点が当てられているにも拘わらず,ES ストラテジーは,最適化を伴わないシード命令で最先端の IO メソッドをランダムに検索するのと同じように,命令の最適化方法を上回ることができることがわかった。さらに,ESとIOの相乗効果を観察し,各コントリビューションを超越した最適な組み合わせを示す。予備的な手法としての模範選択の学習と命令最適化との最適な組み合わせは、APOの重要な側面であり、高度に有能な命令追従モデルの時代においても、将来の研究においてより考慮すべきである、と結論付けている。

Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models.

翻訳日:2024-06-25 21:04:37 公開日:2024-06-22

# 10以上のDeFi詐欺を体験した:DeFiユーザーによるセキュリティの認識と対策

I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures ( http://arxiv.org/abs/2406.15709v1 )

ライセンス: Link先を確認

Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim,

(参考訳) Decentralized Finance(DeFi)は、まったく新しい投資経験を提供し、CeFi(Centralized Finance)の代わりとしてすぐに登場した。しかし、市場規模とアクティブユーザー数の急激な増加により、DeFiは詐欺やハッキングの有利なターゲットとなり、2023年には195億USドルを失った。残念ながら、DeFiユーザーのセキュリティリスク認識レベルとリスク軽減戦略の妥当性を徹底的に調査する以前の研究はない。半構造化インタビュー研究 (N = 14) とフォローアップ調査 (N = 493) に基づいて,DeFi利用者のセキュリティ意識と一般的に採用されているプラクティス,および過去の詐欺やハッキング(DeFi被害者)による被害に対してどのように対応し,損失を回復しようとするかを検討する。分析の結果,CeFiよりもDeFiが好まれる傾向がみられた。 DeFiはCeFiに比べて深刻な攻撃を受ける傾向にあるが、ユーザーは新たな投資機会を探るためにこうしたリスクを冒すことを喜んでいる。従来のシステムを通じて調査された被害者とは異なり、DeFiの被害者はセキュリティの慣行を見直しずに新しいサービスを見つけ、損失を素早く回復する傾向にある。さまざまなDeFiサービスや機会が豊富にあることで、被害者は新しい金融機会を継続的に探究することができる。実際、私たちの結果は、DeFiユーザーの強い経済的モチベーションがセキュリティ上の懸念よりも優れていることを示唆しています。事故後の行動に関する我々の観察は、DeFiユーザーを将来の侵害から守るためには、業界規制の形でより強力なコントロールが必要であることを示唆している。

Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness levels and the adequacy of their risk mitigation strategies. Based on a semi-structured interview study (N = 14) and a follow-up survey (N = 493), this paper investigates DeFi users' security perceptions and commonly adopted practices, and how those affected by previous scams or hacks (DeFi victims) respond and try to recover their losses. Our analysis shows that users often prefer DeFi over CeFi due to their decentralized nature and strong profitability. Despite being aware that DeFi, compared to CeFi, is prone to more severe attacks, users are willing to take those risks to explore new investment opportunities. Worryingly, most victims do not learn from previous experiences; unlike victims studied through traditional systems, DeFi victims tend to find new services, without revising their security practices, to recover their losses quickly. The abundance of various DeFi services and opportunities allows victims to continuously explore new financial opportunities, and this reality seems to cloud their security priorities. Indeed, our results indicate that DeFi users' strong financial motivations outweigh their security concerns - much like those who are addicted to gambling. Our observations about victims' post-incident behaviors suggest that stronger control in the form of industry regulations would be necessary to protect DeFi users from future breaches.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 超放射光駆動型フォトニック量子エンジン

A photonic quantum engine driven by superradiance ( http://arxiv.org/abs/2406.15710v1 )

ライセンス: Link先を確認

Jinuk Kim, Seung-hoon Oh, Daeho Yang, Junki Kim, Moonjoo Lee, Kyungwon An,

(参考訳) ナノおよびマイクロスケールのヒートエンジンの性能は、量子力学現象の助けを借りて改善することができる。近年, 単一貯水池においても, キャノット限界を超えてエンジン性能を向上させるために, 量子コヒーレンスを有する貯水池が提案されている。しかし、これまでのところ、物理的な実現は行われていない。本稿では、原子とフォトニック真空からなる単一熱貯留層を用いた超放射光駆動型フォトニック量子エンジンの実証実験を初めて報告する。量子コヒーレント重ね合わせ状態で調製された貯留層原子は、空洞を通過しながら超放射光を受けた。この結果、エンジンの効率は40倍に向上し、エンジンの効率もほぼ均一に向上した。さらに, 観測されたエンジン出力は, 原子注入速度に対して2次的に増大した。我々の研究は、量子力学的熱伝達やエンジンのパワー向上に利用でき、熱浴に埋め込まれた量子コヒーレンス上で動く光メカニカルデバイスの開発への道を開くことができる。

Performance of nano- and micro-scale heat engines can be improved with a help from quantum mechanical phenomena. Recently, heat reservoirs with quantum coherence have been proposed to enhance engine performance beyond the Carnot limit even with a single reservoir. However, no physical realizations have been achieved so far. Here, we report the first proof-of-principle experimental demonstration of a photonic quantum engine driven by superradiance employing a single heat reservoir composed of atoms and photonic vacuum. Reservoir atoms prepared in a quantum coherent superposition state underwent superradiance while traversing the cavity. This led to about 40-fold increase of the effective engine temperature, resulting in a near-unity engine efficiency. Moreover, the observed engine output power grew quadratically with respect to the atomic injection rate. Our work can be utilized in quantum mechanical heat transfer as well as in boosting engine powers, opening a pathway to development of photomechanical devices that run on quantum coherence embedded in heat baths.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 繰り返し繰り返し核ノルム最小化による効率的な低ランク同定

Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization ( http://arxiv.org/abs/2406.15713v1 )

ライセンス: Link先を確認

Hao Wang, Ye Wang, Xiangyu Yang,

(参考訳) 本稿では、滑らかな函数の和と行列のSchatten-$p$ノルムを最小化する問題を考察する。我々の貢献は、非凸な低ランク化問題を解くために設計された、反復的に再重み付けされた核ノルム法を提案することである。 2つの主要な小説が我々のアプローチを特徴づけている。まず、提案手法はランク識別特性を持ち、有限個の反復で定常点の「正しい」ランクを証明できる。次に,パラメータの平滑化のための適応的更新手法を提案する。この戦略は、「正しい」ランクを検出すると、ゼロ特異値に関連するパラメータを定数として自動的に修正し、残りのパラメータを0に素早く駆動する。この適応的な振る舞いは、アルゴリズムを数回繰り返した後にスムーズな問題を効果的に解決するアルゴリズムに変換し、我々の作業を、低ランク最適化のための既存の反復的に重み付けされた方法とは切り離す。提案アルゴリズムのグローバル収束を証明し、反復のすべての極限点が臨界点であることを保証する。さらに、Kurdyka-{\L}ojasiewicz性質の下で局所収束速度解析を行う。合成データと実データの両方を用いて数値実験を行い、既存の手法よりもアルゴリズムの効率と優越性を実証する。

This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enabling the provable identification of the "correct" rank of the stationary point within a finite number of iterations. Secondly, we introduce an adaptive updating strategy for smoothing parameters. This strategy automatically fixes parameters associated with zero singular values as constants upon detecting the "correct" rank while quickly driving the rest parameters to zero. This adaptive behavior transforms the algorithm into one that effectively solves smooth problems after a few iterations, setting our work apart from existing iteratively reweighted methods for low-rank optimization. We prove the global convergence of the proposed algorithm, guaranteeing that every limit point of the iterates is a critical point. Furthermore, a local convergence rate analysis is provided under the Kurdyka-{\L}ojasiewicz property. We conduct numerical experiments using both synthetic and real data to showcase our algorithm's efficiency and superiority over existing methods.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# Light My Cells におけるPix2Pixフリー顕微鏡画像における蛍光ラベルの予測と適応損失

Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge ( http://arxiv.org/abs/2406.15716v1 )

ライセンス: Link先を確認

Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz,

(参考訳) 蛍光ラベリングは、顕微鏡画像のための細胞構造やその他の細胞内成分を明らかにするための標準的なアプローチである。しかし、この侵襲的な処置は細胞を摂動させるか、あるいは死滅させる可能性があり、その手順自体は非常に時間がかかり複雑である。近年,サイリコラベリングが有望な代替手段として登場し,ラベルなし顕微鏡から蛍光標識画像を直接予測する機械学習モデルが注目されている。本稿では,Light My Cells チャレンジのための深層学習に基づくサイリコラベリング手法を提案する。提案手法は,pix2pix上に構築され,適応損失のある部分ラベル付きデータセットを用いて学習することができる。さらに、異なる入力モダリティを扱うための複数のトレーニング戦略の有効性について検討する。その結果,本手法はシリカラベリングにおいて有望な性能を達成できることが示唆された。私たちのコードはhttps://github.com/MedICL-VU/LightMyCells.comで利用可能です。

Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the fluorescently labeled images from label-free microscopy. In this paper, we propose a deep learning-based in silico labeling method for the Light My Cells challenge. Built upon pix2pix, our proposed method can be trained using the partially labeled datasets with an adaptive loss. Moreover, we explore the effectiveness of several training strategies to handle different input modalities, such as training them together or separately. The results show that our method achieves promising performance for in silico labeling. Our code is available at https://github.com/MedICL-VU/LightMyCells.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# ターンベースゲームを超えて - ダブルプレックスモデルによるリアルタイム会話の実現

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models ( http://arxiv.org/abs/2406.15718v1 )

ライセンス: Link先を確認

Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu,

(参考訳) 大規模言語モデル(LLM)が日々の生活に浸透するにつれて、人間の会話を反映するリアルタイムインタラクションへの需要が高まっている。 LLMによって駆動される従来のターンベースのチャットシステムは、ユーザが応答を生成している間に、システムが言葉で対話することを防ぐ。これらの制限を克服するため,既存のLCMをtextit{duplex model} に適応させ,出力を生成しながらユーザをリスニングし,ユーザに対して迅速なフィードバックを提供する。 %であった。具体的には、会話のクエリとレスポンスを複数のタイムスライスに分割し、それらを擬似的に処理するために時間分割多重化(TDM)符号化戦略を採用する。さらに,LLMをリアルタイムな会話を処理できるほど高度にするために,クエリやレスポンスの時間スライスを交互に行うような微調整データセットを構築した。実験の結果,会話のクエリと応答は処理のための不完全なスライスに分割されているものの,LLMは標準ベンチマークで元の性能を保ちながら,データセットに微調整を施すことができることがわかった。自動的および人的評価は、ユーザとAIのインタラクションをより自然で人間的なものにし、バニラLLMに比べてユーザ満足度を大幅に向上させることを示している。我々のデュプレックスモデルとデータセットはリリースされます。

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# より深く学ぶには?ハイパースペクトル画像分類のためのコルモゴロフ・アルノルドネットワークの探索

How to Learn More? Exploring Kolmogorov-Arnold Networks for Hyperspectral Image Classification ( http://arxiv.org/abs/2406.15719v1 )

ライセンス: Link先を確認

Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Bing Lu, Pedram Ghamisi,

(参考訳) 畳み込みニューラルネットワーク(CNN)と視覚変換器(ViT)は、複雑なハイパースペクトル画像(HSI)分類において優れた能力を示している。しかし、これらのモデルは大量のトレーニングデータを必要とし、計算資源である。一方、現代のマルチ層パーセプトロン(MLP)は、優れた分類能力を示している。現代のMLPベースのモデルでは、CNNやViTに比べてトレーニングデータが少ないため、最先端の分類精度が向上する。近年,MLPの代替としてKAN(Kolmogorov-Arnold Networks)が提案されている。スプラインと内部的類似性やMDPと外部的類似性により、KANSAは学習した特徴を顕著な精度で最適化し、新しい特徴を学習することができる。そこで本研究では,複雑なHSIデータ分類におけるkanの有効性を評価する。さらに,KANSAが取得したHSI分類精度を向上させるために,1D,2D,3Dkanを用いたハイブリッドアーキテクチャを開発し,提案する。提案アーキテクチャの有効性を実証するため,新たに作成された3つのHSIベンチマークデータセット,QUH-Pingan,QUH-Tangdaowan,QUH-Qingyunについて広範な実験を行った。結果は、これらのベンチマークデータセットを1D-CNN、2DCNN、3D CNN、VGG-16、ResNet-50、EfficientNet、RNN、ViTなど、他のCNNおよびViTベースのアルゴリズムに比較して、開発されたハイブリッドkanベースのモデルの競争力またはより良い能力を強調した。コードはhttps://github.com/aj1365/HSIConvKAN)で公開されている。

Convolutional Neural Networks (CNNs) and vision transformers (ViTs) have shown excellent capability in complex hyperspectral image (HSI) classification. However, these models require a significant number of training data and are computational resources. On the other hand, modern Multi-Layer Perceptrons (MLPs) have demonstrated great classification capability. These modern MLP-based models require significantly less training data compared to CNNs and ViTs, achieving the state-of-the-art classification accuracy. Recently, Kolmogorov-Arnold Networks (KANs) were proposed as viable alternatives for MLPs. Because of their internal similarity to splines and their external similarity to MLPs, KANs are able to optimize learned features with remarkable accuracy in addition to being able to learn new features. Thus, in this study, we assess the effectiveness of KANs for complex HSI data classification. Moreover, to enhance the HSI classification accuracy obtained by the KANs, we develop and propose a Hybrid architecture utilizing 1D, 2D, and 3D KANs. To demonstrate the effectiveness of the proposed KAN architecture, we conducted extensive experiments on three newly created HSI benchmark datasets: QUH-Pingan, QUH-Tangdaowan, and QUH-Qingyun. The results underscored the competitive or better capability of the developed hybrid KAN-based model across these benchmark datasets over several other CNN- and ViT-based algorithms, including 1D-CNN, 2DCNN, 3D CNN, VGG-16, ResNet-50, EfficientNet, RNN, and ViT. The code are publicly available at (https://github.com/aj1365/HSIConvKAN)

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 大規模言語モデルのファクト記憶のためのスケーリング法則

Scaling Laws for Fact Memorization of Large Language Models ( http://arxiv.org/abs/2406.15720v1 )

ライセンス: Link先を確認

Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu,

(参考訳) 事実的かつ信頼性の高い応答を生成するために,Large Language Models (LLM) には,ファクト知識の記憶が不可欠である。しかし, LLM事実記憶の挙動は未解明のままである。本稿では,LLMの事実知識のスケーリング法則と,異なる種類の事実を記憶するLLMの挙動を解析する。 LLMの事実知識能力は,それぞれモデルサイズとトレーニングエポックスとの線形および負の指数法則関係を持つことがわかった。 Wikidataの事実全体を記憶するためには、100のエポックで1000Bの非埋め込みパラメータを持つLSMをトレーニングする必要がある。一方,LLMは未知の事実知識に基づいて一般化することができ,そのスケーリング法則は一般事前学習と類似している。さらに,LLMの事実記憶の互換性と嗜好について分析する。互換性のために、LLMは冗長な事実を統一的に記憶するのに苦労している。相関事実が同じ方向と構造を持つ場合のみ、LLMはそれらを相互に記憶することができる。このことは、冗長な事実に対するLLM記憶の非効率性を示している。優先的に、LLMはより頻繁で困難な事実を記憶することにより多くの注意を払っており、その後の事実は過去の事実の記憶を上書きし、低頻度の事実の記憶を著しく妨げている。本研究は,LLMのファクト・ナレッジ・ナレッジ・ラーニングの能力と特徴を明らかにし,LLMのファクト・ナレッジ・アジュメンテーションの方向性を示した。

Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# Clapton: 変分量子アルゴリズムにおける誤り除去のためのクリフォード支援問題変換

Clapton: Clifford-Assisted Problem Transformation for Error Mitigation in Variational Quantum Algorithms ( http://arxiv.org/abs/2406.15721v1 )

ライセンス: Link先を確認

Lennart Maximilian Seifert, Siddharth Dangwal, Frederic T. Chong, Gokul Subramanian Ravi,

(参考訳) 変分量子アルゴリズム(VQA)は、量子コンピューティングの短期において量子優位性を示すが、NISQデバイスの現在の能力を超える精度のレベルを要求する。量子デバイスエラーがVQAに与える影響を系統的に緩和するために,変分量子アルゴリズムにおける誤り除去のためのクラプトン:クリフォード支援問題変換を提案する。クラプトンは、与えられたVQA問題、古典的なデバイスノイズのシミュレーション可能なモデル、およびVQAの変動原理に対して古典的に推定された良い量子状態を利用する。これは、モデル化されたデバイスノイズの存在下で、既知の良いVQA状態のエネルギー推定を低くするために、VQA問題のハミルトニアンに変換を適用する。クラプトン仮説は、VQA問題の既知良状態が問題の理想的な基底状態に近く、デバイスノイズモデリングが合理的に正確である限り(どちらも概ね正しい)、クラプトン変換はVQA問題の基底状態に対するデバイスノイズの影響を著しく減少させ、VQA解の精度を増大させる。 Claptonはエンド・ツー・エンドのアプリケーション・ツー・デバイス・フレームワークとして構築され、1.7xから3.7xまでの平均VQA初期化改善を最大13.3xまで達成している。

Variational quantum algorithms (VQAs) show potential for quantum advantage in the near term of quantum computing, but demand a level of accuracy that surpasses the current capabilities of NISQ devices. To systematically mitigate the impact of quantum device error on VQAs, we propose Clapton: Clifford-Assisted Problem Transformation for Error Mitigation in Variational Quantum Algorithms. Clapton leverages classically estimated good quantum states for a given VQA problem, classical simulable models of device noise, and the variational principle for VQAs. It applies transformations on the VQA problem's Hamiltonian to lower the energy estimates of known good VQA states in the presence of the modeled device noise. The Clapton hypothesis is that as long as the known good states of the VQA problem are close to the problem's ideal ground state and the device noise modeling is reasonably accurate (both of which are generally true), then the Clapton transformation substantially decreases the impact of device noise on the ground state of the VQA problem, thereby increasing the accuracy of the VQA solution. Clapton is built as an end-to-end application-to-device framework and achieves mean VQA initialization improvements of 1.7x to 3.7x, and up to a maximum of 13.3x, over the state-of-the-art baseline when evaluated for a variety of scientific applications from physics and chemistry on noise models and real quantum devices.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# バランスの取れた複数アスペクトの発音評価のための音響的特徴の混合

Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment ( http://arxiv.org/abs/2406.15723v1 )

ライセンス: Link先を確認

Heejin Do, Wonjun Lee, Gary Geunbae Lee,

(参考訳) 自動発音評価において、近年の重点は、豊富なフィードバックを提供するために複数の側面を評価することにある。しかし、非母国語学習者の発話に対するマルチアスペクトスコアラベル付きデータを取得することは、しばしばスコアバランスの取れない分布につながる。本稿では,データ不足とスコア・ラベルの不均衡に対処するため,2つの音響的特徴混合手法を提案する。主に発音の良さを音響的特徴として用いて,発音評価に適した混合設計を調整した。さらに,音声認識結果と元の応答音素を比較し,誤発音のヒントを与えることによって,高精度な誤り率特徴を統合する。音響特性を効果的に混合することにより,音声オクタン762データセットの総合的なスコアリング性能が向上し,詳細な解析により未知の歪みを予測する可能性が示された。

In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly interpolating with the in-batch averaged feature, to address data scarcity and score-label imbalances. Primarily using goodness-of-pronunciation as an acoustic feature, we tailor mixup designs to suit pronunciation assessment. Further, we integrate fine-grained error-rate features by comparing speech recognition results with the original answer phonemes, giving direct hints for mispronunciation. Effective mixing of the acoustic features notably enhances overall scoring performances on the speechocean762 dataset, and detailed analysis highlights our potential to predict unseen distortions.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 多重蛍光画像における細胞の特徴抽出のための半教師付き変異オートエンコーダ

Semi-supervised variational autoencoder for cell feature extraction in multiplexed immunofluorescence images ( http://arxiv.org/abs/2406.15727v1 )

ライセンス: Link先を確認

Piumi Sandarenu, Julia Chen, Iveta Slapetova, Lois Browne, Peter H. Graham, Alexander Swarbrick, Ewan K. A. Millar, Yang Song, Erik Meijering,

(参考訳) デジタルイメージング技術の進歩は、細胞レベルでの腫瘍ミクロ環境と特定の免疫フェノタイプ間の相互作用を可視化し識別するために、多重免疫蛍光(mIF)画像を使うことへの関心を高めている。現在最先端の多重蛍光画像解析パイプラインは、単純な統計的および機械学習ベースのツールを用いて生成された形態的および染色強度に基づくメトリクスによって特徴づけられる細胞の特徴表現に依存している。しかし、これらの方法は細胞の複雑な表現を生成できない。我々は,mIF画像中のセルの特徴を抽出するために,潜伏部分空間を用いた教師付き変分オートエンコーダを用いた深層学習に基づくセル特徴抽出モデルを提案する。乳がん患者の1,093個の組織マイクロアレイコアから抽出した44,000個以上の多重多重蛍光細胞像のコホートを用いて細胞表現型分類を行い,本モデルの有効性を実証した。

Advancements in digital imaging technologies have sparked increased interest in using multiplexed immunofluorescence (mIF) images to visualise and identify the interactions between specific immunophenotypes with the tumour microenvironment at the cellular level. Current state-of-the-art multiplexed immunofluorescence image analysis pipelines depend on cell feature representations characterised by morphological and stain intensity-based metrics generated using simple statistical and machine learning-based tools. However, these methods are not capable of generating complex representations of cells. We propose a deep learning-based cell feature extraction model using a variational autoencoder with supervision using a latent subspace to extract cell features in mIF images. We perform cell phenotype classification using a cohort of more than 44,000 multiplexed immunofluorescence cell image patches extracted across 1,093 tissue microarray cores of breast cancer patients, to demonstrate the success of our model against current and alternative methods.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 安全な集合を破る:フェデレーションラーニングにおける集合的グラディエントからのラベル漏洩

Breaking Secure Aggregation: Label Leakage from Aggregated Gradients in Federated Learning ( http://arxiv.org/abs/2406.15731v1 )

ライセンス: Link先を確認

Zhibo Wang, Zhiwei Chang, Jiahui Hu, Xiaoyi Pang, Jiacheng Du, Yongle Chen, Kui Ren,

(参考訳) Federated Learning (FL)は、勾配反転攻撃(GIA)の下でプライバシー上の脆弱性を示し、個々の勾配から個人情報を抽出する。プライバシーを強化するため、FLはセキュアアグリゲーション(SA)を導入し、サーバが個別の勾配を得るのを防ぐ。本稿では,SAをバイパスし,個々のクライアントのプライベートラベルを復元するために,ステルスなラベル推論攻撃を提案する。具体的には、SAの実装後にのみ得られる集約勾配からラベル推論を理論的に解析する。その結果、最終完全連結層(FCL)の入力(埋め込み)と出力(論理)が勾配分解とラベル復元に寄与していることが判明した。 FCLの埋め込みとロジットをプリセットするために、元のモデルで単一のバッチ正規化(BN)層のパラメータを単に修正して漁モデルを構築する。クライアント固有の漁獲モデルを提供することで、サーバは、期待される埋め込みを伴う線形システムと集約された勾配を係数として解決することにより、FCLのバイアスに関する個々の勾配を導出することができる。すると、各クライアントのラベルは、予め設定されたロジットとFCLのバイアスの勾配に基づいて正確に計算できる。大規模な実験により,様々なデータセットやモデルアーキテクチャ上で,100倍の精度で大規模ラベル回復を実現することができた。

Federated Learning (FL) exhibits privacy vulnerabilities under gradient inversion attacks (GIAs), which can extract private information from individual gradients. To enhance privacy, FL incorporates Secure Aggregation (SA) to prevent the server from obtaining individual gradients, thus effectively resisting GIAs. In this paper, we propose a stealthy label inference attack to bypass SA and recover individual clients' private labels. Specifically, we conduct a theoretical analysis of label inference from the aggregated gradients that are exclusively obtained after implementing SA. The analysis results reveal that the inputs (embeddings) and outputs (logits) of the final fully connected layer (FCL) contribute to gradient disaggregation and label restoration. To preset the embeddings and logits of FCL, we craft a fishing model by solely modifying the parameters of a single batch normalization (BN) layer in the original model. Distributing client-specific fishing models, the server can derive the individual gradients regarding the bias of FCL by resolving a linear system with expected embeddings and the aggregated gradients as coefficients. Then the labels of each client can be precisely computed based on preset logits and gradients of FCL's bias. Extensive experiments show that our attack achieves large-scale label recovery with 100\% accuracy on various datasets and model architectures.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 消費電力を最適化するためのAI駆動アプローチ:総合的な調査

AI-Driven Approaches for Optimizing Power Consumption: A Comprehensive Survey ( http://arxiv.org/abs/2406.15732v1 )

ライセンス: Link先を確認

Parag Biswas, Abdur Rashid, Angona Biswas, Md Abdullah Al Nasim, Kishor Datta Gupta, Roy George,

(参考訳) 電力最適化が重要である主な理由は、環境効果の低減、運転コストの低減、そして、現在の世代と将来の世代に対する安定的で持続可能なエネルギー供給である。電力最適化は、エネルギーをより効果的に利用し、廃棄物を削減し、資源の利用を最適化する。今日の世界では、電力最適化と人工知能(AI)の統合は、エネルギーの生成、使用、分散の方法を変えるために不可欠である。 AI駆動のアルゴリズムと予測分析により、電力使用傾向のリアルタイム監視と分析が可能となり、動的修正によって需要を効果的に満たすことができる。インテリジェントシステムの使用により、電力消費が異なるセクターで最適化されると、効率性と持続可能性が向上する。本研究は、電力最適化に使用されるいくつかのAI技術と、電力消費の異なる分野にわたる様々なインテリジェントシステム応用ドメインの研究のための文献の方法論的分析を網羅し、それらを評価することにより17種類の研究手法の性能と成果を特定し、その強度と限界に関する貴重な知見を抽出することを目的とする。さらに、電力消費最適化のためのAIの統合における今後の方向性について概説する。

Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) integration are essential to changing the way energy is produced, used, and distributed. Real-time monitoring and analysis of power usage trends is made possible by AI-driven algorithms and predictive analytics, which enable dynamic modifications to effectively satisfy demand. Efficiency and sustainability are increased when power consumption is optimized in different sectors thanks to the use of intelligent systems. This survey paper comprises an extensive review of the several AI techniques used for power optimization as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of power consumption.This literature review identifies the performance and outcomes of 17 different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. Furthermore, this article outlines future directions in the integration of AI for power consumption optimization.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# RankAdaptor: 階層型動的低ランク適応による構造解析

RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs ( http://arxiv.org/abs/2406.15734v1 )

ライセンス: Link先を確認

Changhai Zhou, Shijie Han, Shiyang Zhang, Shichao Weng, Zekai Liu, Cheng Jin,

(参考訳) 大規模言語モデル(LLM)の効率的な圧縮は、ますます人気が高まっている。しかし, 圧縮LDMの精度の回復は依然として大きな課題である。標準低ランク適応 (LoRA) を用いた構造解析は、現在のLLM圧縮において一般的な手法である。構造的なプルーニングでは、モデルアーキテクチャは不均一に修正され、固定ランクの標準のLoRAを介して、様々な下流タスクにおいて最適なパフォーマンスをもたらす。この問題に対処するために, 階層的動的階数スケジューリングを用いた効率的な微調整手法である RankAdaptor を導入する。軽量な性能モデルを用いて、微調整時に異なるランクを決定するエンド・ツー・エンドの自動最適化フローを開発した。一般的なベンチマークに関する総合的な実験によると、RancAdaptorは標準のLoRAより一貫して優れており、異なるプルーニング設定に対して構造的なプルーニングを行っている。トレーニング可能なパラメータを増やさなくても、RandAdaptorは、標準的なLoRAと比較して、プルーンドモデルとオリジナルのモデルのリカバリの間の精度パフォーマンスギャップをさらに減らすことができる。

The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various downstream tasks via standard LoRA with fixed rank. To address this problem, we introduce RankAdaptor, an efficient fine-tuning method with hierarchical dynamic rank scheduling for pruned LLMs. An end-to-end automatic optimization flow is developed that utilizes a lightweight performance model to determine the different ranks during fine-tuning. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms standard LoRA with structural pruning over different pruning settings. Without increasing the trainable parameters, RankAdaptor further reduces the accuracy performance gap between the recovery of the pruned model and the original model compared to standard LoRA.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 画像-映像拡散モデルにおける条件付き画像漏洩の同定と解法

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model ( http://arxiv.org/abs/2406.15735v1 )

ライセンス: Link先を確認

Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu,

(参考訳) 拡散モデルは、画像間(I2V)生成においてかなり進歩している。しかし、そのようなモデルは完全には理解されていない。本稿では,I2V拡散モデル(I2V-DMs)における重要な問題,すなわち条件付き画像リークについて報告する。 I2V-DMは、ノイズの多い入力からクリーンなビデオを予測する重要なタスクを無視し、大きなステップで条件付き画像を過度に頼りにしがちである。さらに,プラグイン・アンド・プレイ戦略を提示することで,推論とトレーニングの両面からこの課題に対処する。まず、I2V-DMの信頼性の低い遅延時間ステップを回避するために、早い段階から生成プロセスを開始するトレーニングフリー推論戦略を導入し、トレーニング-推論ギャップを効果的に橋渡しするために、KLの分散を最小化することにより、最適な解析式(Analytic-Init)による初期ノイズ分布を導出する。第2に,条件画像リークを軽減するため,条件画像の時間依存性雑音分布を設計し,条件画像に十分干渉するために,大規模ステップでの高雑音レベルを優先する。収集したオープンドメイン画像ベンチマークとUCF101データセットを用いて,様々なI2V-DM上でこれらの戦略を検証する。画像のアライメントや時間的一貫性を損なうことなく、よりダイナミックで自然な動画を制作することで、本手法がベースラインより優れていることを示す。プロジェクトページ: \url{https://cond-image-leak.github.io/}。

Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean video from noisy inputs, which results in videos lacking dynamic and vivid motion. We further address this challenge from both inference and training aspects by presenting plug-and-play strategies accordingly. First, we introduce a training-free inference strategy that starts the generation process from an earlier time step to avoid the unreliable late-time steps of I2V-DMs, as well as an initial noise distribution with optimal analytic expressions (Analytic-Init) by minimizing the KL divergence between it and the actual marginal distribution to effectively bridge the training-inference gap. Second, to mitigate conditional image leakage during training, we design a time-dependent noise distribution for the conditional image, which favors high noise levels at large time steps to sufficiently interfere with the conditional image. We validate these strategies on various I2V-DMs using our collected open-domain image benchmark and the UCF101 dataset. Extensive results demonstrate that our methods outperform baselines by producing videos with more dynamic and natural motion without compromising image alignment and temporal consistency. The project page: \url{https://cond-image-leak.github.io/}.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# 子どもの数学オリンピックにおける視覚・言語モデルの評価

Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads ( http://arxiv.org/abs/2406.15736v1 )

ライセンス: Link先を確認

Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum,

(参考訳) 近年,ChatGPTやGeminiなど,大規模視覚・言語モデル(LVLM)の汎用的問題解決能力が著しく進歩している。これらのブレークスルーのいくつかは、高次の認知スキルを必要とするさまざまなタスクにおいて、AIモデルが人間の能力を上回っているようにさえ見えます。現在の大きなAIモデルは、人間のように一般化された問題解決が可能か? しかし、ジョイントビジョンとテキスト推論のためのAI能力の体系的な分析は、現在の科学文献に欠けている。本稿では, このギャップを埋めるために, 子どものオリンピアードのビジュオ言語問題を用いて, 数学的, アルゴリズム的推論能力について, 最先端のLVLMを評価した。具体的には,1～12年生の子どもを対象とする国際コンペである数学カンガルー(MK)オリンピアード(Olympiad)の問題について考察する。 MKのパズルを用いて、2020-2024年の840個の問題からなるSMART-840というデータセットを作成しました。我々のデータセットを用いて,LVLMのパワーを数学的推論に基づいて分析し,パズルに対する反応は,子供のそれと直接比較する方法を提供する。以上の結果から,近代のLVLMは,高学年の問題解決において,より強力な推論能力を示す一方で,幼児向けの問題に正しく答える基盤が欠如していることが示唆された。さらに分析したところ、AIモデルの推論能力と幼児の推論能力の間に有意な相関は見られず、それらの能力は、子どもの数学や論理のスキルの根底にある累積的知識とは異なるタイプの推論に基づいているようである。

Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at children from grades 1-12, that tests children's deeper mathematical abilities using puzzles that are appropriately gauged to their age and skills. Using the puzzles from MK, we created a dataset, dubbed SMART-840, consisting of 840 problems from years 2020-2024. With our dataset, we analyze LVLMs power on mathematical reasoning; their responses on our puzzles offer a direct way to compare against that of children. Our results show that modern LVLMs do demonstrate increasingly powerful reasoning skills in solving problems for higher grades, but lack the foundations to correctly answer problems designed for younger children. Further analysis shows that there is no significant correlation between the reasoning capabilities of AI models and that of young children, and their capabilities appear to be based on a different type of reasoning than the cumulative knowledge that underlies children's mathematics and logic skills.

翻訳日:2024-06-25 20:54:52 公開日:2024-06-22

# Ladder: LLMベースの機械翻訳を次のレベルに上げるモデルに依存しないフレームワーク

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level ( http://arxiv.org/abs/2406.15741v1 )

ライセンス: Link先を確認

Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu,

(参考訳) GPT-4のような汎用大規模言語モデル(LLM)は、広範囲なウェブコンテンツを活用することで機械翻訳(MT)において顕著な進歩を遂げた。一方、翻訳特化LDMは、ドメイン固有の単言語コーパスを事前学習し、人手による翻訳データによる微調整によって構築される。優れた性能にもかかわらず、これらの手法は前例のない規模の計算とデータを必要とするか、実際の人間の編集とアノテーションの努力を必要とする。本稿では,MT 用汎用 LLM の性能を向上する新しいモデル非依存・コスト効率ツールである Ladder を開発した。トレーニング中、我々は容易にハードなスキーマで階層的な微調整戦略を提案し、ラダーの精錬性能を徐々に改善した。トレーニングされたLadderは、任意の汎用LLMとシームレスに統合され、翻訳性能が向上する。 Gemma-2B/7B をバックボーンとして使用することにより、Ladder-2B は最上位のオープンソースモデル(例えば、BigTranslate-13B を +6.91 BLEU と +3.52 COMET for XX-En で精製)に生の翻訳を高め、Ladder-7B は最先端の GPT-4 と同等のモデル性能をさらに向上させることができる。広範囲にわたるアブレーションと分析は、様々な環境でラダーの有効性を裏付ける。私たちのコードはhttps://github.com/fzp0424/Ladderで利用可能です。

General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedented scale of computing and data or substantial human editing and annotation efforts. In this paper, we develop Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT. Ladder is trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost. During training, we propose a hierarchical fine-tuning strategy with an easy-to-hard schema, improving Ladder's refining performance progressively. The trained Ladder can be seamlessly integrated with any general-purpose LLMs to boost their translation performance. By utilizing Gemma-2B/7B as the backbone, Ladder-2B can elevate raw translations to the level of top-tier open-source models (e.g., refining BigTranslate-13B with +6.91 BLEU and +3.52 COMET for XX-En), and Ladder-7B can further enhance model performance to be on par with the state-of-the-art GPT-4. Extensive ablation and analysis corroborate the effectiveness of Ladder in diverse settings. Our code is available at https://github.com/fzp0424/Ladder

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# プログラム可能な変分推論による確率計画法

Probabilistic Programming with Programmable Variational Inference ( http://arxiv.org/abs/2406.15742v1 )

ライセンス: Link先を確認

McCoy R. Becker, Alexander K. Lew, Xiaoyan Wang, Matin Ghavami, Mathieu Huot, Martin C. Rinard, Vikash K. Mansinghka,

(参考訳) 現代の確率的プログラミング言語 (PPL) でサポートされている高度なモンテカルロ法と比較すると、PPLは変分推論 (VI) をサポートしていない: ユーザーは通常、PPLバックエンドでモノリシックに実装される変分目的と勾配推定器の事前定義された選択に制限される。本稿では,構成プログラム変換に基づくPPLの変分推論を支援するための,よりモジュラーなアプローチを提案する。提案手法では,変動目的をプログラムとして表現し,ユーザが定義したモデルと変分族の下での期待値の密度の計算に一級構成を用いる。次に、これらのプログラムを体系的に非バイアス勾配推定器に変換し、それらが定義する目的を最適化する。我々の設計は、自動微分、密度蓄積、トレーシング、非バイアス勾配推定戦略の適用など、多くの相互作用する関心事に関するモジュラー推論を可能にする。さらに,PPL における VI の既存サポートと比較して,その設計は3つの軸に沿った表現性の向上を図っている。(1) オプションの固定メニューではなく,ユーザ定義の変動目標のオープンなセットのサポート,(2) 現在の PPL では自動化されていない勾配推定戦略の組合せ空間のサポート,(3) 近似境界化と正規化(モンテカルロ推論のみに導入)のための構成をサポートするため,より広範なモデルと変動家族のクラスをサポートする。我々は、Gen確率型プログラミングシステムの拡張(JAXで実装されたgenjax.vi)にアプローチを実装し、いくつかの深い生成モデリングタスクを評価し、手書き実装と比較してパフォーマンスのオーバーヘッドが最小限であり、オープンソースのPPLと競合する性能を示す。

Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design enables modular reasoning about many interacting concerns, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today's PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced only for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (genjax.vi, implemented in JAX), and evaluate on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# CasModaTest: 単体テスト生成のためのケースドとモデルに依存しないセルフダイレクトフレームワーク

CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation ( http://arxiv.org/abs/2406.15743v1 )

ライセンス: Link先を確認

Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang,

(参考訳) 多くの機械学習(ML)ベースのユニットテスト生成アプローチが提案され、実際に顕著なパフォーマンスを達成したが、有効性や実用性にはいくつかの制限がある。より正確には、(1) 既存のMLベースのアプローチは、主にテストオラクル生成に焦点を当てた単体テストの部分的内容を生成し、(2) テストプレフィックスをテストオラクルと意味的にミスマッチさせ、(3) は、クローズドソースモデルに強く結びついており、最終的にはデータセキュリティを損なう。本稿では,CasModaTestを提案する。CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest,CasModaTest。そして、手動で大規模なデモプールを構築し、CasModaTestに高品質なテストプレフィックスとテストオラクルの例を提供します。最後に、CasModaTestは生成されたテストプレフィックスとテストオラクルを自動的に組み立て、それらの有効性をチェックするためにコンパイルまたは実行します。 CasModaTestの有効性を評価するために、広く使われているデータセット(Defects4J)上で大規模な実験を行い、2つのパフォーマンス対策を考慮し、4つの最先端(SOTA)アプローチと比較する。実験の結果、CasModaTestは全てのSOTAをかなり改善した(精度は60.62%-352.55%、焦点法は2.83%-87.27%)。また、異なるオープンソース LLM 上で CasModaTest を実験した結果、CasModaTest は SOTA (39.82%-293.96% と 9.25%-98.95% ) に対して、エンドツーエンドの単体テスト生成において大幅な改善が達成できることがわかった。

Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle semantically; and (3) are highly bound with the close-sourced model, eventually damaging data security. We propose CasModaTest, a cascaded, model-agnostic, and end-to-end unit test generation framework, to alleviate the above limitations with two cascaded stages: test prefix generation and test oracle generation. Then, we manually build large-scale demo pools to provide CasModaTest with high-quality test prefixes and test oracles examples. Finally, CasModaTest automatically assembles the generated test prefixes and test oracles and compiles or executes them to check their effectiveness, optionally appending with several attempts to fix the errors occurring in compiling and executing phases. To evaluate the effectiveness of CasModaTest, we conduct large-scale experiments on a widely used dataset (Defects4J) and compare it with four state-of-the-art (SOTA) approaches by considering two performance measures. The experimental results indicate that CasModaTest outperforms all SOTAs with a substantial improvement (i.e., 60.62%-352.55% in terms of accuracy, 2.83%-87.27% in terms of focal method coverage). Besides, we also conduct experiments of CasModaTest on different open-source LLMs and find that CasModaTest can also achieve significant improvements over SOTAs (39.82%-293.96% and 9.25%-98.95% in terms of accuracy and focal method coverage, respectively) in end-to-end unit test generation

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 外部励起を考慮した未知確率力学系のモデル化

Modeling Unknown Stochastic Dynamical System Subject to External Excitation ( http://arxiv.org/abs/2406.15747v1 )

ライセンス: Link先を確認

Yuan Chen, Dongbin Xiu,

(参考訳) 本稿では,時間依存励起や制御信号を受ける確率系の未知の確率力学系を学習するための数値的手法を提案する。我々の基本的な前提は、確率系の支配方程式は利用できないということである。しかし、ある既知の励起信号と対応するシステム応答からなる入出力(I/O)データの短いバーストが利用可能である。十分な量のI/Oデータが得られると、未知のダイナミクスを学習し、トレーニングデータにない任意の励起信号を受けるシステムの確率応答の正確な予測モデルを生成することができる。本手法は,(1)学習をパラメータ化形式に変換するためのトレーニングI/Oデータの局所近似,(2)未知の確率フローマップを分布に近似する生成モデル,の2つの重要な要素を有する。提案手法を詳細に提示した後, 提案手法の性能, 特に長期システム予測について, 総合的な数値例を提示する。

We present a numerical method for learning unknown nonautonomous stochastic dynamical system, i.e., stochastic system subject to time dependent excitation or control signals. Our basic assumption is that the governing equations for the stochastic system are unavailable. However, short bursts of input/output (I/O) data consisting of certain known excitation signals and their corresponding system responses are available. When a sufficient amount of such I/O data are available, our method is capable of learning the unknown dynamics and producing an accurate predictive model for the stochastic responses of the system subject to arbitrary excitation signals not in the training data. Our method has two key components: (1) a local approximation of the training I/O data to transfer the learning into a parameterized form; and (2) a generative model to approximate the underlying unknown stochastic flow map in distribution. After presenting the method in detail, we present a comprehensive set of numerical examples to demonstrate the performance of the proposed method, especially for long-term system predictions.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# TacoLM: Gated Attention Equated Codec Language Modelは音声合成のための効率的なゼロショットテキストである

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers ( http://arxiv.org/abs/2406.15752v1 )

ライセンス: Link先を確認

Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen,

(参考訳) ニューラルコーデック言語モデル(LM)は、ゼロショットテキスト音声合成(TTS)において強力な機能を示した。しかし、コーデックLMは、自動回帰特性とテキストとオーディオ間の暗黙のアライメントのため、推論速度と安定性の制限に悩まされることが多い。本研究では,これらの課題に対処するために,新しいニューラルコーデックLM,すなわちTacoLMを導入する。特に、TacoLMは、トレーニングと推論効率を改善し、モデルサイズを小さくするゲートアテンション機構を導入している。一方、デコーダ層毎に追加のゲートクロスアテンション層が含まれており、合成音声の効率性と内容精度を向上させる。 The evaluation of the Librispeech corpus, proposed TacoLM achieve a better word error rate, speaker similarity and mean opinion score, with 90% less parameters and 5.2 times up than VALL-E。デモとコードはhttps://ereboas.github.io/TacoLM/.comで公開されている。

Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, TacoLM introduces a gated attention mechanism to improve the training and inference efficiency and reduce the model size. Meanwhile, an additional gated cross-attention layer is included for each decoder layer, which improves the efficiency and content accuracy of the synthesized speech. In the evaluation of the Librispeech corpus, the proposed TacoLM achieves a better word error rate, speaker similarity, and mean opinion score, with 90% fewer parameters and 5.2 times speed up, compared with VALL-E. Demo and code is available at https://ereboas.github.io/TacoLM/.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 学習したリワード関数を最適化する危険性:低トレーニングエラーは低レギュレーションを保証しない

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret ( http://arxiv.org/abs/2406.15753v1 )

ライセンス: Link先を確認

Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse,

(参考訳) 強化学習では、意図したタスクをキャプチャする報酬関数を指定することが非常に難しい。リワード学習は報酬関数を学習することでこの問題に対処することを目的としている。しかし、学習した報奨モデルはトレーニング分布に誤差が低く、その後、大きな後悔を伴うポリシーを生成する。このような報酬モデルにはミスマッチがあると言っています。エラー-回帰ミスマッチの主な原因は、ポリシー最適化中に一般的に発生する分布シフトである。本稿では,報奨モデルの十分低いテスト誤差が最悪のケースの後悔の少ないことを数学的に証明するが,任意の固定されたテスト誤差に対して,エラー-回帰ミスマッチを許容する現実的なデータ分布が存在することを示す。次に、RLHFのような手法でよく用いられるポリシー正則化手法を用いても、同様の問題が持続することを示す。我々の理論的結果は、学習報酬モデルの品質を測定する新しい方法を開発することの重要性を強調している。

In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of developing new ways to measure the quality of learned reward models.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 声道モデルのためのマルチモーダルセグメンテーション

Multimodal Segmentation for Vocal Tract Modeling ( http://arxiv.org/abs/2406.15754v1 )

ライセンス: Link先を確認

Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli,

(参考訳) 解釈可能な音声処理と言語学のための調音表現を構築するためには,声道の正確なモデリングが必要である。しかし, 声道モデリングは, 内部調音器の多くが外的モーションキャプチャー技術から除外されているため, 困難である。リアルタイム磁気共鳴イメージング(RT-MRI)は、音声中の内音節の正確な動きを計測するが、MRIの注釈付きデータセットのサイズは、時間的・計算的に高価なラベル付け法によって制限される。まず、視覚のみのセグメンテーション手法を用いて、RT-MRIビデオにディープラベリング戦略を提案する。次に、音声を用いたマルチモーダルアルゴリズムを導入し、発声器のセグメンテーションを改善する。今回我々は,MRIビデオセグメンテーションにおける声道モデリングのための新しいベンチマークを作成し,75話者RT-MRIデータセットのラベルをリリースし,声道の公的なRT-MRIデータのラベルを9倍に増やした。コードとデータセットのラベルは \url{rishiraij.github.io/multimodal-mri-avatar/} にある。

Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 弱教師付きセマンティックセグメンテーションのためのきめ細かい背景表現

Fine-grained Background Representation for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2406.15755v1 )

ライセンス: Link先を確認

Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon,

(参考訳) 画像レベルのラベルから信頼できる疑似マスクを生成することは、空間情報の欠如により、弱教師付きセマンティックセグメンテーション(WSSS)タスクにおいて困難である。クラスアクティベーションマップ(CAM)ベースのソリューションは、不審な背景(BG)画素から前景(FG)オブジェクトを識別し、積分対象領域を学習する。本稿では,多様なBGセマンティクスを発見し,表現し,共起問題に対処するシンプルな背景表現(FBR)法を提案する。 BG表現のためのクラスプロトタイプやピクセルレベルの機能の使用を放棄します。代わりに、我々は、細粒度BGセマンティック情報を捕捉し、ピクセル対NROIのコントラストを実行し、紛らわしいBGピクセルを区別するために、新しいプリミティブ、負の関心領域(NROI)を開発する。また,FGの負をフライでマイニングするアクティブサンプリング戦略を提案し,地中コントラスト学習を効果的に行い,対象領域全体を活性化させる。設計の単純さと使い勝手の良さにより,提案手法は様々なモデルにシームレスに接続することができ,ベンチマーク間でWSSS設定の下で新たな最先端結果が得られる。画像レベルのラベルのみを監督として活用し,Pascal VocとMS COCOテストセットで73.2 mIoUと45.6 mIoUのセグメンテーション結果を得た。さらに,サリエンシマップを追加の監視信号(I+S)として組み込むことで,Pascal Vocテストセット上で74.9 mIoUを得ることができた。同時に、我々のFBRアプローチは、弱教師付きインスタンスセグメンテーション(WSIS)タスクにおいて有意義なパフォーマンス向上を示し、その堅牢性と多様なドメインにわたる強力な一般化能力を示している。

Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics and address the co-occurring problems. We abandon using the class prototype or pixel-level features for BG representation. Instead, we develop a novel primitive, negative region of interest (NROI), to capture the fine-grained BG semantic information and conduct the pixel-to-NROI contrast to distinguish the confusing BG pixels. We also present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning to activate the entire object region. Thanks to the simplicity of design and convenience in use, our proposed method can be seamlessly plugged into various models, yielding new state-of-the-art results under various WSSS settings across benchmarks. Leveraging solely image-level (I) labels as supervision, our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively. Furthermore, by incorporating saliency maps as an additional supervision signal (I+S), we attain 74.9 mIoU on Pascal Voc test set. Concurrently, our FBR approach demonstrates meaningful performance gains in weakly-supervised instance segmentation (WSIS) tasks, showcasing its robustness and strong generalization capabilities across diverse domains.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 量子符号の摂動安定性と誤差補正しきい値

Perturbative stability and error correction thresholds of quantum codes ( http://arxiv.org/abs/2406.15757v1 )

ライセンス: Link先を確認

Yaodong Li, Nicholas O'Dea, Vedika Khemani,

(参考訳) 位相的に順序付けられた位相は局所摂動に対して安定であり、位相的量子誤り訂正符号は局所誤差に対するしきい値を持つ。一般CSS符号と古典線形符号を復号化するための古典統計力学モデルを構築することにより、安定性の2つの概念を結合する。提案手法は,非相関ビットフリップおよび位相フリップ誤差下での補正成功確率をエンコードし,焼成障害を伴う一般化Z2格子ゲージ理論を同時に記述する。後者のクリーンな限界は、誤差が摂動XまたはZ磁場に変換されるとき、正確には対応する量子コードハミルトニアンの離散化された仮想時間パス積分である。誤差補正の考慮により、そのような一般化されたZ2格子ゲージ理論の一般次パラメータを定義し、誤差補正の成功確率によって一般に低い値となることを示す。 LDPC条件を満たすCSS符号に対して、対応する格子ゲージ理論の低温秩序相の存在を証明し、特にユークリッド空間的局所性に欠ける場合や、符号速度がゼロでない場合について述べる。さらに、これらの結果は、連続虚数時間の極限で得られた対応する摂動量子ハミルトニアンの安定相の証拠を与えると主張する。そのため、格子ゲージ理論における空間的および時間的欠陥を区別する。空間的欠陥の高エネルギーコストは「メモリ実験」の成功に対応し、基底状態間のエネルギー分割を抑制する一方、時間的欠陥の高エネルギーコストは「安定実験」の成功に対応し、局所的な励起に対するゼロではないギャップを指し示している。

Topologically-ordered phases are stable to local perturbations, and topological quantum error-correcting codes enjoy thresholds to local errors. We connect the two notions of stability by constructing classical statistical mechanics models for decoding general CSS codes and classical linear codes. Our construction encodes correction success probabilities under uncorrelated bit-flip and phase-flip errors, and simultaneously describes a generalized Z2 lattice gauge theory with quenched disorder. We observe that the clean limit of the latter is precisely the discretized imaginary time path integral of the corresponding quantum code Hamiltonian when the errors are turned into a perturbative X or Z magnetic field. Motivated by error correction considerations, we define general order parameters for all such generalized Z2 lattice gauge theories, and show that they are generally lower bounded by success probabilities of error correction. For CSS codes satisfying the LDPC condition and with a sufficiently large code distance, we prove the existence of a low temperature ordered phase of the corresponding lattice gauge theories, particularly for those lacking Euclidean spatial locality and/or when there is a nonzero code rate. We further argue that these results provide evidence to stable phases in the corresponding perturbed quantum Hamiltonians, obtained in the limit of continuous imaginary time. To do so, we distinguish space- and time-like defects in the lattice gauge theory. A high free-energy cost of space-like defects corresponds to a successful "memory experiment" and suppresses the energy splitting among the ground states, while a high free-energy cost of time-like defects corresponds to a successful "stability experiment" and points to a nonzero gap to local excitations.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# EDGE-LLM:Layerwise Unified CompressionとAdaptive Layer Tuning and Votingによるエッジデバイス上での効率的な大言語モデル適応の実現

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting ( http://arxiv.org/abs/2406.15758v1 )

ライセンス: Link先を確認

Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin,

(参考訳) エッジデバイスへの大規模言語モデル(LLM)の効率的な適応は、継続的かつプライバシ保護の適応と推論を必要とするアプリケーションにとって不可欠である。しかし、既存のチューニング技術は高い計算とメモリオーバーヘッドのために不足している。そこで我々はEdge-LLMと呼ばれる計算・メモリ効率の高いLLMチューニングフレームワークを導入する。具体的には,レイヤワイド統一圧縮 (LUC) 技術を用いて,レイヤワイドプルーニング空間と量子化ビット幅ポリシーの生成による計算オーバーヘッドの低減,(2)バックプロパゲーション深さの低減によるメモリオーバーヘッドの低減のための適応層チューニングと投票方式,(3)LUCが導入した不規則な計算パターンと適応層チューニングを補完するハードウェアスケジューリング戦略により,効率的な計算とデータ移動を実現する。大規模な実験では、Edge-LLMは2.92倍のスピードアップと4倍のメモリオーバーヘッド削減を実現している。私たちのコードはhttps://github.com/GATECH-EIC/Edge-LLMで利用可能です。

Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning, thereby achieving efficient computation and data movements. Extensive experiments demonstrate that Edge-LLM achieves a 2.92x speed up and a 4x memory overhead reduction as compared to vanilla tuning methods with comparable task accuracy. Our code is available at https://github.com/GATECH-EIC/Edge-LLM

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# コンセプトドリフトのための新しいベッティング機能を備えたICMアンサンブル

ICM Ensemble with Novel Betting Functions for Concept Drift ( http://arxiv.org/abs/2406.15760v1 )

ライセンス: Link先を確認

Charalambos Eliades, Harris Papadopoulos,

(参考訳) 本研究は,CD(Concept Drift)に対処する改良されたインダクティブ・コンフォーマル・マルティンゴール (ICM) アプローチを導入することで,これまでの成果を裏付けるものである。具体的には,これまでに提案したCAUTIOUSベッティング機能を強化し,複数の密度推定器を組み込んで検出能力を向上する。また、このベッティング関数と、これまでICMフレームワークで利用されていなかった2つのベース推定器(補間ヒストグラムと近接近傍密度推定器)を組み合わせる。 ICMとICMのアンサンブルの両方を用いて,これらの拡張を評価する。後者では,アンサンブルサイズが予測精度および利用可能な予測数に与える影響を総合的に検討する。評価実験の結果,提案手法は従来の手法を上回り,その多くが現代の3つの最先端技術に勝っていることがわかった。

This study builds upon our previous work by introducing a refined Inductive Conformal Martingale (ICM) approach for addressing Concept Drift (CD). Specifically, we enhance our previously proposed CAUTIOUS betting function to incorporate multiple density estimators for improving detection ability. We also combine this betting function with two base estimators that have not been previously utilized within the ICM framework: the Interpolated Histogram and Nearest Neighbor Density Estimators. We assess these extensions using both a single ICM and an ensemble of ICMs. For the latter, we conduct a comprehensive experimental investigation into the influence of the ensemble size on prediction accuracy and the number of available predictions. Our experimental results on four benchmark datasets demonstrate that the proposed approach surpasses our previous methodology in terms of performance while matching or in many cases exceeding that of three contemporary state-of-the-art techniques.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# ワッサーシュタイン勾配流の観点からの数値タブラルデータインプットの拡散モデルの再考

Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow ( http://arxiv.org/abs/2406.15762v1 )

ライセンス: Link先を確認

Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang,

(参考訳) 拡散モデル (DM) はMDI (Missing Data Imputation) において注目されている。不正確なImputationは、DMの自然に試料を多様化する生成過程から生じる。 (2)。難易度トレーニングは、モデルトレーニング段階におけるマスクマトリックスに必要な複雑な設計に由来する。数値表付きデータセットの領域内でこれらの懸念に対処するため、KnewImp(Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation)と呼ばれる新しい原理のアプローチを導入する。具体的には、ワッサースタイン勾配流(WGF)の枠組みに基づいて、第一号がDMベースのMDIで暗黙的に最大化されるコスト汎関数とMDIの目的と多角化を動機とする非負の項に等しいことを最初に証明する。そこで我々は,分散化分散負のエントロピーを持つ新しいコスト関数を設計し,WGFフレームワーク内でのKnewImpアプローチとカーネルヒルベルト空間の再生を導出する。その後、KnewImpの計算手順は、関節分布に関連する他のコスト関数から導出できることを証明し、マスクマトリックスの必要性を排除し、自然に対処する問題(2)を解決した。我々の提案したKnewImpアプローチは,既存の最先端手法を著しく上回っている。

Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within the realm of numerical tabular datasets, we introduce a novel principled approach termed Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation (KnewImp). Specifically, based on Wasserstein gradient flow (WGF) framework, we first prove that issue (1) stems from the cost functionals implicitly maximized in DM-based MDI are equivalent to the MDI's objective plus diversification-promoting non-negative terms. Based on this, we then design a novel cost functional with diversification-discouraging negative entropy and derive our KnewImp approach within WGF framework and reproducing kernel Hilbert space. After that, we prove that the imputation procedure of KnewImp can be derived from another cost functional related to the joint distribution, eliminating the need for the mask matrix and hence naturally addressing issue (2). Extensive experiments demonstrate that our proposed KnewImp approach significantly outperforms existing state-of-the-art methods.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# AllMatch: セミスーパービジョンの学習のために、ラベルのないすべてのデータを爆発させる

AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning ( http://arxiv.org/abs/2406.15763v1 )

ライセンス: Link先を確認

Zhiyu Wu, Jinshi Cui,

(参考訳) 既存の半教師付き学習アルゴリズムでは、擬似ラベル付けおよび整合性制御技術を用いて、未ラベルサンプルの監視信号を導入する。しきい値に基づく擬似ラベルの本来の限界を克服するために、従来の研究では、信頼度閾値と、未ラベルデータに基づく予測によって推定されるモデルの進化的学習状態との整合を試みてきた。本稿では,分類器の重み付けにより,カテゴリ間での差分学習状態を反映し,クラス固有の適応しきい値機構を提案する。さらに、最適しきい値スキームでさえ、ラベル付けされていないサンプルを廃棄する問題を解決できないことを考えると、バイナリ分類整合性規制アプローチは、全てのラベル付けされていないサンプルに対して負のオプションから候補クラスを区別するように設計されている。以上の戦略を組み合わせることで、擬似ラベル精度の向上とラベルなしデータの100%利用率を実現する、AllMatchという新しいSSLアルゴリズムを提案する。我々は、バランスの取れた設定とバランスの取れていない設定の両方を含む、複数のベンチマークに対するアプローチを広範囲に評価した。その結果、AllMatchは既存の最先端メソッドよりも一貫して優れています。

Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100\% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# TP-DRSeg:Explicit Text-Prompts Assisted SAMによる糖尿病網膜症病変分画の改善

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM ( http://arxiv.org/abs/2406.15764v1 )

ライセンス: Link先を確認

Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge,

(参考訳) SAM(Segment Anything Model)のような大規模基盤モデルの最近の進歩は、様々なタスクにおいて大きな可能性を証明している。それらの進歩にもかかわらず、これらのモデルは、特に糖尿病網膜症(DR)病変の微妙な相違を認識する際に、専門的な医用画像解析における課題に直面している。本稿では,テキストプロンプされたDR病変のセグメンテーションのためにSAMをカスタマイズする新しいフレームワーク,TP-DRSegを提案する。我々の中核となる考え方は、医学的な事前知識を視覚のみのセグメンテーションネットワークに注入するために言語手がかりを活用することであり、それによって異なる基礎モデルの利点を組み合わせ、セグメンテーションの信頼性を高めることである。具体的には、医用概念認識における視覚言語モデルの可能性を解き明かすために、暗黙の医学的概念を明示的な事前知識に伝達する明示的な事前エンコーダを提案し、病変に関連する低レベル特徴を発掘するための説明可能な手がかりを提供する。さらに,マルチモーダルな特徴間の知識共有を容易にし,パラメータ効率のよい手法でフレームワークを訓練できるように,セグメンテーションプロセスに明示的な事前を注入するための事前整合型インジェクタを設計する。実験により、従来のモデルや基礎モデルよりもフレームワークの方が優れていることが示された。

Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis, especially in recognizing subtle inter-class differences in Diabetic Retinopathy (DR) lesion segmentation. In this paper, we propose a novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion. Experimental results demonstrate the superiority of our framework over other traditional models and foundation model variants.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 隠れた注意シンクの解き放つ-注意校正によるトレーニング無しの大規模言語モデルの強化

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration ( http://arxiv.org/abs/2406.15765v1 )

ライセンス: Link先を確認

Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin,

(参考訳) 注意は、大きな言語モデル(LLM)の顕著な成果の背後にある基本的な要素である。しかし、注意機構の現在の理解、特に注意分布の確立に関する理解は限られている。意味的重要性の欠如にもかかわらず、非常に大きな注意点を受け取る最初のトークンに注意シンクの存在を探求する最近の研究から着想を得たこの研究は、この現象を深く掘り下げている。本研究の目的は,LLM内の注目シンクの存在をより深く理解し,重量微調整を必要とせず,注意分布を直接最適化することにより,LLMの達成可能な精度を高める方法を明らかにすることである。具体的には、様々な入力やタスクの推論中にLLMの注意分布を包括的に可視化することから始める。これらの視覚化に基づいて,(1)注意シンクはシーケンスの開始時だけでなく,後続の入力トークン内でも発生し,(2)すべての注意シンクがLLMの達成可能な精度に肯定的な影響を及ぼすわけではないことを初めて知る。そこで本研究では,入力適応方式で,ハエの注意分布を自動的に最適化する,トレーニング不要な注意校正手法(ACT)を提案する。広範囲にわたる実験により、ACTは異なる用途にわたる様々なLSMの精度を一貫して向上することが示された。具体的には、ACTはLlama-30Bに適用した場合、異なるデータセット間で平均7.30%の精度向上を達成する。私たちのコードはhttps://github.com/GATECH-EIC/ACTで公開されています。

Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores despite their lack of semantic importance, this work delves deeper into this phenomenon. We aim to provide a more profound understanding of the existence of attention sinks within LLMs and to uncover ways to enhance the achievable accuracy of LLMs by directly optimizing the attention distributions, without the need for weight finetuning. Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks. Based on these visualizations, to the best of our knowledge, we are the first to discover that (1) attention sinks occur not only at the start of sequences but also within later tokens of the input, and (2) not all attention sinks have a positive impact on the achievable accuracy of LLMs. Building upon our findings, we propose a training-free Attention Calibration Technique (ACT) that automatically optimizes the attention distributions on the fly during inference in an input-adaptive manner. Extensive experiments validate that ACT consistently enhances the accuracy of various LLMs across different applications. Specifically, ACT achieves an average improvement of up to 7.30% in accuracy across different datasets when applied to Llama-30B. Our code is available at https://github.com/GATECH-EIC/ACT.

翻訳日:2024-06-25 20:45:08 公開日:2024-06-22

# 産業ストリーミングデータに対する拡散型生成再生による連続学習

Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data ( http://arxiv.org/abs/2406.15766v1 )

ライセンス: Link先を確認

Jiayi He, Jiao Chen, Qianmiao Liu, Suyan Dai, Jianhua Tang, Dongpo Liu,

(参考訳) 産業用モノのインターネット(Industrial Internet of Things, IIoT)は、相互接続されたセンサーとデバイスを統合して産業アプリケーションをサポートするが、その動的環境はデータドリフトに関連する課題を引き起こす。本稿では,資源が限られており,新たなデータ配信にモデルを効果的に適用する必要があることを考慮し,新たな生成再生機構を通じて産業ストリーミングデータによってもたらされる課題に対処する継続学習(CL)アプローチ(Distillation-based Self-Guidance:DSG)を提案する。 DSGは、知識蒸留を利用して、前回の拡散ベースジェネレータから更新したジェネレータへの知識伝達を行い、ジェネレータの安定性と再生データの品質の両方を改善し、破滅的な忘れの軽減を図る。 CWRU, DSA, WISDMデータセットの実験結果からDSGの有効性が示された。 DSGは最先端のベースラインを精度で上回り、主要なデータセットの2.9%から5.0%の改善を示す。

The Industrial Internet of Things (IIoT) integrates interconnected sensors and devices to support industrial applications, but its dynamic environments pose challenges related to data drift. Considering the limited resources and the need to effectively adapt models to new data distributions, this paper introduces a Continual Learning (CL) approach, i.e., Distillation-based Self-Guidance (DSG), to address challenges presented by industrial streaming data via a novel generative replay mechanism. DSG utilizes knowledge distillation to transfer knowledge from the previous diffusion-based generator to the updated one, improving both the stability of the generator and the quality of reproduced data, thereby enhancing the mitigation of catastrophic forgetting. Experimental results on CWRU, DSA, and WISDM datasets demonstrate the effectiveness of DSG. DSG outperforms the state-of-the-art baseline in accuracy, demonstrating improvements ranging from 2.9% to 5.0% on key datasets, showcasing its potential for practical industrial applications.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# MR-MLLM:マルチモーダル理解と視覚知覚の相互強化

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception ( http://arxiv.org/abs/2406.15768v1 )

ライセンス: Link先を確認

Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang,

(参考訳) 近年,マルチモーダル大規模言語モデル (MLLM) は視覚的質問応答や常識推論といったタスクにおいて顕著な能力を示し,視覚的知覚モデルは検出やセグメンテーションといった認知タスクにおいて大きな進歩を遂げている。しかし、MLLMは主に高レベルの画像文の解釈に重点を置いており、細かな視覚的理解に苦慮している。これらの課題を克服するために,視覚知覚とマルチモーダル理解を相乗的に強化する新しいフレームワークであるMutually Reinforced Multimodal Large Language Model (MR-MLLM)を提案する。まず、視覚モデルからの詳細な視覚入力と言語モデルの言語深度を調和させ、マルチモーダル理解と視覚知覚を相乗的に強化する共有クエリ融合機構を提案する。第2に,物体検出境界ボックスなどの視覚知覚出力から新たなモダリティを取り入れ,微妙な視覚的要素を捕捉し,視覚的およびテキスト的データの理解を深める,知覚強化型クロスモーダル統合手法を提案する。さらに, 言語モデルのプロンプトに知覚情報を組み込んで, より正確なマルチモーダル解釈のために, 応答を文脈的に, 知覚的に整列させる, 革新的な知覚埋め込み型プロンプト生成機構を提案する。 MR-MLLMの様々なマルチモーダル理解および視覚知覚タスクにおいて、特にコーナーケースの視覚知覚ときめ細かな言語理解を必要とするタスクにおいて、より優れた性能を示す実験である。

In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation. However, MLLMs mainly focus on high-level image-text interpretations and struggle with fine-grained visual understanding, and vision perception models usually suffer from open-world distribution shifts due to their limited model capacity. To overcome these challenges, we propose the Mutually Reinforced Multimodal Large Language Model (MR-MLLM), a novel framework that synergistically enhances visual perception and multimodal comprehension. First, a shared query fusion mechanism is proposed to harmonize detailed visual inputs from vision models with the linguistic depth of language models, enhancing multimodal comprehension and vision perception synergistically. Second, we propose the perception-enhanced cross-modal integration method, incorporating novel modalities from vision perception outputs, like object detection bounding boxes, to capture subtle visual elements, thus enriching the understanding of both visual and textual data. In addition, an innovative perception-embedded prompt generation mechanism is proposed to embed perceptual information into the language model's prompts, aligning the responses contextually and perceptually for a more accurate multimodal interpretation. Extensive experiments demonstrate MR-MLLM's superior performance in various multimodal comprehension and vision perception tasks, particularly those requiring corner case vision perception and fine-grained language comprehension.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# HCQA @ Ego4D EgoSchema Challenge 2024

HCQA @ Ego4D EgoSchema Challenge 2024 ( http://arxiv.org/abs/2406.15771v1 )

ライセンス: Link先を確認

Haoyu Zhang, Yuquan Xie, Yisen Feng, Zaijing Li, Meng Liu, Liqiang Nie,

(参考訳) 本稿では,CVPR 2024におけるEgo4D EgoSchema Challengeのチャンピオンソリューションについて述べる。強力な自我中心のキャプションモデルと質問推理モデルを深く統合するために,HCQA という,自我中心のビデオ質問回答のための階層的理解スキームを提案する。細粒度キャプション生成、コンテキスト駆動の要約、推論誘導解答の3段階で構成されている。 HCQAは、長めのビデオが与えられたとき、局所的な詳細な視覚情報と、細粒度キャプション生成とコンテキスト駆動の要約によって、大域的に要約された視覚情報をキャプチャする。次に、推論誘導解答法において、HCQAは、この階層的な情報を用いて、与えられた質問を推論し、答える。 EgoSchemaのブラインドテストセットでは、HCQAは5000人以上のキュレートされた複数の質問に対して75%の精度で回答する。私たちのコードはhttps://github.com/Hyu-Zhang/HCQA.comでリリースされます。

In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guided Answering. Given a long-form video, HCQA captures local detailed visual information and global summarised visual information via Fine-grained Caption Generation and Context-driven Summarization, respectively. Then in Inference-guided Answering, HCQA utilizes this hierarchical information to reason and answer given question. On the EgoSchema blind test set, HCQA achieves 75% accuracy in answering over 5,000 human curated multiple-choice questions. Our code will be released at https://github.com/Hyu-Zhang/HCQA.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# ISS-Scenario: CARLAにおけるシナリオベースのテスト

ISS-Scenario: Scenario-based Testing in CARLA ( http://arxiv.org/abs/2406.15777v1 )

ライセンス: Link先を確認

Renjue Li, Tianhang Qin, Cas Widdershoven,

(参考訳) 自律運転システム(ADS)の急速な発展は、将来性に満ちている。しかし、これらの約束を満たすためには、ADSはあらゆる状況において安全である必要がある。本稿では,シナリオベーステストのパラダイムにおける自律走行テストフレームワークであるISS-Scenarioを紹介する。 ISS-Scenarioは、バッチテスト、テストケース(潜在的に危険なシナリオ)の探索、自動運転車(AV)の性能評価のために設計されている。 ISS-Scenarioには、パラメタライズドデザインを備えた多様なシミュレーションシナリオライブラリが含まれている。さらにISS-Scenarioは、ランダムサンプリングと遺伝的アルゴリズムによる最適化検索という、2つのテスト手法をこのフレームワークに統合している。最後に、ISS-Scenarioは、アクシデントリプレイ機能を提供し、各テストケースのログファイルを保存することで、ADSが問題のある振る舞いを示したシナリオを再生および無効化することができる。

The rapidly evolving field of autonomous driving systems (ADSs) is full of promise. However, in order to fulfil these promises, ADSs need to be safe in all circumstances. This paper introduces ISS-Scenario, an autonomous driving testing framework in the paradigm of scenario-based testing. ISS-Scenario is designed for batch testing, exploration of test cases (e.g., potentially dangerous scenarios), and performance evaluation of autonomous vehicles (AVs). ISS-Scenario includes a diverse simulation scenario library with parametrized design. Furthermore, ISS-Scenario integrates two testing methods within the framework: random sampling and optimized search by means of a genetic algorithm. Finally, ISS-Scenario provides an accident replay feature, saving a log file for each test case which allows developers to replay and dissect scenarios where the ADS showed problematic behavior.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# ObjectNLQ @ Ego4D Episodic Memory Challenge 2024

ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 ( http://arxiv.org/abs/2406.15778v1 )

ライセンス: Link先を確認

Yisen Feng, Haoyu Zhang, Yuquan Xie, Zaijing Li, Meng Liu, Liqiang Nie,

(参考訳) 本稿では,CVPR 2024におけるEgo4D Episodic Memory Benchmarkの自然言語クエリトラックとゴールステップトラックについて述べる。どちらの課題も、テキストクエリを使って長いビデオシーケンス内のアクションをローカライズする必要がある。ローカライゼーションの精度を高めるため,ビデオの時間的情報を処理するだけでなく,フレーム内の微細な物体を空間的に識別する。この目的のために,オブジェクトブランチを組み込んだ新しいアプローチであるObjectNLQを導入し,映像表現を詳細なオブジェクト情報で拡張し,グラウンド化効率を向上する。 ObjectNLQは23.15の平均R@1を達成し、自然言語クエリチャレンジでは2位、R@1, IoU=0.3で33.00を獲得し、ゴールステップチャレンジでは3位となった。私たちのコードはhttps://github.com/Yisen-Feng/ObjectNLQ.comでリリースされます。

In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatially within the frames. To this end, we introduce a novel approach, termed ObjectNLQ, which incorporates an object branch to augment the video representation with detailed object information, thereby improving grounding efficiency. ObjectNLQ achieves a mean R@1 of 23.15, ranking 2nd in the Natural Language Queries Challenge, and gains 33.00 in terms of the metric R@1, IoU=0.3, ranking 3rd in the Goal Step Challenge. Our code will be released at https://github.com/Yisen-Feng/ObjectNLQ.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# DABL:大規模言語モデルを用いたビジネスプロセスにおける意味異常の検出

DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models ( http://arxiv.org/abs/2406.15781v1 )

ライセンス: Link先を確認

Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian,

(参考訳) ビジネスプロセスの異常を検出することは、運用の成功を保証するために重要です。多くの既存手法は異常を検出するために統計周波数に依存していますが、頻繁な振る舞いが必ずしも望ましくないというわけではないことに注意する必要があります。この課題に対処するために、意味論的視点から異常を検出することは、より効果的なアプローチであることが証明されている。しかし、現在のセマンティックな異常検出方法は、トレース(すなわちプロセスインスタンス)を複数のイベントペアとして扱い、長距離依存関係を乱す。本稿では,大規模言語モデル(LLM)を用いたビジネスプロセスにおける意味異常の検出手法であるDABLを紹介する。さまざまなドメインから143,137の現実世界のプロセスモデルを収集します。これらのプロセスモデルのプレイアウトから通常のトレースを生成し、順序付けと排他的異常の両方をシミュレートすることで、Llama 2を微調整する。実験により,DABLは与えられたプロセスの一般化能力と学習能力の両方の観点から,既存の最先端のセマンティックな異常検出手法を超越していることが実証された。ユーザーはDABLを直接適用して、追加のトレーニングを必要とせずに、自身のデータセットのセマンティックな異常を検出することができる。さらに、DABLは自然言語の異常の原因を解釈する能力を提供し、検出された異常について貴重な洞察を提供する。

Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the capability to interpret the causes of anomalies in natural language, providing valuable insights into the detected anomalies.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 時系列解析のためのフィードバック駆動型量子貯水池計算

Feedback-driven quantum reservoir computing for time-series analysis ( http://arxiv.org/abs/2406.15783v1 )

ライセンス: Link先を確認

Kaito Kobayashi, Keisuke Fujii, Naoki Yamamoto,

(参考訳) 量子貯水池コンピューティング(QRC)は、非線形情報処理のための計算資源として量子システムを利用する、非常に有望な計算パラダイムである。時系列解析へのその応用は期待されているが、一般的なアプローチは測定時の量子状態の崩壊に悩まされ、時間的入力メモリが消去される。前者は時間複雑性をエスカレートし、後者はヒルベルト空間からの情報抽出を制限する。この問題に対処するため,フィードバック駆動型QRCフレームワークを提案する。この手法では、量子状態への無制限アクセスのために全ての量子ビットの射影測定を用い、測定結果はその後貯水池に送り返され、以前の入力の記憶を復元する。我々は,QRCが時系列処理において重要な要素であるフィードバックを通じて,フェードメモリ特性の取得に成功したことを実証した。特に、測定軌跡の分析では、フィードバック強度に応じて3つの異なる位相が示され、メモリ性能はカオスの端で最大化される。また、QRCの予測能力を評価し、量子スピン系から発する信号の予測性を示す。

Quantum reservoir computing (QRC) is a highly promising computational paradigm that leverages quantum systems as a computational resource for nonlinear information processing. While its application to time-series analysis is eagerly anticipated, prevailing approaches suffer from the collapse of the quantum state upon measurement, resulting in the erasure of temporal input memories. Neither repeated initializations nor weak measurements offer a fundamental solution, as the former escalates time complexity while the latter restricts the information extraction from the Hilbert space. To address this issue, we propose a feedback-driven QRC framework. This methodology employs projective measurements on all qubits for unrestricted access to the quantum state, with the measurement outcomes subsequently fed back into the reservoir to restore the memory of prior inputs. We demonstrate that our QRC successfully acquires the fading-memory property through the feedback, a critical element in time-series processing. Notably, analysis of measurement trajectories reveal three distinct phases depending on the feedback strength, with the memory performance maximized at the edge of chaos. We also evaluate the predictive capabilities of our QRC, demonstrating its suitability for forecasting signals originating from quantum spin systems.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 産業用AIシステムにおけるデータ問題:メタレビューと研究戦略

Data Issues in Industrial AI System: A Meta-Review and Research Strategy ( http://arxiv.org/abs/2406.15784v1 )

ライセンス: Link先を確認

Xuejiao Li, Cheng Yang, Charles Møller, Jay Lee,

(参考訳) 産業4.0の時代には、人工知能(AI)は産業システムにおいてますます重要な役割を担っている。近年、さまざまな業界でAIを採用する傾向にあるが、実際のAIの採用は認識されるほど発展していない。この遅れに寄与する重要な要因は、AI実装におけるデータ問題である。これらのデータ問題にどのように対処するかは、業界と学術の両方に直面する重要な懸念事項である。データ問題に対処する最初のステップは、これらの問題をマッピングすることです。そこで本研究では,産業用AIの実装におけるデータ問題と手法のメタレビューを行う。データソースとコレクション、データアクセスとストレージ、データ統合と相互運用、データ前処理、データ処理、データセキュリティとプライバシ、AIテクノロジの採用などだ。その後、さまざまなAIアルゴリズムのデータ要求を分析する。上記の分析に基づいて、データライフサイクルのすべての段階で、データの問題を体系的に解決する方法について、データ管理フレームワークを提案する。最後に、この研究は今後の研究の方向性を強調している。そこで本研究では、既存の知識体系を充実させ、産業用AIにおけるデータの使いやすさと有用性を達成するための複雑な景観をナビゲートする専門家のためのガイドラインを提供する。

In the era of Industry 4.0, artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. A significant factor contributing to this lag is the data issues in AI implementation. How to address these data issues stands as a significant concern confronting both industry and academia. To address data issues, the first step involves mapping out these issues. Therefore, this study conducts a meta-review to explore data issues and methods within the implementation of industrial AI. Seventy-two data issues are identified and categorized into various stages of the data lifecycle, including data source and collection, data access and storage, data integration and interoperation, data pre-processing, data processing, data security and privacy, and AI technology adoption. Subsequently, the study analyzes the data requirements of various AI algorithms. Building on the aforementioned analyses, it proposes a data management framework, addressing how data issues can be systematically resolved at every stage of the data lifecycle. Finally, the study highlights future research directions. In doing so, this study enriches the existing body of knowledge and provides guidelines for professionals navigating the complex landscape of achieving data usability and usefulness in industrial AI.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# トランスフォーマーには何があるのか? すべての注意が必要なわけではない

What Matters in Transformers? Not All Attention is Needed ( http://arxiv.org/abs/2406.15786v1 )

ライセンス: Link先を確認

Shwai He, Guoheng Sun, Zheyu Shen, Ang Li,

(参考訳) Transformerベースの大規模言語モデル(LLM)のスケーリングは、様々なタスクで有望なパフォーマンスを示している。しかし、このスケーリングには冗長な構造も導入されており、現実のデプロイメントには課題がある。 LLMの冗長性はある程度認識されているが、MLPやアテンション層といった異なる構造における冗長性の多様性は未解明である。本研究では、類似度に基づくメトリクスを用いて、ブロック、MLP、アテンション層を含むトランスフォーマー内の異なるモジュール間の異なる冗長性について検討する。この計量は、冗長構造が入力と非常によく似た出力を生成するという前提で機能する。驚いたことに、アテンション層は他の主流アーキテクチャと区別するためにはアテンション層が不可欠であるが、多くのアテンション層が過剰に高い類似性を示し、性能を劣化させることなく安全に切断できることが判明し、メモリと計算コストの削減につながった。さらに,アテンション層とMLP層を共同でドロップする手法を提案し,性能向上と低下率の向上を実現した。 Llama-3-70Bは注目層の半分を刈っても同等の性能を維持している。我々の発見は将来のネットワークアーキテクチャ設計に貴重な洞察を与えてくれる。コードは: \url{https://github.com/Shwai-He/LLM-Drop} でリリースされる。

Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and dropping ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers. Our findings provide valuable insights for future network architecture design. The code will be released at: \url{https://github.com/Shwai-He/LLM-Drop}.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 強双対下におけるロバスト制約強化学習

Distributionally Robust Constrained Reinforcement Learning under Strong Duality ( http://arxiv.org/abs/2406.15788v1 )

ライセンス: Link先を確認

Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue,

(参考訳) 本研究では, 環境分布の変動や制約にともなう期待報酬を最大化することを目的として, 分散ロバスト制約RL (DRC-RL) の問題について検討する。この設定は、トレーニングとテスト環境が異なる状況を捉え、安全や限られた予算によって動機付けられた制約を満たす必要がある。分散ロバストなRLと制約付きRLの分離問題に対するアルゴリズム設計への大きな進歩にもかかわらず、DRC-RLのエンドツーエンド収束保証付きアルゴリズムは存在しない。我々は,環境不確実性のクラスにおいて,最初の効率的かつ証明可能な解を可能にする,強い双対性に基づくアルゴリズム的枠組みを開発する。さらに,本フレームワークは,分散ロバストなRLと制約付きRLのそれぞれに対して適用可能であるにもかかわらず,分散ロバストなRLと制約の組合せから生じるDRC-RL固有の構造を明らかにする。最後に,提案アルゴリズムの有効性を評価するために,カーレースベンチマーク実験を行った。

We study the problem of Distributionally Robust Constrained RL (DRC-RL), where the goal is to maximize the expected reward subject to environmental distribution shifts and constraints. This setting captures situations where training and testing environments differ, and policies must satisfy constraints motivated by safety or limited budgets. Despite significant progress toward algorithm design for the separate problems of distributionally robust RL and constrained RL, there do not yet exist algorithms with end-to-end convergence guarantees for DRC-RL. We develop an algorithmic framework based on strong duality that enables the first efficient and provable solution in a class of environmental uncertainties. Further, our framework exposes an inherent structure of DRC-RL that arises from the combination of distributional robustness and constraints, which prevents a popular class of iterative methods from tractably solving DRC-RL, despite such frameworks being applicable for each of distributionally robust RL and constrained RL individually. Finally, we conduct experiments on a car racing benchmark to evaluate the effectiveness of the proposed algorithm.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# データ駆動システムにおける説明可能なAIのプライバシ含意

Privacy Implications of Explainable AI in Data-Driven Systems ( http://arxiv.org/abs/2406.15789v1 )

ライセンス: Link先を確認

Fatima Ezzeddine,

(参考訳) 機械学習(ML)モデルは、明らかに強力であり、解釈可能性の欠如に悩まされている。透明性の欠如は、しばしばMLモデルのブラックボックスの性質と呼ばれ、信頼を損ね、その説明可能性を高める努力の必要性を喚起する。説明可能なAI(XAI)技術は、これらの複雑なモデルの内部決定プロセスを説明するためのフレームワークと方法を提供することで、この問題に対処する。対実的説明(CF)や特徴の重要性といったテクニックは、この目標を達成する上で重要な役割を担います。さらに、高品質で多様なデータが、堅牢で信頼性の高いMLアプリケーションの基礎的な要素として残っています。多くのアプリケーションにおいて、MLとXAIの説明器のトレーニングに使用されるデータは機密情報を含んでいる。このコンテキストでは、差分プライバシーなど、データ内の機密情報を保護するために、多数のプライバシ保存技術を使用することができる。その後、XAIとプライバシソリューションの対立は、その反対の目標のために現れます。 XAI技術はモデル動作の推論を提供するため、決定境界や特徴値、説明が第3のエンティティに露出した場合のディープラーニングモデルの勾配といったMLモデルに関する情報を明らかにする。攻撃者はこれらの説明を使ってプライバシー侵害攻撃を開始し、モデル抽出、推論、およびメンバーシップ攻撃を行うことができる。このジレンマは、ML意思決定の理解とプライバシ保護の間の適切な均衡を見つけるという課題を浮き彫りにしている。

Machine learning (ML) models, demonstrably powerful, suffer from a lack of interpretability. The absence of transparency, often referred to as the black box nature of ML models, undermines trust and urges the need for efforts to enhance their explainability. Explainable AI (XAI) techniques address this challenge by providing frameworks and methods to explain the internal decision-making processes of these complex models. Techniques like Counterfactual Explanations (CF) and Feature Importance play a crucial role in achieving this goal. Furthermore, high-quality and diverse data remains the foundational element for robust and trustworthy ML applications. In many applications, the data used to train ML and XAI explainers contain sensitive information. In this context, numerous privacy-preserving techniques can be employed to safeguard sensitive information in the data, such as differential privacy. Subsequently, a conflict between XAI and privacy solutions emerges due to their opposing goals. Since XAI techniques provide reasoning for the model behavior, they reveal information relative to ML models, such as their decision boundaries, the values of features, or the gradients of deep learning models when explanations are exposed to a third entity. Attackers can initiate privacy breaching attacks using these explanations, to perform model extraction, inference, and membership attacks. This dilemma underscores the challenge of finding the right equilibrium between understanding ML decision-making and safeguarding privacy.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 量子囚人ジレンマにおけるリスク支配平衡

Risk-Dominant Equilibrium in Quantum Prisoner's Dilemma ( http://arxiv.org/abs/2406.15795v1 )

ライセンス: Link先を確認

Ahmed S. Elgazzar,

(参考訳) ユニークなナッシュ均衡(NE)の選択は理論的な古典ゲームや量子ゲームにおいて重要である。 Eiswer-Wilkens-Lewenstein量子化スキームは、囚人のジレンマを高い絡み合いのために解決する。中絡みでは複数のNEが存在する。量子囚人のジレンマにおけるユニークなNEの選択について,ジレンマ強度パラメータの変動による検討を行った。リスク管理基準が使用される。ジレンマ強度パラメータと絡み合いの影響を強調した。絡み合いがリスク支配均衡を完全にコントロールしていることがわかった。絡み合いはリスク支配平衡における量子協調を促進し、その結果を改善する。

The choice of a unique Nash equilibrium (NE) is crucial in theoretical classical and quantum games. The Eiswer-Wilkens-Lewenstein quantization scheme solves the prisoner's dilemma only for high entanglement. At medium entanglement, there are multiple NEs. We investigate the selection of a unique NE in the quantum prisoner's dilemma with variable dilemma strength parameters. The risk-dominance criterion is used. The influence of the dilemma strength parameters and entanglement is emphasized. We found that entanglement completely controls the risk-dominant equilibrium. Entanglement promotes quantum-cooperation in the risk-dominant equilibrium and thus improves its outcome.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 大規模言語モデルのためのエンティティレベルの未学習の再考

Rethinking Entity-level Unlearning for Large Language Models ( http://arxiv.org/abs/2406.15796v1 )

ライセンス: Link先を確認

Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin,

(参考訳) 大規模言語モデルのアンラーニングは、セキュリティとプライバシの懸念を軽減する可能性があるため、注目を集めている。現在の研究は、主にインスタンスレベルのアンラーニングに焦点を当てており、特に機密コンテンツの予め定義されたインスタンスを忘れることを目的としている。しかし、著作権保護など多くの現実のシナリオにおいて重要な、完全なエンティティ関連情報の削除を探求する上で、注目すべきギャップがまだ残っている。そこで本研究では,対象モデル内のエンティティ関連知識を完全に消去する,エンティティレベルのアンラーニングという新しいタスクを提案する。モデル内のすべてのエンティティ関連知識に実際にアクセスすることの難しさを考えると、擬似エンティティを導入するための微調整モデルを通じて、エンティティレベルの未学習シナリオをシミュレートすることから始める。次に,非学習手法のトレンドにインスパイアされたベースライン手法を開発し,その効果を詳細に比較する。大規模な実験により、現在のアンラーニングアルゴリズムは、効果的なエンティティレベルのアンラーニングを達成するのに苦労していることが明らかになった。さらに,本研究では,未学習時の事前学習において,微調整によって注入される実体関連知識が本来の実体よりも受容されやすいことを示し,事前学習された知識に近づけるために,より徹底的な擬似性注入法の必要性を強調した。

Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many real-world scenarios, such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, where the entity-related knowledge within the target model is supposed to be entirely erased. Given the challenge of practically accessing all entity-related knowledge within a model, we begin by simulating entity-level unlearning scenarios through fine-tuning models to introduce pseudo entities. Following this, we develop baseline methods inspired by trending unlearning techniques and conduct a detailed comparison of their effectiveness in this task. Extensive experiments reveal that current unlearning algorithms struggle to achieve effective entity-level unlearning. Additionally, our analyses further indicate that entity-related knowledge injected through fine-tuning is more susceptible than original entities from pre-training during unlearning, highlighting the necessity for more thorough pseudo-entity injection methods to make them closer to pre-trained knowledge.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# シナジスティックディープグラフクラスタリングネットワーク

Synergistic Deep Graph Clustering Network ( http://arxiv.org/abs/2406.15797v1 )

ライセンス: Link先を確認

Benyu Wu, Shifei Ding, Xiao Xu, Lili Guo, Ling Ding, Xindong Wu,

(参考訳) グラフニューラルネットワーク(GNN)を用いて、クラスタリングのための凝集性および識別ノード表現を学習することは、ディープグラフクラスタリングにおいて有望な結果を示している。しかし,既存の手法では,表現学習と構造強化の相互関係は無視されている。本研究は,GNNが深層グラフクラスタリングの可能性を解き放つためには,埋め込みと構造を相乗的に拡張することが重要であることを示唆する。信頼性の高い構造はより凝集性の高いノード表現の獲得を促進する一方、高品質なノード表現は構造の増大を導くことができ、見返りに構造的信頼性を高めることができる。さらに、既存のGNNベースのモデルの一般化能力は比較的貧弱である。それらは高い等質性を持つグラフではうまく機能するが、低い等質性を持つグラフでは不十分に機能する。そこで我々はSynC(Syngistic Deep Graph Clustering Network)というグラフクラスタリングフレームワークを提案する。本稿では,構造拡張を導くための高品質な埋め込みを実現するために,TIGAE (Transform Input Graph Auto-Encoder) を設計する。次に、拡張グラフ上の近傍表現を再取得し、クラスタリングに親しみやすい埋め込みを取得し、自己教師付きクラスタリングを行う。特に、表現学習と構造増強は重みを共有し、モデルパラメータの数を著しく減少させる。さらに、モデルの一般化を改善するための構造微調整戦略を導入する。ベンチマークデータセットの大規模な実験により,本手法の優位性と有効性を示す。コードはGitHubとCode Oceanでリリースされている。

Employing graph neural networks (GNNs) to learn cohesive and discriminative node representations for clustering has shown promising results in deep graph clustering. However, existing methods disregard the reciprocal relationship between representation learning and structure augmentation. This study suggests that enhancing embedding and structure synergistically becomes imperative for GNNs to unleash their potential in deep graph clustering. A reliable structure promotes obtaining more cohesive node representations, while high-quality node representations can guide the augmentation of the structure, enhancing structural reliability in return. Moreover, the generalization ability of existing GNNs-based models is relatively poor. While they perform well on graphs with high homogeneity, they perform poorly on graphs with low homogeneity. To this end, we propose a graph clustering framework named Synergistic Deep Graph Clustering Network (SynC). In our approach, we design a Transform Input Graph Auto-Encoder (TIGAE) to obtain high-quality embeddings for guiding structure augmentation. Then, we re-capture neighborhood representations on the augmented graph to obtain clustering-friendly embeddings and conduct self-supervised clustering. Notably, representation learning and structure augmentation share weights, significantly reducing the number of model parameters. Additionally, we introduce a structure fine-tuning strategy to improve the model's generalization. Extensive experiments on benchmark datasets demonstrate the superiority and effectiveness of our method. The code is released on GitHub and Code Ocean.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# スマートな機能とは何か

Smart Feature is What You Need ( http://arxiv.org/abs/2406.15805v1 )

ライセンス: Link先を確認

Zhaoxin Hu, Keyan Ren,

(参考訳) 弱ラベル情報不足による形状誘導の欠如とラベルジッタは、3次元弱教師対象検出の主な問題である。現在の弱教師付きモデルは、弱教師付きおよび完全教師付き手法の本質的な手がかりを生かさずに、弱ラベルから情報を引き出すためのヒューリスティックや仮定手法を用いることが多いため、データ利用効率とモデル精度を組み合わせた手法を探求することは困難である。これらの問題に対処するために,Multiscale Mixed Attention (MMA)と呼ばれる新しいプラグイン・アンド・イン・ポイント・クラウド特徴表現ネットワークを提案する。 MMAは、近傍の隣接注意と異なる密度スケールにおける不均一注意を利用して特徴表現ネットワークを構築する。 MMAから得られるスマート特徴表現は、形状傾向とオブジェクト存在領域推定を有し、検出ボックスの領域を制約し、弱いラベルのデフォルト情報に起因する問題を緩和することができる。室内の弱いラベルのシナリオでは、完全教師付きネットワークは、MMAによる点特徴の改善によってのみ弱教師付きネットワークに近い性能を発揮する。同時に、MMAは廃棄物を宝にし、もともとデータ強化の源となる弱教師付き検出に干渉したラベルジッタ問題を逆転させ、既存の弱監督検出手法の性能を高める。私たちのコードはhttps://github.com/hzx-9894/MMAで公開されています。

Lack of shape guidance and label jitter caused by information deficiency of weak label are the main problems in 3D weakly-supervised object detection. Current weakly-supervised models often use heuristics or assumptions methods to infer information from weak labels without taking advantage of the inherent clues of weakly-supervised and fully-supervised methods, thus it is difficult to explore a method that combines data utilization efficiency and model accuracy. In an attempt to address these issues, we propose a novel plug-and-in point cloud feature representation network called Multi-scale Mixed Attention (MMA). MMA utilizes adjacency attention within neighborhoods and disparity attention at different density scales to build a feature representation network. The smart feature representation obtained from MMA has shape tendency and object existence area inference, which can constrain the region of the detection boxes, thereby alleviating the problems caused by the information default of weak labels. Extensive experiments show that in indoor weak label scenarios, the fully-supervised network can perform close to that of the weakly-supervised network merely through the improvement of point feature by MMA. At the same time, MMA can turn waste into treasure, reversing the label jitter problem that originally interfered with weakly-supervised detection into the source of data enhancement, strengthening the performance of existing weak supervision detection methods. Our code is available at https://github.com/hzx-9894/MMA.

翻訳日:2024-06-25 20:35:12 公開日:2024-06-22

# 評価とフィードバックにおけるAI活用の学生とアカデミックスタッフの理解

Understanding Student and Academic Staff Perceptions of AI Use in Assessment and Feedback ( http://arxiv.org/abs/2406.15808v1 )

ライセンス: Link先を確認

Jasper Roe, Mike Perkins, Daniel Ruelle,

(参考訳) 高等教育における人工知能(AI)と生成人工知能(GenAI)の台頭は、評価改革を必要としている。この研究は、AIとGenAIツールを用いた学生や学術スタッフの経験を探求し、学習と評価における現在の潜在的な応用に対する親しみと快適さに焦点を当てることで、重要なギャップに対処する。オンライン調査では、ベトナムの2つの大学とシンガポールの2つの大学にまたがる35人の研究スタッフと282人の学生のデータを収集し、GenAI習熟度、評価マーキングとフィードバックにおけるその使用感、知識チェックと参加、GenAIテキスト検出の経験を調べた。記述的統計値と反射的主題分析の結果,両群ともGenAIとの親和性は概して低かった。 GenAIのフィードバックは否定的な評価を受けたが、インストラクターのフィードバックと組み合わせると、より肯定的な評価が得られた。研究員は, 学生と比較して, GenAIテキスト検出ツールの受入れや, 検出結果に基づく等級調整が多かった。質的分析では、テキスト検出ツールの不明な理解、GenAI検出器の経験の多様性、教育評価におけるGenAIの将来的な影響に関する混合感情の3つのテーマを特定した。これらの知見は、高等教育におけるGenAI対応評価とフィードバックのための政策と実践の発達に大きな影響を及ぼす。

The rise of Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) in higher education necessitates assessment reform. This study addresses a critical gap by exploring student and academic staff experiences with AI and GenAI tools, focusing on their familiarity and comfort with current and potential future applications in learning and assessment. An online survey collected data from 35 academic staff and 282 students across two universities in Vietnam and one in Singapore, examining GenAI familiarity, perceptions of its use in assessment marking and feedback, knowledge checking and participation, and experiences of GenAI text detection. Descriptive statistics and reflexive thematic analysis revealed a generally low familiarity with GenAI among both groups. GenAI feedback was viewed negatively; however, it was viewed more positively when combined with instructor feedback. Academic staff were more accepting of GenAI text detection tools and grade adjustments based on detection results compared to students. Qualitative analysis identified three themes: unclear understanding of text detection tools, variability in experiences with GenAI detectors, and mixed feelings about GenAI's future impact on educational assessment. These findings have major implications regarding the development of policies and practices for GenAI-enabled assessment and feedback in higher education.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# LaMSUM: LLMを用いたユーザ生成コンテンツの抽出要約のための新しいフレームワーク

LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs ( http://arxiv.org/abs/2406.15809v1 )

ライセンス: Link先を確認

Garima Chhikara, Anurag Sharma, V. Gurucharan, Kripabandhu Ghosh, Abhijnan Chakraborty,

(参考訳) 大規模言語モデル(LLM)は、要約を含む幅広いNLPタスクにおいて、印象的なパフォーマンスを示している。代わってLLMは抽象的な要約を生成するが、LLMを通して抽出的な要約を達成するという課題はいまだに未解明のままである。本研究では,このギャップを埋めるために,投票アルゴリズムを利用してLLMを用いて抽出要約を生成する新しいフレームワークであるLaMSUMを提案する。 Llama 3 と Mixtral と Gemini の3つのオープンソース LLM について評価した結果,LaMSUM は最先端の抽出要約法より優れていることがわかった。さらに,LLMが生成したアウトプット・サマリーの背景にある理論的根拠について述べる。全体として、これはLLMを利用して大きなユーザ生成テキストを抽出的に要約する試みの1つであり、コミュニティにさらなる関心を喚起する可能性が高い。

Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization. Inherently LLMs produce abstractive summaries, and the task of achieving extractive summaries through LLMs still remains largely unexplored. To bridge this gap, in this work, we propose a novel framework LaMSUM to generate extractive summaries through LLMs for large user-generated text by leveraging voting algorithms. Our evaluation on three popular open-source LLMs (Llama 3, Mixtral and Gemini) reveal that the LaMSUM outperforms state-of-the-art extractive summarization methods. We further attempt to provide the rationale behind the output summary produced by LLMs. Overall, this is one of the early attempts to achieve extractive summarization for large user-generated text by utilizing LLMs, and likely to generate further interest in the community.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# PointDreamer: 2Dインパインティングによる色付き点雲からのゼロショット3Dテクスチャメッシュ再構成

PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting ( http://arxiv.org/abs/2406.15811v1 )

ライセンス: Link先を確認

Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen,

(参考訳) 色のついたポイントクラウドからテクスチャ化されたメッシュを再構築することは、3Dグラフィックスとビジョンにおいて重要な課題である。既存のほとんどの手法は、ぼやけたテクスチャや一般化能力の欠如に苦しむ3DまたはUV空間における暗黙の関数として色を予測する。そこで我々は,色付き点雲からテクスチャ化されたメッシュ再構築のための新しいフレームワークであるPointDreamerを提案する。成熟した技術と2Dビジョンの膨大なデータを活用することで、2Dイメージのインペイントにより、忠実さと明瞭さを向上したメッシュを生成する。具体的には、まず入力点雲を2次元空間に投影し、スパースなマルチビュー画像を生成し、事前訓練された2次元拡散モデルを用いて空のピクセルを塗布する。次に,塗布された濃淡画像の色を3次元空間に戻して最終テクスチャメッシュを得る,新しい非境界ファースト戦略を設計する。このように、PointDreamerはゼロショットで動作し、追加のトレーニングは不要です。各種合成および実スキャンデータセットの大規模定性的および定量的実験は、LPIPSスコア(0.118から0.068)を30倍改善したベースライン法を著しく上回り、PointDreamerのSoTA性能を示す。コードネームはhttps://github.com/YuQiao0303/PointDreamer。

Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produces meshes with enhanced fidelity and clarity by 2D image inpainting, taking advantage of the mature techniques and massive data of 2D vision. Specifically, we first project the input point cloud into 2D space to generate sparse multi-view images, and then inpaint empty pixels utilizing a pre-trained 2D diffusion model. Next, we design a novel Non-Border-First strategy to unproject the colors of the inpainted dense images back to 3D space, thus obtaining the final textured mesh. In this way, our PointDreamer works in a zero-shot manner, requiring no extra training. Extensive qualitative and quantitative experiments on various synthetic and real-scanned datasets show the SoTA performance of PointDreamer, by significantly outperforming baseline methods with 30\% improvement in LPIPS score (from 0.118 to 0.068). Code at: https://github.com/YuQiao0303/PointDreamer.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 固有次元相関:マルチモーダル表現における非線形接続の発見

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations ( http://arxiv.org/abs/2406.15812v1 )

ライセンス: Link先を確認

Lorenzo Basile, Santiago Acevedo, Luca Bortolussi, Fabio Anselmi, Alex Rodriguez,

(参考訳) 機械学習手法の背後にあるメカニズムを理解するためには、データポイントを記述する機能間の接続を確立することが不可欠である。しかし、これらの相関はしばしば高次元かつ強い非線形性を示すため、標準手法による検出は困難である。本稿では、内在次元と相関の絡み合いを利用して、高次元多様体間の(潜在的に非線形な)相関を定量化する計量を提案する。まず,制御環境における合成データの検証を行い,その利点と欠点を既存手法と比較した。その後、ニューラルネットワーク表現における大規模アプリケーションに分析を拡張します。具体的には,マルチモーダルデータの潜在表現に着目し,ペアの視覚とテキストの埋め込みの間に明確な相関関係を明らかにする。その結果, 潜在多様体間の高非線形相関パターンの存在が示唆された。

To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 可読性:画像分類のための言語ボトルネックモデルの再検討

Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification ( http://arxiv.org/abs/2406.15816v1 )

ライセンス: Link先を確認

Honori Udo, Takafumi Koshinaka,

(参考訳) 我々は、画像分類のためのディープラーニングモデルの説明可能性を保証するアプローチとして、言語ボトルネックモデルを再考する。画像が言語に変換される過程で必然的に発生する情報損失のため、言語のボトルネックモデルの精度は標準のブラックボックスモデルよりも劣っていると考えられる。しかし,近年の視覚・言語モデルに基づく画像キャプタは,これまで現実的には不可能と考えられていた程度まで,口コミで正確に画像を記述する能力を有している。災害画像分類の課題として,現代の画像キャプタと事前学習された言語モデルを組み合わせた言語ボトルネックモデルが,ブラックボックスモデルを上回る画像分類精度を達成できることを実験的に示す。また,言語ボトルネックモデルとブラックボックスモデルが画像から異なる特徴を抽出し,両者を融合させることで相乗効果が得られ,さらに高い分類精度が得られることを示した。

We revisit language bottleneck models as an approach to ensuring the explainability of deep learning models for image classification. Because of inevitable information loss incurred in the step of converting images into language, the accuracy of language bottleneck models is considered to be inferior to that of standard black-box models. Recent image captioners based on large-scale foundation models of Vision and Language, however, have the ability to accurately describe images in verbal detail to a degree that was previously believed to not be realistically possible. In a task of disaster image classification, we experimentally show that a language bottleneck model that combines a modern image captioner with a pre-trained language model can achieve image classification accuracy that exceeds that of black-box models. We also demonstrate that a language bottleneck model and a black-box model may be thought to extract different features from images and that fusing the two can create a synergistic effect, resulting in even higher classification accuracy.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# ワイヤレスシステムのためのAIモデル自動選択:デジタルツインニングによるオンライン学習

Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning ( http://arxiv.org/abs/2406.15819v1 )

ライセンス: Link先を確認

Qiushuo Hou, Matteo Zecchin, Sangwoo Park, Yunlong Cai, Guanding Yu, Kaushik Chowdhury, Osvaldo Simeone,

(参考訳) O-RANのような現代の無線ネットワークアーキテクチャでは、人工知能(AI)ベースのアプリケーションは、スケジューリングや電力制御などの機能を実行するためにインテリジェントコントローラにデプロイされる。 AI"アプリ"は、ネットワーク条件、トポロジ、トラフィック統計、設計目標などのコンテキスト情報に基づいて選択される。コンテキストとAIモデルパラメータのマッピングは、現在のデータを必要としないコンテキスト情報のみを活用する自動モデル選択(AMS)マッピングを通じて、ゼロショットで理想的に行われる。本稿では,AMSマッピングのオンライン最適化のための一般的な手法を紹介する。 AMSマッピングの最適化は、さまざまなコンテキストから収集されたデータを公開する必要があるため、難しい。したがって、オンラインに実行された場合、この初期最適化フェーズは非常に時間がかかります。可能な解決策は、物理システムのデジタルツインを利用して、複数のシミュレートされたコンテキストから合成データを生成することである。しかし、デジタルツインのシミュレータが不完全なことを考えると、AMSマッピングの最適化にシミュレーションデータを直接使用すると、実際のシステムでのテストでは性能が低下する。本稿では,物理システムから収集した限られた実データを用いて,シミュレータのバイアスを補正するAMSマッピングのオンライン最適化手法を提案する。グラフニューラルネットワークに基づく電力制御アプリの実験結果から,提案手法の利点が示された。

In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameters is ideally done in a zero-shot fashion via an automatic model selection (AMS) mapping that leverages only contextual information without requiring any current data. This paper introduces a general methodology for the online optimization of AMS mappings. Optimizing an AMS mapping is challenging, as it requires exposure to data collected from many different contexts. Therefore, if carried out online, this initial optimization phase would be extremely time consuming. A possible solution is to leverage a digital twin of the physical system to generate synthetic data from multiple simulated contexts. However, given that the simulator at the digital twin is imperfect, a direct use of simulated data for the optimization of the AMS mapping would yield poor performance when tested in the real system. This paper proposes a novel method for the online optimization of AMS mapping that corrects for the bias of the simulator by means of limited real data collected from the physical system. Experimental results for a graph neural network-based power control app demonstrate the significant advantages of the proposed approach.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 非線形偏微分方程式の量子シミュレーションの一般的な枠組み

A general frame of quantum simulation for nonlinear partial differential equations ( http://arxiv.org/abs/2406.15821v1 )

ライセンス: Link先を確認

Shijun Liao,

(参考訳) 現在、Jinらは任意の線形PDEの量子シミュレーション手法(Schr\"{o}dingerization [1-3])を提案しており、これは多くの非ハミルトン線型PDEを解くためにうまく適用されている。本稿では、任意の非線形PDEを直列解の収束保証付き線形PDEに変換できるホモトピー解析法(HAM) [4-6] とを併用することにより、量子シミュレーションのシュル'{o}ディンガー化法を任意の非線形PDEに拡張する。このようにして、非線形PDEは量子コンピュータを用いた量子シミュレーションによって解決できるが、将来的には開発されない。単純性については、'the HAM-Schr\"{o}dingerisation quantum algorithm' と呼ぶ。

Currently, Jin et al. proposed a quantum simulation technique for any a linear PDE, called Schr\"{o}dingerisation [1-3], which has been successfully applied to solve many non-Hamiltonian linear PDEs. In this paper, the Schr\"{o}dingerisation technique of quantum simulation is expanded to any a nonlinear PDE by means of combining the Schr\"{o}dingerisation technique with the homotopy analysis method (HAM) [4-6] that can transfer any a nonlinear PDE into a series of linear PDEs with convergence guarantee of series solution. In this way, a nonlinear PDE can be solved by quantum simulation using a quantum computer -- yet to be developed in the future. For simplicity, we call it ``the HAM-Schr\"{o}dingerisation quantum algorithm''.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# CaT-BENCH:計画における因果依存性と時間依存性のベンチマーク言語モデル

CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans ( http://arxiv.org/abs/2406.15823v1 )

ライセンス: Link先を確認

Yash Kumar Lal, Vanya Cohen, Nathanael Chambers, Niranjan Balasubramanian, Raymond Mooney,

(参考訳) 指導文やレシピなどの自然言語プランを推論するLLMの能力を理解することは、意思決定システムにおいてそれらを確実に活用することが重要である。計画の基本的な側面は、ステップの実行が必要な時間的順序であり、それら間の因果依存性を反映している。本稿では,調理レシピ計画において,ステップの前後にステップが必ず発生する必要があるかどうかを検証した,ステップ順序予測のベンチマークであるCaT-Benchを紹介する。我々は、この手法を用いて、フロンティアのLLMが因果関係と時間的依存をいかによく理解しているかを評価する。我々はSOTA LLMが圧倒されていること(最もゼロショットはF1でわずか0.59)、そしてより頻繁に依存を予測することに偏りがあり、おそらくヒューリスティックなステップの時間的順序に依存している。説明のプロンプトと少数ショット例の使用によりパフォーマンスが向上する一方で、最高のF1結果は0.73である。さらに,人間による説明の評価と回答の正しさは,平均的にモデル推論と一致しないことを示している。驚いたことに、回答後の説明は通常のチェーン・オブ・シークレット・プロンプトよりも優れたパフォーマンスをもたらし、LCMの回答は、同じステップペアに関する質問間で一貫性がないこともわかりました。その結果,LSMがステップ間の依存性を検出する能力は改善の余地があることが示唆された。

Understanding the abilities of LLMs to reason about natural language plans, such as instructional text and recipes, is critical to reliably using them in decision-making systems. A fundamental aspect of plans is the temporal order in which their steps needs to be executed, which reflects the underlying causal dependencies between them. We introduce CaT-Bench, a benchmark of Step Order Prediction questions, which test whether a step must necessarily occur before or after another in cooking recipe plans. We use this to evaluate how well frontier LLMs understand causal and temporal dependencies. We find that SOTA LLMs are underwhelming (best zero-shot is only 0.59 in F1), and are biased towards predicting dependence more often, perhaps relying on temporal order of steps as a heuristic. While prompting for explanations and using few-shot examples improve performance, the best F1 result is only 0.73. Further, human evaluation of explanations along with answer correctness show that, on average, humans do not agree with model reasoning. Surprisingly, we also find that explaining after answering leads to better performance than normal chain-of-thought prompting, and LLM answers are not consistent across questions about the same step pairs. Overall, results show that LLMs' ability to detect dependence between steps has significant room for improvement.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# MVOC:拡散モデルを用いたトレーニング不要なマルチビデオオブジェクト合成法

MVOC: a training-free multiple video object composition method with diffusion models ( http://arxiv.org/abs/2406.15829v1 )

ライセンス: Link先を確認

Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang,

(参考訳) ビデオ編集は、ビデオ編集のコアタスクである。拡散モデルに基づく画像合成は非常に成功しているが、映像オブジェクト合成タスクに成果を拡大することは容易ではない。これは、対応する相互作用効果を示すだけでなく、合成されたビデオ内のオブジェクトが、物理ハーモニービデオの合成に必要な動きとアイデンティティの整合性を維持することを保証する。この課題に対処するため,拡散モデルに基づくMVOC法を提案する。具体的には、まず各ビデオオブジェクトに対してDDIMインバージョンを行い、対応するノイズ特性を得る。次に、画像編集手法で各オブジェクトを合成して編集し、合成ビデオの最初のフレームを得る。最後に,ビデオ生成のための訓練不要条件付きガイダンス操作であるVideo Object Dependence Moduleにおいて,映像に特徴や注意を注入した映像を合成するために画像から映像生成モデルを用い,合成ビデオに非依存な様々なオブジェクト間の特徴や注意マップの調整を可能にする。最後の生成モデルは、生成されたビデオ内のオブジェクトを、元のオブジェクトの動きとアイデンティティと整合性に制約するだけでなく、オブジェクト間の相互作用効果も導入する。大規模な実験により,提案手法は既存の最先端手法よりも優れていることが示された。プロジェクトページ: https://sobeymil.github.io/mvoc.com

Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to composite a physical harmony video. To address this challenge, we propose a Multiple Video Object Composition (MVOC) method based on diffusion models. Specifically, we first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video. Finally, we use the image-to-video generation model to composite the video with feature and attention injections in the Video Object Dependence Module, which is a training-free conditional guidance operation for video generation, and enables the coordination of features and attention maps between various objects that can be non-independent in the composited video. The final generative model not only constrains the objects in the generated video to be consistent with the original object motion and identity, but also introduces interaction effects between objects. Extensive experiments have demonstrated that the proposed method outperforms existing state-of-the-art approaches. Project page: https://sobeymil.github.io/mvoc.com.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# Shape2.5D: 深さと正規値推定のためのテクスチャレス表面のデータセット

Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation ( http://arxiv.org/abs/2406.15831v1 )

ライセンス: Link先を確認

Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker,

(参考訳) テクスチャのない表面を再構築することは、主にテクスチャ情報がない場合に、不規則な深さと正規さの要求を満たす特別なデータセットが欠如していることから、コンピュータビジョンにおいてユニークな課題を生んでいる。このギャップに対処するために設計された,新しい大規模データセットであるShape2.5Dを紹介した。 2635の3Dモデルと48のユニークなオブジェクトからなる364kフレームで構成されたデータセットは、テクスチャレスオブジェクト再構成のための深さと表面の正常マップを提供する。提案したデータセットは、様々な照明条件や視角をシミュレートする3Dモデリングソフトウェアでレンダリングされた合成画像を含む。また、深度カメラでキャプチャされた4672フレームからなる現実世界のサブセットも含まれている。修正エンコーダデコーダネットワークを用いて実施した包括的なベンチマークでは,RGB画像から深度と正常度を頑健に推定するアルゴリズムの開発を支援するデータセットの能力を示す。私たちのオープンソースのデータ生成パイプラインは、データセットを拡張して、将来の研究に適応できるようにします。データセットは \url{https://github.com/saifkhichi96/Shape25D} で公開されている。

Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4672 frames captured with a depth camera. Our comprehensive benchmarks, performed using a modified encoder-decoder network, showcase the dataset's capability to support the development of algorithms that robustly estimate depth and normals from RGB images. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at \url{https://github.com/saifkhichi96/Shape25D}.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 集中集約型分散型変圧器は, サンプル効率の良い多エージェント世界モデルである

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models ( http://arxiv.org/abs/2406.15836v1 )

ライセンス: Link先を確認

Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li,

(参考訳) モデルフリー強化学習(RL)エージェントのワールドモデルを学ぶことは、想像力で学習ポリシーを学習することで、サンプル効率を著しく向上させることができる。しかし,MARL(Multi-Agent RL)の世界モデルの構築は,多数のエージェントから生じる集中型アーキテクチャにおけるスケーラビリティの問題や,エージェント間の依存性から生じる分散型アーキテクチャにおける非定常性の問題から,特に困難である。両課題に対処するために,拡張性のための分散ローカルダイナミクスを学習するMARLの新たな世界モデルと,すべてのエージェントからの集中型表現集約を提案する。本研究では,異なるエージェント間で複雑な局所力学をモデル化し,正確かつ一貫した長期的想像力を提供するために,表現型トランスフォーマーアーキテクチャを活用することで,離散トークン上の自己回帰シーケンスモデリング問題として動的学習を論じる。マルチエージェントシステムのためのトランスフォーマーベースの世界モデルとして,Perceiver Transformer を有効解として導入し,このコンテキスト内での表現集約を実現する。 Starcraft Multi-Agent Challenge (SMAC) の結果は、サンプル効率と全体的な性能の両方において、強力なモデルフリーアプローチと既存のモデルベース手法よりも優れていることを示している。

Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 量子参照フレームのフレームバンドル定式化:視点の重ね合わせから幾何学の重ね合わせへ

A frame-bundle formulation of quantum reference frames: from superposition of perspectives to superposition of geometries ( http://arxiv.org/abs/2406.15838v1 )

ライセンス: Link先を確認

Daniel A. Turolla Vanzella, Jeremy Butterfield,

(参考訳) 我々は、量子参照フレーム(QRF)のコアアイデアが重力の文脈で適用され、その定義が座標系のような不必要な(しかし便利な)要素から解放されるため、完全に幾何学的な定式化が可能である。私たちの定式化は2つの主要な考えに基づいている。まず、QRFは観測者の(従って測定装置の)各時空点(すなわち事象)における時間と空間の認識の不確かさを符号化する。これに対し、イベント $p$ のオブザーバは、通常のように、接空間 $T_p$ のテトラッドとしてモデル化される。したがって、イベントにおける QRF は、$p$ のテトラッド上の複素函数である。第二に、与えられた多様体上の計量を指定できるのは、各接空間に割り当てる基底が、指定したい計量の四元数であることを定義することで得られる。したがって、時空、すなわち多様体+計量は、その上の「視点」の選択とともに、基底の束の部分で表され、各点に割り当てられた基底を四元数とするものとして理解される。したがって、時空の重ね合わせは、大まかに言えば、このバンドルの切断に対する複素振幅の割り当てとして表される。ここで定義される QRF は、事象の基底に割り当てられた複素振幅の集合、すなわち多様体の基底の束上に定義される複素関数の集まりであり、局所的な方法で記述することができる(つまり、区間全体ではなく、事象の基底に振幅を帰属させる)。この定式化は、いくつかの概念的側面と、QRFに関する現在の考え方の拡張に光を当てていると信じている。例えば、幾何学的な用語で考えると、文献で扱われる重力的シナリオ(線形近似の他に)に適用されるQRFの考えは、任意性による予測力を欠いていることが明らかになる。

We provide a possible fully geometric formulation of the core idea of quantum reference frames (QRFs) as it has been applied in the context of gravity, freeing its definition from unnecessary (though convenient) ingredients, such as coordinate systems. Our formulation is based on two main ideas. First, a QRF encodes uncertainty about what is the observer's (and, hence, the measuring apparatus's) perception of time and space at each spacetime point (i.e., event). For this, an observer at an event $p$ is modeled, as usual, as a tetrad in the tangent space $T_p$. So a QRF at an event $p$ is a complex function on the tetrads at $p$. Second, we use the result that one can specify a metric on a given manifold by stipulating that a basis one assigns at each tangent space is to be a tetrad in the metric one wants to specify. Hence a spacetime, i.e. manifold plus metric, together with a choice of "point of view" on it, is represented by a section of the bundle of bases, understood as taking the basis assigned to each point to be a tetrad. Thus a superposition of spacetimes gets represented as, roughly speaking, an assignment of complex amplitudes to sections of this bundle. A QRF, defined here as the collection of complex amplitudes assigned to bases at events--i.e., a complex function defined on the bundle of bases of the manifold--can describe, in a local way (i.e., attributing the amplitudes to bases at events instead of to whole sections), these superpositions. We believe that this formulation sheds some light on some conceptual aspects and possible extensions of current ideas about QRFs. For instance, thinking in geometric terms makes it clear that the idea of QRFs applied to the gravitational scenarios treated in the literature (beyond linear approximation) lacks predictive power due to arbitrariness which, we argue, can only be resolved by some further input from physics.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# テキストベース説明可能なAIにおける局所サロゲートモデルの精度安定度に及ぼす類似度の影響

The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI ( http://arxiv.org/abs/2406.15839v1 )

ライセンス: Link先を確認

Christopher Burger, Charles Walter, Thai Le,

(参考訳) 最近の研究は、機械学習(ML)モデルの入力に対する対向的摂動に対する局所的代理法(英語版)の脆弱性について検討している。多くの方法にまたがる弱点が存在することが示されているが、その理由はいまだほとんど調査されていない。説明可能なAI(XAI)に対する敵対的攻撃の概念の中心は、ある説明が別の説明とどのように異なるかを計算するのに使用される類似度尺度である。過度に敏感な測定は過大な脆弱性をもたらすが、過度に弱さを減らしている。我々は、ケンドールのタウ、スピアマンのフットルール、ランクバイアスオーバーラップなど、テキストベースのランクリストのために設計された様々な類似度尺度について検討し、一般的な敵攻撃プロセスから生じる結論に、測定値や成功のしきい値の実質的な変化がどの程度影響するかを検証した。ある種の測定は過度に敏感であることが判明し、誤った安定性の見積がもたらされる。

Recent work has investigated the vulnerability of local surrogate methods to adversarial perturbations on a machine learning (ML) model's inputs, where the explanation is manipulated while the meaning and structure of the original input remains similar under the complex model. While weaknesses across many methods have been shown to exist, the reasons behind why still remain little explored. Central to the concept of adversarial attacks on explainable AI (XAI) is the similarity measure used to calculate how one explanation differs from another A poor choice of similarity measure can result in erroneous conclusions on the efficacy of an XAI method. Too sensitive a measure results in exaggerated vulnerability, while too coarse understates its weakness. We investigate a variety of similarity measures designed for text-based ranked lists including Kendall's Tau, Spearman's Footrule and Rank-biased Overlap to determine how substantial changes in the type of measure or threshold of success affect the conclusions generated from common adversarial attack processes. Certain measures are found to be overly sensitive, resulting in erroneous estimates of stability.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# デジタル公共商品のプライバシ要件と現実

Privacy Requirements and Realities of Digital Public Goods ( http://arxiv.org/abs/2406.15842v1 )

ライセンス: Link先を確認

Geetika Gopi, Aadyaa Maddi, Omkhar Arasaratnam, Giulia Fanti,

(参考訳) 国際開発コミュニティでは、「デジタル・パブリック・グッズ」という用語が、国連(UN)持続可能な開発目標に対応するためのオープンソースのデジタル製品(ソフトウェア、データセットなど)を指すために使われる。 DPGは、世界中の政府サービス(ID管理、医療登録など)に利用されている。 DPGは機密データを処理できるため、国連はDPGのファーストオーダー要件としてユーザプライバシを確立している。 DPGのプライバシーリスクは、現在、DPGのプライバシー姿勢を評価するために設計された質問を含む、DPG標準によって部分的に管理されている。本研究は、適切なプライバシー保護を確保するため、現行のDMG標準の有効性について検討する。本稿では,ユーザプライバシ保護に関するDSGからの回答を体系的に評価する。プライバシの脅威を特定し,DSG標準に対する回答と比較するため,広範に使用されている3つのDSGの詳細なケーススタディも提示する。以上の結果から,現在のDSG標準の評価手法の限界が明らかとなった。我々は、プライバシーに関する DPG 標準を強化するための事前勧告と提案を提示することで、結論付ける。さらに、この研究は、エンドユーザーだけでなく、サードパーティによるユーザー対応技術の採用者に対しても、プライバシーのコミュニケーションに関するより有用なプライバシー研究を促進することを願っています。

In the international development community, the term "digital public goods" is used to describe open-source digital products (e.g., software, datasets) that aim to address the United Nations (UN) Sustainable Development Goals. DPGs are increasingly being used to deliver government services around the world (e.g., ID management, healthcare registration). Because DPGs may handle sensitive data, the UN has established user privacy as a first-order requirement for DPGs. The privacy risks of DPGs are currently managed in part by the DPG standard, which includes a prerequisite questionnaire with questions designed to evaluate a DPG's privacy posture. This study examines the effectiveness of the current DPG standard for ensuring adequate privacy protections. We present a systematic assessment of responses from DPGs regarding their protections of users' privacy. We also present in-depth case studies from three widely-used DPGs to identify privacy threats and compare this to their responses to the DPG standard. Our findings reveal limitations in the current DPG standard's evaluation approach. We conclude by presenting preliminary recommendations and suggestions for strengthening the DPG standard as it relates to privacy. Additionally, we hope this study encourages more usable privacy research on communicating privacy, not only to end users but also third-party adopters of user-facing technologies.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 進化のユニタリ性に埋め込まれた量子幾何学--スピン共鳴と結晶帯における量子振動と脱落としての影響を明らかにする

Quantum geometry embedded in unitarity of evolution: revealing its impacts as quantum oscillation and dephasing in spin resonance and crystal bands ( http://arxiv.org/abs/2406.15845v1 )

ライセンス: Link先を確認

B. Q. Song, J. D. H. Smith, T. Jiang, Y. X. Yao, J. Wang,

(参考訳) 量子ホール効果は結晶中のトポロジーを明らかにする直感的な方法を提供する。ここでは、より広い概念である量子幾何学の「視覚化」の相手を探す。量子幾何学は、特定の詳細や近似から独立して、ユニタリ進化の本質的な結果として量子においてどのように現れるかを示し、量子幾何学が広く適用可能であることを示唆する。実際、スピンやバンドのシナリオにおいて、振動やデファスティングなどの幾何学的可観測物を例示する。これらの現象は幾何学の連続性のために頑健であり、幾何学的パラメータによって調整することができる。解析解と数値解の両方によって支持される異常は、幾何学的視点を採用するという利点を強調し、識別可能な実験的シグネチャをもたらす可能性がある。

Quantum Hall effects provide intuitive ways of revealing the topology in crystals, i.e., each quantized "step" represents a distinct topological state. Here, we seek a counterpart for "visualizing" quantum geometry, which is a broader concept. We show how geometry emerges in quantum as an intrinsic consequence of unitary evolution, independent of specific details or approximations, suggesting quantum geometry may have widespread applicability. Indeed, we exemplify geometric observables, such as oscillation, dephasing, in spin and band scenarios. These phenomena are robust owing to the continuity of geometry, and can be tuned by geometric parameters. Anomalies, supported by both analytic and numerical solutions, underscore the advantages of adopting a geometric perspective, potentially yielding distinguishable experimental signatures.

翻訳日:2024-06-25 20:25:27 公開日:2024-06-22

# 音声テキスト生成のための補間強化の再検討

Revisiting Interpolation Augmentation for Speech-to-Text Generation ( http://arxiv.org/abs/2406.15846v1 )

ライセンス: Link先を確認

Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang,

(参考訳) 音声テキスト生成システム(S2T)は、主にラベル付きデータセットが不足しているため、低リソースシナリオでしばしば課題に直面している。新たなソリューションの1つは、入力とラベルを補間することで仮想トレーニングサンプルを構築することである。その可能性にも拘わらず、S2Tタスクにおけるこの手法の適用は、まだ未調査のままである。本稿では,いくつかの重要な疑問に導かれる補間強化の有用性を探求する。その結果,補間強化に適切な戦略を採用することで,各種タスクやアーキテクチャ,データスケールのパフォーマンスが大幅に向上し,資源制約下でのより堅牢なS2Tシステムの実現が期待できることがわかった。

Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.