Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241011となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Manydepth2:動的シーンにおける動きを考慮した自己スーパービジョン単眼深度推定 Manydepth2: Motion-Aware Self-Supervised Monocular Depth Estimation in Dynamic Scenes ( http://arxiv.org/abs/2312.15268v3 ) ライセンス: Link先を確認	Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham,	(参考訳) 自己監督型単分子深度推定の進歩にもかかわらず、静的世界に関する仮定に依存するため、動的なシナリオでは課題が持続する。本稿では,動的対象と静的背景の両方の正確な深さ推定を実現するために,移動誘導コストボリューム深度ネットであるManddepth2を提案する。動的コンテンツによって引き起こされる課題に対処するために、光学フローと粗い単分子深度を取り入れて、新しい静的参照フレームを作成する。このフレームを使用して、目標フレームと協調してモーションガイド付きコストボリュームを構築する。さらに,ネットワーク構造の精度とレジリエンスを高めるため,様々な解像度で特徴マップからの情報を効果的に統合する注目型ディープネットアーキテクチャを導入する。同様の計算コストの手法と比較して、Multedepth2は、KITTI-2015データセット上での自己教師付き単眼深度推定において、ルート平均二乗誤差を約5%削減する。コードは:https://github.com/kaichen-z/Manydepth2 Despite advancements in self-supervised monocular depth estimation, challenges persist in dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present Manydepth2, a Motion-Guided Cost Volume Depth Net, to achieve precise depth estimation for both dynamic objects and static backgrounds, all while maintaining computational efficiency. To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a novel static reference frame. This frame is then utilized to build a motion-guided cost volume in collaboration with the target frame. Additionally, to enhance the accuracy and resilience of the network structure, we introduce an attention-based depth net architecture to effectively integrate information from feature maps with varying resolutions. Compared to methods with similar computational costs, Manydepth2 achieves a significant reduction of approximately five percent in root-mean-square error for self-supervised monocular depth estimation on the KITTI-2015 dataset. The code could be found: https://github.com/kaichen-z/Manydepth2	翻訳日:2024-11-09 09:05:28 公開日:2024-10-11
# Manydepth2:動的シーンにおける動きを考慮した自己スーパービジョン単眼深度推定 Manydepth2: Motion-Aware Self-Supervised Monocular Depth Estimation in Dynamic Scenes ( http://arxiv.org/abs/2312.15268v4 ) ライセンス: Link先を確認	Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham,	(参考訳) 自己監督型単分子深度推定の進歩にもかかわらず、静的世界に関する仮定に依存するため、動的なシナリオでは課題が持続する。本稿では,動的対象と静的背景の両方の正確な深さ推定を実現するために,移動誘導コストボリューム深度ネットであるManddepth2を提案する。動的コンテンツによって引き起こされる課題に対処するために、光学フローと粗い単分子深度を取り入れて、新しい静的参照フレームを作成する。このフレームを使用して、目標フレームと協調してモーションガイド付きコストボリュームを構築する。さらに,ネットワーク構造の精度とレジリエンスを高めるため,様々な解像度で特徴マップからの情報を効果的に統合する注目型ディープネットアーキテクチャを導入する。同様の計算コストの手法と比較して、Multedepth2は、KITTI-2015データセット上での自己教師付き単眼深度推定において、ルート平均二乗誤差を約5%削減する。コードは:https://github.com/kaichen-z/Manydepth2 Despite advancements in self-supervised monocular depth estimation, challenges persist in dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present Manydepth2, a Motion-Guided Cost Volume Depth Net, to achieve precise depth estimation for both dynamic objects and static backgrounds, all while maintaining computational efficiency. To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a novel static reference frame. This frame is then utilized to build a motion-guided cost volume in collaboration with the target frame. Additionally, to enhance the accuracy and resilience of the network structure, we introduce an attention-based depth net architecture to effectively integrate information from feature maps with varying resolutions. Compared to methods with similar computational costs, Manydepth2 achieves a significant reduction of approximately five percent in root-mean-square error for self-supervised monocular depth estimation on the KITTI-2015 dataset. The code could be found: https://github.com/kaichen-z/Manydepth2	翻訳日:2024-11-09 09:05:28 公開日:2024-10-11
# AIは人間がより良い判断を下すのに役立つか? 実験的な評価のための方法論的枠組み Does AI help humans make better decisions? A methodological framework for experimental evaluation ( http://arxiv.org/abs/2403.12108v2 ) ライセンス: Link先を確認	Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin,	(参考訳) 人工知能(AI)、あるいはより一般的にデータ駆動型アルゴリズムの使用は、今日の社会においてユビキタスになりつつある。しかし、多くの場合、特に利害関係が高い場合、人間は最終的な決定を下す。したがって、重要な疑問は、AIが人間単独のシステムやAI単独のシステムと比較して、人間のより良い意思決定を支援するかどうかである。追加の仮定を伴わずにこの問題に実験的に答える新しい方法論的枠組みを導入する。我々は、基準となる潜在的な結果に基づいて、標準分類基準を用いて正しい意思決定を行う意思決定者の能力を測定する。我々は、AI生成レコメンデーションの提供が最終決定を下す場合にランダム化される、単盲の実験的設計を考える。この実験的な設計の下で、人間と人間とAI、AIとAIの3つの代替意思決定システムのパフォーマンスを比較する方法について説明する。また、AIレコメンデーションを備えた人間意思決定者を提供する時期と、そのようなレコメンデーションに従うべき時期も示します。提案手法を,事前リスク評価器のランダム化制御試験から得られたデータに適用する。リスクアセスメントの勧告は、現金保釈を課す裁判官の決定の分類精度を向上しないことがわかった。我々の分析では、リスクアセスメントの判断は、一般的にアルゴリズムによる支援の有無にかかわらず、人間の決定よりも悪い結果が得られます。 The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to experimentally answer this question without additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with humans making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems -- human-alone, human-with-AI, and AI-alone. We also show when to provide a human-decision maker with AI recommendations and when they should follow such recommendations. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that the risk assessment-alone decisions generally perform worse than human decisions with or without algorithmic assistance.	翻訳日:2024-11-09 03:59:24 公開日:2024-10-11
# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性 Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v2 ) ライセンス: Link先を確認	Tian Xie, Xueru Zhang,	(参考訳) 機械学習(ML)モデルは、人間に関する連続的な決定を行うために、社会的ドメインでますます使われているため、データ分散を再形成する能力を持つことが多い。人間は、戦略的エージェントとして、学習システムに反応して継続的に行動に適応する。人口が動的に変化するにつれて、MLシステムは高いパフォーマンスを保証するために頻繁な更新を必要とする可能性がある。しかし、高品質な人名サンプルの取得は、社会的領域において非常に困難であり、不可能である。この問題に対処する一般的なプラクティスは、モデル自体を使用してラベルのないデータサンプルを注釈付けすることです。本稿では,MLモデルが人的戦略応答を組み込んだモデルアノテート標本で再訓練された場合の長期的影響について検討する。まず,戦略エージェントとモデル間の相互作用を形式化し,それらの動的相互作用の下でどのように進化するかを分析する。モデルが再訓練されるにつれて、エージェントは肯定的な決定を受ける傾向が増し、一方、ポジティブなラベルを持つエージェントの割合は、時間とともに減少する可能性がある。そこで本研究では,力学を安定化させる改良されたリトレーニングプロセスを提案する。最後に、これらの再訓練プロセスによってアルゴリズム的公正性がどのように影響するかを検証し、各ラウンドで共通公正性制約を課すことは、長期的には不利なグループにとって利益にならないことを発見した。半合成および実データの実験は理論的な結果を検証する。 As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.	翻訳日:2024-11-09 02:30:11 公開日:2024-10-11
# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性 Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v3 ) ライセンス: Link先を確認	Tian Xie, Xueru Zhang,	(参考訳) 機械学習(ML)モデルは、人間に関する連続的な決定を行うために、社会的ドメインでますます使われているため、データ分散を再形成する能力を持つことが多い。人間は、戦略的エージェントとして、学習システムに反応して継続的に行動に適応する。人口が動的に変化するにつれて、MLシステムは高いパフォーマンスを保証するために頻繁な更新を必要とする可能性がある。しかし、高品質な人名サンプルの取得は、社会的領域において非常に困難であり、不可能である。この問題に対処する一般的なプラクティスは、モデル自体を使用してラベルのないデータサンプルを注釈付けすることです。本稿では,MLモデルが人的戦略応答を組み込んだモデルアノテート標本で再訓練された場合の長期的影響について検討する。まず,戦略エージェントとモデル間の相互作用を形式化し,それらの動的相互作用の下でどのように進化するかを分析する。モデルが再訓練されるにつれて、エージェントは肯定的な決定を受ける傾向が増し、一方、ポジティブなラベルを持つエージェントの割合は、時間とともに減少する可能性がある。そこで本研究では,力学を安定化させる改良されたリトレーニングプロセスを提案する。最後に、これらの再訓練プロセスによってアルゴリズム的公正性がどのように影響するかを検証し、各ラウンドで共通公正性制約を課すことは、長期的には不利なグループにとって利益にならないことを発見した。半合成および実データの実験は理論的な結果を検証する。 As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.	翻訳日:2024-11-09 02:30:11 公開日:2024-10-11
# ストリーム拡散政策:可変ノイズ拡散モデルによる高速ポリシー合成 Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models ( http://arxiv.org/abs/2406.04806v3 ) ライセンス: Link先を確認	Sigmund H. Høeg, Yilun Du, Olav Egeland,	(参考訳) 拡散モデルはロボット模倣学習に急速に採用され、複雑なデキスタラスなタスクを自律的に実行できるようになった。しかし、アクション合成は遅いことが多く、反復的推論の多くのステップを必要とし、高速なリアクティブポリシーを必要とするタスクでモデルが使える範囲を制限する。これを回避するために、近年の研究では、拡散過程の蒸留が政策合成の加速にどのように役立つかが研究されている。しかし、蒸留は計算コストが高く、合成された作用の精度と多様性の両方を損なう可能性がある。 SDP(Streaming Diffusion Policy, ストリーミング拡散ポリシー)は, 部分分節化動作軌跡の生成が全出力動作軌跡よりもかなり高速であるという知見を生かして, 政策合成を高速化する代替手法である。それぞれの観測において,本手法は雑音のレベルが変化し,即時動作はノイズフリーとなり,その後の動作はノイズレベルと不確実性が増大する部分的認知行動軌跡を出力する。新しい観測のための部分分極化作用軌跡は、予め予測された雑音性作用軌跡に数ステップの分極を施すことで、迅速に生成することができる。本手法の有効性を概説し、シミュレーションと実世界の双方で性能を保ちながら、ポリシー合成を劇的に高速化する。 Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings.	翻訳日:2024-11-09 01:44:51 公開日:2024-10-11
# ウィスパーの制御:音声基礎モデル制御のための普遍的音響対立攻撃 Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models ( http://arxiv.org/abs/2407.04482v2 ) ライセンス: Link先を確認	Vyas Raina, Mark Gales,	(参考訳) 音声認識に基づくフレキシブルな音声認識システムや、音声プロンプト付き大規模言語モデル(LLM)の形で、音声認識可能な基礎モデルがますます人気を博している。これらのモデルの興味深い側面の1つは、適切なプロンプトを用いて自動音声認識(ASR)以外のタスクを実行する能力である。例えば、OpenAI Whisperモデルは、音声の書き起こしと音声翻訳の両方を実行することができる。オーディオ・プロンプテッド LLM の開発により、さらに大きな制御オプションが生まれる可能性がある。この研究では、この柔軟性により、システムはモデル制御の敵攻撃の影響を受けやすいことを実証する。モデルへのアクセスがなければ、適切な音声入力を変更することでシステムの動作を変更することができる。このリスクを説明するために、入力音声信号に短い普遍的対角音響セグメントを付加して、ASR基礎モデルの迅速な設定を上書きできることを実証する。具体的には、音声の書き起こしを設定されているにもかかわらず、Whisperが常に音声翻訳を行うように制御するために、普遍的な対角音響セグメントをうまく利用した。全体として、本研究は、この形態のモデルが展開される前に考慮すべき基礎モデルに対して、新しい形態の敵攻撃を示すものである。 Speech enabled foundation models, either in the form of flexible speech recognition based systems or audio-prompted large language models (LLMs), are becoming increasingly popular. One of the interesting aspects of these models is their ability to perform tasks other than automatic speech recognition (ASR) using an appropriate prompt. For example, the OpenAI Whisper model can perform both speech transcription and speech translation. With the development of audio-prompted LLMs there is the potential for even greater control options. In this work we demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks. Without any access to the model prompt it is possible to modify the behaviour of the system by appropriately changing the audio input. To illustrate this risk, we demonstrate that it is possible to prepend a short universal adversarial acoustic segment to any input speech signal to override the prompt setting of an ASR foundation model. Specifically, we successfully use a universal adversarial acoustic segment to control Whisper to always perform speech translation, despite being set to perform speech transcription. Overall, this work demonstrates a new form of adversarial attack on multi-tasking speech enabled foundation models that needs to be considered prior to the deployment of this form of model.	翻訳日:2024-11-08 23:46:45 公開日:2024-10-11
# 差分プライバシーにおける隠れ状態解析のための2層ReLUネットワークの近似 Approximating Two-Layer ReLU Networks for Hidden State Analysis in Differential Privacy ( http://arxiv.org/abs/2407.04884v2 ) ライセンス: Link先を確認	Antti Koskela,	(参考訳) 差分プライバシー(DP)の隠れ状態脅威モデルは、トレーニング中に中間状態を見ることなく、最終訓練された機械学習(ML)モデルにしかアクセスできないと仮定する。しかし、このモデルの下での現在のプライバシー分析は凸最適化の問題に限られており、現代のディープラーニングアプリケーションに不可欠な多層ニューラルネットワークへの適用性が低下している。さらに、分類タスクにおける隠蔽状態のプライバシー分析の最も成功した応用は、ロジスティック回帰モデルである。本研究では,DP確率勾配勾配(DP-SGD)を学習した1つの隠蔽層ReLUネットワークに匹敵する,プライバシユーティリティトレードオフによる凸問題をプライベートにトレーニングできることを実証する。我々は、ReLU最小化問題の双対な定式化を確率論的に近似することでこれを達成し、強い凸問題をもたらす。これにより、既存の隠れ状態のプライバシー分析が利用でき、ノイズの多いサイクリックなミニバッチ勾配降下法(NoisyCGD)にも正確なプライバシー境界を提供する。ベンチマーク分類タスクの実験により、NoisyCGDは1層ReLUネットワークに適用されたDP-SGDに匹敵するプライバシー利用トレードオフを達成できることが示された。さらに、凸近似によって得られるスピードアップを強調する理論的ユーティリティ境界を提供する。 The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. Current privacy analyses under this model, however, are limited to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Additionally, the most successful applications of the hidden state privacy analyses in classification tasks have been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of one hidden-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). We achieve this through a stochastic approximation of a dual formulation of the ReLU minimization problem which results in a strongly convex problem. This enables the use of existing hidden state privacy analyses, providing accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Our experiments on benchmark classification tasks show that NoisyCGD can achieve privacy-utility trade-offs comparable to DP-SGD applied to one-hidden-layer ReLU networks. Additionally, we provide theoretical utility bounds that highlight the speed-ups gained through the convex approximation.	翻訳日:2024-11-08 23:35:45 公開日:2024-10-11
# LLM圧縮の多次元安全性評価 Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression ( http://arxiv.org/abs/2407.04965v3 ) ライセンス: Link先を確認	Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar,	(参考訳) モデル圧縮技術により、大規模言語モデル(LLM)を現実世界のアプリケーションにデプロイすることが可能になる。局所的な展開に向けてのこの勢いの結果として、圧縮LDMは人口と相互作用する。圧縮に関する以前の研究は、典型的には、トレーニング損失と直接的に類似したパープレキシティの保存を優先する。圧縮法がモデル行動の他の重要な側面へ与える影響-----------------は体系的評価を必要とする。そこで本研究では,(1)退化障害,すなわち世代におけるバイアスと毒性,(2)識別的タスクにおけるバイアス,(3)方言バイアス,(4)言語モデリングと下流タスクパフォーマンスの4つの側面によるモデル圧縮の影響について検討する。本研究では,非構造化プルーニング,半構造化プルーニング,量子化など,LLM圧縮手法の幅広いスペクトルについて検討する。解析の結果,圧縮が予期せぬ結果をもたらすことが明らかとなった。圧縮は故意にLLMの変性障害を緩和するかもしれないが、それでも表現障害を悪化させる可能性がある。さらに、圧縮の増加は、異なる保護されたグループに異なる影響をもたらす。最後に、異なる圧縮法は、例えば、量子化はバイアスをほとんど保ち、プルーニングは急速に劣化する。本研究は, 実世界のアプリケーションにまたがる信頼性を確保するため, 圧縮LDMの開発に安全性評価を統合することの重要性を浮き彫りにした。 https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}}。 Increasingly, model compression techniques enable large language models (LLMs) to be deployed in real-world applications. As a result of this momentum towards local deployment, compressed LLMs will interact with a large population. Prior work on compression typically prioritize preserving perplexity, which is directly analogous to training loss. The impact of compression method on other critical aspects of model behavior\, -- \,particularly safety\, -- \,requires systematic assessment. To this end, we investigate the impact of model compression along four dimensions: (1) degeneration harm, i.e., bias and toxicity in generation; (2) representational harm, i.e., biases in discriminative tasks; (3) dialect bias; and(4) language modeling and downstream task performance. We examine a wide spectrum of LLM compression techniques, including unstructured pruning, semi-structured pruning, and quantization. Our analysis reveals that compression can lead to unexpected consequences. Although compression may unintentionally alleviate LLMs' degeneration harm, it can still exacerbate representational harm. Furthermore, increasing compression produces a divergent impact on different protected groups. Finally, different compression methods have drastically different safety impacts: for example, quantization mostly preserves bias while pruning degrades quickly. Our findings underscore the importance of integrating safety assessments into the development of compressed LLMs to ensure their reliability across real-world applications.\footnote{Our implementation and results are available here: \url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}}	翻訳日:2024-11-08 23:35:45 公開日:2024-10-11
# SEED-Story:大規模言語モデルを用いたマルチモーダル・ロングストーリー・ジェネレーション SEED-Story: Multimodal Long Story Generation with Large Language Model ( http://arxiv.org/abs/2407.08683v2 ) ライセンス: Link先を確認	Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen,	(参考訳) 画像生成とオープンフォームテキスト生成の顕著な進歩により、インターリーブされた画像テキストコンテンツの作成は、ますます興味深い分野になりつつある。物語テキストと鮮やかなイメージをインターリーブで生成する多モーダルなストーリー生成は、幅広い応用において価値ある実践的課題として現れてきた。しかし、このタスクは、テキストと画像の間の複雑な相互作用の理解と、一貫性のあるコンテキストに関連のあるテキストと視覚の長いシーケンスを生成する能力を必要とするため、重大な課題を生じさせる。本稿では,MLLM(Multimodal Large Language Model)を利用した拡張多モーダルストーリ生成手法であるSEED-Storyを提案する。我々のモデルはMLLMの強力な理解能力に基づいて、テキストトークンと視覚トークンを予測し、それを適応された視覚的デトケナイザで処理し、一貫した文字やスタイルで画像を生成する。さらに,最大25個のストーリー(トレーニング用10個)を高効率で自動回帰的に生成できるマルチモーダルアテンションシンク機構を提案する。さらに,大規模かつ高解像度なStoryStreamというデータセットを提示し,モデルをトレーニングし,様々な側面においてマルチモーダルなストーリー生成のタスクを定量的に評価する。 With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad applications. However, this task poses significant challenges, as it necessitates the comprehension of the complex interplay between texts and images, and the ability to generate long sequences of coherent, contextually relevant texts and visuals. In this work, we propose SEED-Story, a novel method that leverages a Multimodal Large Language Model (MLLM) to generate extended multimodal stories. Our model, built upon the powerful comprehension capability of MLLM, predicts text tokens as well as visual tokens, which are subsequently processed with an adapted visual de-tokenizer to produce images with consistent characters and styles. We further propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner. Additionally, we present a large-scale and high-resolution dataset named StoryStream for training our model and quantitatively evaluating the task of multimodal story generation in various aspects.	翻訳日:2024-11-08 22:17:54 公開日:2024-10-11
# MetaUrban: 都市マイクロモビリティのための体操AIシミュレーションプラットフォーム MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility ( http://arxiv.org/abs/2407.08725v2 ) ライセンス: Link先を確認	Wayne Wu, Honglin He, Jack He, Yiran Wang, Chenda Duan, Zhizheng Liu, Quanyi Li, Bolei Zhou,	(参考訳) 街並みや広場のような公共の都市空間は、住民に役立ち、活気のある変化に社会生活を適応させる。最近のロボティクスとエンボディードAIの進歩により、公共の都市空間はもはや人間専用ではない。フードデリバリーロボットと電動車椅子は歩道を歩行者と共有し始めている。公共の都市空間における短距離移動のためのAIによって実現されるマイクロモビリティは、将来の交通システムにおいて重要な要素である。モバイルデバイスを操作するAIモデルの一般化性と安全性の確保が不可欠である。本稿では,AI駆動型都市マイクロモビリティ研究のための構成シミュレーションプラットフォームであるMetaUrbanを紹介する。 MetaUrbanは、多数の地上計画、オブジェクト配置、歩行者、脆弱な道路利用者、その他の移動エージェントの外観とダイナミクスをカバーし、構成要素から無限に多くのインタラクティブな都市シーンを構築することができる。本稿では,MetaUrbanを用いた都市マイクロモビリティ研究のパイロット研究としてポイントナビゲーションとソーシャルナビゲーションタスクを設計し,強化学習と模倣学習の様々な基盤を確立する。我々は,多種多様な機械構造がAI政策の学習と実行に大きな影響を及ぼすことを示した。我々は,シミュレーション環境の組成特性が,訓練された移動体エージェントの一般化性と安全性を大幅に向上させることを示す,徹底的なアブレーション研究を行った。 MetaUrbanは、研究機会を提供し、都市で安全で信頼性の高いAIとマイクロモビリティを育むために、一般公開される。コードとデータセットが公開される。 Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while robot dogs and humanoids have recently emerged in the street. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. Ensuring the generalizability and safety of AI models maneuvering mobile machines is essential. In this work, we present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for urban micromobility research and establish various baselines of Reinforcement Learning and Imitation Learning. We conduct extensive evaluation across mobile machines, demonstrating that heterogeneous mechanical structures significantly influence the learning and execution of AI policies. We perform a thorough ablation study, showing that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide research opportunities and foster safe and trustworthy embodied AI and micromobility in cities. The code and dataset will be publicly available.	翻訳日:2024-11-08 22:17:54 公開日:2024-10-11
# AdaptEval: テキスト要約のためのドメイン適応に基づく大規模言語モデルの評価 AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization ( http://arxiv.org/abs/2407.11591v3 ) ライセンス: Link先を確認	Anum Afzal, Ribin Chalumattu, Florian Matthes, Laura Mascarell,	(参考訳) LLM(Large Language Models)を用いた抽象的な要約タスクの進歩にもかかわらず、異なるドメインに容易に適応できる能力を評価する研究が不足している。各種ドメイン間の要約タスクにおいて,様々なLLMのドメイン適応能力について,微調整と文脈内学習の両方で評価する。また、最初のドメイン適応評価スイートであるAdaptEvalも紹介する。 AdaptEvalには、ドメイン適応の分析を容易にするための、ドメインベンチマークとメトリクスのセットが含まれている。この結果から,LLMはパラメータスケールに関係なく,文脈内学習環境において同等の性能を示すことが示された。 Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domain adaptation evaluation suite. AdaptEval includes a domain benchmark and a set of metrics to facilitate the analysis of domain adaptation. Our results demonstrate that LLMs exhibit comparable performance in the in-context learning setting, regardless of their parameter scale.	翻訳日:2024-11-08 21:10:26 公開日:2024-10-11
# 変圧器言語モデルの効率的な事前学習のための量子化探索 Exploring Quantization for Efficient Pre-Training of Transformer Language Models ( http://arxiv.org/abs/2407.11722v2 ) ライセンス: Link先を確認	Kamran Chitsaz, Quentin Fournier, Gonçalo Mordido, Sarath Chandar,	(参考訳) トランスフォーマーモデルのスケールの増大は、事前学習された計算要求の増加につながった。事前学習と微調整の後に量子化が有効であることが証明されているが、事前学習中にトランスフォーマーに量子化を適用することは、言語モデリングの大規模化においてほとんど未検討のままである。本研究の目的は、線形層成分に着目したトランスフォーマーの効率的な事前学習における量子化の影響を検討することである。重み、アクティベーション、勾配、オプティマイザ状態に直線量子化を体系的に適用することにより、トレーニング中のモデル効率、安定性、性能への影響を評価する。トランスフォーマーの事前学習に適用される効果的な量子化戦略の包括的レシピを提供することにより、言語モデリング能力を維持しながら、スクラッチから高いトレーニング効率を向上する。コードはhttps://github.com/chandar-lab/EfficientLLMsで入手できる。 The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.	翻訳日:2024-11-08 20:59:00 公開日:2024-10-11
# 二重機能抽出とクロスデュアルドメインデコードによる薬物再配置の薬物放出関連予測 Boosting drug-disease association prediction for drug repositioning via dual-feature extraction and cross-dual-domain decoding ( http://arxiv.org/abs/2407.11812v2 ) ライセンス: Link先を確認	Enqiang Zhu, Xiang Li, Chanjuan Liu, Nikhil R. Pal,	(参考訳) 薬物再配置は、薬物の発見と開発という領域において、迅速かつ費用対効果の高い戦略を提供する。近年では、大規模で複雑なデータセットを分析できるため、薬物再配置の強力なツールとしてディープラーニング技術が登場している。しかし、既存の多くの手法は、薬物や疾患の特徴の相互関係を考慮せずに、ネットワーク内の近傍ノードから特徴情報を抽出することに焦点を当てており、不正確な表現につながる可能性がある。この制限に対処するために、我々は2つの特徴(類似性と関連性)を用いて、Dual-Feature Drug Repositioning Neural Network(DFDRNN)モデルを提案する。 DFDRNNは、隣接する特徴を抽出するための自己注意機構を使用して、2つの二重機能抽出モジュールを組み込んでいる: ドメイン内二重機能抽出(IntraDDFE)モジュールは、単一のドメイン内の特徴を抽出する単一のドメイン(ドラッグまたは疾患)と、ドメイン間の特徴を抽出するドメイン間二重機能抽出(InterDDFE)モジュールである。これらのモジュールを利用することで、薬物や疾患のより適切なエンコーディングを確実にする。さらに、クロスデュアルドメインデコーダは、両方のドメインにおける薬物放出関連を予測するように設計されている。提案するDFDRNNモデルは,4つのベンチマークデータセット上で6つの最先端手法を上回り,平均AUROC 0.946 と平均 AUPR 0.597 を達成している。 2つの疾患のケーススタディでは、提案されたDFDRNNモデルが現実のシナリオに適用可能であることを示し、薬物再配置におけるその有意義な可能性を示している。 Uncovering new therapeutic uses of existing drugs, drug repositioning offers a fast and cost-effective strategy and holds considerable significance in the realm of drug discovery and development. In recent years, deep learning techniques have emerged as powerful tools in drug repositioning due to their ability to analyze large and complex datasets. However, many existing methods focus on extracting feature information from nearby nodes in the network to represent drugs and diseases, without considering the potential inter-relationships between the features of drugs and diseases, which may lead to inaccurate representations. To address this limitation, we use two features (similarity and association) to capture the potential relationships between the features of drugs and diseases, proposing a Dual-Feature Drug Repositioning Neural Network (DFDRNN) model. DFDRNN uses a self-attention mechanism to extract neighbor features and incorporates two dual-feature extraction modules: the intra-domain dual-feature extraction (IntraDDFE) module for extracting features within a single domain (drugs or diseases) and the inter-domain dual-feature extraction (InterDDFE) module for extracting features across domains. By utilizing these modules, we ensure more appropriate encoding of drugs and diseases. Additionally, a cross-dual-domain decoder is designed to predict drug-disease associations in both domains. Our proposed DFDRNN model outperforms six state-of-the-art methods on four benchmark datasets, achieving an average AUROC of 0.946 and an average AUPR of 0.597. Case studies on two diseases show that the proposed DFDRNN model can be applied in real-world scenarios, demonstrating its significant potential in drug repositioning.	翻訳日:2024-11-08 20:59:00 公開日:2024-10-11
# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v2 ) ライセンス: Link先を確認	Ayush Kaushal, Tejas Vaidhya, Tejas Pandey, Aaryan Bhagat, Irina Rish,	(参考訳) 後学習量子化は、LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが、残念ながら、4ビットの精度よりも大きな性能劣化に悩まされている。別のアプローチでは、圧縮されたモデルを低ビット幅(例えば、バイナリまたは3次モデル)で直接訓練する。しかし、そのようなモデルの性能、トレーニングのダイナミクス、スケーリングの傾向はまだよく分かっていない。この問題に対処するため、99Mから3.9Bパラメータを含む54の言語モデルで構成され、300BトークンでトレーニングされたSpectra LLMスイートをトレーニングし、公開リリースする。スペクトルには、FloatLMs、ポストトレーニング後の量子化QuantLMs (3, 4, 6, 8 bits)、および3次LLMs (TriLMs)が含まれる。例えば、TriLM 3.9Bは半精度FloatLM 830Mより小さいが、常識推論と知識ベンチマークでは半精度FloatLM 3.9Bと一致する。しかし、TriLM 3.9Bは6倍の大きさのモデルであるFloatLM 3.9Bと同じくらい毒性があり、ステレオタイピングである。さらに、TriLM 3.9Bは、検証分割とWebベースのコーパスの難易度でFloatLMに遅れをとっているが、LambadaやPennTreeBankのようなあまりノイズの少ないデータセットではパフォーマンスが良くなっている。低ビット幅モデルの理解を深めるため、私たちはSpectraスイートの500以上の中間チェックポイントを \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite} でリリースしています。 Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but unfortunately, it suffers from significant performance degradation below 4-bit precision. An alternative approach involves training compressed models directly at a low bitwidth (e.g., binary or ternary models). However, the performance, training dynamics, and scaling trends of such models are not yet well understood. To address this issue, we train and openly release the Spectra LLM suite consisting of 54 language models ranging from 99M to 3.9B parameters, trained on 300B tokens. Spectra includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, and 8 bits), and ternary LLMs (TriLMs) - our improved architecture for ternary language modeling, which significantly outperforms previously proposed ternary models of a given size (in bits), matching half-precision models at scale. For example, TriLM 3.9B is (bit-wise) smaller than the half-precision FloatLM 830M, but matches half-precision FloatLM 3.9B in commonsense reasoning and knowledge benchmarks. However, TriLM 3.9B is also as toxic and stereotyping as FloatLM 3.9B, a model six times larger in size. Additionally, TriLM 3.9B lags behind FloatLM in perplexity on validation splits and web-based corpora but performs better on less noisy datasets like Lambada and PennTreeBank. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite}.	翻訳日:2024-11-08 20:48:00 公開日:2024-10-11
# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v3 ) ライセンス: Link先を確認	Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,	(参考訳) GPU計算能力の急速な進歩は、メモリ容量と帯域幅の増大を上回り、LLM(Large Language Model)推論のボトルネックを生み出した。後学習量子化は, LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが, 4ビット精度以下の性能劣化に悩まされている。本稿では,従来の浮動小数点モデル (FloatLMs) とその後量子化バージョン (QuantLMs) の代替として,低ビット幅モデル,特に第三言語モデル (TriLMs) の事前学習を検討することで,これらの課題に対処する。我々は、FloatLMs、QuantLMs、TriLMsを含む複数のビット幅にまたがる最初のオープンなLLMスイートであるSpectra LLMスイートを、300Bトークンでトレーニングされた99Mから3.9Bのパラメータで紹介する。我々の総合的な評価は、TriLMがモデルサイズ(ビット)の点で優れたスケーリング挙動を提供することを示している。驚くべきことに、スケールが10億以上のパラメータでは、TriLMは様々なベンチマークで与えられたビットサイズに対して、QuantLMとFloatLMを一貫して上回っている。特にTriLMの3.9Bパラメータは、FloatLM 830Mよりビットが少ないにもかかわらず、全てのベンチマークでFloatLM 3.9Bのパフォーマンスと一致している。全体として、この研究は低ビット幅言語モデルの実現可能性と拡張性に関する貴重な洞察を与え、より効率的なLCMの開発への道を開いた。低ビット幅モデルの理解を深めるため、私たちはSpectraスイートの500以上の中間チェックポイントを \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite} でリリースしています。 Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of model size (in bits). Surprisingly, at scales exceeding one billion parameters, TriLMs consistently outperform their QuantLM and FloatLM counterparts for a given bit size across various benchmarks. Notably, the 3.9B parameter TriLM matches the performance of the FloatLM 3.9B across all benchmarks, despite having fewer bits than FloatLM 830M. Overall, this research provides valuable insights into the feasibility and scalability of low-bitwidth language models, paving the way for the development of more efficient LLMs. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite}.	翻訳日:2024-11-08 20:48:00 公開日:2024-10-11
# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v4 ) ライセンス: Link先を確認	Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,	(参考訳) GPU計算能力の急速な進歩は、メモリ容量と帯域幅の増大を上回り、LLM(Large Language Model)推論のボトルネックを生み出した。後学習量子化は, LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが, 4ビット精度以下の性能劣化に悩まされている。本稿では,従来の浮動小数点モデル (FloatLMs) とその後量子化バージョン (QuantLMs) の代替として,低ビット幅モデル,特に第三言語モデル (TriLMs) の事前学習を検討することで,これらの課題に対処する。我々は、FloatLMs、QuantLMs、TriLMsを含む複数のビット幅にまたがる最初のオープンなLLMスイートであるSpectra LLMスイートを、300Bトークンでトレーニングされた99Mから3.9Bのパラメータで紹介する。我々の総合的な評価は、TriLMがモデルサイズ(ビット)の点で優れたスケーリング挙動を提供することを示している。驚くべきことに、スケールが10億以上のパラメータでは、TriLMは様々なベンチマークで与えられたビットサイズに対して、QuantLMとFloatLMを一貫して上回っている。特にTriLMの3.9Bパラメータは、FloatLM 830Mよりビットが少ないにもかかわらず、全てのベンチマークでFloatLM 3.9Bのパフォーマンスと一致している。全体として、この研究は低ビット幅言語モデルの実現可能性と拡張性に関する貴重な洞察を与え、より効率的なLCMの開発への道を開いた。低ビット幅モデルの理解を深めるため、私たちはSpectraスイートの500以上の中間チェックポイントを \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite} でリリースしています。 Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of model size (in bits). Surprisingly, at scales exceeding one billion parameters, TriLMs consistently outperform their QuantLM and FloatLM counterparts for a given bit size across various benchmarks. Notably, the 3.9B parameter TriLM matches the performance of the FloatLM 3.9B across all benchmarks, despite having fewer bits than FloatLM 830M. Overall, this research provides valuable insights into the feasibility and scalability of low-bitwidth language models, paving the way for the development of more efficient LLMs. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite}.	翻訳日:2024-11-08 20:48:00 公開日:2024-10-11
# スペクトル: 大規模第三言語モデルの事前学習の有効性について Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale ( http://arxiv.org/abs/2407.12327v5 ) ライセンス: Link先を確認	Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,	(参考訳) GPU計算能力の急速な進歩は、メモリ容量と帯域幅の増大を上回り、LLM(Large Language Model)推論のボトルネックを生み出した。後学習量子化は, LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが, 4ビット精度以下の性能劣化に悩まされている。本稿では,従来の浮動小数点モデル (FloatLMs) とその後量子化バージョン (QuantLMs) の代替として,低ビット幅モデル,特に第三言語モデル (TriLMs) の事前学習を検討することで,これらの課題に対処する。我々は、FloatLMs、QuantLMs、TriLMsを含む複数のビット幅にまたがる最初のオープンなLLMスイートであるSpectra LLMスイートを、300Bトークンでトレーニングされた99Mから3.9Bのパラメータで紹介する。我々の総合的な評価は、TriLMがモデルサイズ(ビット)の点で優れたスケーリング挙動を提供することを示している。驚くべきことに、スケールが10億以上のパラメータでは、TriLMは様々なベンチマークで与えられたビットサイズに対して、QuantLMとFloatLMを一貫して上回っている。特にTriLMの3.9Bパラメータは、FloatLM 830Mよりビットが少ないにもかかわらず、全てのベンチマークでFloatLM 3.9Bのパフォーマンスと一致している。全体として、この研究は低ビット幅言語モデルの実現可能性と拡張性に関する貴重な洞察を与え、より効率的なLCMの開発への道を開いた。低ビット幅モデルの理解を深めるため、Spectraスイートの500以上の中間チェックポイントをhttps://github.com/NolanoOrg/SpectraSuite.comでリリースしています。 Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of model size (in bits). Surprisingly, at scales exceeding one billion parameters, TriLMs consistently outperform their QuantLM and FloatLM counterparts for a given bit size across various benchmarks. Notably, the 3.9B parameter TriLM matches the performance of the FloatLM 3.9B across all benchmarks, despite having fewer bits than FloatLM 830M. Overall, this research provides valuable insights into the feasibility and scalability of low-bitwidth language models, paving the way for the development of more efficient LLMs. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at https://github.com/NolanoOrg/SpectraSuite.	翻訳日:2024-11-08 20:36:48 公開日:2024-10-11
# ロボットもマルチタスクが可能:クロスタスクロボットアクション生成のためのメモリアーキテクチャとLCMの統合 Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation ( http://arxiv.org/abs/2407.13505v2 ) ライセンス: Link先を確認	Hassan Ali, Philipp Allgeuer, Carlo Mazzola, Giulia Belgiovine, Burak Can Kaplan, Lukáš Gajdošech, Stefan Wermter,	(参考訳) 大規模言語モデル(LLM)は、ロボットの知覚と身体能力に則って、LLMの常識推論を基礎づけるロボットアプリケーションで最近使用されている。ヒューマノイドロボットでは、メモリは、特に、ロボットが以前のタスク状態、環境状態、実行された動作を記憶しなければならないマルチタスク設定において、現実世界の実施を促進する上でも重要な役割を果たす。本稿では,タスク間を効果的に切り替える一方で,タスク間動作を生成するためのLLMをメモリプロセスに組み込むことに対処する。提案する2層構造は,人間の認知にインスパイアされた記憶モデルと相補的な推論と追従の手法を併用した2つのLCMを特徴とする。その結果,5つのロボットタスクのベースラインよりも性能が大幅に向上し,ロボットの動作と適応タスク実行の知覚を組み合わせたLLMにメモリを統合できる可能性が示された。 Large Language Models (LLMs) have been recently used in robot applications for grounding LLM common-sense reasoning with the robot's perception and physical abilities. In humanoid robots, memory also plays a critical role in fostering real-world embodiment and facilitating long-term interactive capabilities, especially in multi-task setups where the robot must remember previous task states, environment states, and executed actions. In this paper, we address incorporating memory processes with LLMs for generating cross-task robot actions, while the robot effectively switches between tasks. Our proposed dual-layered architecture features two LLMs, utilizing their complementary skills of reasoning and following instructions, combined with a memory model inspired by human cognition. Our results show a significant improvement in performance over a baseline of five robotic tasks, demonstrating the potential of integrating memory with LLMs for combining the robot's action and perception for adaptive task execution.	翻訳日:2024-11-08 20:14:30 公開日:2024-10-11
# NeLLCom-X: 言語学習とグループコミュニケーションをシミュレートする包括的ニューラルネットワークフレームワーク NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication ( http://arxiv.org/abs/2407.13999v2 ) ライセンス: Link先を確認	Yuchen Lian, Tessa Verhoef, Arianna Bisazza,	(参考訳) 計算言語学の最近の進歩には、ランダムな記号の集合から始まる相互作用するニューラルネットワークエージェントによる人間のような言語の出現をシミュレートすることが含まれる。最近導入されたNeLLComフレームワーク(Lian et al , 2023)により、エージェントはまず人工言語を学習し、それを通信に使用することができる。このフレームワーク(NeLLCom-X)は、言語学習性、通信圧力、グループサイズ効果の相互作用を調べるために、より現実的な役割交代エージェントとグループコミュニケーションを導入することで拡張される。我々は,単語順/ケースマーキングトレードオフの出現をシミュレートした先行研究から得られた重要な知見を複製してNeLLCom-Xを検証する。次に,相互作用が言語収束とトレードオフの出現にどのように影響するかを検討する。このフレームワークは、言語進化における相互作用とグループダイナミクスの重要性を強調し、多様な言語的側面の将来のシミュレーションを促進する。 Recent advances in computational linguistics include simulating the emergence of human-like languages with interacting neural network agents, starting from sets of random symbols. The recently introduced NeLLCom framework (Lian et al., 2023) allows agents to first learn an artificial language and then use it to communicate, with the aim of studying the emergence of specific linguistics properties. We extend this framework (NeLLCom-X) by introducing more realistic role-alternating agents and group communication in order to investigate the interplay between language learnability, communication pressures, and group size effects. We validate NeLLCom-X by replicating key findings from prior research simulating the emergence of a word-order/case-marking trade-off. Next, we investigate how interaction affects linguistic convergence and emergence of the trade-off. The novel framework facilitates future simulations of diverse linguistic aspects, emphasizing the importance of interaction and group dynamics in language evolution.	翻訳日:2024-11-08 19:38:31 公開日:2024-10-11
# アートインテリジェンス:高忠実景観絵画合成のための拡散型フレームワーク Artistic Intelligence: A Diffusion-Based Framework for High-Fidelity Landscape Painting Synthesis ( http://arxiv.org/abs/2407.17229v4 ) ライセンス: Link先を確認	Wanggong Yang, Yifei Zhao,	(参考訳) 高忠実な風景画の生成は、構造と様式の両方を正確に制御する必要がある難しい課題である。本稿では,ランドスケープ・ペインティング・ジェネレーションに特化して設計された新しい拡散モデルLPGenを提案する。 LPGenは、構造的特徴とスタイル的特徴を独立に処理し、従来の絵画技法の階層的アプローチを効果的に模倣する、分離された相互注意機構を導入している。さらに、LPGenは、ランドスケープ絵画のレイアウトを制御するために設計されたマルチスケールエンコーダである構造制御器を提案し、美学と構成のバランスを損なう。さらに、このモデルは高解像度のランドスケープ画像のキュレートされたデータセットに事前トレーニングされ、異なる芸術様式で分類され、詳細で一貫した出力を確保するために微調整される。 LPGenは広範な評価を通じて、構造的に正確であるだけでなく、スタイリスティックに整合した絵画を製作する上で優れた性能を示し、現在の最先端のモデルを上回っている。この研究はAI生成芸術を進歩させ、技術と伝統的な芸術的実践の交わりを探索するための新たな道を提供する。コード、データセット、モデルの重み付けが公開されます。 Generating high-fidelity landscape paintings remains a challenging task that requires precise control over both structure and style. In this paper, we present LPGen, a novel diffusion-based model specifically designed for landscape painting generation. LPGen introduces a decoupled cross-attention mechanism that independently processes structural and stylistic features, effectively mimicking the layered approach of traditional painting techniques. Additionally, LPGen proposes a structural controller, a multi-scale encoder designed to control the layout of landscape paintings, striking a balance between aesthetics and composition. Besides, the model is pre-trained on a curated dataset of high-resolution landscape images, categorized by distinct artistic styles, and then fine-tuned to ensure detailed and consistent output. Through extensive evaluations, LPGen demonstrates superior performance in producing paintings that are not only structurally accurate but also stylistically coherent, surpassing current state-of-the-art models. This work advances AI-generated art and offers new avenues for exploring the intersection of technology and traditional artistic practices. Our code, dataset, and model weights will be publicly available.	翻訳日:2024-11-08 15:23:20 公開日:2024-10-11
# 深層学習の数学的理論 Mathematical theory of deep learning ( http://arxiv.org/abs/2407.18384v2 ) ライセンス: Link先を確認	Philipp Petersen, Jakob Zech,	(参考訳) この本は、ディープラーニングの数学的解析の紹介を提供する。これは、深層ニューラルネットワーク理論の3つの柱である近似理論、最適化理論、統計学習理論の基本的な結果をカバーしている。本書は、数学や関連分野の学生や研究者のためのガイドとして、このトピックに関する基礎知識を読者に提供することを目的としている。一般性よりも単純さを優先し、厳密でアクセスしやすい結果を提示し、ディープラーニングを支える基本的な数学的概念を理解するのに役立つ。 This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning.	翻訳日:2024-11-08 14:50:05 公開日:2024-10-11
# レービー確率モデルの騒音化 Denoising Lévy Probabilistic Models ( http://arxiv.org/abs/2407.18609v2 ) ライセンス: Link先を確認	Dario Shariatian, Umut Simsekli, Alain Durmus,	(参考訳) 拡散生成モデルにおけるガウシアンを超えての雑音分布の探索は未解決の問題である。ガウスのケースは実験的、理論的に成功し、スコアベースおよびデノゲーションの定式化に統一されたSDEフレームワークを適合させた。近年の研究では、重み付きノイズ分布はモード崩壊に対処し、クラス不均衡、重み付きテール、または外れ値を持つデータセットを管理することが示唆されている。 Yoon et al (NeurIPS 2023) は L'evy-Ito モデル (LIM) を導入し、SDE フレームワークを$\alpha$-stable ノイズでヘビーテール SDE に拡張した。理論上のエレガンスと性能の向上にもかかわらず、LIMの複雑な数学はアクセシビリティとより広範な採用を制限する可能性がある。本研究は,拡散確率モデル(DDPM)を$\alpha$-stableノイズで拡張し,L''evy確率モデル(DLPM)を作成した。初等証明手法を用いることで,DLPMは最小限の変更でバニラDDPMの実行を減らし,最小限の変更で既存の実装を利用できることを示す。 DLPMとLIMは異なるトレーニングアルゴリズムを持ち、ガウスの場合とは異なり、異なる後方プロセスとサンプリングアルゴリズムを認めている。実験により,DLPMは,データ分散テールのカバレッジの向上,不均衡なデータセットの生成の改善,後方ステップの削減による計算時間の短縮を実現している。 Investigating noise distribution beyond Gaussian in diffusion generative models is an open problem. The Gaussian case has seen success experimentally and theoretically, fitting a unified SDE framework for score-based and denoising formulations. Recent studies suggest heavy-tailed noise distributions can address mode collapse and manage datasets with class imbalance, heavy tails, or outliers. Yoon et al. (NeurIPS 2023) introduced the L\'evy-Ito model (LIM), extending the SDE framework to heavy-tailed SDEs with $\alpha$-stable noise. Despite its theoretical elegance and performance gains, LIM's complex mathematics may limit its accessibility and broader adoption. This study takes a simpler approach by extending the denoising diffusion probabilistic model (DDPM) with $\alpha$-stable noise, creating the denoising L\'evy probabilistic model (DLPM). Using elementary proof techniques, we show DLPM reduces to running vanilla DDPM with minimal changes, allowing the use of existing implementations with minimal changes. DLPM and LIM have different training algorithms and, unlike the Gaussian case, they admit different backward processes and sampling algorithms. Our experiments demonstrate that DLPM achieves better coverage of data distribution tail, improved generation of unbalanced datasets, and faster computation times with fewer backward steps.	翻訳日:2024-11-08 14:50:05 公開日:2024-10-11
# SOAP-RL:POMDP環境における強化学習のための逐次オプションアドバンテージプロパゲーション SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments ( http://arxiv.org/abs/2407.18913v2 ) ライセンス: Link先を確認	Shu Ishida, João F. Henriques,	(参考訳) この研究は、強化学習アルゴリズムを部分的に観測されたマルコフ決定プロセス(POMDP)に拡張する方法とオプションを比較する。オプションの1つの見解は、時間的に拡張されたアクションであり、エージェントがポリシーのコンテキストウィンドウを越えて歴史的な情報を保持できるメモリとして実現することができる。オプションの割り当てはヒューリスティックスと手作りの目的を使って扱うことができるが、時間的に一貫した選択肢と関連するサブ政治を明示的な監督なしに学ぶことは困難である。 PPOEMとSOAPという2つのアルゴリズムが提案され、この問題に深く取り組むために研究されている。 PPOEM は (Hidden Markov Models の)フォワードバックワードアルゴリズムを適用して,オプション拡張ポリシに対する期待リターンを最適化する。しかし、この学習アプローチは、オン・ポリティクスのロールアウト中に不安定である。オプションの割り当ては、エピソード全体が利用可能なオフラインシーケンスに最適化されているため、将来の軌跡を知ることなく因果ポリシーを学ぶのにも適していない。別のアプローチとして、SOAPは最適なオプション割り当てのためのポリシー勾配を評価します。これは、GAE(Generalized advantage estimation)の概念を拡張して、オプションの利点を時間を通して伝播させ、オプションポリシー勾配の時間的バックプロパゲーションの実行と等価な分析を行う。このオプションポリシーは、エージェントの歴史にのみ条件付きであり、将来のアクションではない。競合するベースラインに対して評価され、SOAPは最も堅牢なパフォーマンスを示し、POMDPの廊下環境と、AtariやMuJoCoなどの標準ベンチマーク、PPOEM、LSTM、Option-Criticベースラインを正しく検出した。オープンソースコードはhttps://github.com/shuishida/SoapRL.comで公開されている。 This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervision is a challenge. Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem. PPOEM applies the forward-backward algorithm (for Hidden Markov Models) to optimize the expected returns for an option-augmented policy. However, this learning approach is unstable during on-policy rollouts. It is also unsuited for learning causal policies without the knowledge of future trajectories, since option assignments are optimized for offline sequences where the entire episode is available. As an alternative approach, SOAP evaluates the policy gradient for an optimal option assignment. It extends the concept of the generalized advantage estimation (GAE) to propagate option advantages through time, which is an analytical equivalent to performing temporal back-propagation of option policy gradients. This option policy is only conditional on the history of the agent, not future actions. Evaluated against competing baselines, SOAP exhibited the most robust performance, correctly discovering options for POMDP corridor environments, as well as on standard benchmarks including Atari and MuJoCo, outperforming PPOEM, as well as LSTM and Option-Critic baselines. The open-sourced code is available at https://github.com/shuishida/SoapRL.	翻訳日:2024-11-08 14:50:05 公開日:2024-10-11
# ビジョンランゲージモデルを用いたゼロショットにおけるロボティクス問題の解法 Solving Robotics Problems in Zero-Shot with Vision-Language Models ( http://arxiv.org/abs/2407.19094v3 ) ライセンス: Link先を確認	Zidan Wang, Rui Shen, Bradly Stadie,	(参考訳) ゼロショット方式でロボットの問題を解くために設計された多エージェント視覚大言語モデル(VLLM)フレームワークであるWonderful Teamを紹介した。我々の文脈では、ゼロショットとは、新しい環境において、ロボットの周囲のイメージとタスク記述をVLLMに提供し、ロボットがタスクを完了するために必要なアクションのシーケンスをVLLMが出力することを意味する。ロボット固有のデータに対するLLMの調整や、別々のビジョンエンコーダのトレーニングなど、パイプラインの微調整が必要な以前の作業とは異なり、当社のアプローチでは、慎重にエンジニアリングすることで、単一のオフザシェルフVLLMが、高レベルの計画から低レベルのロケーション抽出、アクション実行に至るまで、ロボットタスクのすべての側面を自律的に処理できることが示されています。重要なことに、GPT-4o単独で使うのに比べ、Wonderful Teamは自己修正的であり、自分自身のミスを反復的に修正できるため、長期的な課題を解決できる。我々は、VIMABenchを用いたシミュレーション環境と実世界の環境の両方において、広範な実験を通してフレームワークを検証する。私たちのシステムは、操作、ゴールリーチ、視覚的推論といった多様なタスクを、すべてゼロショットで処理できる能力を示しています。これらの結果は、この1年で視覚言語モデルは急速に進歩し、多くのロボティクス問題のバックボーンとして強く考えるべきである、という重要なポイントを浮き彫りにしている。 We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework designed to solve robotics problems in a zero-shot regime. In our context, zero-shot means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description, and the VLLM outputs the sequence of actions necessary for the robot to complete the task. Unlike prior work that requires fine-tuning parts of the pipeline -- such as adjusting an LLM on robot-specific data or training separate vision encoders -- our approach demonstrates that with careful engineering, a single off-the-shelf VLLM can autonomously handle all aspects of a robotics task, from high-level planning to low-level location extraction and action execution. Crucially, compared to using GPT-4o alone, Wonderful Team is self-corrective and capable of iteratively fixing its own mistakes, enabling it to solve challenging long-horizon tasks. We validate our framework through extensive experiments, both in simulated environments using VIMABench and in real-world settings. Our system showcases the ability to handle diverse tasks such as manipulation, goal-reaching, and visual reasoning -- all in a zero-shot manner. These results underscore a key point: vision-language models have progressed rapidly in the past year and should be strongly considered as a backbone for many robotics problems moving forward.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-11
# ビジョンランゲージモデルを用いたゼロショットにおけるロボティクス問題の解法 Solving Robotics Problems in Zero-Shot with Vision-Language Models ( http://arxiv.org/abs/2407.19094v4 ) ライセンス: Link先を確認	Zidan Wang, Rui Shen, Bradly Stadie,	(参考訳) ゼロショット方式でロボットの問題を解くために設計された多エージェント視覚大言語モデル(VLLM)フレームワークであるWonderful Teamを紹介した。我々の文脈では、ゼロショットとは、新しい環境において、ロボットの周囲のイメージとタスク記述をVLLMに提供し、ロボットがタスクを完了するために必要なアクションのシーケンスをVLLMが出力することを意味する。ロボット固有のデータに対するLLMの調整や、別々のビジョンエンコーダのトレーニングなど、パイプラインの微調整が必要な以前の作業とは異なり、当社のアプローチでは、慎重にエンジニアリングすることで、単一のオフザシェルフVLLMが、高レベルの計画から低レベルのロケーション抽出、アクション実行に至るまで、ロボットタスクのすべての側面を自律的に処理できることが示されています。重要なことに、GPT-4o単独で使うのに比べ、Wonderful Teamは自己修正的であり、自分自身のミスを反復的に修正できるため、長期的な課題を解決できる。我々は、VIMABenchを用いたシミュレーション環境と実世界の環境の両方において、広範な実験を通してフレームワークを検証する。私たちのシステムは、操作、ゴールリーチ、視覚的推論といった多様なタスクを、すべてゼロショットで処理できる能力を示しています。これらの結果は、この1年で視覚言語モデルは急速に進歩し、多くのロボティクス問題のバックボーンとして強く考えるべきである、という重要なポイントを浮き彫りにしている。 We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework designed to solve robotics problems in a zero-shot regime. In our context, zero-shot means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description, and the VLLM outputs the sequence of actions necessary for the robot to complete the task. Unlike prior work that requires fine-tuning parts of the pipeline -- such as adjusting an LLM on robot-specific data or training separate vision encoders -- our approach demonstrates that with careful engineering, a single off-the-shelf VLLM can autonomously handle all aspects of a robotics task, from high-level planning to low-level location extraction and action execution. Crucially, compared to using GPT-4o alone, Wonderful Team is self-corrective and capable of iteratively fixing its own mistakes, enabling it to solve challenging long-horizon tasks. We validate our framework through extensive experiments, both in simulated environments using VIMABench and in real-world settings. Our system showcases the ability to handle diverse tasks such as manipulation, goal-reaching, and visual reasoning -- all in a zero-shot manner. These results underscore a key point: vision-language models have progressed rapidly in the past year and should be strongly considered as a backbone for many robotics problems moving forward.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-11
# 大規模言語モデルを自動抑うつ分類のための3モードアーキテクチャに統合する Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification ( http://arxiv.org/abs/2407.19340v4 ) ライセンス: Link先を確認	Santosh V. Patapati,	(参考訳) メジャー・うつ病(Major Depressive Disorder、MDD)は、世界中の3億人に影響を及ぼす広汎な精神疾患である。本研究は, 臨床面接記録からのうつ病のバイナリ分類のための, BiLSTM に基づくトリモーダルモデルレベルの融合アーキテクチャを提案する。提案アーキテクチャでは、Mel Frequency Cepstral Coefficients, Facial Action Unitsを組み込み、2ショット学習に基づくGPT-4モデルを用いてテキストデータを処理する。これは、このタスクのために、大規模な言語モデルをマルチモーダルアーキテクチャに組み込む最初の作業である。 DAIC-WOZ AVEC 2016 Challenge cross-validation splitとLeave-One-Subject-Out cross-validation splitは、すべてのベースラインモデルと複数の最先端モデルを上回っている。 Leave-One-Subject-Outテストでは91.01%の精度、F1スコア85.95%の精度、80%の精度、92.86%のリコールを達成した。 Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-11
# DAIC-WOZに基づく自動抑うつ分類のための大規模言語モデルの3モードアーキテクチャへの統合 Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification on the DAIC-WOZ ( http://arxiv.org/abs/2407.19340v5 ) ライセンス: Link先を確認	Santosh V. Patapati,	(参考訳) メジャー・うつ病(Major Depressive Disorder、MDD)は、世界中の3億人に影響を及ぼす広汎な精神疾患である。本研究は, 臨床面接記録からのうつ病のバイナリ分類のための, BiLSTM に基づくトリモーダルモデルレベルの融合アーキテクチャを提案する。提案アーキテクチャでは、Mel Frequency Cepstral Coefficients, Facial Action Unitsを組み込み、2ショット学習に基づくGPT-4モデルを用いてテキストデータを処理する。これは、このタスクのために、大規模な言語モデルをマルチモーダルアーキテクチャに組み込む最初の作業である。 DAIC-WOZ AVEC 2016 Challenge cross-validation splitとLeave-One-Subject-Out cross-validation splitは、すべてのベースラインモデルと複数の最先端モデルを上回っている。 Leave-One-Subject-Outテストでは91.01%の精度、F1スコア85.95%の精度、80%の精度、92.86%のリコールを達成した。 Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-11
# 医療における感性推論 Sentiment Reasoning for Healthcare ( http://arxiv.org/abs/2407.21054v2 ) ライセンス: Link先を確認	Khai Le-Duc, Khai-Nguyen Nguyen, Bach Phan Tat, Duy Le, Jerry Ngo, Long Vo-Dang, Anh Totti Nguyen, Truong-Son Hy,	(参考訳) AI意思決定の透明性は、エラーによる深刻な結果のため、医療において不可欠であり、感情分析タスクにおいて、AIとユーザ間の信頼を構築する上で重要である。推論機能を組み込むことで、LLM(Large Language Models)は、より広い文脈における人間の感情を理解し、曖昧であいまいな言語を扱い、明確に述べられていない基本的な感情を推測する。本研究では,音声とテキストの両モードに対して,新たなタスクであるSentiment Reasoningを導入し,マルチモーダルなマルチタスクフレームワークとデータセットを提案する。本研究は,有理化訓練により,人文・ASR設定の感情分類におけるモデル性能が向上することを示した。また、生成した有理数は通常、人為的有理数と比較して異なる語彙を示すが、類似した意味論は維持する。すべてのコード、データ(英訳、ベトナム語)、モデルはオンラインで公開されている。 Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be explicitly stated. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Our study showed that rationale-augmented training enhances model performance in sentiment classification across both human transcript and ASR settings. Also, we found that the generated rationales typically exhibit different vocabularies compared to human-generated rationales, but maintain similar semantics. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed	翻訳日:2024-11-08 13:51:33 公開日:2024-10-11
# 医療における感性推論 Sentiment Reasoning for Healthcare ( http://arxiv.org/abs/2407.21054v3 ) ライセンス: Link先を確認	Khai-Nguyen Nguyen, Khai Le-Duc, Bach Phan Tat, Duy Le, Long Vo-Dang, Truong-Son Hy,	(参考訳) AIヘルスケアの意思決定における透明性は、AIとユーザ間の信頼を構築するために不可欠である。推論機能を組み込むことで、Large Language Models(LLM)はコンテキスト内の感情を理解し、ニュアンス付き言語を扱い、未定の感情を推測することができる。本研究では,音声とテキストの両モードに対して,新たなタスクであるSentiment Reasoningを導入し,マルチモーダルなマルチタスクフレームワークとデータセットを提案する。感性推論は感情分析における補助的タスクであり、モデルが感情ラベルの両方を予測し、入力の書き起こしに基づいてその背景にある理性を生成する。本研究は,人間に匹敵する品質のモデル予測の合理性を提供するとともに,モデル性能(精度とマクロF1)の1%向上)を合理的な微調整により向上させることにより,感性推論がモデルの透明性向上に役立つことを示す。また,ヒトとASR転写産物の有意な意味的品質の差は認められなかった。すべてのコード、データ(英訳とベトナム語)、モデルはオンラインで公開されている。 Transparency in AI healthcare decision-making is crucial for building trust among AI and users. Incorporating reasoning capabilities enables Large Language Models (LLMs) to understand emotions in context, handle nuanced language, and infer unstated sentiments. In this work, we introduce a new task -- Sentiment Reasoning -- for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts both the sentiment label and generates the rationale behind it based on the input transcript. Our study conducted on both human transcripts and Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helps improve model transparency by providing rationale for model prediction with quality semantically comparable to humans while also improving model performance (1% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also, no significant difference in the semantic quality of generated rationales between human and ASR transcripts. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed.	翻訳日:2024-11-08 13:51:33 公開日:2024-10-11
# 反復フォローアップ質問による検索機能向上 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions ( http://arxiv.org/abs/2408.00727v3 ) ライセンス: Link先を確認	Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang,	(参考訳) 大規模言語モデル(LLM)の創発的能力は、医学的問題を解く大きな可能性を示している。医学的な知識を持つことができるが、それでも幻覚があり、知識の更新には柔軟性がない。 Retrieval-Augmented Generation (RAG) は、外部知識ベースを用いたLSMの医療質問応答能力を高めるために提案されているが、複数の情報検索が必要な複雑なケースでは失敗する可能性がある。このような問題に対処するため,医学用反復RAG(i-MedRAG)を提案する。 i-MedRAGの各イテレーションでは、フォローアップクエリは従来のRAGシステムによって応答され、次のイテレーションでクエリ生成をガイドするためにさらに使用される。 I-MedRAG による様々な LLM の性能向上を,米国医学ライセンス試験 (USMLE) における臨床ヴィグネットの複雑な質問に対する従来の RAG との比較,および,Multitask Language Understanding (MMLU) データセットにおける様々な知識テストと比較した。特に、ゼロショットのi-MedRAGは、GPT-3.5上の既存のプロンプトエンジニアリングと微調整手法を全て上回り、MedQAデータセットの精度は69.68%である。さらに、i-MedRAGのスケーリング特性を、追従クエリの異なるイテレーションと、反復毎に異なるクエリ数で特徴付ける。今回のケーススタディでは,i-MedRAGが順応的にフォローアップクエリを問合せして推論連鎖を形成できることが示され,医学的質問の詳細な分析が可能となった。我々の知る限りでは、フォローアップクエリを医療用RAGに組み込むための最初の研究である。 i-MedRAGの実装はhttps://github.com/Teddy-XiongGZ/MedRAGで公開されている。 The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a conventional RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with conventional RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG. The implementation of i-MedRAG is available at https://github.com/Teddy-XiongGZ/MedRAG.	翻訳日:2024-11-08 13:29:21 公開日:2024-10-11
# 一般化量子スタインの補題と量子資源理論の第二法則 Generalized Quantum Stein's Lemma and Second Law of Quantum Resource Theories ( http://arxiv.org/abs/2408.02722v2 ) ライセンス: Link先を確認	Masahito Hayashi, Hayata Yamasaki,	(参考訳) 熱力学の第二の法則は物理学の基礎であり、単一の関数であるエントロピーを通した熱力学状態間の可換性を特徴づける。熱力学の普遍的な適用性を考えると、量子情報理論における基本的な疑問は、量子情報処理のリソースの変換性を単一の関数で特徴づけるために、類似の第2法則を定式化できるかどうかである。 2008年、有望な定式化が提案され、仮説テストの量子バージョンの変種における最適性能とリソース変換可能性のリンクが提案された。この定式化の中心は一般化された量子シュタインの補題であり、これは資源の正則化された相対エントロピーである量子資源の測度によってこの最適性能を特徴づけることを目的としていた。もし有効であると証明された場合、一般化された量子シュタインの補題は、熱力学におけるエントロピーの役割を果たすリソースの正則化された相対エントロピーを持つ量子資源の第二法則に繋がる。しかし2023年、この補題の元々の証明に論理的なギャップが見つかり、そのような第二法則の定式化の可能性に疑問が投げかけられた。本研究では、代替手法を開発し、一般化された量子シュタイン補題の証明に成功したことにより、この問題を解決する。この証明に基づき、量子状態の静的リソースと古典量子(CQ)チャネルで表される動的リソースの基本クラスの両方に適用できる量子資源理論の定式化を第2法則で再確立し、拡張する。これらの結果は、熱力学と量子情報理論の類似をブリッジする根本的な問題を解決している。 The second law of thermodynamics is the cornerstone of physics, characterizing the convertibility between thermodynamic states through a single function, entropy. Given the universal applicability of thermodynamics, a fundamental question in quantum information theory is whether an analogous second law can be formulated to characterize the convertibility of resources for quantum information processing by a single function. In 2008, a promising formulation was proposed, linking resource convertibility to the optimal performance of a variant of the quantum version of hypothesis testing. Central to this formulation was the generalized quantum Stein's lemma, which aimed to characterize this optimal performance by a measure of quantum resources, the regularized relative entropy of resource. If proven valid, the generalized quantum Stein's lemma would lead to the second law for quantum resources, with the regularized relative entropy of resource taking the role of entropy in thermodynamics. However, in 2023, a logical gap was found in the original proof of this lemma, casting doubt on the possibility of such a formulation of the second law. In this work, we resolve this problem by developing alternative techniques and successfully proving the generalized quantum Stein's lemma. Based on our proof, we reestablish and extend the formulation of quantum resource theories with the second law, applicable to both static resources of quantum states and a fundamental class of dynamical resources represented by classical-quantum (CQ) channels. These results resolve the fundamental problem of bridging the analogy between thermodynamics and quantum information theory.	翻訳日:2024-11-08 12:55:50 公開日:2024-10-11
# 一般化量子スタインの補題と量子資源理論の第二法則 Generalized Quantum Stein's Lemma and Second Law of Quantum Resource Theories ( http://arxiv.org/abs/2408.02722v3 ) ライセンス: Link先を確認	Masahito Hayashi, Hayata Yamasaki,	(参考訳) 熱力学の第二の法則は物理学の基礎であり、単一の関数であるエントロピーを通した熱力学状態間の可換性を特徴づける。熱力学の普遍的な適用性を考えると、量子情報理論における基本的な疑問は、量子情報処理のリソースの変換性を単一の関数で特徴づけるために、類似の第2法則を定式化できるかどうかである。 2008年、有望な定式化が提案され、仮説テストの量子バージョンの変種における最適性能とリソース変換可能性のリンクが提案された。この定式化の中心は一般化された量子シュタインの補題であり、これは資源の正則化された相対エントロピーである量子資源の測度によってこの最適性能を特徴づけることを目的としていた。もし有効であると証明された場合、一般化された量子シュタインの補題は、熱力学におけるエントロピーの役割を果たすリソースの正則化された相対エントロピーを持つ量子資源の第二法則に繋がる。しかし2023年、この補題の元々の証明に論理的なギャップが見つかり、そのような第二法則の定式化の可能性に疑問が投げかけられた。本研究では,従来の解析よりも小さな仮定の集合の下で一般化された量子シュタインの補題を証明するための代替手法を開発し,この問題に対処する。我々の証明に基づき、量子状態の静的リソースと古典量子(CQ)チャネルで表される動的リソースの基本クラスの両方に適用可能な量子資源理論の第2法則を再確立し、拡張する。これらの結果は、熱力学と量子情報理論の類似をブリッジする根本的な問題を解決している。 The second law of thermodynamics is the cornerstone of physics, characterizing the convertibility between thermodynamic states through a single function, entropy. Given the universal applicability of thermodynamics, a fundamental question in quantum information theory is whether an analogous second law can be formulated to characterize the convertibility of resources for quantum information processing by a single function. In 2008, a promising formulation was proposed, linking resource convertibility to the optimal performance of a variant of the quantum version of hypothesis testing. Central to this formulation was the generalized quantum Stein's lemma, which aimed to characterize this optimal performance by a measure of quantum resources, the regularized relative entropy of resource. If proven valid, the generalized quantum Stein's lemma would lead to the second law for quantum resources, with the regularized relative entropy of resource taking the role of entropy in thermodynamics. However, in 2023, a logical gap was found in the original proof of this lemma, casting doubt on the possibility of such a formulation of the second law. In this work, we address this problem by developing alternative techniques to successfully prove the generalized quantum Stein's lemma under a smaller set of assumptions than the original analysis. Based on our proof, we reestablish and extend the second law of quantum resource theories, applicable to both static resources of quantum states and a fundamental class of dynamical resources represented by classical-quantum (CQ) channels. These results resolve the fundamental problem of bridging the analogy between thermodynamics and quantum information theory.	翻訳日:2024-11-08 12:55:50 公開日:2024-10-11
# 消費者用空中水速スキャナ(AASS)と深層学習に基づく超高分解能再構築・検出ネットワークを用いた水中リッターモニタリング Underwater litter monitoring using consumer-grade aerial-aquatic speedy scanner (AASS) and deep learning based super-resolution reconstruction and detection network ( http://arxiv.org/abs/2408.03564v2 ) ライセンス: Link先を確認	Fan Zhao, Yongying Liu, Jiaqi Wang, Yijia Chen, Dianhan Xi, Xinlei Shao, Shigeru Tabeta, Katsunori Mizuno,	(参考訳) 水中のゴミは湖、川、海などの水生環境に広く散らばっており、自然生態系に大きな影響を与えている。調査効率、コスト、環境条件における水中ゴミ検出の現在のモニタリング技術は、自動検出のための効率的でコンシューマレベルの技術の必要性を強調している。本研究では,Aerial-Aquatic Speedy Scanner (AASS) とSuper-Resolution Reconstruction (SRR) と改良されたYOLOv8検出ネットワークを紹介する。 AASSは従来の手法よりもデータ取得効率を高め、水中の廃棄物を正確に識別する高品質な画像をキャプチャする。 SRRは、動きのぼやけと解像度の不十分さを軽減し、画像分解能を向上させる。具体的には、RCANモデルは、試験されたSRRモデルのうち、再構成された画像の精度を78.6%と最も高い平均精度(mAP)を達成した。倍率係数が4のSRRテストセットは,従来のバイコビックセットに比べて改善されたmAPを示す。これらの結果から,提案手法の有効性が示唆された。 Underwater litter is widely spread across aquatic environments such as lakes, rivers, and oceans, significantly impacting natural ecosystems. Current monitoring technologies for detecting underwater litter face limitations in survey efficiency, cost, and environmental conditions, highlighting the need for efficient, consumer-grade technologies for automatic detection. This research introduces the Aerial-Aquatic Speedy Scanner (AASS) combined with Super-Resolution Reconstruction (SRR) and an improved YOLOv8 detection network. AASS enhances data acquisition efficiency over traditional methods, capturing high-quality images that accurately identify underwater waste. SRR improves image-resolution by mitigating motion blur and insufficient resolution, thereby enhancing detection tasks. Specifically, the RCAN model achieved the highest mean average precision (mAP) of 78.6% for detection accuracy on reconstructed images among the tested SRR models. With a magnification factor of 4, the SRR test set shows an improved mAP compared to the conventional bicubic set. These results demonstrate the effectiveness of the proposed method in detecting underwater litter.	翻訳日:2024-11-08 12:33:46 公開日:2024-10-11
# 分散型製造システムにおけるプロセス最適化のための状態ベースポテンシャルゲームへの移行学習 Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems ( http://arxiv.org/abs/2408.05992v2 ) ライセンス: Link先を確認	Steve Yuwono, Dorothea Schwung, Andreas Schwung,	(参考訳) 本稿では,製造システムにおける分散自己最適化の促進を目的とした,状態ベースポテンシャルゲーム(TL-SbPG)における新しいトランスファー学習手法を提案する。提案手法は, 大規模システムにおける自己学習のメカニズムを改善するために, 類似プレイヤー間の知識の共有と伝達を行う実践的な産業環境に焦点をあてる。 TL-SbPGでは、獲得した知識を他のプレイヤーが再利用してポリシーを最適化し、プレイヤーの学習結果を改善し、学習プロセスの加速を図ることができる。この目標を達成するために,プレイヤーの移動学習概念と類似性基準を開発し,2つの異なる設定を提供する。 a) プレーヤとプレーヤの事前定義された類似性 (b) トレーニング中に選手間の類似性を動的に推定した。我々は、転写学習におけるSbPGフレームワークの適用性を正式に証明する。さらに,学習段階における伝達学習手順の最適タイミングと重み付けを決定するための効率的な手法を提案する。実験室規模のテストベッドを用いた実験により, TL-SbPGは生産効率を著しく向上するとともに, 生産スケジュールの消費電力を低減し, ネイティブSbPGよりも優れた性能を示した。 This paper presents a novel transfer learning approach in state-based potential games (TL-SbPGs) for enhancing distributed self-optimization in manufacturing systems. The approach focuses on the practical relevant industrial setting where sharing and transferring gained knowledge among similar-behaved players improves the self-learning mechanism in large-scale systems. With TL-SbPGs, the gained knowledge can be reused by other players to optimize their policies, thereby improving the learning outcomes of the players and accelerating the learning process. To accomplish this goal, we develop transfer learning concepts and similarity criteria for players, which offer two distinct settings: (a) predefined similarities between players and (b) dynamically inferred similarities between players during training. We formally prove the applicability of the SbPG framework in transfer learning. Additionally, we introduce an efficient method to determine the optimal timing and weighting of the transfer learning procedure during the training phase. Through experiments on a laboratory-scale testbed, we demonstrate that TL-SbPGs significantly boost production efficiency while reducing power consumption of the production schedules while also outperforming native SbPGs.	翻訳日:2024-11-08 11:38:16 公開日:2024-10-11
# 希少・曖昧な単語の文脈化による大規模言語モデルに基づく音声認識の強化 Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words ( http://arxiv.org/abs/2408.08027v2 ) ライセンス: Link先を確認	Kento Nozawa, Takashi Masuko, Toru Taniguchi,	(参考訳) 我々は,テキストプロンプトの先行情報としてキーワードを提供することで,文脈認識が可能な大規模言語モデル (LLM) に基づく自動音声認識システムを開発した。我々はデコーダのみのアーキテクチャを採用し、日本語と英語が支配するデータセットをデコーダとして、スクラッチから事前学習した社内LLMであるPLaMo-100Bをデコーダとして使用する。我々は、事前訓練されたWhisperエンコーダをオーディオエンコーダとして採用し、オーディオエンコーダからのオーディオ埋め込みをアダプタ層によりテキスト埋め込み空間に投影し、テキストプロンプトから変換されたテキスト埋め込みと結合してデコーダへの入力を形成する。テキストプロンプトの先行情報としてキーワードを提供することにより、入力音声中の曖昧な単語を正確に書き起こすためにモデルアーキテクチャを変更することなく、LLMベースのASRシステムを文脈化することができる。実験結果から,デコーダにキーワードを付与することで,希少かつ曖昧な単語の認識性能を大幅に向上させることができることがわかった。 We develop a large language model (LLM) based automatic speech recognition (ASR) system that can be contextualized by providing keywords as prior information in text prompts. We adopt decoder-only architecture and use our in-house LLM, PLaMo-100B, pre-trained from scratch using datasets dominated by Japanese and English texts as the decoder. We adopt a pre-trained Whisper encoder as an audio encoder, and the audio embeddings from the audio encoder are projected to the text embedding space by an adapter layer and concatenated with text embeddings converted from text prompts to form inputs to the decoder. By providing keywords as prior information in the text prompts, we can contextualize our LLM-based ASR system without modifying the model architecture to transcribe ambiguous words in the input audio accurately. Experimental results demonstrate that providing keywords to the decoder can significantly improve the recognition performance of rare and ambiguous words.	翻訳日:2024-11-08 07:29:14 公開日:2024-10-11
# Ojaの可塑性規則は、生物学的制約下でニューラルネットワークを訓練する際のいくつかの課題を克服する Oja's plasticity rule overcomes several challenges of training neural networks under biological constraints ( http://arxiv.org/abs/2408.08408v2 ) ライセンス: Link先を確認	Navid Shervani-Tabar, Marzieh Alireza Mirhoseini, Robert Rosenbaum,	(参考訳) 生体神経回路と深層人工ニューラルネットワーク(DNN)の類似点と相違点については,多くの文献がある。しかし、DNNの現代的なトレーニングは、データのバッチ化、正規化、適応オプティマイザ、正確なウェイト初期化といったいくつかのエンジニアリングトリックに依存している。 DNNのトレーニングにおいて重要な役割を担っているにもかかわらず、これらのエンジニアリングのトリックは、生物学的ネットワークと人工ネットワークの並行性を描画する際にしばしば見過ごされる。本研究では,Ojaの塑性規則が工学的トリックの必要性を部分的に克服していることを示す。具体的には、オンライン学習、深層建築、準最適重量初期化のような、難しいが生物学的に現実的な学習シナリオの下では、Ojaのルールは純粋なバックプロパゲーションの性能を大幅に向上させることができる。以上の結果から, 単純なシナプス可塑性規則はDNNのトレーニングにおいて, 生物学的に信頼性の低いアプローチで克服される学習の課題を克服できることが示された。 There is a large literature on the similarities and differences between biological neural circuits and deep artificial neural networks (DNNs). However, modern training of DNNs relies on several engineering tricks such as data batching, normalization, adaptive optimizers, and precise weight initialization. Despite their critical role in training DNNs, these engineering tricks are often overlooked when drawing parallels between biological and artificial networks, potentially due to a lack of evidence for their direct biological implementation. In this study, we show that Oja's plasticity rule partly overcomes the need for some engineering tricks. Specifically, under difficult, but biologically realistic learning scenarios such as online learning, deep architectures, and sub-optimal weight initialization, Oja's rule can substantially improve the performance of pure backpropagation. Our results demonstrate that simple synaptic plasticity rules can overcome challenges to learning that are typically overcome using less biologically plausible approaches when training DNNs.	翻訳日:2024-11-08 07:29:14 公開日:2024-10-11
# EmoDynamiX:混合感情と談話ダイナミクスをモデル化した感情支援対話戦略予測 EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics ( http://arxiv.org/abs/2408.08782v2 ) ライセンス: Link先を確認	Chenwei Wan, Matthieu Labeau, Chloé Clavel,	(参考訳) 苦痛を経験する人々に快適さとアドバイスを提供するために、感情的にインテリジェントな会話システムを設計することは、魅力的な研究分野である。近年,大規模言語モデル (LLM) の進歩に伴い,明示的な戦略予測ステップを伴わない対話エージェントが普及している。しかし、暗黙的な戦略計画には透明性が欠如しており、最近の研究では、特定の社会的感情戦略に対するLLMs固有の嗜好バイアスが、高品質な感情支援の提供を妨げることが示されている。この課題に対処するために,言語生成から切り離された戦略予測を提案するとともに,ユーザの微粒な感情とヘテロジニアスグラフを用いたシステム戦略間の談話ダイナミクスをモデル化し,パフォーマンスと透明性を向上させる新たな対話戦略予測フレームワークであるEmoDynamiXを導入する。 2つのESCデータセットの実験結果から、EmoDynamiXは従来の最先端手法よりも優れたマージン(習熟度と嗜好バイアスの低下)を示した。当社のアプローチは,意思決定のバックトレースを可能にすることで,透明性の向上も実現しています。 Designing emotionally intelligent conversational systems to provide comfort and advice to people experiencing distress is a compelling area of research. Recently, with advancements in large language models (LLMs), end-to-end dialogue agents without explicit strategy prediction steps have become prevalent. However, implicit strategy planning lacks transparency, and recent studies show that LLMs' inherent preference bias towards certain socio-emotional strategies hinders the delivery of high-quality emotional support. To address this challenge, we propose decoupling strategy prediction from language generation, and introduce a novel dialogue strategy prediction framework, EmoDynamiX, which models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency. Experimental results on two ESC datasets show EmoDynamiX outperforms previous state-of-the-art methods with a significant margin (better proficiency and lower preference bias). Our approach also exhibits better transparency by allowing backtracing of decision making.	翻訳日:2024-11-08 07:18:07 公開日:2024-10-11
# SurgicaL-CD:連続拡散モデルを用いた画像翻訳による手術画像の生成 SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models ( http://arxiv.org/abs/2408.09822v3 ) ライセンス: Link先を確認	Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Stefanie Speidel,	(参考訳) コンピュータ補助手術システム(CAS)は、手術中の外科医を補助し、合併症を軽減し、患者のケアを強化するように設計されている。これらのシステムのために機械学習モデルをトレーニングするには、大量の注釈付きデータセットが必要である。従来の手法では, シミュレーションからリアルな手術画像を作成するために, 生成モデルを用いて画像翻訳を行う方法が検討されている。しかし、これらのアプローチは高品質で多様な外科画像を作成するのに苦労している。そこで本研究では, ペアデータのないサンプル画像のみを用いて, リアルな画像を生成するために, 整合拡散法である \emph{SurgicaL-CD} を提案する。 3つのデータセットに対する我々のアプローチを評価し、下流トレーニングデータセットとして品質と有用性の観点から生成された画像を評価する。以上の結果から,本手法はGANや拡散に基づく手法よりも優れていることが示された。私たちのコードはhttps://gitlab.com/nct_tso_public/gan2diffusionで利用可能です。 Computer-assisted surgery (CAS) systems are designed to assist surgeons during procedures, thereby reducing complications and enhancing patient care. Training machine learning models for these systems requires a large corpus of annotated datasets, which is challenging to obtain in the surgical domain due to patient privacy concerns and the significant labeling effort required from doctors. Previous methods have explored unpaired image translation using generative models to create realistic surgical images from simulations. However, these approaches have struggled to produce high-quality, diverse surgical images. In this work, we introduce \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data. We evaluate our approach on three datasets, assessing the generated images in terms of quality and utility as downstream training datasets. Our results demonstrate that our method outperforms GANs and diffusion-based approaches. Our code is available at https://gitlab.com/nct_tso_public/gan2diffusion.	翻訳日:2024-11-08 06:55:48 公開日:2024-10-11
# SSL-TTS: Zero-Shot Multi-Speaker TTSのためのセルフスーパーバイディングとkNN検索 SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS ( http://arxiv.org/abs/2408.10771v2 ) ライセンス: Link先を確認	Karl El Hajal, Ajinkya Kulkarni, Enno Hermann, Mathew Magimai. -Doss,	(参考訳) 最近のゼロショットマルチ話者テキスト音声(TTS)モデルは印象的な結果をもたらすが、通常は多数の話者からの広範な音声データセットと複雑な訓練パイプラインに依存している。一方,TLSの効果的な中間表現として,自己教師付き学習(SSL)音声の特徴が出現している。また、個々の話者識別を維持しつつ、線形に共有音声情報を持つ異なる話者のSSLが特徴であり、ストレートフォワードとロバストな音声クローンを可能にすることも観察された。本研究では、単一話者からの音声の書き起こしに基づいて訓練された軽量で効率的なゼロショットTTSフレームワークであるSSL-TTSを紹介する。 SSL-TTSはSSLの機能と検索手法を利用して、シンプルで堅牢なゼロショットマルチスピーカー合成を行う。客観的および主観的評価は、我々のアプローチが、より大規模なトレーニングデータセットを必要とする最先端のモデルに匹敵する性能を達成することを示す。低トレーニングデータ要件は、SSL-TTSが低リソースドメインや言語向けのマルチスピーカーTSシステムの開発に適していることを意味する。また、音声をブレンドすることで出力音声の微妙な制御を可能にする補間パラメータも導入する。デモサンプルはhttps://idiap.github.io/ssl-ttsで入手できる。 While recent zero-shot multispeaker text-to-speech (TTS) models achieve impressive results, they typically rely on extensive transcribed speech datasets from numerous speakers and intricate training pipelines. Meanwhile, self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS. It was also observed that SSL features from different speakers that are linearly close share phonetic information while maintaining individual speaker identity, which enables straight-forward and robust voice cloning. In this study, we introduce SSL-TTS, a lightweight and efficient zero-shot TTS framework trained on transcribed speech from a single speaker. SSL-TTS leverages SSL features and retrieval methods for simple and robust zero-shot multi-speaker synthesis. Objective and subjective evaluations show that our approach achieves performance comparable to state-of-the-art models that require significantly larger training datasets. The low training data requirements mean that SSL-TTS is well suited for the development of multi-speaker TTS systems for low-resource domains and languages. We also introduce an interpolation parameter which enables fine control over the output speech by blending voices. Demo samples are available at https://idiap.github.io/ssl-tts	翻訳日:2024-11-08 06:33:41 公開日:2024-10-11
# SPARK:大規模ビジョンランゲージモデルのためのマルチビジョンセンサ知覚と推論ベンチマーク SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models ( http://arxiv.org/abs/2408.12114v3 ) ライセンス: Link先を確認	Youngjoon Yu, Sangyun Chung, Byung-Kwan Lee, Yong Man Ro,	(参考訳) 大規模ビジョンランゲージモデル (LVLM) はテキスト・アライン・ビジョン・インプットによって大幅に進歩している。彼らは、テキストモダリティを視覚入力と整合させることにより、コンピュータビジョンタスクにおいて顕著な進歩を遂げた。熱、深度、医療用X線画像など、RGB以外のマルチビジョンセンサーを組み込む試みもある。しかし、現在のLVLMは、マルチビジョンセンサの物理的特性を考慮せずに、同じRGB領域にあるかのように、マルチビジョンセンサから撮影した画像を見ることができる。データセットとそれに対応するコンテキスト知識から、基本的なマルチビジョンセンサー情報を適切に伝達することができない。その結果、実際の物理的環境から得られる情報とテキストとの整合性は正しくは得られず、物理的環境を考慮した複雑なセンサ関連質問への回答が困難になる。本稿では,画像とマルチビジョンセンサ間の基本的なマルチビジョンセンサ情報ギャップを低減するために,SPARKと呼ばれるマルチビジョンセンサ知覚と推論ベンチマークを確立することを目的とする。 6,248個の視覚言語検定サンプルを作成し,多視点感覚知覚と多視点感覚推論を,様々な種類のセンサ関連質問を対象とする物理センサ知識習熟度に基づいて検討した。我々は,これらの試料を用いて,LVLMを10個評価した。その結果、ほとんどのモデルでは、様々な範囲で多視点感覚理論の欠陥が見られた。コードとデータはhttps://github.com/top-yun/SPARKで公開されている。 Large-scale Vision-Language Models (LVLMs) have significantly advanced with text-aligned vision inputs. They have made remarkable progress in computer vision tasks by aligning text modality with vision inputs. There are also endeavors to incorporate multi-vision sensors beyond RGB, including thermal, depth, and medical X-ray images. However, we observe that current LVLMs view images taken from multi-vision sensors as if they were in the same RGB domain without considering the physical characteristics of multi-vision sensors. They fail to convey the fundamental multi-vision sensor information from the dataset and the corresponding contextual knowledge properly. Consequently, alignment between the information from the actual physical environment and the text is not achieved correctly, making it difficult to answer complex sensor-related questions that consider the physical environment. In this paper, we aim to establish a multi-vision Sensor Perception And Reasoning benchmarK called SPARK that can reduce the fundamental multi-vision sensor information gap between images and multi-vision sensors. We generated 6,248 vision-language test samples to investigate multi-vision sensory perception and multi-vision sensory reasoning on physical sensor knowledge proficiency across different formats, covering different types of sensor-related questions. We utilized these samples to assess ten leading LVLMs. The results showed that most models displayed deficiencies in multi-vision sensory reasoning to varying extents. Codes and data are available at https://github.com/top-yun/SPARK	翻訳日:2024-11-08 05:49:00 公開日:2024-10-11
# 拡散モデルがいかにして分解と構成を学ぶか How Diffusion Models Learn to Factorize and Compose ( http://arxiv.org/abs/2408.13256v2 ) ライセンス: Link先を確認	Qiyao Liang, Ziming Liu, Mitchell Ostrow, Ila Fiete,	(参考訳) 拡散モデルは、トレーニングセットに一緒に現れない可能性のある要素を組み合わせて、フォトリアリスティックな画像を生成することができ、 \textit{compositionally generalize} の能力を示すことができる。それでも、構成性の正確なメカニズムと、それがいかにトレーニングによって獲得されるかは、いまだ解明されていない。認知神経科学的なアプローチに触発されて、拡散モデルが構成可能な特徴の意味的意味的・因果的表現を学習するかどうかを調べるために、高度に縮小された設定を考える。様々な2次元ガウスバンプ画像を生成するために訓練された条件付き拡散確率モデル(DDPM)について広範囲に制御実験を行った。その結果,データに基づく変動の連続的な特徴を符号化するために,モデルが分解されるが完全連続な多様体表現を学習することが判明した。このような表現では、モデルは優れた特徴合成性を示すが、ある特徴の見えない値を補間する能力は限定的である。さらに, 実験結果から, 拡散モデルが構成例が少なく, 構成性が得られることが示され, DDPMの訓練方法がより効率的であることが示唆された。最後に、拡散モデルの多様体形成と物理学のパーコレーション理論を結びつけ、因子化表現学習の突然の開始についての洞察を提供する。これにより, 拡散モデルがデータ中の構成構造をどのように捉えているか, より深く理解することができる。 Diffusion models are capable of generating photo-realistic images that combine elements which likely do not appear together in the training set, demonstrating the ability to \textit{compositionally generalize}. Nonetheless, the precise mechanism of compositionality and how it is acquired through training remains elusive. Inspired by cognitive neuroscientific approaches, we consider a highly reduced setting to examine whether and when diffusion models learn semantically meaningful and factorized representations of composable features. We performed extensive controlled experiments on conditional Denoising Diffusion Probabilistic Models (DDPMs) trained to generate various forms of 2D Gaussian bump images. We found that the models learn factorized but not fully continuous manifold representations for encoding continuous features of variation underlying the data. With such representations, models demonstrate superior feature compositionality but limited ability to interpolate over unseen values of a given feature. Our experimental results further demonstrate that diffusion models can attain compositionality with few compositional examples, suggesting a more efficient way to train DDPMs. Finally, we connect manifold formation in diffusion models to percolation theory in physics, offering insight into the sudden onset of factorized representation learning. Our thorough toy experiments thus contribute a deeper understanding of how diffusion models capture compositional structure in data.	翻訳日:2024-11-08 05:26:28 公開日:2024-10-11
# 大規模言語モデルにおける学習自由なアクティベーション空間 Training-Free Activation Sparsity in Large Language Models ( http://arxiv.org/abs/2408.14690v2 ) ライセンス: Link先を確認	James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun,	(参考訳) アクティベーションスパシティは、前方通過時の行列乗算に必要な計算量とメモリ移動量を削減し、大規模言語モデル(LLM)における実用的な推論スピードアップを可能にする。しかし、既存の手法は広く普及を阻害する限界に直面している。いくつかのアプローチは、ReLUベースのスパーシリティを持つ古いモデル向けに調整されているが、数千億のトークンに対して、広範な事前トレーニングを必要とするものもある。本稿では,TEALについて述べる。TEALは,モデル全体にわたって,大域的なアクティベーション間隔を隠蔽状態に適用する,単純なトレーニング不要な手法である。 TEALは、Llama-2、Llama-3、Mistralファミリ間の性能劣化を最小限に抑えながら、40-50%のモデル幅を実現し、サイズは7Bから70Bまで様々である。既存のスパースカーネルを改善し、最大1.53$\times$と1.8$\times$のウォールクロック復号速度を40%および50%のモデル幅で示す。 TEALは重量量子化と互換性があり、さらなる効率向上を可能にする。 Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older models with ReLU-based sparsity, while others require extensive continued pre-training on up to hundreds of billions of tokens. This paper describes TEAL, a simple training-free method that applies magnitude-based activation sparsity to hidden states throughout the entire model. TEAL achieves 40-50% model-wide sparsity with minimal performance degradation across Llama-2, Llama-3, and Mistral families, with sizes varying from 7B to 70B. We improve existing sparse kernels and demonstrate wall-clock decoding speed-ups of up to 1.53$\times$ and 1.8$\times$ at 40% and 50% model-wide sparsity. TEAL is compatible with weight quantization, enabling further efficiency gains.	翻訳日:2024-11-08 04:52:58 公開日:2024-10-11
# 生成検証:次世代予測としてのリワードモデリング Generative Verifiers: Reward Modeling as Next-Token Prediction ( http://arxiv.org/abs/2408.15240v2 ) ライセンス: Link先を確認	Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal,	(参考訳) 検証や報酬モデルはしばしば、大きな言語モデル(LLM)の推論性能を高めるために使われる。一般的なアプローチはBest-of-N法であり、LLMによって生成されるN候補解は検証器によってランク付けされ、最もよい解が選択される。 LLMベースの検証は、通常、解を採点するために識別分類器として訓練されるが、事前訓練されたLLMのテキスト生成能力は利用しない。この制限を克服するために、我々は、ユビキタスな次世代予測目標を用いて、検証とソリューション生成を共同で行うトレーニング検証を提案する。このような生成検証器(genRM)は、標準的な検証器と比較して、命令チューニングとシームレスに統合し、チェーン・オブ・シント推論を可能とし、多数決によるテスト時間計算を有効活用することで、LLMのいくつかの利点の恩恵を受けることができる。 GenRM は差別的, DPO 検証, LLM-as-a-Judge よりも優れており, アルゴリズム的および数学的推論タスクにおいて, Best-of-N を用いて解いた問題の数に対して 16-40% の改善が得られた。さらに, 算数問題に対する微妙な誤りを抽出するためには, 総合的検証理論を用いたGenRMの学習が十分であることがわかった。最後に、生成検証器がモデルサイズと推論時間計算で好適にスケールできることを実証する。 Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. Compared to standard verifiers, such generative verifiers (GenRM) can benefit from several advantages of LLMs: they integrate seamlessly with instruction tuning, enable chain-of-thought reasoning, and can utilize additional test-time compute via majority voting for better verification. We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge, resulting in a 16-40% improvement in the number of problems solved with Best-of-N on algorithmic and math reasoning tasks. Furthermore, we find that training GenRM with synthetic verification rationales is sufficient to pick out subtle errors on math problems. Finally, we demonstrate that generative verifiers scale favorably with model size and inference-time compute.	翻訳日:2024-11-08 04:41:58 公開日:2024-10-11
# ASR-LLMセットアップにおける日本語音声認識の高速化と生成誤差補正 Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction ( http://arxiv.org/abs/2408.16180v2 ) ライセンス: Link先を確認	Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara,	(参考訳) 大きな言語モデル(LLM)の強力な表現力により、自動音声認識(ASR)のための生成誤り補正(GER)は、ASRの誤りに対処するための意味的および音声的改善を提供することを目的としている。本研究では,LLMをベースとしたGERが日本語処理能力の強化と拡張を実現し,0.9-2.6kテキスト発声による日本語ASRのGERベンチマークを初めて提示する。また、入力側で複数のシステム仮説を統合し、出力側で複数のLSMを補正し、それらをマージすることで、新しいマルチパス拡張生成誤差補正(MPA GER)を導入する。我々の知る限りでは、ASRシステム(例えば、N-best仮説)が生成した出力書き起こしにおける第二パス言語モデリングを含む日本語 GER 用 LLM の使用に関する最初の調査である。実験では,SPREDS-U1-jaデータとCSJデータの両方において,ASR品質と一般化の手法による性能改善を実証した。 With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances. We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them. To the best of our knowledge, this is the first investigation of the use of LLMs for Japanese GER, which involves second-pass language modeling on the output transcriptions generated by the ASR system (e.g., N-best hypotheses). Our experiments demonstrated performance improvement in the proposed methods of ASR quality and generalization both in SPREDS-U1-ja and CSJ data.	翻訳日:2024-11-08 04:19:50 公開日:2024-10-11
# FlowRetrieval:Few-Shot Imitation LearningのためのFlow-Guided Data Retrieval FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning ( http://arxiv.org/abs/2408.16944v2 ) ライセンス: Link先を確認	Li-Heng Lin, Yuchen Cui, Amber Xie, Tianyu Hua, Dorsa Sadigh,	(参考訳) 擬似学習は、与えられた下流タスクに対するポリシーを効率的に適応するために、少数のタスク固有のデモンストレーションにのみ依存する。検索ベースのメソッドには,関連する過去の経験を検索して,ポリシ学習時に対象データを拡張する,という約束がある。しかし、既存のデータ検索手法は2つの極端に該当する。それらは、前提に適さない事前データにおいて視覚的に類似したシーンを持つ正確な行動の存在に依存するか、あるいはタスクの高レベルの言語記述のセマンティックな類似性に基づいて検索する。本研究では,多量のタスクデータにおける動きの類似性を利用して,目的タスクの少数の模倣学習を改善する方法について検討する。私たちのキーとなる洞察は、モーション類似データには、アクションとオブジェクトの相互作用の影響についての豊富な情報があり、それは、数発の適応で活用できるということだ。本稿では,従来のデータから類似した動作を抽出すると同時に,そのようなデータから最大限の利益を得ることのできるポリシの学習を指導するために,光フロー表現を利用したFlowRetrievalを提案する。その結果、FlowRetrievalは、シミュレーションや実世界のドメイン間で先行手法よりも優れており、最高の検索ベースの先行手法よりも平均27%高い成功率を実現していることがわかった。実のFranka EmikaロボットによるPen-in-Cupタスクにおいて、FlowRetrievalは、すべての事前および対象データから学習するベースライン模倣学習技術の性能を3.7倍に向上させる。 Webサイト: https://flow-retrieval.github.io Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behaviors with visually similar scenes in the prior data, which is impractical to assume; or they retrieve based on semantic similarity of high-level language descriptions of the task, which might not be that informative about the shared low-level behaviors or motions across tasks that is often a more important factor for retrieving relevant data for policy learning. In this work, we investigate how we can leverage motion similarity in the vast amount of cross-task data to improve few-shot imitation learning of the target task. Our key insight is that motion-similar data carries rich information about the effects of actions and object interactions that can be leveraged during few-shot adaptation. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains, achieving on average 27% higher success rate than the best retrieval-based prior method. In the Pen-in-Cup task with a real Franka Emika robot, FlowRetrieval achieves 3.7x the performance of the baseline imitation learning technique that learns from all prior and target data. Website: https://flow-retrieval.github.io	翻訳日:2024-11-08 04:08:49 公開日:2024-10-11
# 適応型大規模言語モデルにおける安全性層 - LLMセキュリティの鍵 Safety Layers in Aligned Large Language Models: The Key to LLM Security ( http://arxiv.org/abs/2408.17003v2 ) ライセンス: Link先を確認	Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li,	(参考訳) LLMは安全で、悪意のある質問を認識し、拒否することができる。しかし、そのようなセキュリティ維持における内部パラメータの役割はまだよく理解されておらず、さらに、悪意のないバックドアや通常のデータで微調整された場合、これらのモデルはセキュリティの劣化に対して脆弱である可能性がある。これらの課題に対処するため、我々の研究は、パラメータレベルでLLMをアライメントする際のセキュリティのメカニズムを明らかにし、モデルの中心にある小さな連続した層を特定します。まず、モデルの内部層内の入力ベクトルの変動を分析することにより、これらの安全層の存在を確かめる。さらに、オーバーリジェクション現象とパラメータスケーリング分析を利用して、安全層を正確に特定する。これらの知見に基づいて, 安全部分パラメータ細調整法(SPPFT)を提案する。提案手法は, 完全微調整と比較して, 性能を保ち, 計算資源の削減を図りながら, LLMの安全性を著しく維持できることを示す。 Aligned LLMs are secure, capable of recognizing and refusing to answer malicious questions. However, the role of internal parameters in maintaining such security is not well understood yet, further these models can be vulnerable to security degradation when fine-tuned with non-malicious backdoor or normal data. To address these challenges, our work uncovers the mechanism behind security in aligned LLMs at the parameter level, identifying a small set of contiguous layers in the middle of the model that are crucial for distinguishing malicious queries from normal ones, referred to as "safety layers". We first confirm the existence of these safety layers by analyzing variations in input vectors within the model's internal layers. Additionally, we leverage the over-rejection phenomenon and parameters scaling analysis to precisely locate the safety layers. Building on these findings, we propose a novel fine-tuning approach, Safely Partial-Parameter Fine-Tuning (SPPFT), that fixes the gradient of the safety layers during fine-tuning to address the security degradation. Our experiments demonstrate that the proposed approach can significantly preserve LLM security while maintaining performance and reducing computational resources compared to full fine-tuning.	翻訳日:2024-11-08 04:08:49 公開日:2024-10-11
# 適応型大規模言語モデルにおける安全性層 - LLMセキュリティの鍵 Safety Layers in Aligned Large Language Models: The Key to LLM Security ( http://arxiv.org/abs/2408.17003v3 ) ライセンス: Link先を確認	Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li,	(参考訳) LLMは安全で、悪意のある質問を認識し、拒否することができる。しかし、そのようなセキュリティ維持における内部パラメータの役割はまだよく理解されておらず、さらに、悪意のないバックドアや通常のデータで微調整された場合、これらのモデルはセキュリティの劣化に対して脆弱である可能性がある。これらの課題に対処するため、我々の研究は、パラメータレベルでLLMをアライメントする際のセキュリティのメカニズムを明らかにし、モデルの中心にある小さな連続した層を特定します。まず、モデルの内部層内の入力ベクトルの変動を分析することにより、これらの安全層の存在を確かめる。さらに、オーバーリジェクション現象とパラメータスケーリング分析を利用して、安全層を正確に特定する。これらの知見に基づいて, 安全部分パラメータ細調整法(SPPFT)を提案する。提案手法は, 完全微調整と比較して, 性能を保ち, 計算資源の削減を図りながら, LLMの安全性を著しく維持できることを示す。 Aligned LLMs are secure, capable of recognizing and refusing to answer malicious questions. However, the role of internal parameters in maintaining such security is not well understood yet, further these models can be vulnerable to security degradation when fine-tuned with non-malicious backdoor or normal data. To address these challenges, our work uncovers the mechanism behind security in aligned LLMs at the parameter level, identifying a small set of contiguous layers in the middle of the model that are crucial for distinguishing malicious queries from normal ones, referred to as "safety layers". We first confirm the existence of these safety layers by analyzing variations in input vectors within the model's internal layers. Additionally, we leverage the over-rejection phenomenon and parameters scaling analysis to precisely locate the safety layers. Building on these findings, we propose a novel fine-tuning approach, Safely Partial-Parameter Fine-Tuning (SPPFT), that fixes the gradient of the safety layers during fine-tuning to address the security degradation. Our experiments demonstrate that the proposed approach can significantly preserve LLM security while maintaining performance and reducing computational resources compared to full fine-tuning.	翻訳日:2024-11-08 04:08:49 公開日:2024-10-11
# LLMの幻覚グラフも--構造的視点から LLMs hallucinate graphs too: a structural perspective ( http://arxiv.org/abs/2409.00159v2 ) ライセンス: Link先を確認	Erwan Le Merrer, Gilles Tredan,	(参考訳) LLMが幻覚、すなわち誤った情報を事実として返すことは知られている。本稿では,これらの幻覚を構造化された形で研究する可能性について紹介する。この文脈における幻覚は、文献からよく知られたグラフ(例えば、Karate club, Les Mis\'erables, graph atlas)に刺激されたときの誤った出力である。これらの幻覚グラフは、文の事実的正確性(あるいはそうでない)よりもはるかにリッチであるという利点がある。我々の最初の貢献は、主要な近代LSMからのトポロジカル幻覚の多様性を観察することである。 2つ目の貢献は、グラフアトラス集合内のいくつかのグラフからの平均的なグラフ編集距離であるグラフアトラス距離という、このような幻覚の振幅に対する計量の提案である。我々は、この指標を、そのランクを得るために1万倍のプロンプトを利用する幻覚のランクである幻覚のリーダーボードと比較する。 It is known that LLMs do hallucinate, that is, they return incorrect information as facts. In this paper, we introduce the possibility to study these hallucinations under a structured form: graphs. Hallucinations in this context are incorrect outputs when prompted for well known graphs from the literature (e.g. Karate club, Les Mis\'erables, graph atlas). These hallucinated graphs have the advantage of being much richer than the factual accuracy -- or not -- of a statement; this paper thus argues that such rich hallucinations can be used to characterize the outputs of LLMs. Our first contribution observes the diversity of topological hallucinations from major modern LLMs. Our second contribution is the proposal of a metric for the amplitude of such hallucinations: the Graph Atlas Distance, that is the average graph edit distance from several graphs in the graph atlas set. We compare this metric to the Hallucination Leaderboard, a hallucination rank that leverages 10,000 times more prompts to obtain its ranking.	翻訳日:2024-11-08 03:57:28 公開日:2024-10-11
# FMRFT:Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking ( http://arxiv.org/abs/2409.01148v2 ) ライセンス: Link先を確認	Mingyuan Yao, Yukang Huo, Qingbin Tian, Jiayin Zhao, Xiao Liu, Ruifeng Wang, Lin Xue, Haihua Wang,	(参考訳) 深層学習技術を用いた魚の追跡によって, 病気や飢餓による異常魚の行動の早期検出が可能であり, 工業用養殖にとって重要な意味を持つ。しかし、水中での反射や、高い類似性、刺激による急激な水泳、相互閉塞などのいくつかの理由により、魚のマルチターゲット追跡に困難が生じる。これらの課題に対処するため、本稿では、複雑なマルチシナリオ・スタージョン追跡データセットを確立し、リアルタイムなエンドツーエンド魚追跡ソリューションであるFMRFTモデルを提案する。このモデルには、マルチフレームの時間記憶と特徴抽出を容易にするMIMアーキテクチャが組み込まれており、複数の魚をフレーム間で追跡することの難しさに対処している。さらに、QTSI(Query Time Sequence Intersection)モジュールを備えたFMRFTモデルは、隠蔽されたオブジェクトを効果的に管理し、RT-DETRの優れた機能相互作用と事前フレーム処理機能を用いて冗長なトラッキングフレームを削減する。この組み合わせは魚追跡の精度と安定性を著しく向上させる。データセット上でトレーニングおよびテストが行われ、IDF1スコアは90.3%、MOTA精度は94.3%である。実験結果から,FMRFTモデルは魚の群集における相似性と相互排除の課題に効果的に対処し,工場の農業環境における正確な追跡を可能にすることが示唆された。 Early detection of abnormal fish behavior caused by disease or hunger can be achieved through fish tracking using deep learning techniques, which holds significant value for industrial aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity, rapid swimming caused by stimuli and mutual occlusion bring challenges to multi-target tracking of fish. To address these challenges, this paper establishes a complex multi-scenario sturgeon tracking dataset and introduces the FMRFT model, a real-time end-to-end fish tracking solution. The model incorporates the low video memory consumption Mamba In Mamba (MIM) architecture, which facilitates multi-frame temporal memory and feature extraction, thereby addressing the challenges to track multiple fish across frames. Additionally, the FMRFT model with the Query Time Sequence Intersection (QTSI) module effectively manages occluded objects and reduces redundant tracking frames using the superior feature interaction and prior frame processing capabilities of RT-DETR. This combination significantly enhances the accuracy and stability of fish tracking. Trained and tested on the dataset, the model achieves an IDF1 score of 90.3% and a MOTA accuracy of 94.3%. Experimental results show that the proposed FMRFT model effectively addresses the challenges of high similarity and mutual occlusion in fish populations, enabling accurate tracking in factory farming environments.	翻訳日:2024-11-08 03:35:26 公開日:2024-10-11
# 確率量子化を用いた高次元データのロバストクラスタリング Robust Clustering on High-Dimensional Data with Stochastic Quantization ( http://arxiv.org/abs/2409.02066v3 ) ライセンス: Link先を確認	Anton Kozyriev, Vladimir Norkin,	(参考訳) 本稿では,従来のベクトル量子化アルゴリズム,特にK-Meansとその変種K-Means++の限界に対処し,SQアルゴリズムを高次元教師なし・半教師付き学習タスクのスケーラブルな代替手段として検討する。従来のクラスタリングアルゴリズムは、計算中の非効率なメモリ利用に悩まされることが多く、すべてのデータサンプルをメモリにロードする必要があるため、大規模なデータセットでは実用的ではない。 Mini-Batch K-Meansのような変種は、メモリ使用量の削減によってこの問題を部分的に緩和するが、クラスタリング問題の非凸性に起因する堅牢な理論的収束保証は欠如している。対照的に、確率量子化アルゴリズムは強力な理論的収束保証を提供し、クラスタリングタスクの堅牢な代替となる。本研究では,ラベル付きデータとラベル付きデータの様々な比率でモデル精度を比較し,部分ラベル付きデータを用いた画像分類問題に対して,アルゴリズムの計算効率と迅速な収束性を実証する。高次元化の課題に対処するため,我々は,Stochastic Quantizationアルゴリズムと従来の量子化アルゴリズムの両アルゴリズムの効率を比較する基盤となる,潜時空間の低次元表現に画像をエンコードするトリプレットネットワークを用いた。さらに,適応学習率による修正を導入することにより,アルゴリズムの収束速度を向上させる。 This paper addresses the limitations of conventional vector quantization algorithms, particularly K-Means and its variant K-Means++, and investigates the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning tasks. Traditional clustering algorithms often suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as Mini-Batch K-Means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, the Stochastic Quantization algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data, comparing model accuracy across various ratios of labeled to unlabeled data. To address the challenge of high dimensionality, we employ a Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both the Stochastic Quantization algorithm and traditional quantization algorithms. Furthermore, we enhance the algorithm's convergence speed by introducing modifications with an adaptive learning rate.	翻訳日:2024-11-07 23:56:04 公開日:2024-10-11
# LibMOON: PyTorchのグラディエントベースの多目的最適化ライブラリ LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch ( http://arxiv.org/abs/2409.02969v3 ) ライセンス: Link先を確認	Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Yifan Chen, Han Zhao, Qingfu Zhang,	(参考訳) マルチ目的最適化問題(MOP)は、機械学習、マルチタスク学習、公正性や堅牢性制約下での学習などにおいて広く用いられている。複数の目的関数をスカラー目的関数に還元する代わりに、MOPは、数千/数百万のパラメータを持つモデルよりも複数の目的関数を同時に最適化することを含む、いわゆるパレート最適性(Pareto optimality)あるいはパレート集合学習(Pareto set learning)を最適化することを目指している。 MOPの既存のベンチマークライブラリは、主に進化的アルゴリズムに焦点を当てており、そのほとんどはゼロ階数/メタヒューリスティックな手法であり、目的からの高階情報を効果的に利用せず、数千/数百万のパラメータを持つ大規模モデルにスケールできない。本稿では,このギャップを考慮し,最先端の勾配法をサポートする初の多目的最適化ライブラリであるLibMOONを紹介する。 Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with thousands / millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order / meta-heuristic methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with thousands / millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.	翻訳日:2024-11-07 23:34:03 公開日:2024-10-11
# FairQuant: ディープニューラルネットワークの検証と定量化 FairQuant: Certifying and Quantifying Fairness of Deep Neural Networks ( http://arxiv.org/abs/2409.03220v2 ) ライセンス: Link先を確認	Brian Hyeongseok Kim, Jingbo Wang, Chao Wang,	(参考訳) 本稿では,ディープニューラルネットワーク(DNN)の個人的公正性を正式に証明し,定量化する手法を提案する。個人的公正性は、法的に保護された属性(例えば、性別や人種)を除いて同一の2つの個人が同じ処置を受けることを保証している。このような保証を提供する技術は存在するが、DNNのサイズや入力次元が大きくなるにつれてスケーラビリティや精度の欠如に悩まされる傾向がある。本手法は, DNNのシンボル間隔に基づく解析に抽象化を適用し, そして, フェアネス特性に導かれる反復的改良を施すことにより, この制限を克服する。さらに,本手法は,DNNが公平かどうかを判断するだけでなく,分類結果が妥当である個人の割合を計算することによって,従来の定性認証から定量的認証まで,記号間隔に基づく分析を引き上げている。提案手法を実装し,4つの人気フェアネス研究データセットに基づいてトレーニングしたディープニューラルネットワーク上で評価を行った。実験結果から,本手法は最先端技術よりも精度が高いだけでなく,桁違いに高速であることがわかった。 We propose a method for formally certifying and quantifying individual fairness of deep neural networks (DNN). Individual fairness guarantees that any two individuals who are identical except for a legally protected attribute (e.g., gender or race) receive the same treatment. While there are existing techniques that provide such a guarantee, they tend to suffer from lack of scalability or accuracy as the size and input dimension of the DNN increase. Our method overcomes this limitation by applying abstraction to a symbolic interval based analysis of the DNN followed by iterative refinement guided by the fairness property. Furthermore, our method lifts the symbolic interval based analysis from conventional qualitative certification to quantitative certification, by computing the percentage of individuals whose classification outputs are provably fair, instead of merely deciding if the DNN is fair. We have implemented our method and evaluated it on deep neural networks trained on four popular fairness research datasets. The experimental results show that our method is not only more accurate than state-of-the-art techniques but also several orders-of-magnitude faster.	翻訳日:2024-11-07 23:23:02 公開日:2024-10-11
# エントロピー駆動型エンタングルメント鍛造 Entropy-driven entanglement forging ( http://arxiv.org/abs/2409.04510v2 ) ライセンス: Link先を確認	Axel Pérez-Obiol, Sergi Masot-Llima, Antonio M. Romero, Javier Menéndez, Arnau Rios, Artur García-Sáez, Bruno Juliá-Díaz,	(参考訳) 変動量子アルゴリズムを用いた物理系シミュレーションは、よく研究されているアプローチであるが、量子ビット数と回路深さの要求により、現在のデバイスで実装することは困難である。システムにおける知識,すなわちサブシステムのエントロピー,エンタングルメント構造,あるいは特定の対称性の制限が,エンタングルメント鍛造によるこれらのアルゴリズムのコスト削減にどの程度有効かを示す。そのため、核シェルモデルで原子核${}^{28}$Neおよび${}^{60}$Tiと同様に、パラメトリズドホッピング項を持つフェルミ・ハッバード一次元鎖をシミュレートする。適応型変分量子固有解法を用いて、量子回路に必要な量子ビットの最大数(最大4分の1)と2量子ビットゲートの量(桁数以上)の両方において、大幅な減少が認められる。提案手法は, エントロピー駆動型エンタングルメント鍛造法を用いて, ノイズの多い中間規模量子デバイスの限界に量子シミュレーションを適応させることが可能である。 Simulating physical systems with variational quantum algorithms is a well-studied approach, but it is challenging to implement in current devices due to demands in qubit number and circuit depth. We show how limited knowledge of the system, namely the entropy of its subsystems, its entanglement structure or certain symmetries, can be used to reduce the cost of these algorithms with entanglement forging. To do so, we simulate a Fermi-Hubbard one-dimensional chain with a parametrized hopping term, as well as atomic nuclei ${}^{28}$Ne and ${}^{60}$Ti with the nuclear shell model. Using an adaptive variational quantum eigensolver we find significant reductions in both the maximum number of qubits (up to one fourth) and the amount of two-qubit gates (over an order of magnitude) required in the quantum circuits. Our findings indicate that our method, entropy-driven entanglement forging, can be used to adjust quantum simulations to the limitations of noisy intermediate-scale quantum devices.	翻訳日:2024-11-07 23:00:54 公開日:2024-10-11
# 言語モデルにおける真理と政治的バイアスの関係について On the Relationship between Truth and Political Bias in Language Models ( http://arxiv.org/abs/2409.05283v2 ) ライセンス: Link先を確認	Suyash Fulay, William Brannon, Shrestha Mohanty, Cassandra Overney, Elinor Poole-Dayan, Deb Roy, Jad Kabbara,	(参考訳) 言語モデルアライメントの研究は、モデルが有用で害のないだけでなく、真実で偏見のないものであることを保証するためにしばしば試みる。しかし、これらの目的を同時に最適化することは、ある側面の改善が他の側面にどのように影響するかを曖昧にする可能性がある。本研究では、言語モデルアライメントと政治科学の両方に不可欠な2つの概念(真理性と政治的偏見)の関係を分析することに焦点を当てる。我々は、様々な人気真実性データセットの報酬モデルを訓練し、その後、彼らの政治的偏見を評価する。以上の結果から,これらのデータセットの真正性に対する報酬モデルの最適化は,政治的偏見を左右する傾向にあることが明らかとなった。また、既存のオープンソース報酬モデル(つまり、標準的な人間の嗜好データセットでトレーニングされたモデル)も、同様のバイアスを示しており、より大きなモデルではバイアスが大きくなっていることもわかりました。これらの結果は、真理を表わすために使用されるデータセット、真理と政治的に偏見のないモデルに整合する潜在的な制限、真理と政治の関係について言語モデルが捉えるものについて重要な疑問を提起する。 Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political bias. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e., those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions about the datasets used to represent truthfulness, potential limitations of aligning models to be both truthful and politically unbiased, and what language models capture about the relationship between truth and politics.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-11
# Mpox Narrative on Instagram: 感情、ヘイトスピーチ、不安分析のためのMpox上のInstagram投稿のラベル付き多言語データセット Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis ( http://arxiv.org/abs/2409.05292v4 ) ライセンス: Link先を確認	Nirmalya Thakur,	(参考訳) WHOは、世界保健機関(WHO)の国際的懸念の公衆衛生非常事態を宣言している。ソーシャルメディアのマイニングに関する以前の研究は、mpoxのアウトブレイクに関するInstagram投稿のデータセットの開発に重点を置いていなかった。本研究は, この研究ギャップに対処し, この分野に2つの科学的貢献を行うことを目的としている。まず、2022年7月23日から2024年9月5日までに発行されたmpoxに関する60,127のInstagram投稿の多言語データセットを示す。データセットはhttps://dx.doi.org/10.21227/7fvc-y093で公開されている。これらの投稿のそれぞれについて、データセット内の別々の属性として、ポストID、ポスト説明、出版日時、言語、翻訳版(Google Translate APIを使用して英訳が行われた)が提示される。このデータセットを開発した後、感情分析、ヘイトスピーチ検出、不安やストレス検出を行った。このプロセスには各ポストを分類することが含まれる。 (i)恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立という感情階級の1つ (二)憎むこと、憎まないこと、 (3)不安・ストレス、または不安・ストレスは検出されなかった。これらの結果はデータセット内の別の属性として示されます。次に、感情分析、ヘイトスピーチ分析、不安やストレス分析の結果について述べる。恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立性の差は27.95%、2.57%、8.69%、5.94%、2.69%、1.53%、50.64%であった。ヘイトスピーチの検出に関しては、95.75%の投稿にはヘイトが含まれておらず、残りの4.25%にはヘイトが含まれていた。最後に、投稿の72.05%は不安/ストレスを示しておらず、残りの27.95%はある種の不安/ストレスを表している。 The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. No prior work related to social media mining has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper aims to address this research gap and makes two scientific contributions to this field. First, it presents a multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. The dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram posts about mpox in 52 languages. For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were performed. This process included classifying each post into (i) one of the sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no anxiety/stress detected. These results are presented as separate attributes in the dataset. Second, this paper presents the results of performing sentiment analysis, hate speech analysis, and anxiety or stress analysis. The variation of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and 50.64%, respectively. In terms of hate speech detection, 95.75% of the posts did not contain hate and the remaining 4.25% of the posts contained hate. Finally, 72.05% of the posts did not indicate any anxiety/stress, and the remaining 27.95% of the posts represented some form of anxiety/stress.	翻訳日:2024-11-07 22:38:45 公開日:2024-10-11
# 自動走行におけるトレーサブル動作仕様へのオントロジー的アプローチ An Ontology-based Approach Towards Traceable Behavior Specifications in Automated Driving ( http://arxiv.org/abs/2409.06607v2 ) ライセンス: Link先を確認	Nayel Fabian Salem, Marcus Nolte, Veronica Haber, Till Menzel, Hans Steege, Robert Graubohm, Markus Maurer,	(参考訳) 自動走行システムを備えた公共交通機関の車両には、様々な期待が寄せられている: その他の面において、その行動は安全であり、道路の規則に適合し、利用者に移動性を提供するべきである。開発者は、例えば、システム設計時の要件の観点から、この振る舞いを指定する責任を負います。この記事で論じるとおり、この仕様は常に前提とトレードオフの必要性を伴います。その結果、そのような振舞い仕様の不足が生じ、安全でないシステムの振舞いに繋がる可能性がある。仕様の不備の特定を支援するには、要件とそれぞれの前提を明確にする必要がある。本稿では,自動走行システム搭載車両の動作を特定するためのオントロジーに基づく手法として,セマンティックノーム行動解析を提案する。オントロジーを用いて、対象とする運用環境の特定動作を正式に表現し、特定動作と対処するステークホルダーのニーズの間のトレーサビリティを確立する。さらに,ドイツの法律文脈におけるセマンティックノルム行動分析の適用例を2つのシナリオで説明し,その結果を評価した。評価の結果,行動仕様における仮定の明示的な文書化は,仕様の不備の特定と治療の両立を支えていることが明らかとなった。そこで本論文は,自動走行におけるオントロジーに基づく行動仕様を容易にするための要件,用語,およびそれに対応する方法論を提供する。 Vehicles in public traffic that are equipped with Automated Driving Systems are subject to a number of expectations: Among other aspects, their behavior should be safe, conforming to the rules of the road and provide mobility to their users. This poses challenges for the developers of such systems: Developers are responsible for specifying this behavior, for example, in terms of requirements at system design time. As we will discuss in the article, this specification always involves the need for assumptions and trade-offs. As a result, insufficiencies in such a behavior specification can occur that can potentially lead to unsafe system behavior. In order to support the identification of specification insufficiencies, requirements and respective assumptions need to be made explicit. In this article, we propose the Semantic Norm Behavior Analysis as an ontology-based approach to specify the behavior for an Automated Driving System equipped vehicle. We use ontologies to formally represent specified behavior for a targeted operational environment, and to establish traceability between specified behavior and the addressed stakeholder needs. Furthermore, we illustrate the application of the Semantic Norm Behavior Analysis in a German legal context with two example scenarios and evaluate our results. Our evaluation shows that the explicit documentation of assumptions in the behavior specification supports both the identification of specification insufficiencies and their treatment. Therefore, this article provides requirements, terminology and an according methodology to facilitate ontology-based behavior specifications in automated driving.	翻訳日:2024-11-07 22:05:05 公開日:2024-10-11
# 複素数による自動微分に関するチュートリアル A tutorial on automatic differentiation with complex numbers ( http://arxiv.org/abs/2409.06752v2 ) ライセンス: Link先を確認	Nicholas Krämer,	(参考訳) 自動微分は至る所にあるが、複雑な算術においてどのように機能するかに関する最小限の文書は、「$\mathbb{C}^d$」$\cong$「$\mathbb{R}^{2d}$の微分」や、Wirtinger calculusへの浅い参照」以上のものしか存在しない。残念なことに、値 $\mathbb{C}^d \cong \mathbb{R}^{2d}$ は、高額な線型代数関数や微分方程式シミュレータの微分を避けるために、カスタム勾配規則を導出する必要があるとすぐに不足する。このような文書の欠如に対処するため、この記事では、複素数による前方および逆モードの自動微分を調査し、正則性やコーシー-リーマン方程式を明示的に避けながら、ウィッティンガー微分、修正鎖則、異なる勾配規則などのトピックをカバーした。正確には、複素解析や微分幾何学に頼らずに、ほとんど完全に線型代数を持つヤコビ-ベクトル積とベクトル-ヤコビ積の複素バージョンを導出し、説明し、実装する。このチュートリアルは、ユーザや開発者にとっても、カスタムのグラデーション伝搬ルールを実装する際に、複雑な値を真剣に取るためのアクションを呼びます。 Automatic differentiation is everywhere, but there exists only minimal documentation of how it works in complex arithmetic beyond stating "derivatives in $\mathbb{C}^d$" $\cong$ "derivatives in $\mathbb{R}^{2d}$" and, at best, shallow references to Wirtinger calculus. Unfortunately, the equivalence $\mathbb{C}^d \cong \mathbb{R}^{2d}$ becomes insufficient as soon as we need to derive custom gradient rules, e.g., to avoid differentiating "through" expensive linear algebra functions or differential equation simulators. To combat such a lack of documentation, this article surveys forward- and reverse-mode automatic differentiation with complex numbers, covering topics such as Wirtinger derivatives, a modified chain rule, and different gradient conventions while explicitly avoiding holomorphicity and the Cauchy--Riemann equations (which would be far too restrictive). To be precise, we will derive, explain, and implement a complex version of Jacobian-vector and vector-Jacobian products almost entirely with linear algebra without relying on complex analysis or differential geometry. This tutorial is a call to action, for users and developers alike, to take complex values seriously when implementing custom gradient propagation rules -- the manuscript explains how.	翻訳日:2024-11-07 22:05:05 公開日:2024-10-11
# 能動触覚物体認識・詩推定・形状伝達学習のためのベイズ的枠組み A Bayesian Framework for Active Tactile Object Recognition, Pose Estimation and Shape Transfer Learning ( http://arxiv.org/abs/2409.06912v3 ) ライセンス: Link先を確認	Haodong Zheng, Andrei Jalba, Raymond H. Cuijpers, Wijnand IJsselsteijn, Sanne Schoenmakers,	(参考訳) 人間はアクティブタッチで世界を探索し、理解することができるので、ロボットには同様の能力が望まれる。本稿では, アクティブな触覚物体認識, ポーズ推定, 形状伝達学習の課題に対処し, カスタマイズされた粒子フィルタ (PF) とガウス過程暗黙曲面 (GPIS) を統一されたベイズフレームワークで組み合わせる。新しい触覚入力で、カスタマイズされたPFは、オブジェクトの新規性を追跡しながら、オブジェクトクラスの共同分布とオブジェクトのポーズを更新する。新たなオブジェクトが特定されると、GPISを使ってその形状を再構築する。 GPISの先行をPFから最大ポスペリオリ(MAP)推定することで、既知の形状に関する知識を移譲し、新しい形状を学ぶことができる。グローバルな形状推定に基づく探索手法を提案し, 有効データ取得を誘導し, 十分な情報による探索を終了する。シミュレーション実験を通じて,提案フレームワークは,対象のクラスを推定し,既知のオブジェクトのポーズと新しい形状の学習において,その有効性と効率を実証した。さらに、予め学習した形状を確実に認識することができる。 As humans can explore and understand the world through active touch, similar capability is desired for robots. In this paper, we address the problem of active tactile object recognition, pose estimation and shape transfer learning, where a customized particle filter (PF) and Gaussian process implicit surface (GPIS) is combined in a unified Bayesian framework. Upon new tactile input, the customized PF updates the joint distribution of the object class and object pose while tracking the novelty of the object. Once a novel object is identified, its shape will be reconstructed using GPIS. By grounding the prior of the GPIS with the maximum-a-posteriori (MAP) estimation from the PF, the knowledge about known shapes can be transferred to learn novel shapes. An exploration procedure based on global shape estimation is proposed to guide active data acquisition and terminate the exploration upon sufficient information. Through experiments in simulation, the proposed framework demonstrated its effectiveness and efficiency in estimating object class and pose for known objects and learning novel shapes. Furthermore, it can recognize previously learned shapes reliably.	翻訳日:2024-11-07 22:05:05 公開日:2024-10-11
# xTED:拡散に基づく軌道編集によるクロスドメイン適応 xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing ( http://arxiv.org/abs/2409.08687v2 ) ライセンス: Link先を確認	Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi Zhang, Jianming Hu, Xianyuan Zhan,	(参考訳) 異なるドメインから事前に収集されたデータを再利用することは、ターゲットドメインで不十分なデータを持つが、他のドメインで比較的豊富である意思決定タスクにとって魅力的な解決策である。既存のドメイン間政策伝達手法は主に、ドメイン/タスク固有の差別者、表現、ポリシーなどの政策学習を促進するために、ドメインの対応や修正を学習することを目的としている。この設計哲学は、しばしば重いモデルアーキテクチャやタスク/ドメイン固有のモデリングをもたらし、柔軟性を欠いている。複雑な下流のドメイン間ポリシー転送モデルに頼るのではなく、データレベルでドメインギャップを直接ブリッジできるだろうか? 本研究では,クロスドメイントラジェクトリ適応のために特別に設計された拡散モデルを用いたクロスドメイントラジェクトリ・EDiting (xTED) フレームワークを提案する。提案するモデルアーキテクチャは,対象データ内の動的パターンだけでなく,状態,行動,報酬間の複雑な依存関係を効果的にキャプチャする。事前訓練された拡散を先行として利用することにより、元の意味情報を保存しながら、ソースドメインの軌跡を対象のドメインプロパティにマッチするように変換することができる。このプロセスは、基礎となるドメインギャップを暗黙的に修正し、ソースデータの状態リアリズムと動的信頼性を高め、様々な下流ポリシー学習手法で柔軟な組み入れを可能にする。その単純さにもかかわらず、xTEDは広範なシミュレーションや実ロボット実験において優れた性能を示している。 Reusing pre-collected data from different domains is an appealing solution for decision-making tasks that have insufficient data in the target domain but are relatively abundant in other related domains. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning domain/task-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer models? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. By utilizing the pre-trained diffusion as a prior, source domain trajectories can be transformed to match with target domain properties while preserving original semantic information. This process implicitly corrects underlying domain gaps, enhancing state realism and dynamics reliability in the source data, and allowing flexible incorporation with various downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.	翻訳日:2024-11-07 21:09:04 公開日:2024-10-11
# RAGにおけるLLMの信頼性測定と向上 : 接地属性と再利用学習を通して Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse ( http://arxiv.org/abs/2409.11242v2 ) ライセンス: Link先を確認	Maojia Song, Shang Hong Sim, Rishabh Bhardwaj, Hai Leong Chieu, Navonil Majumder, Soujanya Poria,	(参考訳) LLMは、検索拡張生成(RAG)システムの不可欠なコンポーネントである。エンド・ツー・エンドのRAGシステムの全体的な品質評価には多くの研究が焦点を当てているが、RAGタスクに対するLCMの適切性を理解するにはギャップがある。本稿では,RAG フレームワークにおける LLM の信頼性を評価する総合的尺度である Trust-Score を紹介する。この結果から,テキスト内学習などの様々なプロンプト手法では,信頼スコアが測定したRAGタスクにLLMを効果的に適応できないことがわかった。そこで本研究では,信頼スコア性能向上のためのLLM調整手法であるTrust-Alignを提案する。 LLaMA-3ファミリーは,ASQA(14.0),QAMPARI(28.9),ERI5(13.7)において,同規模のオープンソースLLMを著しく上回っている。また、LLaMAシリーズ(1bから8b)、Qwen-2.5シリーズ(0.5bから7b)、Phi3.5(3.8b)など、様々なオープンウェイトモデルにおけるTrust-Alignの有効性を示す。ソースコードは \url{https://anonymous.4open.science/r/trust-align} で公開しています。 LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that various prompting methods, such as in-context learning, fail to effectively adapt LLMs to the RAG task as measured by Trust-Score. Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. The LLaMA-3 family, aligned using our method, significantly outperforms open-source LLMs of similar sizes on ASQA (up 14.0), QAMPARI (up 28.9), and ELI5 (up 13.7). We also demonstrate the effectiveness of Trust-Align across different open-weight models, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b), and Phi3.5 (3.8b). We release our code at \url{https://anonymous.4open.science/r/trust-align}	翻訳日:2024-11-07 20:13:03 公開日:2024-10-11
# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v1 ) ライセンス: Link先を確認	Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,	(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像内の大きな物体をモデル化する能力に影響を与える。この問題に対処するため,ウィンドウパーティションをチャネルアテンションに組み込んで大きな受容場を取得し,よりグローバルな情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。 Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects in the image. To address this issue, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.	翻訳日:2024-11-07 03:44:25 公開日:2024-10-11
# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v2 ) ライセンス: Link先を確認	Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,	(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。 Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.	翻訳日:2024-11-07 03:44:25 公開日:2024-10-11
# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v3 ) ライセンス: Link先を確認	Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,	(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。 Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.	翻訳日:2024-11-07 03:44:25 公開日:2024-10-11
# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v4 ) ライセンス: Link先を確認	Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,	(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。 Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.	翻訳日:2024-11-07 03:44:25 公開日:2024-10-11
# ファウショット学習のための特徴生成器 A Feature Generator for Few-Shot Learning ( http://arxiv.org/abs/2409.14141v1 ) ライセンス: Link先を確認	Heethanjan Kanagalingam, Thenukan Pathmanathan, Navaneethan Ketheeswaran, Mokeeshan Vathanakumar, Mohamed Afham, Ranga Rodrigo,	(参考訳) FSL(Few-shot Learning)は、ラベル付きデータに制限のある新しいオブジェクトやクラスをモデルが認識できるようにすることを目的としている。新しいデータポイントを合成して限られたデータセットを増やす機能ジェネレータが、この課題に対する有望な解決策として登場した。本稿では,FSLタスクの埋め込みプロセスの改善における特徴発生器の有効性について検討する。本稿では,クラスごとの画像不足による不正確な埋め込みの問題に対処するため,クラスレベルのテキスト記述から視覚的特徴を生成する特徴生成器を提案する。生成元を分類器の損失、識別器の損失、生成した特徴と真のクラス埋め込みの間の距離損失の組み合わせで訓練することにより、正確な同クラス特徴の生成を保証し、全体的な特徴表現を強化する。提案手法は1ショットで10%,5ショットで約5%の精度でベースラインモデルを上回った。さらに、この論文では、ビジュアルオンリーとビジュアル+テキストジェネレータの両方がテストされている。 Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper.	翻訳日:2024-11-07 03:22:12 公開日:2024-10-11
# ファウショット学習のための特徴生成器 A Feature Generator for Few-Shot Learning ( http://arxiv.org/abs/2409.14141v2 ) ライセンス: Link先を確認	Heethanjan Kanagalingam, Thenukan Pathmanathan, Navaneethan Ketheeswaran, Mokeeshan Vathanakumar, Mohamed Afham, Ranga Rodrigo,	(参考訳) FSL(Few-shot Learning)は、ラベル付きデータに制限のある新しいオブジェクトやクラスをモデルが認識できるようにすることを目的としている。新しいデータポイントを合成して限られたデータセットを増やす機能ジェネレータが、この課題に対する有望な解決策として登場した。本稿では,FSLタスクの埋め込みプロセスの改善における特徴発生器の有効性について検討する。本稿では,クラスごとの画像不足による不正確な埋め込みの問題に対処するため,クラスレベルのテキスト記述から視覚的特徴を生成する特徴生成器を提案する。生成元を分類器の損失、識別器の損失、生成した特徴と真のクラス埋め込みの間の距離損失の組み合わせで訓練することにより、正確な同クラス特徴の生成を保証し、全体的な特徴表現を強化する。提案手法は1ショットで10%,5ショットで約5%の精度でベースラインモデルを上回った。さらに、この論文では、ビジュアルオンリーとビジュアル+テキストジェネレータの両方がテストされている。コードはhttps://github.com/heethanjan/Feature-Generator-for-FSLで公開されている。 Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper. The code is publicly available at https://github.com/heethanjan/Feature-Generator-for-FSL.	翻訳日:2024-11-07 03:22:12 公開日:2024-10-11
# 回転する表面符号と回転しない表面符号の等式誤差率の比較 Compare the Pair: Rotated vs. Unrotated Surface Codes at Equal Logical Error Rates ( http://arxiv.org/abs/2409.14765v1 ) ライセンス: Link先を確認	Anthony Ryan O'Rourke, Simon Devitt,	(参考訳) 現実的な量子コンピュータは、リソース効率のよいエラー訂正コードを必要とする。回転曲面符号は、回転しない曲面符号の約半分の量子ビットを用いて、同じ誤差補正距離を持つ論理量子ビットを生成する。しかし、距離の代わりに、より有用な量子ビット保存計量は論理的誤り抑制に基づいている。本研究では,各符号の量子ビット数を比較し,回路レベルの雑音下での等価な論理誤差率を求める。我々は、スタビライザシミュレータStimと非相関な最小整合デコーダPyMatching 2を用いて、有効なCNOT順序でメモリ実験回路のモンテカルロサンプリングを行う。高奇数および符号距離に対する論理的誤り率と物理的誤り率の両面的なスケーリングについて明らかにする。ローテーションされたコードは、非ローテーションコードで使用されるキュービット数$\sim74\%$を使用して、論理誤差率$p_L = 10^{-12}$を演算物理誤差率$p=10^{-3}$で達成している。この比率は、すべての有用な論理的誤り率に対して$p=10^{-3}$の2つの係数の物理的誤り率に対して$\sim75\%$のままである。本研究は, 回転符号による量子ビット保存を明確化し, 表面符号の将来の実装に使用する数値的正当性を提供する。 Practical quantum computers will require resource-efficient error-correcting codes. The rotated surface code uses approximately half the number of qubits as the unrotated surface code to create a logical qubit with the same error-correcting distance. However, instead of distance, a more useful qubit-saving metric would be based on logical error suppression. In this work we compare the number of qubits used by each code to achieve equal logical error rates under circuit-level noise. We perform Monte Carlo sampling of memory experiment circuits with all valid CNOT orders, using the stabiliser simulator Stim and the uncorrelated minimum-weight perfect-matching decoder PyMatching 2. We clarify the well-below-threshold scaling of logical to physical error rates for high odd and even code distances. We find that the rotated code uses $\sim74\%$ the number of qubits used by the unrotated code to achieve a logical error rate of $p_L = 10^{-12}$ at the operational physical error rate of $p=10^{-3}$. The ratio remains $\sim75\%$ for physical error rates within a factor of two of $p=10^{-3}$ for all useful logical error rates. Our work clarifies the qubit savings provided by the rotated code and provides numerical justification for its use in future implementations of the surface code.	翻訳日:2024-11-06 21:12:18 公開日:2024-10-11
# 回転する表面符号と回転しない表面符号の等式誤差率の比較 Compare the Pair: Rotated vs. Unrotated Surface Codes at Equal Logical Error Rates ( http://arxiv.org/abs/2409.14765v2 ) ライセンス: Link先を確認	Anthony Ryan O'Rourke, Simon Devitt,	(参考訳) 現実的な量子コンピュータは、リソース効率のよいエラー訂正コードを必要とする。回転曲面符号は、回転しない曲面符号の約半分の量子ビットを用いて、同じ誤差補正距離を持つ論理量子ビットを生成する。しかし、距離の代わりに、より有用な量子ビット保存計量は論理誤差率に基づいている。本研究では,回路レベルのノイズ下での論理的誤り率と物理的誤り率を高い奇数および偶数で比較し,各符号が使用する量子ビット数を比較して等価な論理的誤り率を求める。我々は、スタビライザシミュレータStimと非相関な最小整合デコーダPyMatching 2を用いて、有効なCNOT順序でメモリ実験回路のモンテカルロサンプリングを行う。ローテーションされたコードは、ノイズモデルによって740～75\%の量子ビット数を使用し、論理誤差率$p_L = 10^{-12}$を演算物理誤差率$p=10^{-3}$で達成する。この比率は、すべての有用な論理的誤り率に対して$p=10^{-3}$の2要素の物理誤差率に対して$\approx75\%$のままである。我々の研究は、表面コードの低p_L$スケールを見つけ、回転された表面コードによって提供されるキュービットの節約を明確化し、将来の実装におけるその使用の数値的正当性を提供する。 Practical quantum computers will require resource-efficient error-correcting codes. The rotated surface code uses approximately half the number of qubits as the unrotated surface code to create a logical qubit with the same error-correcting distance. However, instead of distance, a more useful qubit-saving metric would be based on logical error rates. In this work we find the well-below-threshold scaling of logical to physical error rates under circuit-level noise for both codes at high odd and even distances, then compare the number of qubits used by each code to achieve equal logical error rates. We perform Monte Carlo sampling of memory experiment circuits with all valid CNOT orders, using the stabiliser simulator Stim and the uncorrelated minimum-weight perfect-matching decoder PyMatching 2. We find that the rotated code uses $74 - 75\%$ the number of qubits used by the unrotated code, depending on the noise model, to achieve a logical error rate of $p_L = 10^{-12}$ at the operational physical error rate of $p=10^{-3}$. The ratio remains $\approx75\%$ for physical error rates within a factor of two of $p=10^{-3}$ for all useful logical error rates. Our work finds the low-$p_L$ scaling of the surface code and clarifies the qubit savings provided by the rotated surface code, providing numerical justification for its use in future implementations of the surface code.	翻訳日:2024-11-06 21:12:18 公開日:2024-10-11
# Bagging Regularized M-estimatorの精密漸近 Precise Asymptotics of Bagging Regularized M-estimators ( http://arxiv.org/abs/2409.15252v2 ) ライセンス: Link先を確認	Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec,	(参考訳) 我々は,アンサンブル推定器の正方形予測リスクを,正規化M-推定器(subagging,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating)を用いて評価し,そのリスクに対する一貫した推定器を構築する。具体的には、M \ge 1$ 正規化 M-推定器の不均一なコレクションを、それぞれ(おそらく異なる)サブサンプルサイズ、凸微分可能損失、凸正則化器で訓練する。サンプルサイズが$n$、フィーチャーサイズが$p$、サブサンプルサイズが$k_m$ for $m \in [M]$で、固定制限比が$n/p$、$k_m/n$です。我々の分析の鍵となるのは、重なり合う部分サンプル上の推定器と残留誤差の相関関係の合同漸近挙動に関する新しい結果である。独立な利害関係では、非アンサンブル設定($M = 1$)における自由度に関連するトレース汎函数の収束も確立し、それまで知られていた平方損失とリッジのケースを拡張し、ラッソ正則化器(英語版)(lasso regularizers)を拡大する。共通損失、正規化子、サブサンプルサイズで訓練された均質アンサンブルに特化すると、リスク評価はアンサンブルとサブサンプルサイズ$(M,k)$による暗黙の正規化効果にいくつかの光を放つ。アンサンブルサイズが$M$の場合、サブサンプルサイズを最適に調整すると、サンプル単位のモノトニックリスクが生じる。フルアンサンブル推定器 ($M \to \infty$ の場合) に対して、最適部分サンプルサイズ $k^\star$ は、明示正規化が消えるとき、過度にパラメータ化された状態 $(k^\star \le \min\{n,p\})$ に属する傾向にある。最後に、サブサンプルサイズ、アンサンブルサイズ、正規化のジョイント最適化は、(サブゲージなしで)全データでのみレギュラーライザ最適化を著しく上回る。 We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M \ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M \to \infty$), the optimal subsample size $k^\star$ tends to be in the overparameterized regime $(k^\star \le \min\{n,p\})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).	翻訳日:2024-11-06 20:16:59 公開日:2024-10-11
# エージェント能力評価のための確率的手法の解析 Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( http://arxiv.org/abs/2409.16125v2 ) ライセンス: Link先を確認	Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer,	(参考訳) AIシステムからのリスクを軽減するためには、その能力を正確に評価する必要があります。これは、稀にしか表示されない場合に特に困難である。 Phuongらは、与えられたタスクを完了したAIエージェントの確率をよりよく推定することを目的とした2つの方法を提案する。マイルストーン法はタスクをサブタスクに分解し、全体の成功率の推定を改善する。これらの手法をモンテカルロ推定器として解析したところ、両者ともモンテカルロサンプリングに比べて分散を効果的に減少させるが、バイアスももたらされることが判明した。実験結果から,本手法は実世界の多くの課題に対する真解率を過小評価する。専門家のベスト・オブ・N法は、本質的に欠陥のある再重み付け因子に起因する全てのタスクに対してさらに深刻な過小評価を示す。困難なタスクにおけるAIエージェントの能力推定の精度を高めるため、今後の研究はモンテカルロ推定器の豊富な文献を活用するべきであると提案する。 To mitigate risks from AI systems, we need to assess their capabilities accurately. This is especially difficult in cases where capabilities are only rarely displayed. Phuong et al. propose two methods that aim to obtain better estimates of the probability of an AI agent successfully completing a given task. The milestone method decomposes tasks into subtasks, aiming to improve overall success rate estimation, while the expert best-of-N method leverages human guidance as a proxy for the model's independent performance. Our analysis of these methods as Monte Carlo estimators reveals that while both effectively reduce variance compared to naive Monte Carlo sampling, they also introduce bias. Experimental results demonstrate that the milestone method underestimates true solve rates for many real-world tasks due to its constraining assumptions. The expert best-of-N method exhibits even more severe underestimation across all tasks, attributed to an inherently flawed re-weighting factor. To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.	翻訳日:2024-11-06 17:52:35 公開日:2024-10-11
# エージェント能力評価のための確率的手法の解析 Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( http://arxiv.org/abs/2409.16125v3 ) ライセンス: Link先を確認	Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer,	(参考訳) AIシステムからのリスクを軽減するためには、その能力を正確に評価する必要があります。これは、稀にしか表示されない場合に特に困難である。 Phuongらは、与えられたタスクを完了したAIエージェントの確率をよりよく推定することを目的とした2つの方法を提案する。マイルストーン法はタスクをサブタスクに分解し、全体の成功率の推定を改善する。これらの手法をモンテカルロ推定器として解析したところ、両者ともモンテカルロサンプリングに比べて分散を効果的に減少させるが、バイアスももたらされることが判明した。実験結果から,本手法は実世界の多くの課題に対する真解率を過小評価する。専門家のベスト・オブ・N法は、本質的に欠陥のある再重み付け因子に起因する全てのタスクに対してさらに深刻な過小評価を示す。困難なタスクにおけるAIエージェントの能力推定の精度を高めるため、今後の研究はモンテカルロ推定器の豊富な文献を活用するべきであると提案する。 To mitigate risks from AI systems, we need to assess their capabilities accurately. This is especially difficult in cases where capabilities are only rarely displayed. Phuong et al. propose two methods that aim to obtain better estimates of the probability of an AI agent successfully completing a given task. The milestone method decomposes tasks into subtasks, aiming to improve overall success rate estimation, while the expert best-of-N method leverages human guidance as a proxy for the model's independent performance. Our analysis of these methods as Monte Carlo estimators reveals that while both effectively reduce variance compared to naive Monte Carlo sampling, they also introduce bias. Experimental results demonstrate that the milestone method underestimates true solve rates for many real-world tasks due to its constraining assumptions. The expert best-of-N method exhibits even more severe underestimation across all tasks, attributed to an inherently flawed re-weighting factor. To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.	翻訳日:2024-11-06 17:52:35 公開日:2024-10-11
# 忘れ、無視、近視:オンライン連続学習における重要な課題の再考 Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning ( http://arxiv.org/abs/2409.19245v1 ) ライセンス: Link先を確認	Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shaoyuan Li, Songcan Chen,	(参考訳) オンライン連続学習では、一定の無限のデータストリームからモデルを学習する必要がある。この分野では大きな努力がなされているが、その多くは、より重い訓練負荷を犠牲にして、より優れた分類能力を達成するために、破滅的な忘れる問題を緩和することに焦点を当てていた。彼らは、例えば、高速なデータストリーム環境では、データは遅いモデルに対応するために停止しない、という現実のシナリオを見落としていた。本稿では,モデルが時間単位内で処理できるトレーニングサンプルの最大数として定義されるモデルのスループットが,同様に重要であることを強調する。モデルがどれだけのデータを利用できるかを直接制限し、現在のメソッドに挑戦的なジレンマを提示する。モデルの無知: OCLの単一パスの性質は、制約付きトレーニング時間とストレージ容量内で効果的な特徴を学習し、効果的な学習とモデルのスループットのトレードオフにつながる。これらの課題に対処するため、我々は、最小時間で効率的なグローバルな識別的特徴学習を容易にする非スパース分類法進化フレームワーク(NsCE)を提案する。 NsCEは、非スパースな最大分離正規化と、事前訓練されたモデルの助けを借りて、ターゲットとなる体験再生技術を統合し、新しいグローバルな差別的特徴の迅速な獲得を可能にしている。 Online continual learning requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the catastrophic forgetting issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e.g., in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that model throughput -- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and excessively sparse classifier, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features.	翻訳日:2024-11-06 00:18:22 公開日:2024-10-11
# 忘れ、無視、近視:オンライン連続学習における重要な課題の再考 Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning ( http://arxiv.org/abs/2409.19245v2 ) ライセンス: Link先を確認	Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shao-yuan Li, Songcan Chen,	(参考訳) オンライン連続学習では、一定の無限のデータストリームからモデルを学習する必要がある。この分野では大きな努力がなされているが、その多くは、より重い訓練負荷を犠牲にして、より優れた分類能力を達成するために、破滅的な忘れる問題を緩和することに焦点を当てていた。彼らは、例えば、高速なデータストリーム環境では、データは遅いモデルに対応するために停止しない、という現実のシナリオを見落としていた。本稿では,モデルが時間単位内で処理できるトレーニングサンプルの最大数として定義されるモデルのスループットが,同様に重要であることを強調する。モデルがどれだけのデータを利用できるかを直接制限し、現在のメソッドに挑戦的なジレンマを提示する。モデルの無知: OCLの単一パスの性質は、制約付きトレーニング時間とストレージ容量内で効果的な特徴を学習し、効果的な学習とモデルのスループットのトレードオフにつながる。これらの課題に対処するため、我々は、最小時間で効率的なグローバルな識別的特徴学習を容易にする非スパース分類法進化フレームワーク(NsCE)を提案する。 NsCEは、非スパースな最大分離正規化と、事前訓練されたモデルの助けを借りて、ターゲットとなる体験再生技術を統合し、新しいグローバルな差別的特徴の迅速な獲得を可能にしている。 Online continual learning requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the catastrophic forgetting issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e.g., in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that model throughput -- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and excessively sparse classifier, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features.	翻訳日:2024-11-06 00:18:22 公開日:2024-10-11
# 伝統放送コンペティションにおけるヘッジと近似真理 Hedging and Approximate Truthfulness in Traditional Forecasting Competitions ( http://arxiv.org/abs/2409.19477v1 ) ライセンス: Link先を確認	Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo,	(参考訳) 予測競技において、従来のメカニズムは、各イベントの結果に対して各競技者の予測をスコアし、最高得点の競技者が勝利する。この伝統メカニズムがインセンティブの問題に悩まされることはよく知られているが、イベントの数が増加するにつれて、競技者が大まかに真実であることは民間伝承である。しかし、これまでのところこの文学は、この伝統的なメカニズムの形式的な分析を欠いている。本稿はそのような分析を初めて行う。任意の数のイベントであっても、最高の予測者はヘッジへのインセンティブを持ち、より穏健な信念を報告し、勝利確率を高めることができる。しかし, 正の面から, 2人の競技者が, 相手の相対的品質と事象の結果に十分な不確実性がある場合には, ほぼ真相を呈することを示す。 In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.	翻訳日:2024-11-05 22:57:44 公開日:2024-10-11
# 伝統放送コンペティションにおけるヘッジと近似真理 Hedging and Approximate Truthfulness in Traditional Forecasting Competitions ( http://arxiv.org/abs/2409.19477v2 ) ライセンス: Link先を確認	Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo,	(参考訳) 予測競技において、従来のメカニズムは、各イベントの結果に対して各競技者の予測をスコアし、最高得点の競技者が勝利する。この伝統メカニズムがインセンティブの問題に悩まされることはよく知られているが、イベントの数が増加するにつれて、競技者が大まかに真実であることは民間伝承である。しかし、これまでのところこの文学は、この伝統的なメカニズムの形式的な分析を欠いている。本稿はそのような分析を初めて行う。任意の数のイベントであっても、最高の予測者はヘッジへのインセンティブを持ち、より穏健な信念を報告し、勝利確率を高めることができる。しかし, 正の面から, 2人の競技者が, 相手の相対的品質と事象の結果に十分な不確実性がある場合には, ほぼ真相を呈することを示す。 In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.	翻訳日:2024-11-05 22:57:44 公開日:2024-10-11
# HealthQ: 医療会話におけるLCM鎖の問合せ機能について HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations ( http://arxiv.org/abs/2409.19487v1 ) ライセンス: Link先を確認	Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani,	(参考訳) デジタル医療において、大言語モデル(LLM)は主に質問応答能力を高め、患者との相互作用を改善するために利用されてきた。しかし、効果的な患者ケアは、関連する質問に答えることで、積極的に情報を収集できるLCMチェーンを必要とする。本稿では,LLMヘルスケアチェーンの問合せ能力を評価するための新しいフレームワークであるHealthQを提案する。提案手法は,レトリーバル拡張生成 (RAG) や思考の連鎖 (CoT) ,反射的連鎖など複数のLCM連鎖を実装し,その関連性や情報性を評価するためのLCM判定器を導入した。 HealthQを検証するために、我々は、Recall-Oriented Understudy for Gisting Evaluation (ROUGE) や Named Entity Recognition (NER) ベースのセット比較のような従来の自然言語処理(NLP)メトリクスを使用し、公衆医療用ノートデータセットであるChatDoctor と MTS-Dialog から2つのカスタムデータセットを構築した。医療会話におけるLSMの質問能力に関する初の総合的研究を行い、新しいデータセット生成パイプラインを開発し、詳細な評価手法を提案する。 In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions. However, effective patient care necessitates LLM chains that can actively gather information by posing relevant questions. This paper presents HealthQ, a novel framework designed to evaluate the questioning capabilities of LLM healthcare chains. We implemented several LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, and introduced an LLM judge to assess the relevance and informativeness of the generated questions. To validate HealthQ, we employed traditional Natural Language Processing (NLP) metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Named Entity Recognition (NER)-based set comparison, and constructed two custom datasets from public medical note datasets, ChatDoctor and MTS-Dialog. Our contributions are threefold: we provide the first comprehensive study on the questioning capabilities of LLMs in healthcare conversations, develop a novel dataset generation pipeline, and propose a detailed evaluation methodology.	翻訳日:2024-11-05 22:57:44 公開日:2024-10-11
# HealthQ: 医療会話におけるLCM鎖の問合せ機能について HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations ( http://arxiv.org/abs/2409.19487v2 ) ライセンス: Link先を確認	Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani,	(参考訳) デジタル医療において、大言語モデル(LLM)は主に質問応答能力を高め、患者との相互作用を改善するために利用されてきた。しかし、効果的な患者ケアは、関連する質問に答えることで、積極的に情報を収集できるLCMチェーンを必要とする。本稿では,LLMヘルスケアチェーンの問合せ能力を評価するための新しいフレームワークであるHealthQを提案する。提案手法は,レトリーバル拡張生成 (RAG) や思考の連鎖 (CoT) ,反射的連鎖など複数のLCM連鎖を実装し,その関連性や情報性を評価するためのLCM判定器を導入した。 HealthQを検証するために、我々は、Recall-Oriented Understudy for Gisting Evaluation (ROUGE) や Named Entity Recognition (NER) ベースのセット比較のような従来の自然言語処理(NLP)メトリクスを使用し、公衆医療用ノートデータセットであるChatDoctor と MTS-Dialog から2つのカスタムデータセットを構築した。医療会話におけるLSMの質問能力に関する初の総合的研究を行い、新しいデータセット生成パイプラインを開発し、詳細な評価手法を提案する。 In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions. However, effective patient care necessitates LLM chains that can actively gather information by posing relevant questions. This paper presents HealthQ, a novel framework designed to evaluate the questioning capabilities of LLM healthcare chains. We implemented several LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, and introduced an LLM judge to assess the relevance and informativeness of the generated questions. To validate HealthQ, we employed traditional Natural Language Processing (NLP) metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Named Entity Recognition (NER)-based set comparison, and constructed two custom datasets from public medical note datasets, ChatDoctor and MTS-Dialog. Our contributions are threefold: we provide the first comprehensive study on the questioning capabilities of LLMs in healthcare conversations, develop a novel dataset generation pipeline, and propose a detailed evaluation methodology.	翻訳日:2024-11-05 22:57:44 公開日:2024-10-11
# ドメイン被覆強化によるLDMのフェデレーション・インストラクション・チューニング Federated Instruction Tuning of LLMs with Domain Coverage Augmentation ( http://arxiv.org/abs/2409.20135v3 ) ライセンス: Link先を確認	Zezhou Wang, Yaxin Du, Zhuzhong Qian, Siheng Chen,	(参考訳) Federated Domain-specific Instruction Tuning (FedDIT)は、限られたクロスクライアントなプライベートデータとサーバサイドの公開データを使って命令拡張を行い、最終的に特定のドメイン内のモデルパフォーマンスを向上する。現在まで、FedDITに影響を与える要因は不明確であり、既存の命令拡張手法は主に分散環境を考慮せずに集中的な設定に焦点を当てている。実験の結果,データ不均一性ではなく,クロスクライアントなドメインカバレッジがFedDITのモデル性能を駆動していることが判明した。そこで本研究では,クライアントセンターの選択と検索に基づく拡張により,ドメインカバレッジを最適化するFedDCAを提案する。クライアント側の計算効率とシステムのスケーラビリティのために、FedDCAの変種であるFedDCA$^$はサーバ側の特徴アライメントを備えた異種エンコーダを利用する。 4つの異なる領域(コード、医療、財務、数学)にわたる大規模な実験は、両方の方法の有効性を裏付けるものである。さらに,多量の公開データを用いたメモリ抽出攻撃に対するプライバシ保護について検討した。その結果,公開データの量とプライバシ保護能力との間に有意な相関は認められなかった。しかし、微調整ラウンドの増加に伴い、プライバシー漏洩のリスクは減少または収束する。 Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data together with server-side public data for instruction augmentation, ultimately boosting model performance within specific domains. To date, the factors affecting FedDIT remain unclear, and existing instruction augmentation methods primarily focus on the centralized setting without considering distributed environments. Our experiments reveal that the cross-client domain coverage, rather than data heterogeneity, drives model performance in FedDIT. In response, we propose FedDCA, which optimizes domain coverage through greedy client center selection and retrieval-based augmentation. For client-side computational efficiency and system scalability, FedDCA$^$, the variant of FedDCA, utilizes heterogeneous encoders with server-side feature alignment. Extensive experiments across four distinct domains (code, medical, financial, and mathematical) substantiate the effectiveness of both methods. Additionally, we investigate privacy preservation against memory extraction attacks utilizing various amounts of public data. Results show that there is no significant correlation between the volume of public data and the privacy-preserving capability. However, as the fine-tuning rounds increase, the risk of privacy leakage reduces or converges.	翻訳日:2024-11-05 16:08:18 公開日:2024-10-11
# ドメイン被覆強化によるLDMのフェデレーション・インストラクション・チューニング Federated Instruction Tuning of LLMs with Domain Coverage Augmentation ( http://arxiv.org/abs/2409.20135v4 ) ライセンス: Link先を確認	Zezhou Wang, Yaxin Du, Zhuzhong Qian, Siheng Chen,	(参考訳) Federated Domain-specific Instruction Tuning (FedDIT)は、限られたクロスクライアントなプライベートデータとサーバサイドの公開データを使って命令拡張を行い、最終的に特定のドメイン内のモデルパフォーマンスを向上する。現在まで、FedDITに影響を与える要因は不明確であり、既存の命令拡張手法は主に分散環境を考慮せずに集中的な設定に焦点を当てている。実験の結果,データ不均一性ではなく,クロスクライアントなドメインカバレッジがFedDITのモデル性能を駆動していることが判明した。そこで本研究では,クライアントセンターの選択と検索に基づく拡張により,ドメインカバレッジを最適化するFedDCAを提案する。クライアント側の計算効率とシステムのスケーラビリティのために、FedDCAの変種であるFedDCA$^$はサーバ側の特徴アライメントを備えた異種エンコーダを利用する。 4つの異なる領域(コード、医療、財務、数学)にわたる大規模な実験は、両方の方法の有効性を裏付けるものである。さらに,多量の公開データを用いたメモリ抽出攻撃に対するプライバシ保護について検討した。その結果,公開データの量とプライバシ保護能力との間に有意な相関は認められなかった。しかし、微調整ラウンドの増加に伴い、プライバシー漏洩のリスクは減少または収束する。 Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data together with server-side public data for instruction augmentation, ultimately boosting model performance within specific domains. To date, the factors affecting FedDIT remain unclear, and existing instruction augmentation methods primarily focus on the centralized setting without considering distributed environments. Our experiments reveal that the cross-client domain coverage, rather than data heterogeneity, drives model performance in FedDIT. In response, we propose FedDCA, which optimizes domain coverage through greedy client center selection and retrieval-based augmentation. For client-side computational efficiency and system scalability, FedDCA$^$, the variant of FedDCA, utilizes heterogeneous encoders with server-side feature alignment. Extensive experiments across four distinct domains (code, medical, financial, and mathematical) substantiate the effectiveness of both methods. Additionally, we investigate privacy preservation against memory extraction attacks utilizing various amounts of public data. Results show that there is no significant correlation between the volume of public data and the privacy-preserving capability. However, as the fine-tuning rounds increase, the risk of privacy leakage reduces or converges.	翻訳日:2024-11-05 15:58:31 公開日:2024-10-11
# プロンプトを超えて: 大規模言語モデルの動的会話ベンチマーク Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models ( http://arxiv.org/abs/2409.20222v2 ) ライセンス: Link先を確認	David Castillo-Bolado, Joseph Davidson, Finlay Gray, Marek Rosa,	(参考訳) 本稿では,対話エージェントに対する動的ベンチマークシステムを導入し,その性能をシミュレーションし,ユーザ$\leftrightarrow$agentインタラクションによって評価する。インタラクションはユーザとエージェント間の会話であり、複数のタスクが導入され、同時に実行される。タスクをインターリーブするために定期的にコンテキストスイッチを行い、エージェントの長期記憶、継続的な学習、情報統合機能を評価する現実的なテストシナリオを構築します。プロプライエタリおよびオープンソースのLarge-Language Modelsの結果、LLMは一般的にシングルタスクのインタラクションでうまく機能するが、インターリーブされると同じタスクで苦労する。特に、LTMシステムで補足された短いコンテキストのLLMは、より大きなコンテキストを持つものよりもパフォーマンスが良い。我々のベンチマークは、これまでのベンチマークでは捉えられなかったような、より自然な相互作用に対応するLLMには、他にも課題があることを示唆している。 We introduce a dynamic benchmarking system for conversational agents that evaluates their performance through a single, simulated, and lengthy user$\leftrightarrow$agent interaction. The interaction is a conversation between the user and agent, where multiple tasks are introduced and then undertaken concurrently. We context switch regularly to interleave the tasks, which constructs a realistic testing scenario in which we assess the Long-Term Memory, Continual Learning, and Information Integration capabilities of the agents. Results from both proprietary and open-source Large-Language Models show that LLMs in general perform well on single-task interactions, but they struggle on the same tasks when they are interleaved. Notably, short-context LLMs supplemented with an LTM system perform as well as or better than those with larger contexts. Our benchmark suggests that there are other challenges for LLMs responding to more natural interactions that contemporary benchmarks have heretofore not been able to capture.	翻訳日:2024-11-05 15:58:31 公開日:2024-10-11
# 逆絵画:絵画の過程を再構築する Inverse Painting: Reconstructing The Painting Process ( http://arxiv.org/abs/2409.20556v2 ) ライセンス: Link先を確認	Bowei Chen, Yifan Wang, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz,	(参考訳) 入力絵が与えられた場合、どのように塗られたかのタイムラプス映像を再構成する。我々はこれを自己回帰画像生成問題として定式化し、初期空白の「キャンバス」を反復的に更新する。モデルは、多くのペイントビデオのトレーニングによって、実際のアーティストから学習する。本手法では,テキストと領域理解を取り入れて絵画の「指示」を定義し,新しい拡散型レンダラーでキャンバスを更新する。この方法は、訓練された限られたアクリル様式の絵画を外挿し、幅広い芸術様式やジャンルのもっともらしい結果を示す。 Given an input painting, we reconstruct a time-lapse video of how it may have been painted. We formulate this as an autoregressive image generation problem, in which an initially blank "canvas" is iteratively updated. The model learns from real artists by training on many painting videos. Our approach incorporates text and region understanding to define a set of painting "instructions" and updates the canvas with a novel diffusion-based renderer. The method extrapolates beyond the limited, acrylic style paintings on which it has been trained, showing plausible results for a wide range of artistic styles and genres.	翻訳日:2024-11-05 15:38:59 公開日:2024-10-11
# 人間の意味軌道に対する伝達不能な非教師付き外乱検出フレームワーク Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories ( http://arxiv.org/abs/2410.00054v1 ) ライセンス: Link先を確認	Zheng Zhang, Hossein Amiri, Dazhou Yu, Yuntong Hu, Liang Zhao, Andreas Zufle,	(参考訳) セマンティックトラジェクトリは、旅行目的や場所活動などのテキスト情報で時空間データを豊かにするものであり、医療、社会保障、都市計画に不可欠な不適切な行動を特定するための鍵となる。従来の外れ値検出は、ドメイン知識を必要とし、目に見えない外れ値を特定する能力を制限するヒューリスティックなルールに依存している。さらに、空間的、時間的、テキスト的次元にわたるマルチモーダルデータを共同で検討できる包括的なアプローチが欠如している。ドメインに依存しないモデルの必要性に対処するため,TOD4TrajフレームワークのTransferable Outlier Detection for Human Semantic Trajectories(TOD4Traj)を提案する。対照的な学習モジュールは、時間的および集団間の定期的なモビリティパターンを特定するために、さらにプロポーズされ、個々の一貫性とグループの多数派パターンに基づいて、アウトレーヤを共同で検出することができる。実験の結果,TOD4Trajは既存のモデルよりも優れた性能を示し,その有効性と適応性を示した。 Semantic trajectories, which enrich spatial-temporal data with textual information such as trip purposes or location activities, are key for identifying outlier behaviors critical to healthcare, social security, and urban planning. Traditional outlier detection relies on heuristic rules, which requires domain knowledge and limits its ability to identify unseen outliers. Besides, there lacks a comprehensive approach that can jointly consider multi-modal data across spatial, temporal, and textual dimensions. Addressing the need for a domain-agnostic model, we propose the Transferable Outlier Detection for Human Semantic Trajectories (TOD4Traj) framework.TOD4Traj first introduces a modality feature unification module to align diverse data feature representations, enabling the integration of multi-modal information and enhancing transferability across different datasets. A contrastive learning module is further pro-posed for identifying regular mobility patterns both temporally and across populations, allowing for a joint detection of outliers based on individual consistency and group majority patterns. Our experimental results have shown TOD4Traj's superior performance over existing models, demonstrating its effectiveness and adaptability in detecting human trajectory outliers across various datasets.	翻訳日:2024-11-05 15:19:28 公開日:2024-10-11
# 人間の意味軌道に対する伝達不能な非教師付き外乱検出フレームワーク Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories ( http://arxiv.org/abs/2410.00054v2 ) ライセンス: Link先を確認	Zheng Zhang, Hossein Amiri, Dazhou Yu, Yuntong Hu, Liang Zhao, Andreas Zufle,	(参考訳) セマンティックトラジェクトリは、旅行目的や場所活動などのテキスト情報で時空間データを豊かにするものであり、医療、社会保障、都市計画に不可欠な不適切な行動を特定するための鍵となる。従来の外れ値検出は、ドメイン知識を必要とし、目に見えない外れ値を特定する能力を制限するヒューリスティックなルールに依存している。さらに、空間的、時間的、テキスト的次元にわたるマルチモーダルデータを共同で検討できる包括的なアプローチが欠如している。ドメインに依存しないモデルの必要性に対処するため,TOD4TrajフレームワークのTransferable Outlier Detection for Human Semantic Trajectories(TOD4Traj)を提案する。対照的な学習モジュールは、時間的および集団間の定期的なモビリティパターンを特定するために、さらにプロポーズされ、個々の一貫性とグループの多数派パターンに基づいて、アウトレーヤを共同で検出することができる。実験の結果,TOD4Trajは既存のモデルよりも優れた性能を示し,その有効性と適応性を示した。 Semantic trajectories, which enrich spatial-temporal data with textual information such as trip purposes or location activities, are key for identifying outlier behaviors critical to healthcare, social security, and urban planning. Traditional outlier detection relies on heuristic rules, which requires domain knowledge and limits its ability to identify unseen outliers. Besides, there lacks a comprehensive approach that can jointly consider multi-modal data across spatial, temporal, and textual dimensions. Addressing the need for a domain-agnostic model, we propose the Transferable Outlier Detection for Human Semantic Trajectories (TOD4Traj) framework.TOD4Traj first introduces a modality feature unification module to align diverse data feature representations, enabling the integration of multi-modal information and enhancing transferability across different datasets. A contrastive learning module is further pro-posed for identifying regular mobility patterns both temporally and across populations, allowing for a joint detection of outliers based on individual consistency and group majority patterns. Our experimental results have shown TOD4Traj's superior performance over existing models, demonstrating its effectiveness and adaptability in detecting human trajectory outliers across various datasets.	翻訳日:2024-11-05 15:19:28 公開日:2024-10-11
# Scheherazade: LLMにおけるChain-of-Thought Math ReasoningとChain-of-Problemsの評価 Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems ( http://arxiv.org/abs/2410.00151v1 ) ライセンス: Link先を確認	Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,	(参考訳) ベンチマークは、Large Language Models (LLMs) の数学推論能力の進歩を測定するために重要である。しかし、GSM8Kのような既存の広く使われているベンチマークは、複数の最先端LCMが94%以上の精度を達成するため、あまり役に立たない。より厳しいベンチマークが提案されているが、その作成は手作業で行われ、高価であることが多い。本稿では,数理推論問題を論理的に連鎖させることにより,挑戦的な数理推論ベンチマークを自動生成するSchherazadeを提案する。本稿では,前鎖法と後鎖法という2つの異なる連鎖法を提案する。 GSM8KにSchherazadeを適用し、GSM8K-Scheherazadeを作成し、3つのフロンティアLSMとOpenAIのo1-previewを評価する。その結果,フロンティアモデルの性能低下はわずか数問の連鎖で急激に低下するが,予備評価では,最大5問の逆連鎖が継続することが示された。加えて、他のモデルはすべて、問題が逆向きにチェーンされている場合、パフォーマンスが悪くなるが、o1-previewは逆向きにチェーンされたベンチマークでパフォーマンスが良くなる。データセットとコードを公開します。 Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.	翻訳日:2024-11-05 14:40:28 公開日:2024-10-11
# Scheherazade: LLMにおけるChain-of-Thought Math ReasoningとChain-of-Problemsの評価 Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems ( http://arxiv.org/abs/2410.00151v2 ) ライセンス: Link先を確認	Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,	(参考訳) ベンチマークは、Large Language Models (LLMs) の数学推論能力の進歩を測定するために重要である。しかし、GSM8Kのような既存の広く使われているベンチマークは、複数の最先端LCMが94%以上の精度を達成するため、あまり役に立たない。より厳しいベンチマークが提案されているが、その作成は手作業で行われ、高価であることが多い。本稿では,数理推論問題を論理的に連鎖させることにより,挑戦的な数理推論ベンチマークを自動生成するSchherazadeを提案する。本稿では,前鎖法と後鎖法という2つの異なる連鎖法を提案する。 GSM8KにSchherazadeを適用し、GSM8K-Scheherazadeを作成し、3つのフロンティアLSMとOpenAIのo1-previewを評価する。その結果,フロンティアモデルの性能低下はわずか数問の連鎖で急激に低下するが,予備評価では,最大5問の逆連鎖が継続することが示された。加えて、他のモデルはすべて、問題が逆向きにチェーンされている場合、パフォーマンスが悪くなるが、o1-previewは逆向きにチェーンされたベンチマークでパフォーマンスが良くなる。データセットとコードを公開します。 Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.	翻訳日:2024-11-05 14:40:28 公開日:2024-10-11
# Scheherazade: LLMにおけるChain-of-Thought Math ReasoningとChain-of-Problemsの評価 Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems ( http://arxiv.org/abs/2410.00151v3 ) ライセンス: Link先を確認	Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,	(参考訳) ベンチマークは、Large Language Models (LLMs) の数学推論能力の進歩を測定するために重要である。しかし、GSM8Kのような既存の広く使われているベンチマークは、複数の最先端LCMが94%以上の精度を達成するため、あまり役に立たない。より厳しいベンチマークが提案されているが、その作成は手作業で行われ、高価であることが多い。本稿では,数理推論問題を論理的に連鎖させることにより,挑戦的な数理推論ベンチマークを自動生成するSchherazadeを提案する。本稿では,前鎖法と後鎖法という2つの異なる連鎖法を提案する。 GSM8KにSchherazadeを適用し、GSM8K-Scheherazadeを作成し、3つのフロンティアLSMとOpenAIのo1-previewを評価する。その結果,フロンティアモデルの性能低下はわずか数問の連鎖で急激に低下するが,予備評価では,最大5問の逆連鎖が継続することが示された。加えて、他のモデルはすべて、問題が逆向きにチェーンされている場合、パフォーマンスが悪くなるが、o1-previewは逆向きにチェーンされたベンチマークでパフォーマンスが良くなる。データセットとコードを公開します。 Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.	翻訳日:2024-11-05 14:40:28 公開日:2024-10-11
# 猫は猫だ(犬じゃない!):因果解析と埋め込み最適化によるテキスト・画像エンコーダの情報混合の解明 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization ( http://arxiv.org/abs/2410.00321v1 ) ライセンス: Link先を確認	Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai,	(参考訳) 本稿では,テキスト・ツー・イメージ拡散モデル(T2I)のテキストエンコーダにおける因果的方法の影響を分析する。それまでの作業は、デノナイジングプロセスを通じて問題に対処することに集中してきた。しかしながら、テキストの埋め込みがT2Iモデルにどのように貢献するか、特に複数のオブジェクトを生成する場合についての研究は行われていない。本稿では,テキスト埋め込みの包括的分析について述べる。一テキストの埋め込みが生成された画像にどのように貢献するか及び二情報が失われた理由及び第一項の対象に偏りがあること。そこで本研究では, 安定拡散における情報収支の90.05%向上を図り, トレーニング不要な簡易かつ効果的なテキスト埋め込みバランス最適化手法を提案する。さらに,従来の手法よりも高精度に情報損失を定量化し,人的評価と81%の精度で情報損失を評価できる新しい自動評価指標を提案する。この測定基準は、CLIPのテキストイメージの類似性のような現在の分散スコアの制限に対処するため、オブジェクトの存在と精度を効果的に測定する。 This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing the issues through the denoising process. However, there is no research discussing how text embedding contributes to T2I models, especially when generating more than one object. In this paper, we share a comprehensive analysis of text embedding: i) how text embedding contributes to the generated images and ii) why information gets lost and biases towards the first-mentioned object. Accordingly, we propose a simple but effective text embedding balance optimization method, which is training-free, with an improvement of 90.05% on information balance in stable diffusion. Furthermore, we propose a new automatic evaluation metric that quantifies information loss more accurately than existing methods, achieving 81% concordance with human assessments. This metric effectively measures the presence and accuracy of objects, addressing the limitations of current distribution scores like CLIP's text-image similarities.	翻訳日:2024-11-05 06:26:14 公開日:2024-10-11
# 猫は猫だ(犬じゃない!):因果解析と埋め込み最適化によるテキスト・画像エンコーダの情報混合の解明 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization ( http://arxiv.org/abs/2410.00321v2 ) ライセンス: Link先を確認	Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai,	(参考訳) 本稿では,テキスト・ツー・イメージ拡散モデル(T2I)のテキストエンコーダにおける因果的方法の影響を分析する。それまでの作業は、デノナイジングプロセスを通じて問題に対処することに集中してきた。しかしながら、テキストの埋め込みがT2Iモデルにどのように貢献するか、特に複数のオブジェクトを生成する場合についての研究は行われていない。本稿では,テキスト埋め込みの包括的分析について述べる。一テキストの埋め込みが生成された画像にどのように貢献するか及び二情報が失われた理由及び第一項の対象に偏りがあること。そこで本研究では, 安定拡散における情報収支の90.05%向上を図り, トレーニング不要な簡易かつ効果的なテキスト埋め込みバランス最適化手法を提案する。さらに,従来の手法よりも高精度に情報損失を定量化し,人的評価と81%の精度で情報損失を評価できる新しい自動評価指標を提案する。この測定基準は、CLIPのテキストイメージの類似性のような現在の分散スコアの制限に対処するため、オブジェクトの存在と精度を効果的に測定する。 This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing the issues through the denoising process. However, there is no research discussing how text embedding contributes to T2I models, especially when generating more than one object. In this paper, we share a comprehensive analysis of text embedding: i) how text embedding contributes to the generated images and ii) why information gets lost and biases towards the first-mentioned object. Accordingly, we propose a simple but effective text embedding balance optimization method, which is training-free, with an improvement of 90.05% on information balance in stable diffusion. Furthermore, we propose a new automatic evaluation metric that quantifies information loss more accurately than existing methods, achieving 81% concordance with human assessments. This metric effectively measures the presence and accuracy of objects, addressing the limitations of current distribution scores like CLIP's text-image similarities.	翻訳日:2024-11-05 06:26:14 公開日:2024-10-11
# 猫は猫だ(犬じゃない!):因果解析と埋め込み最適化によるテキスト・画像エンコーダの情報混合の解明 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization ( http://arxiv.org/abs/2410.00321v3 ) ライセンス: Link先を確認	Chieh-Yun Chen, Chiang Tseng, Li-Wu Tsao, Hong-Han Shuai,	(参考訳) 本稿では,テキスト・ツー・イメージ拡散モデル(T2I)のテキストエンコーダにおける因果的方法の影響を分析する。それまでの作業は、デノナイジングプロセスを通じて問題に対処することに集中してきた。しかしながら、テキストの埋め込みがT2Iモデルにどのように貢献するか、特に複数のオブジェクトを生成する場合についての研究は行われていない。本稿では,テキスト埋め込みの包括的分析について述べる。一テキストの埋め込みが生成された画像にどのように貢献するか及び二情報が失われた理由及び第一項の対象に偏りがあること。そこで本研究では, 安定拡散における情報収支の90.05%向上を図り, トレーニング不要な簡易かつ効果的なテキスト埋め込みバランス最適化手法を提案する。さらに,従来の手法よりも高精度に情報損失を定量化し,人的評価と81%の精度で情報損失を評価できる新しい自動評価指標を提案する。この測定基準は、CLIPのテキストイメージの類似性のような現在の分散スコアの制限に対処するため、オブジェクトの存在と精度を効果的に測定する。 This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing the issues through the denoising process. However, there is no research discussing how text embedding contributes to T2I models, especially when generating more than one object. In this paper, we share a comprehensive analysis of text embedding: i) how text embedding contributes to the generated images and ii) why information gets lost and biases towards the first-mentioned object. Accordingly, we propose a simple but effective text embedding balance optimization method, which is training-free, with an improvement of 90.05% on information balance in stable diffusion. Furthermore, we propose a new automatic evaluation metric that quantifies information loss more accurately than existing methods, achieving 81% concordance with human assessments. This metric effectively measures the presence and accuracy of objects, addressing the limitations of current distribution scores like CLIP's text-image similarities.	翻訳日:2024-11-05 06:26:14 公開日:2024-10-11
# 猫は猫だ(犬じゃない!):因果解析と埋め込み最適化によるテキスト・画像エンコーダの情報混合の解明 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization ( http://arxiv.org/abs/2410.00321v4 ) ライセンス: Link先を確認	Chieh-Yun Chen, Chiang Tseng, Li-Wu Tsao, Hong-Han Shuai,	(参考訳) 本稿では,テキスト・ツー・イメージ拡散モデル(T2I)のテキストエンコーダにおける因果的方法の影響を分析する。それまでの作業は、デノナイジングプロセスを通じて問題に対処することに集中してきた。しかしながら、テキストの埋め込みがT2Iモデルにどのように貢献するか、特に複数のオブジェクトを生成する場合についての研究は行われていない。本稿では,テキスト埋め込みの包括的分析について述べる。一テキストの埋め込みが生成された画像にどのように貢献するか及び二情報が失われた理由及び第一項の対象に偏りがあること。そこで本研究では, 安定拡散における情報収支の90.05%向上を図り, トレーニング不要な簡易かつ効果的なテキスト埋め込みバランス最適化手法を提案する。さらに,従来の手法よりも高精度に情報損失を定量化し,人的評価と81%の精度で情報損失を評価できる新しい自動評価指標を提案する。この測定基準は、CLIPのテキストイメージの類似性のような現在の分散スコアの制限に対処するため、オブジェクトの存在と精度を効果的に測定する。 This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing the issues through the denoising process. However, there is no research discussing how text embedding contributes to T2I models, especially when generating more than one object. In this paper, we share a comprehensive analysis of text embedding: i) how text embedding contributes to the generated images and ii) why information gets lost and biases towards the first-mentioned object. Accordingly, we propose a simple but effective text embedding balance optimization method, which is training-free, with an improvement of 90.05% on information balance in stable diffusion. Furthermore, we propose a new automatic evaluation metric that quantifies information loss more accurately than existing methods, achieving 81% concordance with human assessments. This metric effectively measures the presence and accuracy of objects, addressing the limitations of current distribution scores like CLIP's text-image similarities.	翻訳日:2024-11-05 06:26:14 公開日:2024-10-11
# Webエージェントにおける自己修復のためのマルチモーダルオートバリデーション Multimodal Auto Validation For Self-Refinement in Web Agents ( http://arxiv.org/abs/2410.00689v1 ) ライセンス: Link先を確認	Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish Jagmohan,	(参考訳) 私たちの世界がデジタル化するにつれ、複雑で単調なタスクを自動化できるWebエージェントがワークフローの合理化に欠かせないものになりつつある。本稿では,マルチモーダル検証と自己補充によるWebエージェントの性能向上手法を提案する。本稿では,Webエージェントの自動検証のための階層構造が,最先端のAgent-E Webオートメーションフレームワークを基盤として,様々なモダリティ(テキスト,ビジョン)の包括的研究を行う。我々はまた、Webエージェントがワークフローの失敗を検出し、自己修正することを可能にする自動バリケータを開発し、Web自動化のための自己修正機構も導入した。その結果,Agent-E(SOTA Webエージェント)の最先端性能が向上し,WebVoyagerベンチマークのサブセットでタスク補完率が76.2\%から81.24\%に向上した。本稿では,複雑な実世界のシナリオにおいて,より信頼性の高いディジタルアシスタントを実現する方法について述べる。 As our world digitizes, web agents that can automate complex and monotonous tasks are becoming essential in streamlining workflows. This paper introduces an approach to improving web agent performance through multi-modal validation and self-refinement. We present a comprehensive study of different modalities (text, vision) and the effect of hierarchy for the automatic validation of web agents, building upon the state-of-the-art Agent-E web automation framework. We also introduce a self-refinement mechanism for web automation, using the developed auto-validator, that enables web agents to detect and self-correct workflow failures. Our results show significant gains on Agent-E's (a SOTA web agent) prior state-of-art performance, boosting task-completion rates from 76.2\% to 81.24\% on the subset of the WebVoyager benchmark. The approach presented in this paper paves the way for more reliable digital assistants in complex, real-world scenarios.	翻訳日:2024-11-05 04:25:20 公開日:2024-10-11
# Webエージェントにおける自己修復のためのマルチモーダルオートバリデーション Multimodal Auto Validation For Self-Refinement in Web Agents ( http://arxiv.org/abs/2410.00689v2 ) ライセンス: Link先を確認	Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish Jagmohan,	(参考訳) 私たちの世界がデジタル化するにつれ、複雑で単調なタスクを自動化できるWebエージェントがワークフローの合理化に欠かせないものになりつつある。本稿では,マルチモーダル検証と自己補充によるWebエージェントの性能向上手法を提案する。本稿では,Webエージェントの自動検証のための階層構造が,最先端のAgent-E Webオートメーションフレームワークを基盤として,様々なモダリティ(テキスト,ビジョン)の包括的研究を行う。我々はまた、Webエージェントがワークフローの失敗を検出し、自己修正することを可能にする自動バリケータを開発し、Web自動化のための自己修正機構も導入した。その結果,Agent-E(SOTA Webエージェント)の最先端性能が向上し,WebVoyagerベンチマークのサブセットでタスク補完率が76.2\%から81.24\%に向上した。本稿では,複雑な実世界のシナリオにおいて,より信頼性の高いディジタルアシスタントを実現する方法について述べる。 As our world digitizes, web agents that can automate complex and monotonous tasks are becoming essential in streamlining workflows. This paper introduces an approach to improving web agent performance through multi-modal validation and self-refinement. We present a comprehensive study of different modalities (text, vision) and the effect of hierarchy for the automatic validation of web agents, building upon the state-of-the-art Agent-E web automation framework. We also introduce a self-refinement mechanism for web automation, using the developed auto-validator, that enables web agents to detect and self-correct workflow failures. Our results show significant gains on Agent-E's (a SOTA web agent) prior state-of-art performance, boosting task-completion rates from 76.2\% to 81.24\% on the subset of the WebVoyager benchmark. The approach presented in this paper paves the way for more reliable digital assistants in complex, real-world scenarios.	翻訳日:2024-11-05 04:25:20 公開日:2024-10-11
# 階層的テキスト分類の再検討:推論とメトリクス Revisiting Hierarchical Text Classification: Inference and Metrics ( http://arxiv.org/abs/2410.01305v1 ) ライセンス: Link先を確認	Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald,	(参考訳) 階層的テキスト分類(階層的テキスト分類、hierarchical text classification)は、階層として整理された構造化空間内のテキストにラベルを割り当てるタスクである。最近の研究は、HTCを従来のマルチラベル分類問題として扱い、そのように評価している。そこで我々は,具体的に設計された階層的指標に基づくモデルの評価を提案し,計量選択と予測推定法の複雑さを実証する。我々は、新しい挑戦的なデータセットを導入し、比較的最近の洗練されたモデルを評価し、それらを、理論上動機付けられた新しい損失を含む、単純だが強力なベースラインと比較する。最後に、これらのベースラインが最新のモデルと非常によく競合していることを示します。このことは、HTCの新しい方法を提案する際に、評価方法論を慎重に検討することの重要性を強調している。コードの実装とデータセットは \url{https://github.com/RomanPlaud/revisitingHTC} で公開されている。 Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{https://github.com/RomanPlaud/revisitingHTC}.	翻訳日:2024-11-04 21:59:16 公開日:2024-10-11
# 階層的テキスト分類の再検討:推論とメトリクス Revisiting Hierarchical Text Classification: Inference and Metrics ( http://arxiv.org/abs/2410.01305v2 ) ライセンス: Link先を確認	Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald,	(参考訳) 階層的テキスト分類(階層的テキスト分類、hierarchical text classification)は、階層として整理された構造化空間内のテキストにラベルを割り当てるタスクである。最近の研究は、HTCを従来のマルチラベル分類問題として扱い、そのように評価している。そこで我々は,具体的に設計された階層的指標に基づくモデルの評価を提案し,計量選択と予測推定法の複雑さを実証する。我々は、新しい挑戦的なデータセットを導入し、比較的最近の洗練されたモデルを評価し、それらを、理論上動機付けられた新しい損失を含む、単純だが強力なベースラインと比較する。最後に、これらのベースラインが最新のモデルと非常によく競合していることを示します。このことは、HTCの新しい方法を提案する際に、評価方法論を慎重に検討することの重要性を強調している。コードの実装とデータセットは \url{https://github.com/RomanPlaud/revisitingHTC} で公開されている。 Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{https://github.com/RomanPlaud/revisitingHTC}.	翻訳日:2024-11-04 21:59:16 公開日:2024-10-11
# KnobGen: スケッチベース拡散モデルにおけるアートワークの高度化制御 KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models ( http://arxiv.org/abs/2410.01595v1 ) ライセンス: Link先を確認	Pouyan Navard, Amin Karimi Monsefi, Mengxi Zhou, Wei-Lun Chao, Alper Yilmaz, Rajiv Ramnath,	(参考訳) 近年の拡散モデルではテキスト・ツー・イメージ(T2I)生成が大幅に改善されているが、細粒度精度と高レベル制御のバランスをとるのにしばしば苦労している。 ControlNetやT2I-Adapterのような手法は、調味されたアーティストによるスケッチに従うのに優れているが、過度に剛性があり、初心者のスケッチに意図しない欠陥を複製する傾向がある。一方、スケッチベースの抽象化フレームワークのような粗粒度の粗いメソッドは、よりアクセスしやすい入力処理を提供するが、詳細で専門的な使用に必要な正確な制御は欠いている。このような制約に対処するため,スケッチベースの画像生成を民主化し,スケッチの複雑さやユーザスキルの異なるレベルにシームレスに適応する,デュアルパスのフレームワークであるKnobGenを提案する。 KnobGenは、高レベルのセマンティクスにはCGC(Coarse-Grained Controller)モジュール、詳細な洗練にはFGC(Final-Grained Controller)モジュールを使用する。これら2つのモジュールの相対強度は、我々のノブ推論機構によって調整され、ユーザの特定のニーズに合わせることができる。これらのメカニズムにより、KnobGenは初心者スケッチと味付けアーティストによって描かれたスケッチの両方から、柔軟に画像を生成することができる。これは、MultiGen-20Mデータセットと新たに収集されたスケッチデータセットに示されているように、画像の自然な外観を維持しながら最終的な出力を制御する。 Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.	翻訳日:2024-11-04 16:44:34 公開日:2024-10-11
# KnobGen: スケッチベース拡散モデルにおけるアートワークの高度化制御 KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models ( http://arxiv.org/abs/2410.01595v2 ) ライセンス: Link先を確認	Pouyan Navard, Amin Karimi Monsefi, Mengxi Zhou, Wei-Lun Chao, Alper Yilmaz, Rajiv Ramnath,	(参考訳) 近年の拡散モデルではテキスト・ツー・イメージ(T2I)生成が大幅に改善されているが、細粒度精度と高レベル制御のバランスをとるのにしばしば苦労している。 ControlNetやT2I-Adapterのような手法は、調味されたアーティストによるスケッチに従うのに優れているが、過度に剛性があり、初心者のスケッチに意図しない欠陥を複製する傾向がある。一方、スケッチベースの抽象化フレームワークのような粗粒度の粗いメソッドは、よりアクセスしやすい入力処理を提供するが、詳細で専門的な使用に必要な正確な制御は欠いている。このような制約に対処するため,スケッチベースの画像生成を民主化し,スケッチの複雑さやユーザスキルの異なるレベルにシームレスに適応する,デュアルパスのフレームワークであるKnobGenを提案する。 KnobGenは、高レベルのセマンティクスにはCGC(Coarse-Grained Controller)モジュール、詳細な洗練にはFGC(Final-Grained Controller)モジュールを使用する。これら2つのモジュールの相対強度は、我々のノブ推論機構によって調整され、ユーザの特定のニーズに合わせることができる。これらのメカニズムにより、KnobGenは初心者スケッチと味付けアーティストによって描かれたスケッチの両方から、柔軟に画像を生成することができる。これは、MultiGen-20Mデータセットと新たに収集されたスケッチデータセットに示されているように、画像の自然な外観を維持しながら最終的な出力を制御する。 Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.	翻訳日:2024-11-04 16:44:34 公開日:2024-10-11
# 解釈可能なコントラスト型モンテカルロ木探索手法 Interpretable Contrastive Monte Carlo Tree Search Reasoning ( http://arxiv.org/abs/2410.01707v1 ) ライセンス: Link先を確認	Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen,	(参考訳) 大規模言語モデル(LLM)のための新しいMCTS推論アルゴリズムであるSC-MCTSを提案する。私たちのモチベーションは: 1. 従来のMCTS LLM推論作業は、CoTと比較して最大の欠点-スロースピードを見落としていることが多い。 2) 従来の研究は, LLM推論のツールとしてMCTSを主に用いており, 定量分析が限定的であったり, 解釈可能性の観点からその成分のアブレーション研究を行ったりしていた。 3)報奨モデルはMCTSにおいて最も重要な要素であるが,これまでの研究ではMCTSの報奨モデルの改良や詳細な研究はめったに行われていない。そこで我々は, LLMのMCTS推論性能に対する各成分の影響を明らかにするとともに, MCTSの成分に対する広範囲なアブレーション研究および定量的解析を行った。この上に建つ。一コントラスト復号の原理に基づく高度に解釈可能な報酬モデルを設計し、 (ii) は投機的復号法を用いて1ノードあたり51.9%の速度向上を達成した。また、 3) UCTノード選択戦略とバックプロパゲーションを改善した結果,性能が大幅に向上した。我々は,Llama-3.1-70BとSC-MCTSを用いたBlocksworldのマルチステップ推論データセットにおいて,平均17.4%でo1-miniを上回りました。 We propose SC-MCTS: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable reward model based on the principle of contrastive decoding and (ii) achieved an average speed improvement of 51.9% per node using speculative decoding. Additionally, (iii) we improved UCT node selection strategy and backpropagation used in previous works, resulting in significant performance improvement. We outperformed o1-mini by an average of 17.4% on the Blocksworld multi-step reasoning dataset using Llama-3.1-70B with SC-MCTS.	翻訳日:2024-11-04 15:53:34 公開日:2024-10-11
# 解釈可能なコントラスト型モンテカルロ木探索手法 Interpretable Contrastive Monte Carlo Tree Search Reasoning ( http://arxiv.org/abs/2410.01707v2 ) ライセンス: Link先を確認	Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen,	(参考訳) 大規模言語モデル(LLM)のための新しいMCTS推論アルゴリズムであるSC-MCTSを提案する。私たちのモチベーションは: 1. 従来のMCTS LLM推論作業は、CoTと比較して最大の欠点-スロースピードを見落としていることが多い。 2) 従来の研究は, LLM推論のツールとしてMCTSを主に用いており, 定量分析が限定的であったり, 解釈可能性の観点からその成分のアブレーション研究を行ったりしていた。 3)報奨モデルはMCTSにおいて最も重要な要素であるが,これまでの研究ではMCTSの報奨モデルの改良や詳細な研究はめったに行われていない。そこで我々は, LLMのMCTS推論性能に対する各成分の影響を明らかにするとともに, MCTSの成分に対する広範囲なアブレーション研究および定量的解析を行った。この上に建つ。一コントラスト復号の原理に基づく高度に解釈可能な報酬モデルを設計し、 (ii) は投機的復号法を用いて1ノードあたり51.9%の速度向上を達成した。また、 3) UCTノード選択戦略とバックプロパゲーションを改善した結果,性能が大幅に向上した。我々は,Llama-3.1-70BとSC-MCTSを用いたBlocksworldのマルチステップ推論データセットにおいて,平均17.4%でo1-miniを上回りました。私たちのコードは \url{https://github.com/zitian-gao/SC-MCTS} で利用可能です。 We propose SC-MCTS: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable reward model based on the principle of contrastive decoding and (ii) achieved an average speed improvement of 51.9% per node using speculative decoding. Additionally, (iii) we improved UCT node selection strategy and backpropagation used in previous works, resulting in significant performance improvement. We outperformed o1-mini by an average of 17.4% on the Blocksworld multi-step reasoning dataset using Llama-3.1-70B with SC-MCTS. Our code is available at \url{https://github.com/zitian-gao/SC-MCTS}.	翻訳日:2024-11-04 15:53:34 公開日:2024-10-11
# 経口摂取:RAGを併用したLLMからの幻覚を除去する Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG ( http://arxiv.org/abs/2410.02825v1 ) ライセンス: Link先を確認	Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert,	(参考訳) 本稿では, LLM と RAG によるプライバシープロセスの効率向上の可能性を秘めた新しい手法を提案する。幻覚を抑えるため,プライバシ固有の知識ベースでベースLLMモデルを継続的に事前訓練し,意味的なRAG層で拡張する。評価の結果,本手法は,不正確さを低減した事実情報による応答を基礎として,プライバシ関連クエリ処理におけるモデル性能(既定のLCMと比較して最大2倍のメトリクス)の向上を図っている。 This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.	翻訳日:2024-11-03 05:34:38 公開日:2024-10-11
# 経口摂取:RAGを併用したLLMからの幻覚を除去する Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG ( http://arxiv.org/abs/2410.02825v2 ) ライセンス: Link先を確認	Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert,	(参考訳) 本稿では, LLM と RAG によるプライバシープロセスの効率向上の可能性を秘めた新しい手法を提案する。幻覚を抑えるため,プライバシ固有の知識ベースでベースLLMモデルを継続的に事前訓練し,意味的なRAG層で拡張する。評価の結果,本手法は,不正確さを低減した事実情報による応答を基礎として,プライバシ関連クエリ処理におけるモデル性能(既定のLCMと比較して最大2倍のメトリクス)の向上を図っている。 This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.	翻訳日:2024-11-03 05:34:38 公開日:2024-10-11
# CAnDOIT: 時系列からの観測データと干渉データによる因果発見 CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v1 ) ライセンス: Link先を確認	Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,	(参考訳) 原因と効果の研究は科学の多くの分野において最重要であり、知的システムの多くの実践的応用にも重要である。特に、隠れ要因を含む状況における因果関係の同定は、因果モデルを構築するための観察データのみに依存する手法にとって大きな課題である。本稿では,観測時系列データと介入時系列データの両方を用いて因果関係モデルを再構成する因果関係探索手法であるCAnDOITを提案する。因果解析における介入データの使用は、シナリオが複雑であり、観測データだけでは正しい因果構造を明らかにするのに不十分な、ロボット工学のような現実世界の応用にとって不可欠である。この手法の検証は、まずランダムに生成された合成モデル上で行われ、その後、ロボット操作環境における因果構造学習のためのよく知られたベンチマークで行われる。実験により、アプローチは介入からのデータを効果的に処理し、それらを活用して因果解析の精度を高めることができることが示された。 CAnDOITのPython実装も開発され、GitHubで公開されている: https://github.com/lcastri/causalflow。 The study of cause-and-effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This paper proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time-series data. The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well-known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub: https://github.com/lcastri/causalflow.	翻訳日:2024-11-03 05:24:53 公開日:2024-10-11
# CAnDOIT: 時系列からの観測データと干渉データによる因果発見 CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v2 ) ライセンス: Link先を確認	Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,	(参考訳) 原因と効果の研究は科学の多くの分野において最重要であり、知的システムの多くの実践的応用にも重要である。特に、隠れ要因を含む状況における因果関係の同定は、因果モデルを構築するための観察データのみに依存する手法にとって大きな課題である。本稿では,観測時系列データと介入時系列データの両方を用いて因果関係モデルを再構成する因果関係探索手法であるCAnDOITを提案する。因果解析における介入データの使用は、シナリオが複雑であり、観測データだけでは正しい因果構造を明らかにするのに不十分な、ロボット工学のような現実世界の応用にとって不可欠である。この手法の検証は、まずランダムに生成された合成モデル上で行われ、その後、ロボット操作環境における因果構造学習のためのよく知られたベンチマークで行われる。実験により、アプローチは介入からのデータを効果的に処理し、それらを活用して因果解析の精度を高めることができることが示された。 CAnDOITのPython実装も開発され、GitHubで公開されている: https://github.com/lcastri/causalflow。 The study of cause-and-effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This paper proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time-series data. The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well-known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub: https://github.com/lcastri/causalflow.	翻訳日:2024-11-03 05:24:53 公開日:2024-10-11
# CAnDOIT: 時系列からの観測データと干渉データによる因果発見 CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v3 ) ライセンス: Link先を確認	Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,	(参考訳) 原因と効果の研究は科学の多くの分野において最重要であり、知的システムの多くの実践的応用にも重要である。特に、隠れ要因を含む状況における因果関係の同定は、因果モデルを構築するための観察データのみに依存する手法にとって大きな課題である。本稿では,観測時系列データと介入時系列データの両方を用いて因果関係モデルを再構成する因果関係探索手法であるCAnDOITを提案する。因果解析における介入データの使用は、シナリオが複雑であり、観測データだけでは正しい因果構造を明らかにするのに不十分な、ロボット工学のような現実世界の応用にとって不可欠である。この手法の検証は、まずランダムに生成された合成モデル上で行われ、その後、ロボット操作環境における因果構造学習のためのよく知られたベンチマークで行われる。実験により、アプローチは介入からのデータを効果的に処理し、それらを活用して因果解析の精度を高めることができることが示された。 CAnDOITのPython実装も開発され、GitHubで公開されている: https://github.com/lcastri/causalflow。 The study of cause-and-effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This paper proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time-series data. The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well-known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub: https://github.com/lcastri/causalflow.	翻訳日:2024-11-03 05:24:53 公開日:2024-10-11
# マイクロ波フォトニッククラスター状態の効率的なトモグラフィー Efficient tomography of microwave photonic cluster states ( http://arxiv.org/abs/2410.03345v1 ) ライセンス: Link先を確認	Yoshiki Sunada, Shingo Kono, Jesper Ilves, Takanori Sugiyama, Yasunari Suzuki, Tsuyoshi Okubo, Shuhei Tamate, Yutaka Tabuchi, Yasunobu Nakamura,	(参考訳) 多数の量子ビット間の絡み合いは多くの量子アルゴリズムにとって重要な資源である。このような多体状態は、光学領域やマイクロ波領域において、イテナントフォトニック量子ビットの連鎖を絡み合わせることで効率よく生成される。しかし、指数関数的に大きな密度行列を実験的に再構成することで生成した多体状態を完全に特徴づけることは依然として困難である。本稿では, 行列積演算式に基づく効率的なトモグラフィ法を開発し, 2^{35} \times 2^{35}$密度行列を再構成することにより, 最大35個のマイクロ波フォトニック量子ビットのクラスタ状態でこれを実証する。これにより,大規模なクラスタ状態を生成する場合にのみ発生する光子源の性能劣化を検出することができる。このトモグラフィー法は一般に、絡み合った量子ビットの様々な物理的実現に適用でき、絡み合った光子の高忠実度源の開発を導くための効率的なベンチマーク法を提供する。 Entanglement among a large number of qubits is a crucial resource for many quantum algorithms. Such many-body states have been efficiently generated by entangling a chain of itinerant photonic qubits in the optical or microwave domain. However, it has remained challenging to fully characterize the generated many-body states by experimentally reconstructing their exponentially large density matrices. Here, we develop an efficient tomography method based on the matrix-product-operator formalism and demonstrate it on a cluster state of up to 35 microwave photonic qubits by reconstructing its $2^{35} \times 2^{35}$ density matrix. The full characterization enables us to detect the performance degradation of our photon source which occurs only when generating a large cluster state. This tomography method is generally applicable to various physical realizations of entangled qubits and provides an efficient benchmarking method for guiding the development of high-fidelity sources of entangled photons.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-11
# マイクロ波フォトニッククラスター状態の効率的なトモグラフィー Efficient tomography of microwave photonic cluster states ( http://arxiv.org/abs/2410.03345v2 ) ライセンス: Link先を確認	Yoshiki Sunada, Shingo Kono, Jesper Ilves, Takanori Sugiyama, Yasunari Suzuki, Tsuyoshi Okubo, Shuhei Tamate, Yutaka Tabuchi, Yasunobu Nakamura,	(参考訳) 多数の量子ビット間の絡み合いは多くの量子アルゴリズムにとって重要な資源である。このような多体状態は、光学領域やマイクロ波領域において、イテナントフォトニック量子ビットの連鎖を絡み合わせることで効率よく生成される。しかし、指数関数的に大きな密度行列を実験的に再構成することで生成した多体状態を完全に特徴づけることは依然として困難である。本稿では, 行列積演算式に基づく効率的なトモグラフィ法を開発し, 2^{35} \times 2^{35}$密度行列を再構成することにより, 最大35個のマイクロ波フォトニック量子ビットのクラスタ状態でこれを実証する。これにより,大規模なクラスタ状態を生成する場合にのみ発生する光子源の性能劣化を検出することができる。このトモグラフィー法は一般に、絡み合った量子ビットの様々な物理的実現に適用でき、絡み合った光子の高忠実度源の開発を導くための効率的なベンチマーク法を提供する。 Entanglement among a large number of qubits is a crucial resource for many quantum algorithms. Such many-body states have been efficiently generated by entangling a chain of itinerant photonic qubits in the optical or microwave domain. However, it has remained challenging to fully characterize the generated many-body states by experimentally reconstructing their exponentially large density matrices. Here, we develop an efficient tomography method based on the matrix-product-operator formalism and demonstrate it on a cluster state of up to 35 microwave photonic qubits by reconstructing its $2^{35} \times 2^{35}$ density matrix. The full characterization enables us to detect the performance degradation of our photon source which occurs only when generating a large cluster state. This tomography method is generally applicable to various physical realizations of entangled qubits and provides an efficient benchmarking method for guiding the development of high-fidelity sources of entangled photons.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-11
# 大規模視線言語モデルにおける多言語間知識衝突の解き放つ Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models ( http://arxiv.org/abs/2410.03659v1 ) ライセンス: Link先を確認	Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen,	(参考訳) LVLM(Large Vision-Language Models)は、マルチモーダル入力をキャプチャし、推論する能力を示す。しかし、これらのモデルは、そのビジョンと言語構成要素の間の表現された知識の不整合から生じるパラメトリックな知識の矛盾を招きやすい。本稿では,$\textbf{cross-modality parametric knowledge conflict}$の問題を正式に定義し,それらを検出,解釈,緩和するための体系的なアプローチを提案する。モデルのサイズに関わらず,近年のLVLMにおけるモダリティ間のコンフリクトレートが持続的に高いことを示す,視覚的およびテキスト的回答間のコンフリクトを識別するパイプラインを導入する。さらに、これらの競合がどのように推論プロセスに干渉するかを考察し、競合するサンプルを他者から識別するための対照的な指標を提案する。これらの知見に基づいて,回答信頼度に基づく不確実性成分から推定される望ましくないロジットを除去する動的コントラスト復号法を開発した。ログを提供しないモデルに対しては、競合を緩和するための2つのプロンプトベースの戦略を導入します。提案手法は,ViQuAEデータセットとInfoSeekデータセットの両方において,有望な精度向上を実現する。具体的には、LLaVA-34Bを用いて、動的コントラストデコーディングにより平均2.24%の精度が向上する。 Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs. However, these models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components. In this paper, we formally define the problem of $\textbf{cross-modality parametric knowledge conflict}$ and present a systematic approach to detect, interpret, and mitigate them. We introduce a pipeline that identifies conflicts between visual and textual answers, showing a persistently high conflict rate across modalities in recent LVLMs regardless of the model size. We further investigate how these conflicts interfere with the inference process and propose a contrastive metric to discern the conflicting samples from the others. Building on these insights, we develop a novel dynamic contrastive decoding method that removes undesirable logits inferred from the less confident modality components based on answer confidence. For models that do not provide logits, we also introduce two prompt-based strategies to mitigate the conflicts. Our methods achieve promising improvements in accuracy on both the ViQuAE and InfoSeek datasets. Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2.24%.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-11
# 大規模視覚言語モデルにおけるモダリティ間の知識紛争の解き放つ Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models ( http://arxiv.org/abs/2410.03659v2 ) ライセンス: Link先を確認	Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen,	(参考訳) LVLM(Large Vision-Language Models)は、マルチモーダル入力をキャプチャし、推論する能力を示す。しかし、これらのモデルは、そのビジョンと言語構成要素の間の表現された知識の不整合から生じるパラメトリックな知識の矛盾を招きやすい。本稿では,$\textbf{cross-modality parametric knowledge conflict}$の問題を正式に定義し,それらを検出,解釈,緩和するための体系的なアプローチを提案する。モデルのサイズに関わらず,近年のLVLMにおけるモダリティ間のコンフリクトレートが持続的に高いことを示す,視覚的およびテキスト的回答間のコンフリクトを識別するパイプラインを導入する。さらに、これらの競合がどのように推論プロセスに干渉するかを考察し、競合するサンプルを他者から識別するための対照的な指標を提案する。これらの知見に基づいて,回答信頼度に基づく不確実性成分から推定される望ましくないロジットを除去する動的コントラスト復号法を開発した。ログを提供しないモデルに対しては、競合を緩和するための2つのプロンプトベースの戦略を導入します。提案手法は,ViQuAEデータセットとInfoSeekデータセットの両方において,有望な精度向上を実現する。具体的には、LLaVA-34Bを用いて、動的コントラストデコーディングにより平均2.24%の精度が向上する。 Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs. However, these models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components. In this paper, we formally define the problem of $\textbf{cross-modality parametric knowledge conflict}$ and present a systematic approach to detect, interpret, and mitigate them. We introduce a pipeline that identifies conflicts between visual and textual answers, showing a persistently high conflict rate across modalities in recent LVLMs regardless of the model size. We further investigate how these conflicts interfere with the inference process and propose a contrastive metric to discern the conflicting samples from the others. Building on these insights, we develop a novel dynamic contrastive decoding method that removes undesirable logits inferred from the less confident modality components based on answer confidence. For models that do not provide logits, we also introduce two prompt-based strategies to mitigate the conflicts. Our methods achieve promising improvements in accuracy on both the ViQuAE and InfoSeek datasets. Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2.24%.	翻訳日:2024-11-02 20:48:16 公開日:2024-10-11
# タブラルデータを用いた診断におけるグラディエントブースティング決定木 Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data ( http://arxiv.org/abs/2410.03705v1 ) ライセンス: Link先を確認	A. Yarkın Yıldız, Asli Kalayci,	(参考訳) 医学的診断は、正確な分類と治療の提供の観点から、医療分野において重要な課題である。正しい診断に基づいて、ほぼ正確な決定を下すことは、患者の生活そのものに影響を与え、正しく分類されていない場合、大惨事を引き起こす可能性がある。サポートベクタマシン(SVM)やロジスティックレグレッション、TabNetやTabTransformerといった最先端の表層深層学習(DL)メソッドなど、従来の機械学習(ML)が提案され、表層医学データセット上で使用されている。さらに、性能の向上、計算コストの低減、タスクの最適化の容易化などにより、近年ではアンサンブル法が使われている。それらは、いくつかの診断タスクにおいて、医療上の意思決定プロセスの成功という観点で、強力な代替手段を提供する。本研究では,XGBoost,CatBoost,LightGBMに着目し,アンサンブル手法,特にグラフデータを用いた医学分類タスクにおけるGBDTアルゴリズムの利点について検討した。実験では、GBDTメソッドが従来のMLやディープニューラルネットワークアーキテクチャよりも優れており、いくつかのベンチマーク表診断データセットよりも平均ランクが高いことが示されている。さらに、DLモデルに比べて計算能力ははるかに少なく、高い性能と低い複雑さの観点から最適な方法論を作成する。 Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.	翻訳日:2024-11-02 20:38:12 公開日:2024-10-11
# タブラルデータを用いた診断におけるグラディエントブースティング決定木 Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data ( http://arxiv.org/abs/2410.03705v2 ) ライセンス: Link先を確認	A. Yarkın Yıldız, Asli Kalayci,	(参考訳) 医学的診断は、正確な分類と治療の提供の観点から、医療分野において重要な課題である。正しい診断に基づいて、ほぼ正確な決定を下すことは、患者の生活そのものに影響を与え、正しく分類されていない場合、大惨事を引き起こす可能性がある。サポートベクタマシン(SVM)やロジスティックレグレッション、TabNetやTabTransformerといった最先端の表層深層学習(DL)メソッドなど、従来の機械学習(ML)が提案され、表層医学データセット上で使用されている。さらに、性能の向上、計算コストの低減、タスクの最適化の容易化などにより、近年ではアンサンブル法が使われている。それらは、いくつかの診断タスクにおいて、医療上の意思決定プロセスの成功という観点で、強力な代替手段を提供する。本研究では,XGBoost,CatBoost,LightGBMに着目し,アンサンブル手法,特にグラフデータを用いた医学分類タスクにおけるGBDTアルゴリズムの利点について検討した。実験では、GBDTメソッドが従来のMLやディープニューラルネットワークアーキテクチャよりも優れており、いくつかのベンチマーク表診断データセットよりも平均ランクが高いことが示されている。さらに、DLモデルに比べて計算能力ははるかに少なく、高い性能と低い複雑さの観点から最適な方法論を作成する。 Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.	翻訳日:2024-11-02 20:38:12 公開日:2024-10-11
# 教師なしの人間選好学習 Unsupervised Human Preference Learning ( http://arxiv.org/abs/2410.03731v2 ) ライセンス: Link先を確認	Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Dilek Hakkani Tur,	(参考訳) 大規模言語モデルは、印象的な推論能力を示すが、個々のユーザの好み情報がないため、パーソナライズされたコンテンツの提供に苦慮している。文脈内学習やパラメータ効率のよい微調整といった既存の手法は、個人の所有する小さな個人データセットを考えると、人間の嗜好の複雑さを捉えるには不十分である。本稿では,より大規模で訓練済みのモデルを指導する自然言語規則を生成するために,小パラメータモデルを選好エージェントとして活用し,効率的なパーソナライズを実現する手法を提案する。提案手法では, より大規模な基礎モデルの出力を誘導し, 大規模モデルの広範な知識と能力を活用しながら, 個人の好みに合わせたコンテンツを生成する。重要なのは、このパーソナライゼーションは、大きなモデルを微調整する必要がないことだ。メールや記事のデータセットによる実験結果から,本手法がベースラインのパーソナライズ手法を著しく上回ることを示した。基礎モデルをデータと計算効率のよい方法で個別の好みに適応させることにより、我々のアプローチは高度にパーソナライズされた言語モデルアプリケーションへの道を開く。 Large language models demonstrate impressive reasoning abilities but struggle to provide personalized content due to their lack of individual user preference information. Existing methods, such as in-context learning and parameter-efficient fine-tuning, fall short in capturing the complexity of human preferences, especially given the small, personal datasets individuals possess. In this paper, we propose a novel approach utilizing small parameter models as preference agents to generate natural language rules that guide a larger, pre-trained model, enabling efficient personalization. Our method involves a small, local "steering wheel" model that directs the outputs of a much larger foundation model, producing content tailored to an individual's preferences while leveraging the extensive knowledge and capabilities of the large model. Importantly, this personalization is achieved without the need to fine-tune the large model. Experimental results on email and article datasets, demonstrate that our technique significantly outperforms baseline personalization methods. By allowing foundation models to adapt to individual preferences in a data and compute-efficient manner, our approach paves the way for highly personalized language model applications.	翻訳日:2024-11-02 20:18:28 公開日:2024-10-11
# 教師なしの人間選好学習 Unsupervised Human Preference Learning ( http://arxiv.org/abs/2410.03731v3 ) ライセンス: Link先を確認	Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Dilek Hakkani-Tür,	(参考訳) 大規模言語モデルは、印象的な推論能力を示すが、個々のユーザの好み情報がないため、パーソナライズされたコンテンツの提供に苦慮している。文脈内学習やパラメータ効率のよい微調整といった既存の手法は、個人の所有する小さな個人データセットを考えると、人間の嗜好の複雑さを捉えるには不十分である。本稿では,より大規模で訓練済みのモデルを指導する自然言語規則を生成するために,小パラメータモデルを選好エージェントとして活用し,効率的なパーソナライズを実現する手法を提案する。提案手法では, より大規模な基礎モデルの出力を誘導し, 大規模モデルの広範な知識と能力を活用しながら, 個人の好みに合わせたコンテンツを生成する。重要なのは、このパーソナライゼーションは、大きなモデルを微調整する必要がないことだ。メールや記事のデータセットによる実験結果から,本手法がベースラインのパーソナライズ手法を著しく上回ることを示した。基礎モデルをデータと計算効率のよい方法で個別の好みに適応させることにより、我々のアプローチは高度にパーソナライズされた言語モデルアプリケーションへの道を開く。 Large language models demonstrate impressive reasoning abilities but struggle to provide personalized content due to their lack of individual user preference information. Existing methods, such as in-context learning and parameter-efficient fine-tuning, fall short in capturing the complexity of human preferences, especially given the small, personal datasets individuals possess. In this paper, we propose a novel approach utilizing small parameter models as preference agents to generate natural language rules that guide a larger, pre-trained model, enabling efficient personalization. Our method involves a small, local "steering wheel" model that directs the outputs of a much larger foundation model, producing content tailored to an individual's preferences while leveraging the extensive knowledge and capabilities of the large model. Importantly, this personalization is achieved without the need to fine-tune the large model. Experimental results on email and article datasets, demonstrate that our technique significantly outperforms baseline personalization methods. By allowing foundation models to adapt to individual preferences in a data and compute-efficient manner, our approach paves the way for highly personalized language model applications.	翻訳日:2024-11-02 20:18:28 公開日:2024-10-11
# 対向ロバスト性のための脳誘発正則化器 A Brain-Inspired Regularizer for Adversarial Robustness ( http://arxiv.org/abs/2410.03952v1 ) ライセンス: Link先を確認	Elie Attias, Cengiz Pehlevan, Dina Obeid,	(参考訳) 畳み込みニューラルネットワーク(CNN)は多くの視覚的タスクに優れるが、人間の目には知覚できないわずかな入力摂動に敏感であり、しばしばタスクの失敗をもたらす。近年の研究では、脳に似た表現を促進する正則化器を用いたCNNのトレーニングが、ニューラル記録を用いて、モデルロバスト性を改善することが示されている。しかしながら、ニューラルネットワークの使用要件は、これらの方法の有用性を厳しく制限する。ニューラル記録を必要とせずに、ニューラルレギュレータの計算機能を模倣する正規化器を開発することは可能か? 本研究では、Li et al (2019) で導入された神経正則化器を検査し、その基礎となる強度を抽出する。正規化器は神経表現類似性を用いており、画素類似性とも相関している。この発見に触発され,オリジナルの本質を保ちながら画像画素の類似性を用いて計算される新たな正則化器を導入し,ニューラル記録の必要性を排除した。我々の正規化方法が示す。 1) モデルロバスト性は, 各種データセットに対するブラックボックス攻撃の範囲に大きく向上した。 2) 計算コストが低く、元のデータセットにのみ依存する。我々の研究は、生物学的に動機付けられた損失関数が人工ニューラルネットワークの性能向上にどのように役立つかを探る。 Convolutional Neural Networks (CNNs) excel in many visual tasks, but they tend to be sensitive to slight input perturbations that are imperceptible to the human eye, often resulting in task failures. Recent studies indicate that training CNNs with regularizers that promote brain-like representations, using neural recordings, can improve model robustness. However, the requirement to use neural data severely restricts the utility of these methods. Is it possible to develop regularizers that mimic the computational function of neural regularizers without the need for neural recordings, thereby expanding the usability and effectiveness of these techniques? In this work, we inspect a neural regularizer introduced in Li et al. (2019) to extract its underlying strength. The regularizer uses neural representational similarities, which we find also correlate with pixel similarities. Motivated by this finding, we introduce a new regularizer that retains the essence of the original but is computed using image pixel similarities, eliminating the need for neural recordings. We show that our regularization method 1) significantly increases model robustness to a range of black box attacks on various datasets and 2) is computationally inexpensive and relies only on original datasets. Our work explores how biologically motivated loss functions can be used to drive the performance of artificial neural networks.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-11
# 対向ロバスト性のための脳誘発正則化器 A Brain-Inspired Regularizer for Adversarial Robustness ( http://arxiv.org/abs/2410.03952v2 ) ライセンス: Link先を確認	Elie Attias, Cengiz Pehlevan, Dina Obeid,	(参考訳) 畳み込みニューラルネットワーク(CNN)は多くの視覚的タスクに優れるが、人間の目には知覚できないわずかな入力摂動に敏感であり、しばしばタスクの失敗をもたらす。近年の研究では、脳に似た表現を促進する正則化器を用いたCNNのトレーニングが、ニューラル記録を用いて、モデルロバスト性を改善することが示されている。しかしながら、ニューラルネットワークの使用要件は、これらの方法の有用性を厳しく制限する。ニューラル記録を必要とせずに、ニューラルレギュレータの計算機能を模倣する正規化器を開発することは可能か? 本研究では、Li et al (2019) で導入された神経正則化器を検査し、その基礎となる強度を抽出する。正規化器は神経表現類似性を用いており、画素類似性とも相関している。この発見に触発され,オリジナルの本質を保ちながら画像画素の類似性を用いて計算される新たな正則化器を導入し,ニューラル記録の必要性を排除した。我々の正規化方法が示す。 1) モデルロバスト性は, 各種データセットに対するブラックボックス攻撃の範囲に大きく向上した。 2) 計算コストが低く、元のデータセットにのみ依存する。我々の研究は、生物学的に動機付けられた損失関数が人工ニューラルネットワークの性能向上にどのように役立つかを探る。 Convolutional Neural Networks (CNNs) excel in many visual tasks, but they tend to be sensitive to slight input perturbations that are imperceptible to the human eye, often resulting in task failures. Recent studies indicate that training CNNs with regularizers that promote brain-like representations, using neural recordings, can improve model robustness. However, the requirement to use neural data severely restricts the utility of these methods. Is it possible to develop regularizers that mimic the computational function of neural regularizers without the need for neural recordings, thereby expanding the usability and effectiveness of these techniques? In this work, we inspect a neural regularizer introduced in Li et al. (2019) to extract its underlying strength. The regularizer uses neural representational similarities, which we find also correlate with pixel similarities. Motivated by this finding, we introduce a new regularizer that retains the essence of the original but is computed using image pixel similarities, eliminating the need for neural recordings. We show that our regularization method 1) significantly increases model robustness to a range of black box attacks on various datasets and 2) is computationally inexpensive and relies only on original datasets. Our work explores how biologically motivated loss functions can be used to drive the performance of artificial neural networks.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-11
# EIP-4844の180日後:小さなロールアップで溶剤ジレンマを共有するか? 180 Days After EIP-4844: Will Blob Sharing Solve Dilemma for Small Rollups? ( http://arxiv.org/abs/2410.04111v1 ) ライセンス: Link先を確認	Suhyeon Lee,	(参考訳) EIP-4844によるブロブの導入により、Ethereum上のロールアップに対するデータアベイラビリティ(DA)コストが大幅に削減された。しかし、128KBのブロブの固定サイズのため、データスループットの低いロールアップはジレンマに直面している。複数のロールアップがひとつのブロブを共有するブロブ共有は、この問題の解決策として提案されている。本稿では,EIP-4844の実装から約6ヶ月後に収集した実世界データに基づくブロブ共有の有効性について検討する。簡単なブロブ共有形式を用いてコスト変化をシミュレートすることにより、ブロブ共有が小規模ロールアップのコストとDAサービス品質を大幅に改善し、ジレンマを効果的に解消できることを実証する。特に, ブロブシェアリングによるブロブベース手数料の平滑化効果に起因して, 多くのロールアップにおいてUSDのコスト削減が90%を超えることが観察された。 The introduction of blobs through EIP-4844 has significantly reduced the Data Availability (DA) costs for rollups on Ethereum. However, due to the fixed size of blobs at 128 KB, rollups with low data throughput face a dilemma: they either use blobs inefficiently or decrease the frequency of DA submissions. Blob sharing, where multiple rollups share a single blob, has been proposed as a solution to this problem. This paper examines the effectiveness of blob sharing based on real-world data collected approximately six months after the implementation of EIP-4844. By simulating cost changes using a simple blob sharing format, we demonstrate that blob sharing can substantially improve the costs and DA service quality for small rollups, effectively resolving their dilemma. Notably, we observed cost reductions in USD exceeding 90% for most of the rollups when they cooperate, attributable to the smoothing effect of the blob base fee achieved through blob sharing.	翻訳日:2024-11-02 14:01:04 公開日:2024-10-11
# EIP-4844の180日後:小さなロールアップで溶剤ジレンマを共有するか? 180 Days After EIP-4844: Will Blob Sharing Solve Dilemma for Small Rollups? ( http://arxiv.org/abs/2410.04111v2 ) ライセンス: Link先を確認	Suhyeon Lee,	(参考訳) EIP-4844によるブロブの導入により、Ethereum上のロールアップに対するデータアベイラビリティ(DA)コストが大幅に削減された。しかし、128KBのブロブの固定サイズのため、データスループットの低いロールアップはジレンマに直面している。複数のロールアップがひとつのブロブを共有するブロブ共有は、この問題の解決策として提案されている。本稿では,EIP-4844の実装から約6ヶ月後に収集した実世界データに基づくブロブ共有の有効性について検討する。簡単なブロブ共有形式を用いてコスト変化をシミュレートすることにより、ブロブ共有が小規模ロールアップのコストとDAサービス品質を大幅に改善し、ジレンマを効果的に解消できることを実証する。特に, ブロブシェアリングによるブロブベース手数料の平滑化効果に起因して, 多くのロールアップにおいてUSDのコスト削減が85%を超えることが確認された。 The introduction of blobs through EIP-4844 has significantly reduced the Data Availability (DA) costs for rollups on Ethereum. However, due to the fixed size of blobs at 128 KB, rollups with low data throughput face a dilemma: they either use blobs inefficiently or decrease the frequency of DA submissions. Blob sharing, where multiple rollups share a single blob, has been proposed as a solution to this problem. This paper examines the effectiveness of blob sharing based on real-world data collected approximately six months after the implementation of EIP-4844. By simulating cost changes using a simple blob sharing format, we demonstrate that blob sharing can substantially improve the costs and DA service quality for small rollups, effectively resolving their dilemma. Notably, we observed cost reductions in USD exceeding 85% for most of the rollups when they cooperate, attributable to the smoothing effect of the blob base fee achieved through blob sharing.	翻訳日:2024-11-02 14:01:04 公開日:2024-10-11
# ブラックボックスと外見ガラス:ディープネットワークにおけるマルチレベル対称性,反射面,凸最適化 Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks ( http://arxiv.org/abs/2410.04279v1 ) ライセンス: Link先を確認	Emi Zeger, Mert Pilanci,	(参考訳) 絶対値アクティベーションと任意の入力次元を持つディープニューラルネットワーク(DNN)のトレーニングは,幾何代数を用いて表現された新しい特徴を持つ等価凸ラッソ問題として定式化可能であることを示す。この定式化は、ニューラルネットワークの対称性をコードする幾何学的構造を明らかにする。 DNNの等価なラッソ形式を用いて、我々は、ディープネットワークと浅層ネットワークの根本的な区別を正式に証明する:ディープネットワークは本来、それらの適合する関数において対称構造を好んでおり、より深いディープネットワークは、マルチレベル対称性、すなわち対称性内での対称性を可能にする。さらに、ラッソの特徴は、訓練点を越えて反射される超平面への距離を表す。これらの反射超平面は、トレーニングデータによって分散され、最適な重みベクトルに直交する。大規模言語モデルによる埋め込みを用いた学習ネットワークにおける理論支援と理論的予測特徴の実証実験 We show that training deep neural networks (DNNs) with absolute value activation and arbitrary input dimension can be formulated as equivalent convex Lasso problems with novel features expressed using geometric algebra. This formulation reveals geometric structures encoding symmetry in neural networks. Using the equivalent Lasso form of DNNs, we formally prove a fundamental distinction between deep and shallow networks: deep networks inherently favor symmetric structures in their fitted functions, with greater depth enabling multilevel symmetries, i.e., symmetries within symmetries. Moreover, Lasso features represent distances to hyperplanes that are reflected across training points. These reflection hyperplanes are spanned by training data and are orthogonal to optimal weight vectors. Numerical experiments support theory and demonstrate theoretically predicted features when training networks using embeddings generated by Large Language Models.	翻訳日:2024-11-02 08:39:47 公開日:2024-10-11
# ブラックボックスと外見ガラス:ディープネットワークにおけるマルチレベル対称性,反射面,凸最適化 Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks ( http://arxiv.org/abs/2410.04279v2 ) ライセンス: Link先を確認	Emi Zeger, Mert Pilanci,	(参考訳) 絶対値アクティベーションと任意の入力次元を持つディープニューラルネットワーク(DNN)のトレーニングは,幾何代数を用いて表現された新しい特徴を持つ等価凸ラッソ問題として定式化可能であることを示す。この定式化は、ニューラルネットワークの対称性をコードする幾何学的構造を明らかにする。 DNNの等価なラッソ形式を用いて、我々は、ディープネットワークと浅層ネットワークの根本的な区別を正式に証明する:ディープネットワークは本来、それらの適合する関数において対称構造を好んでおり、より深いディープネットワークは、マルチレベル対称性、すなわち対称性内での対称性を可能にする。さらに、ラッソの特徴は、訓練点を越えて反射される超平面への距離を表す。これらの反射超平面は、トレーニングデータによって分散され、最適な重みベクトルに直交する。大規模言語モデルによる埋め込みを用いた学習ネットワークにおける理論支援と理論的予測特徴の実証実験 We show that training deep neural networks (DNNs) with absolute value activation and arbitrary input dimension can be formulated as equivalent convex Lasso problems with novel features expressed using geometric algebra. This formulation reveals geometric structures encoding symmetry in neural networks. Using the equivalent Lasso form of DNNs, we formally prove a fundamental distinction between deep and shallow networks: deep networks inherently favor symmetric structures in their fitted functions, with greater depth enabling multilevel symmetries, i.e., symmetries within symmetries. Moreover, Lasso features represent distances to hyperplanes that are reflected across training points. These reflection hyperplanes are spanned by training data and are orthogonal to optimal weight vectors. Numerical experiments support theory and demonstrate theoretically predicted features when training networks using embeddings generated by Large Language Models.	翻訳日:2024-11-02 08:39:47 公開日:2024-10-11
# 画像生成のための地域プリミティブの分散化 Disentangling Regional Primitives for Image Generation ( http://arxiv.org/abs/2410.04421v1 ) ライセンス: Link先を確認	Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang,	(参考訳) 本稿では,画像生成のためのニューラルネットワークの内部表現構造を説明する手法を提案する。具体的には、ニューラルネットワークの中間層の特徴からプリミティブな特徴成分を分離し、各特徴成分が特定の画像領域を生成するためにのみ使用されることを保証する。このようにして、画像全体の生成は、様々なプリエンコードされた原始的地域パターンの重ね合わせと見なすことができ、それぞれが特徴成分によって生成される。特徴成分は、ニューラルネットワークによって符号化された異なる画像領域を生成する要求の間のOR関係として表現できる。したがって、Harsanyi相互作用を拡張してそのようなOR相互作用を表現し、特徴成分をアンタングルする。実験では、各特徴成分と特定の画像領域の生成との明確な対応を示す。 This paper presents a method to explain the internal representation structure of a neural network for image generation. Specifically, our method disentangles primitive feature components from the intermediate-layer feature of the neural network, which ensures that each feature component is exclusively used to generate a specific set of image regions. In this way, the generation of the entire image can be considered as the superposition of different pre-encoded primitive regional patterns, each being generated by a feature component. We find that the feature component can be represented as an OR relationship between the demands for generating different image regions, which is encoded by the neural network. Therefore, we extend the Harsanyi interaction to represent such an OR interaction to disentangle the feature component. Experiments show a clear correspondence between each feature component and the generation of specific image regions.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-11
# 画像生成のための地域プリミティブの分散化 Disentangling Regional Primitives for Image Generation ( http://arxiv.org/abs/2410.04421v2 ) ライセンス: Link先を確認	Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang,	(参考訳) 本稿では,画像生成のためのニューラルネットワークの内部表現構造を説明する手法を提案する。具体的には、ニューラルネットワークの中間層の特徴からプリミティブな特徴成分を分離し、各特徴成分が特定の画像領域を生成するためにのみ使用されることを保証する。このようにして、画像全体の生成は、様々なプリエンコードされた原始的地域パターンの重ね合わせと見なすことができ、それぞれが特徴成分によって生成される。特徴成分は、ニューラルネットワークによって符号化された異なる画像領域を生成する要求の間のOR関係として表現できる。したがって、Harsanyi相互作用を拡張してそのようなOR相互作用を表現し、特徴成分をアンタングルする。実験では、各特徴成分と特定の画像領域の生成との明確な対応を示す。 This paper presents a method to explain the internal representation structure of a neural network for image generation. Specifically, our method disentangles primitive feature components from the intermediate-layer feature of the neural network, which ensures that each feature component is exclusively used to generate a specific set of image regions. In this way, the generation of the entire image can be considered as the superposition of different pre-encoded primitive regional patterns, each being generated by a feature component. We find that the feature component can be represented as an OR relationship between the demands for generating different image regions, which is encoded by the neural network. Therefore, we extend the Harsanyi interaction to represent such an OR interaction to disentangle the feature component. Experiments show a clear correspondence between each feature component and the generation of specific image regions.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-11
# ニューラルネットワークにおける冗長計算ブロックの検出と近似 Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v1 ) ライセンス: Link先を確認	Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,	(参考訳) ディープニューラルネットワークはしばしば、異なるモデルとそれぞれの層の両方で、同様の内部表現を学習する。ネットワーク間の類似性は、モデルの縫合やマージといった技術を可能にする一方で、ネットワーク内の類似性は、より効率的なアーキテクチャを設計する新たな機会を提供する。本稿では、様々なニューラルネットワークアーキテクチャにおいて、これらの内部的類似性の出現について検討し、その類似性パターンが使用するデータセットから独立して現れることを示す。冗長ブロックを検出するための単純なメトリックであるBlock Redundancyを導入し、将来のアーキテクチャ最適化手法の基礎を提供する。これに基づいて,より単純な変換を用いて1つ以上の冗長な計算ブロックを特定し,近似する一般的なフレームワークである冗長ブロック近似(RBA)を提案する。 2つの表現間の変換 $\mathcal{T}$ がクローズド形式で効率的に計算できることを示し、ネットワークから冗長ブロックを置き換えるのに十分である。 RBAは、優れたパフォーマンスを維持しながら、モデルパラメータと時間の複雑さを減らす。我々は,事前学習された基礎モデルとデータセットを用いて,視覚領域における分類タスクの検証を行った。 Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.	翻訳日:2024-11-02 01:07:35 公開日:2024-10-11
# ニューラルネットワークにおける冗長計算ブロックの検出と近似 Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v2 ) ライセンス: Link先を確認	Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,	(参考訳) ディープニューラルネットワークはしばしば、異なるモデルとそれぞれの層の両方で、同様の内部表現を学習する。ネットワーク間の類似性は、モデルの縫合やマージといった技術を可能にする一方で、ネットワーク内の類似性は、より効率的なアーキテクチャを設計する新たな機会を提供する。本稿では、様々なニューラルネットワークアーキテクチャにおいて、これらの内部的類似性の出現について検討し、その類似性パターンが使用するデータセットから独立して現れることを示す。冗長ブロックを検出するための単純なメトリックであるBlock Redundancyを導入し、将来のアーキテクチャ最適化手法の基礎を提供する。これに基づいて,より単純な変換を用いて1つ以上の冗長な計算ブロックを特定し,近似する一般的なフレームワークである冗長ブロック近似(RBA)を提案する。 2つの表現間の変換 $\mathcal{T}$ がクローズド形式で効率的に計算できることを示し、ネットワークから冗長ブロックを置き換えるのに十分である。 RBAは、優れたパフォーマンスを維持しながら、モデルパラメータと時間の複雑さを減らす。我々は,事前学習された基礎モデルとデータセットを用いて,視覚領域における分類タスクの検証を行った。 Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.	翻訳日:2024-11-02 01:07:35 公開日:2024-10-11
# ニューラルネットワークにおける冗長計算ブロックの検出と近似 Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v3 ) ライセンス: Link先を確認	Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,	(参考訳) ディープニューラルネットワークはしばしば、異なるモデルとそれぞれの層の両方で、同様の内部表現を学習する。ネットワーク間の類似性は、モデルの縫合やマージといった技術を可能にする一方で、ネットワーク内の類似性は、より効率的なアーキテクチャを設計する新たな機会を提供する。本稿では、様々なニューラルネットワークアーキテクチャにおいて、これらの内部的類似性の出現について検討し、その類似性パターンが使用するデータセットから独立して現れることを示す。冗長ブロックを検出するための単純なメトリックであるBlock Redundancyを導入し、将来のアーキテクチャ最適化手法の基礎を提供する。これに基づいて,より単純な変換を用いて1つ以上の冗長な計算ブロックを特定し,近似する一般的なフレームワークである冗長ブロック近似(RBA)を提案する。 2つの表現間の変換 $\mathcal{T}$ がクローズド形式で効率的に計算できることを示し、ネットワークから冗長ブロックを置き換えるのに十分である。 RBAは、優れたパフォーマンスを維持しながら、モデルパラメータと時間の複雑さを減らす。我々は,事前学習された基礎モデルとデータセットを用いて,視覚領域における分類タスクの検証を行った。 Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.	翻訳日:2024-11-02 01:07:35 公開日:2024-10-11
# ニューラルネットワークにおける冗長計算ブロックの検出と近似 Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v4 ) ライセンス: Link先を確認	Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,	(参考訳) ディープニューラルネットワークはしばしば、異なるモデルとそれぞれの層の両方で、同様の内部表現を学習する。ネットワーク間の類似性は、モデルの縫合やマージといった技術を可能にする一方で、ネットワーク内の類似性は、より効率的なアーキテクチャを設計する新たな機会を提供する。本稿では、様々なニューラルネットワークアーキテクチャにおいて、これらの内部的類似性の出現について検討し、その類似性パターンが使用するデータセットから独立して現れることを示す。冗長ブロックを検出するための単純なメトリックであるBlock Redundancyを導入し、将来のアーキテクチャ最適化手法の基礎を提供する。これに基づいて,より単純な変換を用いて1つ以上の冗長な計算ブロックを特定し,近似する一般的なフレームワークである冗長ブロック近似(RBA)を提案する。 2つの表現間の変換 $\mathcal{T}$ がクローズド形式で効率的に計算できることを示し、ネットワークから冗長ブロックを置き換えるのに十分である。 RBAは、優れたパフォーマンスを維持しながら、モデルパラメータと時間の複雑さを減らす。我々は,事前学習された基礎モデルとデータセットを用いて,視覚領域における分類タスクの検証を行った。 Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.	翻訳日:2024-11-02 01:07:35 公開日:2024-10-11
# Agnostic Smoothed Online Learning Agnostic Smoothed Online Learning ( http://arxiv.org/abs/2410.05124v1 ) ライセンス: Link先を確認	Moïse Blanchard,	(参考訳) 統計学習における古典的な結果は、2つの極端なデータ生成モデルを考えるのが一般的である。これらのモデルのギャップを埋めるために、最近の研究は滑らかなフレームワークを導入し、各イテレーションにおいて、ある固定基底測度$\mu$に対して$\sigma^{-1}$で束縛された密度を持つような分布から敵がインスタンスを生成する。このフレームワークは、$\sigma$ の値に依存する i.i.d. と逆の場合を補間する。古典的オンライン予測問題において、スムーズなオンライン学習の先行結果は、学習者に対して、PAC学習や一貫性文学における標準設定と対照的に、基本測度$\mu$が知られているという間違いなく強い仮定に依存している。基本測度が未知であり、値が任意である一般的な不可知問題を考える。この方向に沿って、Blockらは、経験的リスク最小化は、明確に定義された仮定の下で、サブリニアな後悔であることを示した。本稿では,再帰的被覆に基づくR-Coverアルゴリズムを提案する。分類に関して、R-Cover は VC 次元 $d$ を持つ函数クラスに対して、適応的後悔 $\tilde O(\sqrt{dT/\sigma})$ を持つことを証明している。回帰に関して、R-Cover は多項式脂肪散乱次元成長を持つ関数類に対して、サブ線形な後悔を持つことを確かめる。 Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.	翻訳日:2024-11-02 00:08:45 公開日:2024-10-11
# Agnostic Smoothed Online Learning Agnostic Smoothed Online Learning ( http://arxiv.org/abs/2410.05124v2 ) ライセンス: Link先を確認	Moïse Blanchard,	(参考訳) 統計学習における古典的な結果は、2つの極端なデータ生成モデルを考えるのが一般的である。これらのモデルのギャップを埋めるために、最近の研究は滑らかなフレームワークを導入し、各イテレーションにおいて、ある固定基底測度$\mu$に対して$\sigma^{-1}$で束縛された密度を持つような分布から敵がインスタンスを生成する。このフレームワークは、$\sigma$ の値に依存する i.i.d. と逆の場合を補間する。古典的オンライン予測問題において、スムーズなオンライン学習の先行結果は、学習者に対して、PAC学習や一貫性文学における標準設定と対照的に、基本測度$\mu$が知られているという間違いなく強い仮定に依存している。基本測度が未知であり、値が任意である一般的な不可知問題を考える。この方向に沿って、Blockらは、経験的リスク最小化は、明確に定義された仮定の下で、サブリニアな後悔であることを示した。本稿では,再帰的被覆に基づくR-Coverアルゴリズムを提案する。分類に関して、R-Cover は VC 次元 $d$ を持つ函数クラスに対して、適応的後悔 $\tilde O(\sqrt{dT/\sigma})$ を持つことを証明している。回帰に関して、R-Cover は多項式脂肪散乱次元成長を持つ関数類に対して、サブ線形な後悔を持つことを確かめる。 Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.	翻訳日:2024-11-02 00:08:45 公開日:2024-10-11
# VLM2Vec:大規模マルチモーダル埋め込みタスクのためのビジョンランゲージモデルの訓練 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks ( http://arxiv.org/abs/2410.05160v1 ) ライセンス: Link先を確認	Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen,	(参考訳) 埋め込みモデルは、セマンティックな類似性、情報検索、クラスタリングなど、さまざまな下流タスクを可能にする上で重要である。近年,タスク(例えばMTEB)をまたいで一般化可能なユニバーサルテキスト埋め込みモデルの開発への関心が高まっている。しかし, 汎用マルチモーダル埋め込みモデルの学習の進展は, その重要性にもかかわらず比較的遅かった。本研究では,幅広い下流タスクを扱える普遍的な埋め込み構築の可能性を探究する。 1 MMEB(Massive Multimodal Embedding Benchmark)は、4 つのメタタスク(分類、視覚的質問応答、マルチモーダル検索、視覚的グラウンド)と36 つのデータセット(20 のトレーニングと16 の評価データセットを含む)と、2 の VLM2Vec (Vision-Language Model -> Vector) を含む。 CLIPやBLIPのような以前のモデルとは異なり、VLM2Vecは画像とテキストの組み合わせを処理してタスク命令に基づいた固定次元ベクトルを生成することができる。我々は,Phi-3.5-V上に一連のVLM2Vecモデルを構築し,MMEBの評価分割に基づいて評価する。以上の結果から,MMEBにおける既存マルチモーダル埋め込みモデルとアウト・オブ・ディストリビューションデータセットの双方において,モデルが10%から20%の絶対的な平均的改善を達成できることが示唆された。 Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding models that can generalize across tasks (e.g., MTEB). However, progress in learning universal multimodal embedding models has been relatively slow despite their importance. In this work, we aim to explore the potential for building universal embeddings capable of handling a wide range of downstream tasks. Our contributions are twofold: (1) MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training and 16 evaluation datasets, and (2) VLM2Vec (Vision-Language Model -> Vector), a contrastive training framework that converts any state-of-the-art vision-language model into an embedding model via training on MMEB. Unlike previous models such as CLIP and BLIP, VLM2Vec can process any combination of images and text to generate a fixed-dimensional vector based on task instructions. We build a series of VLM2Vec models on Phi-3.5-V and evaluate them on MMEB's evaluation split. Our results show that \model achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models on both in-distribution and out-of-distribution datasets in MMEB.	翻訳日:2024-11-01 23:58:57 公開日:2024-10-11
# VLM2Vec:大規模マルチモーダル埋め込みタスクのためのビジョンランゲージモデルの訓練 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks ( http://arxiv.org/abs/2410.05160v2 ) ライセンス: Link先を確認	Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen,	(参考訳) 埋め込みモデルは、セマンティックな類似性、情報検索、クラスタリングなど、さまざまな下流タスクを可能にする上で重要である。近年,タスク(例えばMTEB)をまたいで一般化可能なユニバーサルテキスト埋め込みモデルの開発への関心が高まっている。しかし, 汎用マルチモーダル埋め込みモデルの学習の進展は, その重要性にもかかわらず比較的遅かった。本研究では,幅広い下流タスクを扱える普遍的な埋め込み構築の可能性を探究する。 1 MMEB(Massive Multimodal Embedding Benchmark)は、4 つのメタタスク(分類、視覚的質問応答、マルチモーダル検索、視覚的グラウンド)と36 つのデータセット(20 のトレーニングと16 の評価データセットを含む)と、2 の VLM2Vec (Vision-Language Model -> Vector) を含む。 CLIPやBLIPのような以前のモデルとは異なり、VLM2Vecは画像とテキストの組み合わせを処理してタスク命令に基づいた固定次元ベクトルを生成することができる。我々は,Phi-3.5-V上に一連のVLM2Vecモデルを構築し,MMEBの評価分割に基づいて評価する。以上の結果から,VLM2Vecは,MMEBにおける既存のマルチモーダル埋め込みモデルよりも10%から20%の絶対的な平均的改善を実現していることがわかった。 Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding models that can generalize across tasks (e.g., MTEB). However, progress in learning universal multimodal embedding models has been relatively slow despite their importance. In this work, we aim to explore the potential for building universal embeddings capable of handling a wide range of downstream tasks. Our contributions are twofold: (1) MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training and 16 evaluation datasets, and (2) VLM2Vec (Vision-Language Model -> Vector), a contrastive training framework that converts any state-of-the-art vision-language model into an embedding model via training on MMEB. Unlike previous models such as CLIP and BLIP, VLM2Vec can process any combination of images and text to generate a fixed-dimensional vector based on task instructions. We build a series of VLM2Vec models on Phi-3.5-V and evaluate them on MMEB's evaluation split. Our results show that VLM2Vec achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models on both in-distribution and out-of-distribution datasets in MMEB.	翻訳日:2024-11-01 23:58:57 公開日:2024-10-11
# マルチモーダル連続学習の最近の進歩:包括的調査 Recent Advances of Multimodal Continual Learning: A Comprehensive Survey ( http://arxiv.org/abs/2410.05352v1 ) ライセンス: Link先を確認	Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King,	(参考訳) 継続学習(CL)は、機械学習モデルに新しいデータから継続的に学習する権限を付与することを目的としている。機械学習モデルは、小規模から大規模に事前訓練されたアーキテクチャへと進化し、また、非モーダルデータからマルチモーダルデータへのサポートから、近年、マルチモーダル連続学習(MMCL)手法が出現している。 MMCLの最大の課題は、単純で単調なCLメソッドの積み重ねを超えることである。本研究はMMCLに関する総合的な調査である。本研究は,MMCL手法の構造的分類だけでなく,基本的な背景知識とMMCL設定を提供する。我々は,既存のMMCLメソッドを,正規化ベース,アーキテクチャベース,リプレイベース,プロンプトベースという4つのカテゴリに分類し,その方法論を説明し,重要なイノベーションを強調した。さらに,この分野でのさらなる研究を促進するため,オープンなMMCLデータセットとベンチマークを要約し,今後の研究・開発に向けたいくつかの今後の方向性について論じる。関連するMMCL論文やオープンリソースをインデックスするGitHubリポジトリも、https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learningで公開しています。 Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As machine learning models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary challenge of MMCL is that it goes beyond a simple stacking of unimodal CL methods, as such straightforward approaches often yield unsatisfactory performance. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize existing MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, and discuss several promising future directions for investigation and development. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.	翻訳日:2024-11-01 19:17:28 公開日:2024-10-11
# マルチモーダル連続学習の最近の進歩:包括的調査 Recent Advances of Multimodal Continual Learning: A Comprehensive Survey ( http://arxiv.org/abs/2410.05352v2 ) ライセンス: Link先を確認	Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King,	(参考訳) 継続学習(CL)は、機械学習モデルに新しいデータから継続的に学習する権限を付与することを目的としている。機械学習モデルは、小規模から大規模に事前訓練されたアーキテクチャへと進化し、また、非モーダルデータからマルチモーダルデータへのサポートから、近年、マルチモーダル連続学習(MMCL)手法が出現している。 MMCLの最大の課題は、単純で単調なCLメソッドの積み重ねを超えることである。本研究はMMCLに関する総合的な調査である。本研究は,MMCL手法の構造的分類だけでなく,基本的な背景知識とMMCL設定を提供する。我々は,既存のMMCLメソッドを,正規化ベース,アーキテクチャベース,リプレイベース,プロンプトベースという4つのカテゴリに分類し,その方法論を説明し,重要なイノベーションを強調した。さらに,この分野でのさらなる研究を促進するため,オープンなMMCLデータセットとベンチマークを要約し,今後の研究・開発に向けたいくつかの今後の方向性について論じる。関連するMMCL論文やオープンリソースをインデックスするGitHubリポジトリも、https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learningで公開しています。 Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As machine learning models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary challenge of MMCL is that it goes beyond a simple stacking of unimodal CL methods, as such straightforward approaches often yield unsatisfactory performance. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize existing MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, and discuss several promising future directions for investigation and development. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.	翻訳日:2024-11-01 19:07:22 公開日:2024-10-11
# 基底自由点展開機能工学のための暗黙的に学習されたニューラルフェーズ関数 Implicitly Learned Neural Phase Functions for Basis-Free Point Spread Function Engineering ( http://arxiv.org/abs/2410.05413v1 ) ライセンス: Link先を確認	Aleksey Valouev, Rachel Chan,	(参考訳) ポイントスプレッド機能(PSF)技術は、ニューラルネットワーク、蛍光顕微鏡、バイオフォトニクスなど、計算画像における光の焦点を正確に制御するために欠かせない。 PSF は位相関数のフーリエ変換の大きさから導かれ、PSF (PSF Engineering) に与えられた位相関数の構成が不適切な逆問題となる。従来のPSF工学手法は物理基底関数に依存しており、撮像タスクに必要なPSFの範囲で一般化する能力を制限する。本稿では, 位相関数の品質において, 既存の画素ワイズ最適化手法を著しく上回る暗黙のニューラル表現を活用する新しい手法を提案する。 Point spread function (PSF) engineering is vital for precisely controlling the focus of light in computational imaging, with applications in neural imaging, fluorescence microscopy, and biophotonics. The PSF is derived from the magnitude of the Fourier transform of a phase function, making the construction of the phase function given the PSF (PSF engineering) an ill-posed inverse problem. Traditional PSF engineering methods rely on physical basis functions, limiting their ability to generalize across the range of PSFs required for imaging tasks. We introduce a novel approach leveraging implicit neural representations that significantly outperforms existing pixel-wise optimization methods in phase function quality.	翻訳日:2024-11-01 18:57:16 公開日:2024-10-11
# 基底自由点展開機能工学のための暗黙的に学習されたニューラルフェーズ関数 Implicitly Learned Neural Phase Functions for Basis-Free Point Spread Function Engineering ( http://arxiv.org/abs/2410.05413v2 ) ライセンス: Link先を確認	Aleksey Valouev,	(参考訳) ポイントスプレッド機能(PSF)技術は、ニューラルネットワーク、蛍光顕微鏡、バイオフォトニクスなど、計算画像における光の焦点を正確に制御するために欠かせない。 PSF は位相関数のフーリエ変換の大きさから導かれ、PSF (PSF Engineering) に与えられた位相関数の構成が不適切な逆問題となる。従来のPSF工学手法は物理基底関数に依存しており、撮像タスクに必要なPSFの範囲で一般化する能力を制限する。本稿では, 位相関数の品質において, 既存の画素ワイズ最適化手法を著しく上回る暗黙のニューラル表現を活用する新しい手法を提案する。 Point spread function (PSF) engineering is vital for precisely controlling the focus of light in computational imaging, with applications in neural imaging, fluorescence microscopy, and biophotonics. The PSF is derived from the magnitude of the Fourier transform of a phase function, making the construction of the phase function given the PSF (PSF engineering) an ill-posed inverse problem. Traditional PSF engineering methods rely on physical basis functions, limiting their ability to generalize across the range of PSFs required for imaging tasks. We introduce a novel approach leveraging implicit neural representations that significantly outperforms existing pixel-wise optimization methods in phase function quality.	翻訳日:2024-11-01 18:57:16 公開日:2024-10-11
# ニューラルネットワークを用いたマルチスペクトル衛星画像からのアクティブ火災検知のための分類器の設計 Designing a Classifier for Active Fire Detection from Multispectral Satellite Imagery Using Neural Architecture Search ( http://arxiv.org/abs/2410.05425v1 ) ライセンス: Link先を確認	Amber Cassimon, Phil Reiter, Siegfried Mercelis, Kevin Mets,	(参考訳) 本稿では、強化学習に基づくニューラルアーキテクチャサーチ(NAS)エージェントを用いて、小型ニューラルネットワークを設計し、マルチスペクトル衛星画像上でアクティブな火災検知を行う。具体的には、単一マルチスペクトル画素が火災の一部であるかどうかを判断できるニューラルネットワークを設計することを目的としており、センサデータのオンボード処理を容易にするために、低地球軌道(LEO)ナノサテライト(LEO)の制約内で行うことを目的としている。強化学習を利用するには報酬関数が必要である。我々は、この報酬関数を、純粋にアーキテクチャの特徴からINT8精度への量子化に続いて、特定のアーキテクチャによって得られたF1スコアを予測する回帰モデルの形で提供する。このモデルは、ニューラルネットワークアーキテクチャのランダムなサンプルを収集し、これらのアーキテクチャをトレーニングし、それらの分類性能統計を収集して訓練される。 F1スコア以外にも、設計モデルのサイズを制限し、ナノサテライトプラットフォームが課すリソース制約に適合するように、トレーニング可能なパラメータの総数を報酬関数に含めています。最後に、最高のニューラルネットワークをGoogle Coral Micro Dev Boardにデプロイし、推論レイテンシと消費電力を評価しました。このニューラルネットワークは1,716のトレーニング可能なパラメータで構成され、平均984{\mu}を推論に用い、800mW前後を消費して推論を行う。これらの結果から,我々の強化学習に基づくNASアプローチは,未解決の新たな問題に適用できることが示唆された。 This paper showcases the use of a reinforcement learning-based Neural Architecture Search (NAS) agent to design a small neural network to perform active fire detection on multispectral satellite imagery. Specifically, we aim to design a neural network that can determine if a single multispectral pixel is a part of a fire, and do so within the constraints of a Low Earth Orbit (LEO) nanosatellite with a limited power budget, to facilitate on-board processing of sensor data. In order to use reinforcement learning, a reward function is needed. We supply this reward function in the shape of a regression model that predicts the F1 score obtained by a particular architecture, following quantization to INT8 precision, from purely architectural features. This model is trained by collecting a random sample of neural network architectures, training these architectures, and collecting their classification performance statistics. Besides the F1 score, we also include the total number of trainable parameters in our reward function to limit the size of the designed model and ensure it fits within the resource constraints imposed by nanosatellite platforms. Finally, we deployed the best neural network to the Google Coral Micro Dev Board and evaluated its inference latency and power consumption. This neural network consists of 1,716 trainable parameters, takes on average 984{\mu}s to inference, and consumes around 800mW to perform inference. These results show that our reinforcement learning-based NAS approach can be successfully applied to novel problems not tackled before.	翻訳日:2024-11-01 18:47:31 公開日:2024-10-11
# ニューラルネットワークを用いたマルチスペクトル衛星画像からのアクティブ火災検知のための分類器の設計 Designing a Classifier for Active Fire Detection from Multispectral Satellite Imagery Using Neural Architecture Search ( http://arxiv.org/abs/2410.05425v2 ) ライセンス: Link先を確認	Amber Cassimon, Phil Reiter, Siegfried Mercelis, Kevin Mets,	(参考訳) 本稿では、強化学習に基づくニューラルアーキテクチャサーチ(NAS)エージェントを用いて、小型ニューラルネットワークを設計し、マルチスペクトル衛星画像上でアクティブな火災検知を行う。具体的には、単一マルチスペクトル画素が火災の一部であるかどうかを判断できるニューラルネットワークを設計することを目的としており、センサデータのオンボード処理を容易にするために、低地球軌道(LEO)ナノサテライト(LEO)の制約内で行うことを目的としている。強化学習を利用するには報酬関数が必要である。我々は、この報酬関数を、純粋にアーキテクチャの特徴からINT8精度への量子化に続いて、特定のアーキテクチャによって得られたF1スコアを予測する回帰モデルの形で提供する。このモデルは、ニューラルネットワークアーキテクチャのランダムなサンプルを収集し、これらのアーキテクチャをトレーニングし、それらの分類性能統計を収集して訓練される。 F1スコア以外にも、設計モデルのサイズを制限し、ナノサテライトプラットフォームが課すリソース制約に適合するように、トレーニング可能なパラメータの総数を報酬関数に含めています。最後に、最高のニューラルネットワークをGoogle Coral Micro Dev Boardにデプロイし、推論レイテンシと消費電力を評価しました。このニューラルネットワークは1,716のトレーニング可能なパラメータで構成され、平均984{\mu}を推論に用い、800mW前後を消費して推論を行う。これらの結果から,我々の強化学習に基づくNASアプローチは,未解決の新たな問題に適用できることが示唆された。 This paper showcases the use of a reinforcement learning-based Neural Architecture Search (NAS) agent to design a small neural network to perform active fire detection on multispectral satellite imagery. Specifically, we aim to design a neural network that can determine if a single multispectral pixel is a part of a fire, and do so within the constraints of a Low Earth Orbit (LEO) nanosatellite with a limited power budget, to facilitate on-board processing of sensor data. In order to use reinforcement learning, a reward function is needed. We supply this reward function in the shape of a regression model that predicts the F1 score obtained by a particular architecture, following quantization to INT8 precision, from purely architectural features. This model is trained by collecting a random sample of neural network architectures, training these architectures, and collecting their classification performance statistics. Besides the F1 score, we also include the total number of trainable parameters in our reward function to limit the size of the designed model and ensure it fits within the resource constraints imposed by nanosatellite platforms. Finally, we deployed the best neural network to the Google Coral Micro Dev Board and evaluated its inference latency and power consumption. This neural network consists of 1,716 trainable parameters, takes on average 984{\mu}s to inference, and consumes around 800mW to perform inference. These results show that our reinforcement learning-based NAS approach can be successfully applied to novel problems not tackled before.	翻訳日:2024-11-01 18:47:31 公開日:2024-10-11
# PH-Dropout:ビュー合成のための臨床てんかん不確実性定量化 PH-Dropout: Prctical Epistemic Uncertainty Quantification for View Synthesis ( http://arxiv.org/abs/2410.05468v1 ) ライセンス: Link先を確認	Chuanhao Sun, Thanos Triantafyllou, Anthos Makris, Maja Drmač, Kai Xu, Luo Mai, Mahesh K. Marina,	(参考訳) Neural Radiance Fields (NeRF) と Gaussian Splatting (GS) を用いたビュー合成は、実世界のシナリオのレンダリングにおいて顕著な忠実さを示した。しかし, 視線合成における精度, 効率のよい不確実性定量化(UQ)の実践的方法が欠落している。既存のNeRFのアプローチでは、大きな計算オーバーヘッド(例: ``10x のトレーニング時間の増加" や ``10x の繰り返しトレーニング)を導入するか、特定の不確実性条件やモデルに制限される。特に、GSモデルは包括的てんかんUQに対する体系的なアプローチを欠いている。この機能は、ニューラルネットワークビュー合成の堅牢性とスケーラビリティを改善し、アクティブなモデル更新、エラー推定、不確実性に基づいたスケーラブルなアンサンブルモデリングを可能にするために重要である。本稿では,関数近似の観点からNeRFとGSに基づく手法を再検討し,3次元表現学習における重要な違いと接続を同定する。これらの知見に基づいて, PH-Dropout (Post hoc Dropout) の導入を行った。以上の結果から,PH-Dropoutの有効性が示唆された。 View synthesis using Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) has demonstrated impressive fidelity in rendering real-world scenarios. However, practical methods for accurate and efficient epistemic Uncertainty Quantification (UQ) in view synthesis are lacking. Existing approaches for NeRF either introduce significant computational overhead (e.g., ``10x increase in training time" or ``10x repeated training") or are limited to specific uncertainty conditions or models. Notably, GS models lack any systematic approach for comprehensive epistemic UQ. This capability is crucial for improving the robustness and scalability of neural view synthesis, enabling active model updates, error estimation, and scalable ensemble modeling based on uncertainty. In this paper, we revisit NeRF and GS-based methods from a function approximation perspective, identifying key differences and connections in 3D representation learning. Building on these insights, we introduce PH-Dropout (Post hoc Dropout), the first real-time and accurate method for epistemic uncertainty estimation that operates directly on pre-trained NeRF and GS models. Extensive evaluations validate our theoretical findings and demonstrate the effectiveness of PH-Dropout.	翻訳日:2024-11-01 18:28:00 公開日:2024-10-11
# PH-Dropout:ビュー合成のための実用的てんかん不確実性定量化 PH-Dropout: Practical Epistemic Uncertainty Quantification for View Synthesis ( http://arxiv.org/abs/2410.05468v2 ) ライセンス: Link先を確認	Chuanhao Sun, Thanos Triantafyllou, Anthos Makris, Maja Drmač, Kai Xu, Luo Mai, Mahesh K. Marina,	(参考訳) Neural Radiance Fields (NeRF) と Gaussian Splatting (GS) を用いたビュー合成は、実世界のシナリオのレンダリングにおいて顕著な忠実さを示した。しかし, 視線合成における精度, 効率のよい不確実性定量化(UQ)の実践的方法が欠落している。既存のNeRFのアプローチでは、大きな計算オーバーヘッド(例: ``10x のトレーニング時間の増加" や ``10x の繰り返しトレーニング)を導入するか、特定の不確実性条件やモデルに制限される。特に、GSモデルは包括的てんかんUQに対する体系的なアプローチを欠いている。この機能は、ニューラルネットワークビュー合成の堅牢性とスケーラビリティを改善し、アクティブなモデル更新、エラー推定、不確実性に基づいたスケーラブルなアンサンブルモデリングを可能にするために重要である。本稿では,関数近似の観点からNeRFとGSに基づく手法を再検討し,3次元表現学習における重要な違いと接続を同定する。これらの知見に基づいて, PH-Dropout (Post hoc Dropout) の導入を行った。以上の結果から,PH-Dropoutの有効性が示唆された。 View synthesis using Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) has demonstrated impressive fidelity in rendering real-world scenarios. However, practical methods for accurate and efficient epistemic Uncertainty Quantification (UQ) in view synthesis are lacking. Existing approaches for NeRF either introduce significant computational overhead (e.g., ``10x increase in training time" or ``10x repeated training") or are limited to specific uncertainty conditions or models. Notably, GS models lack any systematic approach for comprehensive epistemic UQ. This capability is crucial for improving the robustness and scalability of neural view synthesis, enabling active model updates, error estimation, and scalable ensemble modeling based on uncertainty. In this paper, we revisit NeRF and GS-based methods from a function approximation perspective, identifying key differences and connections in 3D representation learning. Building on these insights, we introduce PH-Dropout (Post hoc Dropout), the first real-time and accurate method for epistemic uncertainty estimation that operates directly on pre-trained NeRF and GS models. Extensive evaluations validate our theoretical findings and demonstrate the effectiveness of PH-Dropout.	翻訳日:2024-11-01 18:28:00 公開日:2024-10-11
# T2V-Turbo-v2:データ・リワード・条件付き誘導設計によるビデオ生成後モデルの強化 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design ( http://arxiv.org/abs/2410.05677v1 ) ライセンス: Link先を確認	Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang,	(参考訳) 本稿では,事前学習したT2Vモデルから高機能な一貫性モデルを蒸留することにより,後学習段階における拡散型テキスト・ツー・ビデオ(T2V)モデルの改善に焦点をあてる。提案手法であるT2V-Turbo-v2は, 高品質なトレーニングデータ, 報酬モデルフィードバック, 条件付きガイダンスなど, 各種監視信号の整合蒸留プロセスへの統合により, 大幅な高度化を実現する。包括的アブレーション研究を通じて、特定の学習目標に対するデータセットの調整の重要性と、視覚的品質とテキスト・ビデオのアライメントを向上させるための多様な報酬モデルからの学習の有効性を強調した。さらに,教師のODEソルバを増強する効果的なエネルギー関数の設計に焦点を当てた,条件付き指導戦略の広大な設計空間を強調した。トレーニングデータセットからモーションガイダンスを抽出し、ODEソルバに組み込むことで、VBenchとT2V-CompBenchのモーション関連指標の改善により、生成されたビデオのモーション品質を改善する効果を示す。実証的に、我々のT2V-Turbo-v2は、Gen-3やKlingといったプロプライエタリシステムを上回る85.13のスコアで、VBenchに新たな最先端結果を確立する。 In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals, including high-quality training data, reward model feedback, and conditional guidance, into the consistency distillation process. Through comprehensive ablation studies, we highlight the crucial importance of tailoring datasets to specific learning objectives and the effectiveness of learning from diverse reward models for enhancing both the visual quality and text-video alignment. Additionally, we highlight the vast design space of conditional guidance strategies, which centers on designing an effective energy function to augment the teacher ODE solver. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver, showcasing its effectiveness in improving the motion quality of the generated videos with the improved motion-related metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2 establishes a new state-of-the-art result on VBench, with a Total score of 85.13, surpassing proprietary systems such as Gen-3 and Kling.	翻訳日:2024-11-01 17:09:37 公開日:2024-10-11
# T2V-Turbo-v2:データ・リワード・条件付き誘導設計によるビデオ生成後モデルの強化 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design ( http://arxiv.org/abs/2410.05677v2 ) ライセンス: Link先を確認	Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang,	(参考訳) 本稿では,事前学習したT2Vモデルから高機能な一貫性モデルを蒸留することにより,後学習段階における拡散型テキスト・ツー・ビデオ(T2V)モデルの改善に焦点をあてる。提案手法であるT2V-Turbo-v2は, 高品質なトレーニングデータ, 報酬モデルフィードバック, 条件付きガイダンスなど, 各種監視信号の整合蒸留プロセスへの統合により, 大幅な高度化を実現する。包括的アブレーション研究を通じて、特定の学習目標に対するデータセットの調整の重要性と、視覚的品質とテキスト・ビデオのアライメントを向上させるための多様な報酬モデルからの学習の有効性を強調した。さらに,教師のODEソルバを増強する効果的なエネルギー関数の設計に焦点を当てた,条件付き指導戦略の広大な設計空間を強調した。トレーニングデータセットからモーションガイダンスを抽出し、ODEソルバに組み込むことで、VBenchとT2V-CompBenchのモーション関連指標の改善により、生成されたビデオのモーション品質を改善する効果を示す。実証的に、我々のT2V-Turbo-v2は、Gen-3やKlingといったプロプライエタリシステムを上回る85.13のスコアで、VBenchに新たな最先端結果を確立する。 In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals, including high-quality training data, reward model feedback, and conditional guidance, into the consistency distillation process. Through comprehensive ablation studies, we highlight the crucial importance of tailoring datasets to specific learning objectives and the effectiveness of learning from diverse reward models for enhancing both the visual quality and text-video alignment. Additionally, we highlight the vast design space of conditional guidance strategies, which centers on designing an effective energy function to augment the teacher ODE solver. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver, showcasing its effectiveness in improving the motion quality of the generated videos with the improved motion-related metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2 establishes a new state-of-the-art result on VBench, with a Total score of 85.13, surpassing proprietary systems such as Gen-3 and Kling.	翻訳日:2024-11-01 17:09:37 公開日:2024-10-11
# CASA:高能率インクリメンタル物体検出のための視覚言語モデルにおけるクラス非依存的共有属性 CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection ( http://arxiv.org/abs/2410.05804v1 ) ライセンス: Link先を確認	Mingyi Guo, Yuyang Liu, Zongying Lin, Peixi Peng, Yonghong Tian,	(参考訳) インクリメンタルオブジェクト検出(IOD)は、シーケンシャルデータにおけるバックグラウンドカテゴリが以前学習されたクラスや将来のクラスを含む場合、バックグラウンドシフトによって問題となる。 CLIPのようなビジョン言語基盤モデルにインスパイアされたこれらのモデルは、事前トレーニング中に広範な画像とテキストのペアデータから共有属性をキャプチャする。本稿では,視覚言語基礎モデルの属性をインクリメンタルオブジェクト検出に活用する手法を提案する。本手法は,クラス非依存の共有属性ベース(CASA)を構築し,インクリメンタルクラス間の共通意味情報をキャプチャする。具体的には、大規模言語モデルを用いて、候補となるテキスト属性を生成し、現在のトレーニングデータに基づいて最も関連性の高い属性を選択し、それらの意味を属性割り当て行列に記録する。その後のタスクでは、保持された属性を凍結し、残りの候補を引き続き選択し、属性割り当て行列を更新する。さらに, OWL-ViTをベースラインとして, 事前学習した基礎モデルのパラメータを保存する。 IODのスケーラビリティと適応性を大幅に向上させるため,パラメータ効率の微調整によりパラメータ記憶に0.7%しか加えていない。 COCOデータセット上での2相および多相の大規模実験により,提案手法の最先端性能が実証された。 Incremental object detection (IOD) is challenged by background shift, where background categories in sequential data may include previously learned or future classes. Inspired by the vision-language foundation models such as CLIP, these models capture shared attributes from extensive image-text paired data during pre-training. We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection. Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes. Specifically, we utilize large language models to generate candidate textual attributes and select the most relevant ones based on current training data, recording their significance in an attribute assignment matrix. For subsequent tasks, we freeze the retained attributes and continue selecting from the remaining candidates while updating the attribute assignment matrix accordingly. Furthermore, we employ OWL-ViT as our baseline, preserving the original parameters of the pre-trained foundation model. Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of IOD. Extensive two-phase and multi-phase experiments on the COCO dataset demonstrate the state-of-the-art performance of our proposed method.	翻訳日:2024-11-01 12:39:56 公開日:2024-10-11
# CASA:高能率インクリメンタル物体検出のための視覚言語モデルにおけるクラス非依存的共有属性 CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection ( http://arxiv.org/abs/2410.05804v2 ) ライセンス: Link先を確認	Mingyi Guo, Yuyang Liu, Zongying Lin, Peixi Peng, Yonghong Tian,	(参考訳) インクリメンタルオブジェクト検出(IOD)は、シーケンシャルデータにおけるバックグラウンドカテゴリが以前学習されたクラスや将来のクラスを含む場合、バックグラウンドシフトによって問題となる。 CLIPのようなビジョン言語基盤モデルにインスパイアされたこれらのモデルは、事前トレーニング中に広範な画像とテキストのペアデータから共有属性をキャプチャする。本稿では,視覚言語基礎モデルの属性をインクリメンタルオブジェクト検出に活用する手法を提案する。本手法は,クラス非依存の共有属性ベース(CASA)を構築し,インクリメンタルクラス間の共通意味情報をキャプチャする。具体的には、大規模言語モデルを用いて、候補となるテキスト属性を生成し、現在のトレーニングデータに基づいて最も関連性の高い属性を選択し、それらの意味を属性割り当て行列に記録する。その後のタスクでは、保持された属性を凍結し、残りの候補を引き続き選択し、属性割り当て行列を更新する。さらに, OWL-ViTをベースラインとして, 事前学習した基礎モデルのパラメータを保存する。 IODのスケーラビリティと適応性を大幅に向上させるため,パラメータ効率の微調整によりパラメータ記憶に0.7%しか加えていない。 COCOデータセット上での2相および多相の大規模実験により,提案手法の最先端性能が実証された。 Incremental object detection (IOD) is challenged by background shift, where background categories in sequential data may include previously learned or future classes. Inspired by the vision-language foundation models such as CLIP, these models capture shared attributes from extensive image-text paired data during pre-training. We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection. Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes. Specifically, we utilize large language models to generate candidate textual attributes and select the most relevant ones based on current training data, recording their significance in an attribute assignment matrix. For subsequent tasks, we freeze the retained attributes and continue selecting from the remaining candidates while updating the attribute assignment matrix accordingly. Furthermore, we employ OWL-ViT as our baseline, preserving the original parameters of the pre-trained foundation model. Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of IOD. Extensive two-phase and multi-phase experiments on the COCO dataset demonstrate the state-of-the-art performance of our proposed method.	翻訳日:2024-11-01 12:39:56 公開日:2024-10-11
# Aria: オープンなマルチモーダルなNative Mixture-of-Expertsモデル Aria: An Open Multimodal Native Mixture-of-Experts Model ( http://arxiv.org/abs/2410.05993v1 ) ライセンス: Link先を確認	Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li,	(参考訳) 情報は多様である。マルチモーダルネイティブAIモデルは、現実世界の情報を統合し、包括的な理解を提供するために不可欠である。プロプライエタリなマルチモーダルネイティブモデルが存在するが、オープン性の欠如は、適応だけでなく、採用の障害となる。このギャップを埋めるために、オープンなマルチモーダルネイティブモデルであるAriaを紹介します。 Ariaは3.9Bと3.5Bのアクティベートパラメータをそれぞれ視覚トークンとテキストトークンに混合したエキスパートモデルである。 Pixtral-12BとLlama3.2-11Bを上回り、様々なマルチモーダルタスクにおける最高のプロプライエタリモデルと競合する。言語理解,マルチモーダル理解,長いコンテキストウィンドウ,命令フォローなどにおいて,Ariaを4段階のパイプラインに従ってゼロからトレーニングする。私たちは、Ariaの実際のアプリケーションへの導入と適応を容易にするコードベースとともに、モデルの重みをオープンソースにしています。 Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles for adoptions, let alone adaptations. To fill this gap, we introduce Aria, an open multimodal native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. Aria is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual token and text token, respectively. It outperforms Pixtral-12B and Llama3.2-11B, and is competitive against the best proprietary models on various multimodal tasks. We pre-train Aria from scratch following a 4-stage pipeline, which progressively equips the model with strong capabilities in language understanding, multimodal understanding, long context window, and instruction following. We open-source the model weights along with a codebase that facilitates easy adoptions and adaptations of Aria in real-world applications.	翻訳日:2024-11-01 11:50:19 公開日:2024-10-11
# Aria: オープンなマルチモーダルなNative Mixture-of-Expertsモデル Aria: An Open Multimodal Native Mixture-of-Experts Model ( http://arxiv.org/abs/2410.05993v2 ) ライセンス: Link先を確認	Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li,	(参考訳) 情報は多様である。マルチモーダルネイティブAIモデルは、現実世界の情報を統合し、包括的な理解を提供するために不可欠である。プロプライエタリなマルチモーダルネイティブモデルが存在するが、オープン性の欠如は、適応だけでなく、採用の障害となる。このギャップを埋めるために、オープンなマルチモーダルネイティブモデルであるAriaを紹介します。 Ariaは3.9Bと3.5Bのアクティベートパラメータをそれぞれ視覚トークンとテキストトークンに混合したエキスパートモデルである。 Pixtral-12BとLlama3.2-11Bを上回り、様々なマルチモーダルタスクにおける最高のプロプライエタリモデルと競合する。言語理解,マルチモーダル理解,長いコンテキストウィンドウ,命令フォローなどにおいて,Ariaを4段階のパイプラインに従ってゼロからトレーニングする。私たちは、Ariaの実際のアプリケーションへの導入と適応を容易にするコードベースとともに、モデルの重みをオープンソースにしています。 Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles for adoptions, let alone adaptations. To fill this gap, we introduce Aria, an open multimodal native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. Aria is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual token and text token, respectively. It outperforms Pixtral-12B and Llama3.2-11B, and is competitive against the best proprietary models on various multimodal tasks. We pre-train Aria from scratch following a 4-stage pipeline, which progressively equips the model with strong capabilities in language understanding, multimodal understanding, long context window, and instruction following. We open-source the model weights along with a codebase that facilitates easy adoptions and adaptations of Aria in real-world applications.	翻訳日:2024-11-01 11:50:19 公開日:2024-10-11
# 未知リンク関数を持つ一般化スパース付加モデル Generalized Sparse Additive Model with Unknown Link Function ( http://arxiv.org/abs/2410.06012v1 ) ライセンス: Link先を確認	Peipei Yuan, Xinge You, Hong Chen, Xuelin Zhang, Qinmu Peng,	(参考訳) 一般化加法モデル(GAM)は高次元データ解析に成功している。しかし、既存のほとんどのメソッドは、リンク関数、コンポーネント関数、変数相互作用を同時に見積もることはできない。この問題を軽減するために,未知リンク関数 (GSAMUL) を持つ一般化スパース付加モデル(一般スパース付加モデル)を提案し,B-スプラインベースと未知リンク関数を多層パーセプトロン (MLP) ネットワークで推定する。さらに$\ell_{2,1}$-norm正規化器は変数選択に使用される。提案したGSAMULは、可変選択と隠れ相互作用の両方を実現することができる。この推定を二段階最適化問題に統合し、データをトレーニングセットと検証セットに分割する。理論的には、近似手順の収束に関する保証を提供する。応用において、合成および実世界のデータセットの実験的評価は、提案手法の有効性を一貫して検証する。 Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. However, most existing methods cannot simultaneously estimate the link function, the component functions and the variable interaction. To alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. Furthermore, $\ell_{2,1}$-norm regularizer is used for variable selection. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the effectiveness of the proposed approach.	翻訳日:2024-11-01 11:40:34 公開日:2024-10-11
# 未知リンク関数を持つ一般化スパース付加モデル Generalized Sparse Additive Model with Unknown Link Function ( http://arxiv.org/abs/2410.06012v2 ) ライセンス: Link先を確認	Peipei Yuan, Xinge You, Hong Chen, Xuelin Zhang, Qinmu Peng,	(参考訳) 一般化加法モデル(GAM)は高次元データ解析に成功している。しかし、既存のほとんどのメソッドは、リンク関数、コンポーネント関数、変数相互作用を同時に見積もることはできない。この問題を軽減するために,未知リンク関数 (GSAMUL) を持つ一般化スパース付加モデル(一般スパース付加モデル)を提案し,B-スプラインベースと未知リンク関数を多層パーセプトロン (MLP) ネットワークで推定する。さらに$\ell_{2,1}$-norm正規化器は変数選択に使用される。提案したGSAMULは、可変選択と隠れ相互作用の両方を実現することができる。この推定を二段階最適化問題に統合し、データをトレーニングセットと検証セットに分割する。理論的には、近似手順の収束に関する保証を提供する。応用において、合成および実世界のデータセットの実験的評価は、提案手法の有効性を一貫して検証する。 Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. However, most existing methods cannot simultaneously estimate the link function, the component functions and the variable interaction. To alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. Furthermore, $\ell_{2,1}$-norm regularizer is used for variable selection. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the effectiveness of the proposed approach.	翻訳日:2024-11-01 11:40:34 公開日:2024-10-11
# ブロック誘起符号生成逆数ネットワーク(BISGAN) : GANを用いた信号スポーフィングとその評価 Block Induced Signature Generative Adversarial Network (BISGAN): Signature Spoofing Using GANs and Their Evaluation ( http://arxiv.org/abs/2410.06041v1 ) ライセンス: Link先を確認	Haadia Amjad, Kilian Goeller, Steffen Seitz, Carsten Knoll, Naseer Bajwa, Muhammad Imran Malik, Ronald Tetzlaff,	(参考訳) ディープラーニングはバイオメトリックスにおいて、効率的な識別と検証システムを開発するために積極的に利用されている。手書き署名は認証目的のための生体データの一般的なサブセットである。 GAN(Generative Adversarial Network)は、オリジナルおよびフォージされたシグネチャから学習し、フォージされたシグネチャを生成する。ほとんどのGAN技術は、識別器である強力なシグネチャ検証器を生成するが、ジェネレータモデルによって生成される偽造品の品質をより重視する必要がある。この研究は、署名検証システムのベンチマークを達成するために、偽造サンプルを生成するジェネレータを作成することに重点を置いている。 Inceptionモデルのようなブロックを注入したCycleGANを生成元とし,SigCNNモデルの変種を基本判別器として使用する。私たちは、署名スプーフィングで80%から100%の成功をもたらす新しいテクニックでモデルをトレーニングします。さらに、生成された偽造物の良さを測るカスタム評価手法を作成する。本研究は,バイオメトリックデータ生成と評価の理解を深めるため,データ品質を汚すジェネレータ指向のGANアーキテクチャを提唱する。 Deep learning is actively being used in biometrics to develop efficient identification and verification systems. Handwritten signatures are a common subset of biometric data for authentication purposes. Generative adversarial networks (GANs) learn from original and forged signatures to generate forged signatures. While most GAN techniques create a strong signature verifier, which is the discriminator, there is a need to focus more on the quality of forgeries generated by the generator model. This work focuses on creating a generator that produces forged samples that achieve a benchmark in spoofing signature verification systems. We use CycleGANs infused with Inception model-like blocks with attention heads as the generator and a variation of the SigCNN model as the base Discriminator. We train our model with a new technique that results in 80% to 100% success in signature spoofing. Additionally, we create a custom evaluation technique to act as a goodness measure of the generated forgeries. Our work advocates generator-focused GAN architectures for spoofing data quality that aid in a better understanding of biometric data generation and evaluation.	翻訳日:2024-11-01 11:30:40 公開日:2024-10-11
# ブロック誘起符号生成逆数ネットワーク(BISGAN) : GANを用いた信号スポーフィングとその評価 Block Induced Signature Generative Adversarial Network (BISGAN): Signature Spoofing Using GANs and Their Evaluation ( http://arxiv.org/abs/2410.06041v2 ) ライセンス: Link先を確認	Haadia Amjad, Kilian Goeller, Steffen Seitz, Carsten Knoll, Naseer Bajwa, Ronald Tetzlaff, Muhammad Imran Malik,	(参考訳) ディープラーニングはバイオメトリックスにおいて、効率的な識別と検証システムを開発するために積極的に利用されている。手書き署名は認証目的のための生体データの一般的なサブセットである。 GAN(Generative Adversarial Network)は、オリジナルおよびフォージされたシグネチャから学習し、フォージされたシグネチャを生成する。ほとんどのGAN技術は、識別器である強力なシグネチャ検証器を生成するが、ジェネレータモデルによって生成される偽造品の品質をより重視する必要がある。この研究は、署名検証システムのベンチマークを達成するために、偽造サンプルを生成するジェネレータを作成することに重点を置いている。 Inceptionモデルのようなブロックを注入したCycleGANを生成元とし,SigCNNモデルの変種を基本判別器として使用する。私たちは、署名スプーフィングで80%から100%の成功をもたらす新しいテクニックでモデルをトレーニングします。さらに、生成された偽造物の良さを測るカスタム評価手法を作成する。本研究は,バイオメトリックデータ生成と評価の理解を深めるため,データ品質を汚すジェネレータ指向のGANアーキテクチャを提唱する。 Deep learning is actively being used in biometrics to develop efficient identification and verification systems. Handwritten signatures are a common subset of biometric data for authentication purposes. Generative adversarial networks (GANs) learn from original and forged signatures to generate forged signatures. While most GAN techniques create a strong signature verifier, which is the discriminator, there is a need to focus more on the quality of forgeries generated by the generator model. This work focuses on creating a generator that produces forged samples that achieve a benchmark in spoofing signature verification systems. We use CycleGANs infused with Inception model-like blocks with attention heads as the generator and a variation of the SigCNN model as the base Discriminator. We train our model with a new technique that results in 80% to 100% success in signature spoofing. Additionally, we create a custom evaluation technique to act as a goodness measure of the generated forgeries. Our work advocates generator-focused GAN architectures for spoofing data quality that aid in a better understanding of biometric data generation and evaluation.	翻訳日:2024-11-01 11:30:40 公開日:2024-10-11
# Auto-Evolve: 自己推論フレームワークによる大規模言語モデルのパフォーマンス向上 Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework ( http://arxiv.org/abs/2410.06328v1 ) ライセンス: Link先を確認	Krishna Aswani, Huilin Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse,	(参考訳) CoT(Chain-of-Thought)やSelf-Discover(Self-Discover)といった急進的なエンジニアリング戦略の進歩は、Large Language Models(LLMs)の推論能力を改善する大きな可能性を示している。しかし、これらの最先端戦略(SOTA)は、人間の問題解決へのアプローチをシミュレートすることを目的とした、'emph{"think step by step"} や 'emph{"break down this problem"} のような静的な推論モジュールの単一または固定セットに依存している。この制約は、多様な問題に効果的に取り組む際のモデルの柔軟性を制限する。本稿では,LLMが動的推論モジュールと下流動作計画の自己生成を可能にする新しいフレームワークであるAuto-Evolveを紹介する。我々は、Claude 2.0、Claude 3 Sonnet、Mistral Large、GPT 4による難易度の高いBigBench-Hard(BBH)データセットのAuto-Evolveを評価する。 Auto-EvolveはCoTを最大10.4 %、そしてこれら4つのモデルで平均7 %上回っている。私たちのフレームワークには2つのイノベーションがあります。 a) Auto-Evolveは、人間の推論パラダイムと整合しながら、タスク毎の推論モジュールを動的に生成することにより、事前定義されたテンプレートの必要性を排除します。 b) LLMの指導指導を段階的に洗練し, 1ステップで行うよりも平均2.8倍の性能向上に寄与する反復改良部品を導入する。 Recent advancements in prompt engineering strategies, such as Chain-of-Thought (CoT) and Self-Discover, have demonstrated significant potential in improving the reasoning abilities of Large Language Models (LLMs). However, these state-of-the-art (SOTA) prompting strategies rely on single or fixed set of static seed reasoning modules like \emph{"think step by step"} or \emph{"break down this problem"} intended to simulate human approach to problem-solving. This constraint limits the flexibility of models in tackling diverse problems effectively. In this paper, we introduce Auto-Evolve, a novel framework that enables LLMs to self-create dynamic reasoning modules and downstream action plan, resulting in significant improvements over current SOTA methods. We evaluate Auto-Evolve on the challenging BigBench-Hard (BBH) dataset with Claude 2.0, Claude 3 Sonnet, Mistral Large, and GPT 4, where it consistently outperforms the SOTA prompt strategies. Auto-Evolve outperforms CoT by up to 10.4\% and on an average by 7\% across these four models. Our framework introduces two innovations: a) Auto-Evolve dynamically generates reasoning modules for each task while aligning with human reasoning paradigm, thus eliminating the need for predefined templates. b) We introduce an iterative refinement component, that incrementally refines instruction guidance for LLMs and helps boost performance by average 2.8\% compared to doing it in a single step.	翻訳日:2024-11-01 06:29:16 公開日:2024-10-11
# Auto-Evolve: 自己推論フレームワークによる大規模言語モデルのパフォーマンス向上 Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework ( http://arxiv.org/abs/2410.06328v2 ) ライセンス: Link先を確認	Krishna Aswani, Huilin Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse,	(参考訳) CoT(Chain-of-Thought)やSelf-Discover(Self-Discover)といった急進的なエンジニアリング戦略の進歩は、Large Language Models(LLMs)の推論能力を改善する大きな可能性を示している。しかし、これらの最先端(SOTA)の戦略を促進するには、「ステップバイステップ」や「この問題を分解する」といった静的な推論モジュールの単一あるいは固定的なセットを頼りにしており、人間の問題解決へのアプローチをシミュレートする。この制約は、多様な問題に効果的に取り組む際のモデルの柔軟性を制限する。本稿では,LLMが動的推論モジュールと下流動作計画の自己生成を可能にする新しいフレームワークであるAuto-Evolveを紹介する。我々は、Claude 2.0、Claude 3 Sonnet、Mistral Large、GPT 4による難易度の高いBigBench-Hard(BBH)データセットのAuto-Evolveを評価する。 Auto-EvolveはCoTを最大10.4%上回り、4つのモデルで平均7%上回っている。私たちのフレームワークには2つのイノベーションがあります。 a) Auto-Evolveは、人間の推論パラダイムと整合しながら、タスク毎の推論モジュールを動的に生成することにより、事前定義されたテンプレートの必要性を排除します。 b) LLMの指導指導を段階的に洗練し, 1ステップで行うよりも平均2.8%向上する反復改良部品を導入する。 Recent advancements in prompt engineering strategies, such as Chain-of-Thought (CoT) and Self-Discover, have demonstrated significant potential in improving the reasoning abilities of Large Language Models (LLMs). However, these state-of-the-art (SOTA) prompting strategies rely on single or fixed set of static seed reasoning modules like "think step by step" or "break down this problem" intended to simulate human approach to problem-solving. This constraint limits the flexibility of models in tackling diverse problems effectively. In this paper, we introduce Auto-Evolve, a novel framework that enables LLMs to self-create dynamic reasoning modules and downstream action plan, resulting in significant improvements over current SOTA methods. We evaluate Auto-Evolve on the challenging BigBench-Hard (BBH) dataset with Claude 2.0, Claude 3 Sonnet, Mistral Large, and GPT 4, where it consistently outperforms the SOTA prompt strategies. Auto-Evolve outperforms CoT by up to 10.4% and on an average by 7% across these four models. Our framework introduces two innovations: a) Auto-Evolve dynamically generates reasoning modules for each task while aligning with human reasoning paradigm, thus eliminating the need for predefined templates. b) We introduce an iterative refinement component, that incrementally refines instruction guidance for LLMs and helps boost performance by average 2.8% compared to doing it in a single step.	翻訳日:2024-11-01 06:29:16 公開日:2024-10-11
# TopoTune : 一般化された組合せ複雑ニューラルネットワークのためのフレームワーク TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks ( http://arxiv.org/abs/2410.06530v1 ) ライセンス: Link先を確認	Mathilde Papillon, Guillermo Bernárdez, Claudio Battiloro, Nina Miolane,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ領域の対称性を保存する方法で、リレーショナルデータセット、処理ノード、エッジ機能からの学習に優れています。しかし、多くの複雑なシステム、例えば生物学やソーシャルネットワークは、より自然に高階位相空間で表されるマルチウェイの複雑な相互作用を生み出している。トポロジカルディープラーニング(TDL)の新たな分野は、これらの高次構造を適応し活用することを目指している。比較的一般的な TDL モデルである Combinatorial Complex Neural Networks (CCNN) は、GNN よりも表現力が高く、パフォーマンスも優れていることが示されている。しかし、グラフ深層学習のエコシステムとは違って、TDLは新しいアーキテクチャを簡単に定義するための原則的で標準化されたフレームワークがなく、アクセシビリティと適用性を制限する。この問題に対処するために,汎用CCNN(Generalized CCNNs,GCCNs)を導入する。これは,任意の(グラフ)ニューラルネットワークをTDLモデルに体系的に変換するために使用できる,新しい単純かつ強力なTDLモデルのファミリーである。 GCCN が CCNN を一般化・サブスメートすることを証明する一方で,多様な GCCN のクラスに対する広範な実験により,これらのアーキテクチャは CCNN との整合性や性能を保ちながら,モデルの複雑さを少なく抑えることが示されている。 TDLを加速し、民主化するために、私たちは、実践者が前例のない柔軟性と容易さでGCCNを定義し、構築し、訓練できる軽量ソフトウェアであるTopoTuneを紹介します。 Graph Neural Networks (GNNs) excel in learning from relational datasets, processing node and edge features in a way that preserves the symmetries of the graph domain. However, many complex systems--such as biological or social networks--involve multiway complex interactions that are more naturally represented by higher-order topological spaces. The emerging field of Topological Deep Learning (TDL) aims to accommodate and leverage these higher-order structures. Combinatorial Complex Neural Networks (CCNNs), fairly general TDL models, have been shown to be more expressive and better performing than GNNs. However, differently from the graph deep learning ecosystem, TDL lacks a principled and standardized framework for easily defining new architectures, restricting its accessibility and applicability. To address this issue, we introduce Generalized CCNNs (GCCNs), a novel simple yet powerful family of TDL models that can be used to systematically transform any (graph) neural network into its TDL counterpart. We prove that GCCNs generalize and subsume CCNNs, while extensive experiments on a diverse class of GCCNs show that these architectures consistently match or outperform CCNNs, often with less model complexity. In an effort to accelerate and democratize TDL, we introduce TopoTune, a lightweight software that allows practitioners to define, build, and train GCCNs with unprecedented flexibility and ease.	翻訳日:2024-11-01 05:09:09 公開日:2024-10-11
# TopoTune : 一般化された組合せ複雑ニューラルネットワークのためのフレームワーク TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks ( http://arxiv.org/abs/2410.06530v2 ) ライセンス: Link先を確認	Mathilde Papillon, Guillermo Bernárdez, Claudio Battiloro, Nina Miolane,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ領域の対称性を保存する方法で、リレーショナルデータセット、処理ノード、エッジ機能からの学習に優れています。しかし、多くの複雑なシステム、例えば生物学やソーシャルネットワークは、より自然に高階位相空間で表されるマルチウェイの複雑な相互作用を生み出している。トポロジカルディープラーニング(TDL)の新たな分野は、これらの高次構造を適応し活用することを目指している。比較的一般的な TDL モデルである Combinatorial Complex Neural Networks (CCNN) は、GNN よりも表現力が高く、パフォーマンスも優れていることが示されている。しかし、グラフ深層学習のエコシステムとは違って、TDLは新しいアーキテクチャを簡単に定義するための原則的で標準化されたフレームワークがなく、アクセシビリティと適用性を制限する。この問題に対処するために,汎用CCNN(Generalized CCNNs,GCCNs)を導入する。これは,任意の(グラフ)ニューラルネットワークをTDLモデルに体系的に変換するために使用できる,新しい単純かつ強力なTDLモデルのファミリーである。 GCCN が CCNN を一般化・サブスメートすることを証明する一方で,多様な GCCN のクラスに対する広範な実験により,これらのアーキテクチャは CCNN との整合性や性能を保ちながら,モデルの複雑さを少なく抑えることが示されている。 TDLを加速し、民主化するために、私たちは、実践者が前例のない柔軟性と容易さでGCCNを定義し、構築し、訓練できる軽量ソフトウェアであるTopoTuneを紹介します。 Graph Neural Networks (GNNs) excel in learning from relational datasets, processing node and edge features in a way that preserves the symmetries of the graph domain. However, many complex systems--such as biological or social networks--involve multiway complex interactions that are more naturally represented by higher-order topological spaces. The emerging field of Topological Deep Learning (TDL) aims to accommodate and leverage these higher-order structures. Combinatorial Complex Neural Networks (CCNNs), fairly general TDL models, have been shown to be more expressive and better performing than GNNs. However, differently from the graph deep learning ecosystem, TDL lacks a principled and standardized framework for easily defining new architectures, restricting its accessibility and applicability. To address this issue, we introduce Generalized CCNNs (GCCNs), a novel simple yet powerful family of TDL models that can be used to systematically transform any (graph) neural network into its TDL counterpart. We prove that GCCNs generalize and subsume CCNNs, while extensive experiments on a diverse class of GCCNs show that these architectures consistently match or outperform CCNNs, often with less model complexity. In an effort to accelerate and democratize TDL, we introduce TopoTune, a lightweight software that allows practitioners to define, build, and train GCCNs with unprecedented flexibility and ease.	翻訳日:2024-11-01 05:09:09 公開日:2024-10-11
# Chip-Tuning: 言語モデルが言う前に分類する Chip-Tuning: Classify Before Language Models Say ( http://arxiv.org/abs/2410.06541v1 ) ライセンス: Link先を確認	Fangwei Zhu, Dian Li, Jiajun Huang, Gang Liu, Hui Wang, Zhifang Sui,	(参考訳) 大規模言語モデル(LLM)の性能の急激な発展は、モデルサイズがエスカレーションされ、モデルトレーニングと推論のコストが増大する。従来の研究では、LLMの特定の層が冗長性を示し、これらの層を取り除くことで、モデルの性能がわずかに損なわれることが判明した。本稿では, LLMの層冗長性を説明するために, 探索手法を採用し, 探索型分類器を用いて言語モデルを効果的に解析できることを実証する。分類問題に特化した簡易かつ効果的な構造化プルーニングフレームワークであるチップチューニングを提案する。チップチューニングは、LLMの異なる層にチップという名前の小さなプロブリング分類器を取り付け、バックボーンモデルが凍結されたチップを訓練する。分類用チップを選択した後、付加層に後続するすべての層は、限界性能損失で除去できる。各種LLMおよびデータセットによる実験結果から,チップチューニングは従来の最先端のベースラインよりも精度とプルーニング比の両方で有意に優れ,プルーニング比が最大50%に達することが示された。また、チップチューニングはマルチモーダルモデルに適用でき、モデル微調整と組み合わせることで、優れた互換性が証明できる。 The rapid development in the performance of large language models (LLMs) is accompanied by the escalation of model size, leading to the increasing cost of model training and inference. Previous research has discovered that certain layers in LLMs exhibit redundancy, and removing these layers brings only marginal loss in model performance. In this paper, we adopt the probing technique to explain the layer redundancy in LLMs and demonstrate that language models can be effectively pruned with probing classifiers. We propose chip-tuning, a simple and effective structured pruning framework specialized for classification problems. Chip-tuning attaches tiny probing classifiers named chips to different layers of LLMs, and trains chips with the backbone model frozen. After selecting a chip for classification, all layers subsequent to the attached layer could be removed with marginal performance loss. Experimental results on various LLMs and datasets demonstrate that chip-tuning significantly outperforms previous state-of-the-art baselines in both accuracy and pruning ratio, achieving a pruning ratio of up to 50%. We also find that chip-tuning could be applied on multimodal models, and could be combined with model finetuning, proving its excellent compatibility.	翻訳日:2024-11-01 05:09:09 公開日:2024-10-11
# Chip-Tuning: 言語モデルが言う前に分類する Chip-Tuning: Classify Before Language Models Say ( http://arxiv.org/abs/2410.06541v2 ) ライセンス: Link先を確認	Fangwei Zhu, Dian Li, Jiajun Huang, Gang Liu, Hui Wang, Zhifang Sui,	(参考訳) 大規模言語モデル(LLM)の性能の急激な発展は、モデルサイズがエスカレーションされ、モデルトレーニングと推論のコストが増大する。従来の研究では、LLMの特定の層が冗長性を示し、これらの層を取り除くことで、モデルの性能がわずかに損なわれることが判明した。本稿では, LLMの層冗長性を説明するために, 探索手法を採用し, 探索型分類器を用いて言語モデルを効果的に解析できることを実証する。分類問題に特化した簡易かつ効果的な構造化プルーニングフレームワークであるチップチューニングを提案する。チップチューニングは、LLMの異なる層にチップという名前の小さなプロブリング分類器を取り付け、バックボーンモデルが凍結されたチップを訓練する。分類用チップを選択した後、付加層に後続するすべての層は、限界性能損失で除去できる。各種LLMおよびデータセットによる実験結果から,チップチューニングは従来の最先端のベースラインよりも精度とプルーニング比の両方で有意に優れ,プルーニング比が最大50%に達することが示された。また、チップチューニングはマルチモーダルモデルに適用でき、モデル微調整と組み合わせることで、優れた互換性が証明できる。 The rapid development in the performance of large language models (LLMs) is accompanied by the escalation of model size, leading to the increasing cost of model training and inference. Previous research has discovered that certain layers in LLMs exhibit redundancy, and removing these layers brings only marginal loss in model performance. In this paper, we adopt the probing technique to explain the layer redundancy in LLMs and demonstrate that language models can be effectively pruned with probing classifiers. We propose chip-tuning, a simple and effective structured pruning framework specialized for classification problems. Chip-tuning attaches tiny probing classifiers named chips to different layers of LLMs, and trains chips with the backbone model frozen. After selecting a chip for classification, all layers subsequent to the attached layer could be removed with marginal performance loss. Experimental results on various LLMs and datasets demonstrate that chip-tuning significantly outperforms previous state-of-the-art baselines in both accuracy and pruning ratio, achieving a pruning ratio of up to 50%. We also find that chip-tuning could be applied on multimodal models, and could be combined with model finetuning, proving its excellent compatibility.	翻訳日:2024-11-01 05:09:09 公開日:2024-10-11
# マルチラウンド優先最適化を用いた細部・高精度ビデオキャプションのためのマルチモーダルLLMの強化 Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization ( http://arxiv.org/abs/2410.06682v1 ) ライセンス: Link先を確認	Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang,	(参考訳) ビデオには豊富な情報が含まれており、自然言語で詳細な正確な記述を生成することが、ビデオ理解の重要な側面である。本稿では,指向性優先最適化 (DPO) によるビデオキャプションの強化を目的とした,低ランク適応 (LoRA) を備えた高度オーディオ視覚大言語モデル (LLM) である Video-SALMONN 2 を提案する。 DPOを用いて最適化されたビデオ記述の完全性と精度を評価するための新しい指標を提案する。さらに,DPO参照モデルを定期的に更新し,各トレーニングラウンド(1000ステップ)後のパラメータ更新のプロキシとしてLoRAモジュールをマージ,再起動し,地上の映像キャプションからのガイダンスを取り入れてプロセスの安定化を図る,新しいマルチラウンドDPO(mrDPO)アプローチを導入する。我々は,mDPO学習モデルによって生成されたキャプションを教師付きラベルとして使用することにより,pre-DPO LLMを微調整する再生チューニングを提案する。実験の結果,mDPOはビデオSALMONN 2のキャプション精度を著しく向上させ,グローバル・ローカル・エラー率を40%,ローカル・エラー率を20%,反復率を35倍に低下させることがわかった。ビデオキャプションタスクにおけるGPT-4oやGemini-1.5-Proといった主要なモデルよりも、70億のパラメータしか持たない最後のビデオ-SALMONN 2モデルは、同様のサイズのモデルの間で、最先端の動画質問応答ベンチマークと競合する性能を維持している。受け入れたら、コード、モデルチェックポイント、トレーニングとテストデータをリリースします。デモは \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io} で公開されている。 Videos contain a wealth of information, and generating detailed and accurate descriptions in natural language is a key aspect of video understanding. In this paper, we present video-SALMONN 2, an advanced audio-visual large language model (LLM) with low-rank adaptation (LoRA) designed for enhanced video (with paired audio) captioning through directed preference optimization (DPO). We propose new metrics to evaluate the completeness and accuracy of video descriptions, which are optimized using DPO. To further improve training, we introduce a novel multi-round DPO (mrDPO) approach, which involves periodically updating the DPO reference model, merging and re-initializing the LoRA module as a proxy for parameter updates after each training round (1,000 steps), and incorporating guidance from ground-truth video captions to stabilize the process. To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels. Experiments show that mrDPO significantly enhances video-SALMONN 2's captioning accuracy, reducing global and local error rates by 40\% and 20\%, respectively, while decreasing the repetition rate by 35\%. The final video-SALMONN 2 model, with just 7 billion parameters, surpasses leading models such as GPT-4o and Gemini-1.5-Pro in video captioning tasks, while maintaining competitive performance to the state-of-the-art on widely used video question-answering benchmark among models of similar size. Upon acceptance, we will release the code, model checkpoints, and training and test data. Demos are available at \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io}.	翻訳日:2024-11-01 04:19:50 公開日:2024-10-11
# マルチラウンド優先最適化を用いた細部・高精度ビデオキャプションのためのマルチモーダルLLMの強化 Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization ( http://arxiv.org/abs/2410.06682v2 ) ライセンス: Link先を確認	Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang,	(参考訳) ビデオには豊富な情報が含まれており、自然言語で詳細な正確な記述を生成することが、ビデオ理解の重要な側面である。本稿では,指向性優先最適化 (DPO) によるビデオキャプションの強化を目的とした,低ランク適応 (LoRA) を備えた高度オーディオ視覚大言語モデル (LLM) である Video-SALMONN 2 を提案する。 DPOを用いて最適化されたビデオ記述の完全性と精度を評価するための新しい指標を提案する。さらに,DPO参照モデルを定期的に更新し,各トレーニングラウンド(1000ステップ)後のパラメータ更新のプロキシとしてLoRAモジュールをマージ,再起動し,地上の映像キャプションからのガイダンスを取り入れてプロセスの安定化を図る,新しいマルチラウンドDPO(mrDPO)アプローチを導入する。我々は,mDPO学習モデルによって生成されたキャプションを教師付きラベルとして使用することにより,pre-DPO LLMを微調整する再生チューニングを提案する。実験の結果,mDPOはビデオSALMONN 2のキャプション精度を著しく向上させ,グローバル・ローカル・エラー率を40%,ローカル・エラー率を20%,反復率を35倍に低下させることがわかった。ビデオキャプションタスクにおけるGPT-4oやGemini-1.5-Proといった主要なモデルよりも、70億のパラメータしか持たない最後のビデオ-SALMONN 2モデルは、同様のサイズのモデルの間で、最先端の動画質問応答ベンチマークと競合する性能を維持している。受け入れたら、コード、モデルチェックポイント、トレーニングとテストデータをリリースします。デモは \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io} で公開されている。 Videos contain a wealth of information, and generating detailed and accurate descriptions in natural language is a key aspect of video understanding. In this paper, we present video-SALMONN 2, an advanced audio-visual large language model (LLM) with low-rank adaptation (LoRA) designed for enhanced video (with paired audio) captioning through directed preference optimization (DPO). We propose new metrics to evaluate the completeness and accuracy of video descriptions, which are optimized using DPO. To further improve training, we introduce a novel multi-round DPO (mrDPO) approach, which involves periodically updating the DPO reference model, merging and re-initializing the LoRA module as a proxy for parameter updates after each training round (1,000 steps), and incorporating guidance from ground-truth video captions to stabilize the process. To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels. Experiments show that mrDPO significantly enhances video-SALMONN 2's captioning accuracy, reducing global and local error rates by 40\% and 20\%, respectively, while decreasing the repetition rate by 35\%. The final video-SALMONN 2 model, with just 7 billion parameters, surpasses leading models such as GPT-4o and Gemini-1.5-Pro in video captioning tasks, while maintaining competitive performance to the state-of-the-art on widely used video question-answering benchmark among models of similar size. Upon acceptance, we will release the code, model checkpoints, and training and test data. Demos are available at \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io}.	翻訳日:2024-11-01 04:19:50 公開日:2024-10-11
# パラメトリックPDEのためのニューラルソルバーの学習と物理インフォームド法 Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods ( http://arxiv.org/abs/2410.06820v1 ) ライセンス: Link先を確認	Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, Patrick Gallinari,	(参考訳) 物理インフォームド深層学習は、大きな解空間を探索し、多くの反復を必要とし、不安定な訓練につながるような偏微分方程式(PDE)を解く複雑さのために、最適化の課題に直面していることが多い。これらの課題は、特に損失関数の微分項によって生じる最適化問題の条件が悪くなることから生じる。これらの問題に対処するために、データに基づいて訓練された物理インフォームド反復アルゴリズムを用いてPDEの解法を学ぶことを提案する。提案手法は,各PDEインスタンスに自動的に適応し,最適化プロセスの大幅な高速化と安定化を実現し,物理認識モデルの高速収束を可能にする勾配降下アルゴリズムの条件付けを学習する。さらに,従来の物理インフォームド手法は1つのPDEインスタンスを解くが,本手法はパラメトリックPDEに対処する。具体的には, 物理損失勾配をPDEパラメータと統合し, 係数, 初期条件, 境界条件を含むPDEパラメータの分布を解く。提案手法の有効性を,複数のデータセット上での実験実験により実証し,トレーニングとテスト時間最適化性能を比較した。 Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem, caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach addresses parametric PDEs. Specifically, our method integrates the physical loss gradient with the PDE parameters to solve over a distribution of PDE parameters, including coefficients, initial conditions, or boundary conditions. We demonstrate the effectiveness of our method through empirical experiments on multiple datasets, comparing training and test-time optimization performance.	翻訳日:2024-11-01 03:21:00 公開日:2024-10-11
# パラメトリックPDEのためのニューラルソルバーの学習と物理インフォームド法 Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods ( http://arxiv.org/abs/2410.06820v2 ) ライセンス: Link先を確認	Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, Patrick Gallinari,	(参考訳) 物理インフォームド深層学習は、大きな解空間を探索し、多くの反復を必要とし、不安定な訓練につながるような偏微分方程式(PDE)を解く複雑さのために、最適化の課題に直面していることが多い。これらの課題は、特に損失関数の微分項によって生じる最適化問題の条件が悪くなることから生じる。これらの問題に対処するために、データに基づいて訓練された物理インフォームド反復アルゴリズムを用いてPDEの解法を学ぶことを提案する。提案手法は,各PDEインスタンスに自動的に適応し,最適化プロセスの大幅な高速化と安定化を実現し,物理認識モデルの高速収束を可能にする勾配降下アルゴリズムの条件付けを学習する。さらに,従来の物理インフォームド手法は1つのPDEインスタンスを解くが,本手法はパラメトリックPDEに対処する。具体的には, 物理損失勾配をPDEパラメータと統合し, 係数, 初期条件, 境界条件を含むPDEパラメータの分布を解く。提案手法の有効性を,複数のデータセット上での実験実験により実証し,トレーニングとテスト時間最適化性能を比較した。 Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem, caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach addresses parametric PDEs. Specifically, our method integrates the physical loss gradient with the PDE parameters to solve over a distribution of PDE parameters, including coefficients, initial conditions, or boundary conditions. We demonstrate the effectiveness of our method through empirical experiments on multiple datasets, comparing training and test-time optimization performance.	翻訳日:2024-11-01 03:21:00 公開日:2024-10-11
# 誰がウェブブラウザを使うのか? アメリカにおけるブラウザのフィンガープリントにおける人口統計学の役割 How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users ( http://arxiv.org/abs/2410.06954v1 ) ライセンス: Link先を確認	Alex Berke, Badih Ghazi, Enrico Bacis, Pritish Kamath, Ravi Kumar, Robin Lassonde, Pasin Manurangsi, Umar Syed,	(参考訳) ブラウザのフィンガープリントは、ユーザーのデバイスから属性を収集してユニークな「指紋」を作成することで、クッキーなしでもウェブ上のユーザーを特定し、追跡するために使用することができる。この技術と結果として生じるプライバシーリスクは10年以上にわたって研究されてきた。しかし、先行研究ではデータが公開されていないため、さらなる研究は限られている。さらに、先行研究のデータにはユーザーの人口統計が欠如していた。ここでは、さらなる研究を可能にするための第一種データセットを提供する。これには、ユーザの人口統計と調査回答によるブラウザ属性が含まれ、8,400人の米国研究参加者からインフォームドコンセントで収集された。このデータセットを用いて、人口集団間で指紋認証のリスクがどのように異なるかを示す。例えば、低所得のユーザはリスクが高く、ユーザーの年齢が上がるにつれて、どちらも指紋認証や実際の指紋認証のリスクに気を遣う傾向にある。さらに, 性別, 年齢, 所得水準, 人種などのユーザ人口層を, 指紋認証によく使用されるブラウザ属性から推定し, このリスクに最も寄与するブラウザ属性を特定する。また,オープンな研究のためにブラウザデータを共有する可能性について,12,461人の参加者から回答を得て,今後のデータ収集にどのような影響があるかを調べる実験を行った。女性参加者は、私たちが収集したブラウザデータを表示するように、ブラウザデータをシェアする傾向が著しく低かった。全体として、指紋認証のリスクを評価し、ユーザプライバシを改善することを目的として、現在進行中の作業において、ユーザ人口統計学が重要な役割を担っていることを示す。私たちが提供しているデータセットとデータ収集ツールは、この研究で対処されていない研究の質問をさらに研究するために使用できます。 Browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies used data not publicly available. Additionally, data in prior studies lacked user demographics. Here we provide a first-of-its-kind dataset to enable further research. It includes browser attributes with users' demographics and survey responses, collected with informed consent from 8,400 US study participants. We use this dataset to demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with responses from 12,461 total participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect. Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we provide can be used to further study research questions not addressed in this work.	翻訳日:2024-10-31 23:27:23 公開日:2024-10-11
# 誰がウェブブラウザを使うのか? アメリカにおけるブラウザのフィンガープリントにおける人口統計学の役割 How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users ( http://arxiv.org/abs/2410.06954v2 ) ライセンス: Link先を確認	Alex Berke, Enrico Bacis, Badih Ghazi, Pritish Kamath, Ravi Kumar, Robin Lassonde, Pasin Manurangsi, Umar Syed,	(参考訳) ブラウザのフィンガープリントは、ユーザーのデバイスから属性を収集してユニークな「指紋」を作成することで、クッキーなしでもウェブ上のユーザーを特定し、追跡するために使用することができる。この技術と結果として生じるプライバシーリスクは10年以上にわたって研究されてきた。しかし、先行研究ではデータが公開されていないため、さらなる研究は限られている。さらに、先行研究のデータにはユーザーの人口統計が欠如していた。ここでは、さらなる研究を可能にするための第一種データセットを提供する。これには、ユーザの人口統計と調査回答によるブラウザ属性が含まれ、8,400人の米国研究参加者からインフォームドコンセントで収集された。このデータセットを用いて、人口集団間で指紋認証のリスクがどのように異なるかを示す。例えば、低所得のユーザはリスクが高く、ユーザーの年齢が上がるにつれて、どちらも指紋認証や実際の指紋認証のリスクに気を遣う傾向にある。さらに, 性別, 年齢, 所得水準, 人種などのユーザ人口層を, 指紋認証によく使用されるブラウザ属性から推定し, このリスクに最も寄与するブラウザ属性を特定する。また,オープンな研究のためにブラウザデータを共有する可能性について,12,461人の参加者から回答を得て,今後のデータ収集にどのような影響があるかを調べる実験を行った。女性参加者は、私たちが収集したブラウザデータを表示するように、ブラウザデータをシェアする傾向が著しく低かった。全体として、指紋認証のリスクを評価し、ユーザプライバシを改善することを目的として、現在進行中の作業において、ユーザ人口統計学が重要な役割を担っていることを示す。私たちが提供しているデータセットとデータ収集ツールは、この研究で対処されていない研究の質問をさらに研究するために使用できます。 Browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies used data not publicly available. Additionally, data in prior studies lacked user demographics. Here we provide a first-of-its-kind dataset to enable further research. It includes browser attributes with users' demographics and survey responses, collected with informed consent from 8,400 US study participants. We use this dataset to demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with responses from 12,461 total participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect. Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we provide can be used to further study research questions not addressed in this work.	翻訳日:2024-10-31 23:27:23 公開日:2024-10-11
# ELMO:アップサンプリングによるリアルタイムLiDARモーションキャプチャ ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ( http://arxiv.org/abs/2410.06963v1 ) ライセンス: Link先を確認	Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee,	(参考訳) 本稿では,単一LiDARセンサ用に設計されたリアルタイムアップサンプリングモーションキャプチャフレームワークELMOを紹介する。 ELMOは、条件付き自己回帰変換器ベースのアップサンプリングモーションジェネレータとしてモデル化され、20fpsのLiDARポイントクラウドシーケンスから60fpsのモーションキャプチャを実現する。 ELMOの鍵となる特徴は、自覚機構と、運動と点雲のための慎重に設計された埋め込みモジュールとの結合であり、運動の質を著しく高めていることである。高精度なモーションキャプチャを実現するため,単一フレーム点雲からのユーザの骨格オフセットを予測可能な1時間スケルトンキャリブレーションモデルを開発した。さらに,LDARシミュレータを用いた新しいデータ拡張手法を導入し,グローバルなルート追跡を強化し,環境理解を向上させる。提案手法の有効性を示すため,ELMOと画像ベースと点クラウドベースのモーションキャプチャにおける最先端の手法を比較した。設計原則を検証するために、さらにアブレーション研究を行います。 ELMOの高速推論時間はリアルタイムアプリケーションに適しており、ライブストリーミングとインタラクティブなゲームシナリオを備えたデモビデオで例示されています。さらに,20種類の被験者を対象とする高品質なLiDARモキャップ同期データセットの提供も行い,今後の研究に有用な資料となる。データセットと評価コードは {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}} で公開されている。 This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}}	翻訳日:2024-10-31 23:17:38 公開日:2024-10-11
# ELMO:アップサンプリングによるリアルタイムLiDARモーションキャプチャ ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ( http://arxiv.org/abs/2410.06963v2 ) ライセンス: Link先を確認	Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee,	(参考訳) 本稿では,単一LiDARセンサ用に設計されたリアルタイムアップサンプリングモーションキャプチャフレームワークELMOを紹介する。 ELMOは、条件付き自己回帰変換器ベースのアップサンプリングモーションジェネレータとしてモデル化され、20fpsのLiDARポイントクラウドシーケンスから60fpsのモーションキャプチャを実現する。 ELMOの鍵となる特徴は、自覚機構と、運動と点雲のための慎重に設計された埋め込みモジュールとの結合であり、運動の質を著しく高めていることである。高精度なモーションキャプチャを実現するため,単一フレーム点雲からのユーザの骨格オフセットを予測可能な1時間スケルトンキャリブレーションモデルを開発した。さらに,LDARシミュレータを用いた新しいデータ拡張手法を導入し,グローバルなルート追跡を強化し,環境理解を向上させる。提案手法の有効性を示すため,ELMOと画像ベースと点クラウドベースのモーションキャプチャにおける最先端の手法を比較した。設計原則を検証するために、さらにアブレーション研究を行います。 ELMOの高速推論時間はリアルタイムアプリケーションに適しており、ライブストリーミングとインタラクティブなゲームシナリオを備えたデモビデオで例示されています。さらに,20種類の被験者を対象とする高品質なLiDARモキャップ同期データセットの提供も行い,今後の研究に有用な資料となる。データセットと評価コードは {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}} で公開されている。 This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}}	翻訳日:2024-10-31 23:17:38 公開日:2024-10-11
# Bridge the Points:グラフベースのFew-shotセグメンテーション Bridge the Points: Graph-based Few-shot Segment Anything Semantically ( http://arxiv.org/abs/2410.06964v1 ) ライセンス: Link先を確認	Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei,	(参考訳) 近年の大規模事前訓練技術の進歩により、視覚基盤モデルの能力が大幅に向上し、特に、点と箱のプロンプトに基づいて正確なマスクを生成できるセグメンツ・アシング・モデル(SAM)が注目されている。近年の研究では、SAMをFew-shot Semantic Segmentation (FSS)に拡張し、SAMベースの自動セマンティックセマンティックセマンティックセマンティックセマンティクスの迅速な生成に焦点を当てている。しかし、これらの手法は適切なプロンプトの選択に苦慮し、異なるシナリオに対して特定のハイパーパラメータ設定が必要であり、SAMの過剰使用によるワンショット推論時間が長くなるため、効率が低下し、自動化能力が制限される。これらの問題に対処するため,グラフ解析に基づく簡易かつ効果的な手法を提案する。特に、Positive-Negative Alignmentモジュールは、マスクを生成するためのポイントプロンプトを動的に選択する。その後のポイント・マスク・クラスタリングモジュールは、ポイント上のマスクカバレッジに基づいて、マスクと選択されたポイントの粒度を有向グラフとして整列する。これらの点は、有向グラフの弱連結成分を効率的な方法で分解し、異なる自然クラスターを構成することによって集約される。最後に、グラフベースの粒度アライメントの恩恵を受け、高信頼マスクを集約し、最終的な予測のために偽陽性マスクをフィルタリングし、追加のハイパーパラメータと冗長マスクの生成を減らす。標準FSS、ワンショット部分セグメンテーション、クロスドメインFSSデータセットの広範な実験分析は、提案手法の有効性と効率を検証し、COCO-20iでは58.7%、LVIS-92iでは35.2%のmIoUで最先端のジェネラリストモデルを上回った。コードはhttps://andyzaq.github.io/GF-SAM/で公開されている。 The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.	翻訳日:2024-10-31 23:17:38 公開日:2024-10-11
# Bridge the Points:グラフベースのFew-shotセグメンテーション Bridge the Points: Graph-based Few-shot Segment Anything Semantically ( http://arxiv.org/abs/2410.06964v2 ) ライセンス: Link先を確認	Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei,	(参考訳) 近年の大規模事前訓練技術の進歩により、視覚基盤モデルの能力が大幅に向上し、特に、点と箱のプロンプトに基づいて正確なマスクを生成できるセグメンツ・アシング・モデル(SAM)が注目されている。近年の研究では、SAMをFew-shot Semantic Segmentation (FSS)に拡張し、SAMベースの自動セマンティックセマンティックセマンティックセマンティックセマンティクスの迅速な生成に焦点を当てている。しかし、これらの手法は適切なプロンプトの選択に苦慮し、異なるシナリオに対して特定のハイパーパラメータ設定が必要であり、SAMの過剰使用によるワンショット推論時間が長くなるため、効率が低下し、自動化能力が制限される。これらの問題に対処するため,グラフ解析に基づく簡易かつ効果的な手法を提案する。特に、Positive-Negative Alignmentモジュールは、マスクを生成するためのポイントプロンプトを動的に選択する。その後のポイント・マスク・クラスタリングモジュールは、ポイント上のマスクカバレッジに基づいて、マスクと選択されたポイントの粒度を有向グラフとして整列する。これらの点は、有向グラフの弱連結成分を効率的な方法で分解し、異なる自然クラスターを構成することによって集約される。最後に、グラフベースの粒度アライメントの恩恵を受け、高信頼マスクを集約し、最終的な予測のために偽陽性マスクをフィルタリングし、追加のハイパーパラメータと冗長マスクの生成を減らす。標準FSS、ワンショット部分セグメンテーション、クロスドメインFSSデータセットの広範な実験分析は、提案手法の有効性と効率を検証し、COCO-20iでは58.7%、LVIS-92iでは35.2%のmIoUで最先端のジェネラリストモデルを上回った。コードはhttps://andyzaq.github.io/GF-SAM/で公開されている。 The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.	翻訳日:2024-10-31 23:17:38 公開日:2024-10-11
# 直流拡散:直流の場合、直線性は必要ではない Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow ( http://arxiv.org/abs/2410.07303v1 ) ライセンス: Link先を確認	Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, Hongsheng Li,	(参考訳) 拡散モデルは、視覚生成を大幅に改善したが、生成ODEを解くという計算集約的な性質のため、生成速度の遅さによって妨げられている。広く認識されている解である整流流は、ODEパスを直線化することで生成速度を向上させる。主な構成要素は以下のとおりである。 1) フローマッチングの拡散形式を用いる。 2)$\boldsymbol v$-predictionを採用し、 3) 整流(再流)を行う。そこで本稿では,事前学習した拡散モデルを用いて,一致したノイズとサンプルのペアを得るとともに,一致したノイズとサンプルのペアを再学習する手法を提案する。これに基づいて構成する。 1)と 2)不要。さらに, 直線性は整合に不可欠な訓練対象ではなく, 流れマッチングモデルの特定の事例であることも強調する。より重要なトレーニングターゲットは、DDPMやSub-VPのようなモデルに対して本質的に湾曲した一階近似ODEパスを達成することである。この知見に基づいて、フローマッチングモデルに制限されるのではなく、より広い範囲の拡散モデルを含むように、設計空間と修正の応用範囲を一般化するRectified Diffusionを提案する。安定拡散v1-5と安定拡散XLについて検証した。本手法は,修正フローベース以前の作業(例えばInstaFlow)のトレーニング手順を大幅に単純化するだけでなく,トレーニングコストの低減を図り,優れたパフォーマンスを実現する。私たちのコードはhttps://github.com/G-U-N/Rectified-Diffusion.comで公開されています。 Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs. Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path. Its key components include: 1) using the diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. Based on this, components 1) and 2) are unnecessary. Furthermore, we highlight that straightness is not an essential training target for rectification; rather, it is a specific case of flow-matching models. The more critical training target is to achieve a first-order approximate ODE path, which is inherently curved for models like DDPM and Sub-VP. Building on this insight, we propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models, rather than being restricted to flow-matching models. We validate our method on Stable Diffusion v1-5 and Stable Diffusion XL. Our method not only greatly simplifies the training procedure of rectified flow-based previous works (e.g., InstaFlow) but also achieves superior performance with even lower training cost. Our code is available at https://github.com/G-U-N/Rectified-Diffusion.	翻訳日:2024-10-31 21:06:44 公開日:2024-10-11
# 直流拡散:直流の場合、直線性は必要ではない Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow ( http://arxiv.org/abs/2410.07303v2 ) ライセンス: Link先を確認	Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, Hongsheng Li,	(参考訳) 拡散モデルは、視覚生成を大幅に改善したが、生成ODEを解くという計算集約的な性質のため、生成速度の遅さによって妨げられている。広く認識されている解である整流流は、ODEパスを直線化することで生成速度を向上させる。主な構成要素は以下のとおりである。 1) フローマッチングの拡散形式を用いる。 2)$\boldsymbol v$-predictionを採用し、 3) 整流(再流)を行う。そこで本稿では,事前学習した拡散モデルを用いて,一致したノイズとサンプルのペアを得るとともに,一致したノイズとサンプルのペアを再学習する手法を提案する。これに基づいて構成する。 1)と 2)不要。さらに, 直線性は整合に不可欠な訓練対象ではなく, 流れマッチングモデルの特定の事例であることも強調する。より重要なトレーニングターゲットは、DDPMやSub-VPのようなモデルに対して本質的に湾曲した一階近似ODEパスを達成することである。この知見に基づいて、フローマッチングモデルに制限されるのではなく、より広い範囲の拡散モデルを含むように、設計空間と修正の応用範囲を一般化するRectified Diffusionを提案する。安定拡散v1-5と安定拡散XLについて検証した。本手法は,修正フローベース以前の作業(例えばInstaFlow)のトレーニング手順を大幅に単純化するだけでなく,トレーニングコストの低減を図り,優れたパフォーマンスを実現する。私たちのコードはhttps://github.com/G-U-N/Rectified-Diffusion.comで公開されています。 Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs. Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path. Its key components include: 1) using the diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. Based on this, components 1) and 2) are unnecessary. Furthermore, we highlight that straightness is not an essential training target for rectification; rather, it is a specific case of flow-matching models. The more critical training target is to achieve a first-order approximate ODE path, which is inherently curved for models like DDPM and Sub-VP. Building on this insight, we propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models, rather than being restricted to flow-matching models. We validate our method on Stable Diffusion v1-5 and Stable Diffusion XL. Our method not only greatly simplifies the training procedure of rectified flow-based previous works (e.g., InstaFlow) but also achieves superior performance with even lower training cost. Our code is available at https://github.com/G-U-N/Rectified-Diffusion.	翻訳日:2024-10-31 21:06:44 公開日:2024-10-11
# DA-Code:大規模言語モデルのためのエージェントデータサイエンスコード生成ベンチマーク DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models ( http://arxiv.org/abs/2410.07331v1 ) ライセンス: Link先を確認	Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu,	(参考訳) 本稿では,エージェントベースのデータサイエンスタスク上でのLCMの評価に特化して設計されたコード生成ベンチマークであるDA-Codeを紹介する。まず、DA-Code内のタスクは本質的に困難で、従来のコード生成タスクとは分離され、基礎と計画において高度なコーディングスキルが要求されます。次に、DA-Codeの例は、すべて実データと多種多様なデータに基づいており、幅広い複雑なデータラングリングと分析タスクをカバーしている。第三に、これらの課題を解決するためには、複雑なデータサイエンスプログラミング言語を使用し、複雑なデータ処理を実行し、答えを導出する必要がある。私たちは、実世界のデータ分析シナリオと整合し、スケーラブルな、制御可能で実行可能な環境にベンチマークをセットアップしました。アノテーションは評価スイートを慎重に設計し、評価の精度と堅牢性を確保する。我々はDA-Agentベースラインを開発する。実験によると、ベースラインは他の既存のフレームワークよりも優れているが、現在の最高のLCMを使用すると、わずか30.5%の精度しか得られず、改善の余地は十分にある。ベンチマークは[https://da-code-bench.github.io](https://da-code-bench.github.io]で公開しています。 We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at [https://da-code-bench.github.io](https://da-code-bench.github.io).	翻訳日:2024-10-31 20:56:57 公開日:2024-10-11
# DA-Code:大規模言語モデルのためのエージェントデータサイエンスコード生成ベンチマーク DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models ( http://arxiv.org/abs/2410.07331v2 ) ライセンス: Link先を確認	Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu,	(参考訳) 本稿では,エージェントベースのデータサイエンスタスク上でのLCMの評価に特化して設計されたコード生成ベンチマークであるDA-Codeを紹介する。まず、DA-Code内のタスクは本質的に困難で、従来のコード生成タスクとは分離され、基礎と計画において高度なコーディングスキルが要求されます。次に、DA-Codeの例は、すべて実データと多種多様なデータに基づいており、幅広い複雑なデータラングリングと分析タスクをカバーしている。第三に、これらの課題を解決するためには、複雑なデータサイエンスプログラミング言語を使用し、複雑なデータ処理を実行し、答えを導出する必要がある。私たちは、実世界のデータ分析シナリオと整合し、スケーラブルな、制御可能で実行可能な環境にベンチマークをセットアップしました。アノテーションは評価スイートを慎重に設計し、評価の精度と堅牢性を確保する。我々はDA-Agentベースラインを開発する。実験によると、ベースラインは他の既存のフレームワークよりも優れているが、現在の最高のLCMを使用すると、わずか30.5%の精度しか得られず、改善の余地は十分にある。ベンチマークはhttps://da-code-bench.github.io.comで公開しています。 We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at https://da-code-bench.github.io.	翻訳日:2024-10-31 20:56:57 公開日:2024-10-11
# レッドウッド混交林におけるNRF加速生態モニタリング NeRF-Accelerated Ecological Monitoring in Mixed-Evergreen Redwood Forest ( http://arxiv.org/abs/2410.07418v1 ) ライセンス: Link先を確認	Adam Korycki, Cory Yeaton, Gregory S. Gilbert, Colleen Josephson, Steve McGuire,	(参考訳) 森林マッピングは、森林環境の動態を理解するために必要な重要な観測データを提供する。特に、乳房の高さにおける樹径(DBH)は、森林バイオマスと二酸化炭素(CO$_2$)の隔離を推定するために用いられる指標である。森林マッピングのマニュアル手法は労働集約的かつ時間を要するものであり、大規模な地図作成のボトルネックとなっている。自動マッピングは、通常点雲の形で、密集した森林の復元に頼っている。地上レーザースキャン(TLS)と移動レーザースキャン(MLS)は高価なLiDARセンシングを用いて点雲を生成し、木径の推定に成功している。ニューラルレイディアンスフィールド(NeRF)は、入力ビューのスパースセットでニューラルネットワークをトレーニングすることで、フォトリアリスティックで視覚に基づく再構築を可能にする創発的技術である。本稿では,混交常緑樹林における幹径推定を目的としたMLSとNeRF林の復元の比較を行った。さらに,コンベックス・ハルモデルを用いたDBH推定法を提案する。このアプローチを用いて1.68cmのRMSEを達成し、標準シリンダーモデリング手法を一貫して上回った。コードコントリビューションとフォレストデータセットはhttps://github.com/harelab-ucsc/RedwoodNeRF.comで無償公開しています。 Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide (CO$_2$) sequestration. Manual methods of forest mapping are labor intensive and time consuming, a bottleneck for large-scale mapping efforts. Automated mapping relies on acquiring dense forest reconstructions, typically in the form of point clouds. Terrestrial laser scanning (TLS) and mobile laser scanning (MLS) generate point clouds using expensive LiDAR sensing, and have been used successfully to estimate tree diameter. Neural radiance fields (NeRFs) are an emergent technology enabling photorealistic, vision-based reconstruction by training a neural network on a sparse set of input views. In this paper, we present a comparison of MLS and NeRF forest reconstructions for the purpose of trunk diameter estimation in a mixed-evergreen Redwood forest. In addition, we propose an improved DBH-estimation method using convex-hull modeling. Using this approach, we achieved 1.68 cm RMSE, which consistently outperformed standard cylinder modeling approaches. Our code contributions and forest datasets are freely available at https://github.com/harelab-ucsc/RedwoodNeRF.	翻訳日:2024-10-31 20:37:14 公開日:2024-10-11
# レッドウッド混交林におけるNRF加速生態モニタリング NeRF-Accelerated Ecological Monitoring in Mixed-Evergreen Redwood Forest ( http://arxiv.org/abs/2410.07418v2 ) ライセンス: Link先を確認	Adam Korycki, Cory Yeaton, Gregory S. Gilbert, Colleen Josephson, Steve McGuire,	(参考訳) 森林マッピングは、森林環境の動態を理解するために必要な重要な観測データを提供する。特に、乳房の高さにおける樹径(DBH)は、森林のバイオマスと二酸化炭素の隔離を推定するために用いられる指標である。森林マッピングのマニュアル手法は労働集約的かつ時間を要するものであり、大規模な地図作成のボトルネックとなっている。自動マッピングは、通常点雲の形で、密集した森林の復元に頼っている。地上レーザースキャン(TLS)と移動レーザースキャン(MLS)は高価なLiDARセンシングを用いて点雲を生成し、木径の推定に成功している。ニューラルレイディアンスフィールド(NeRF)は、入力ビューのスパースセットでニューラルネットワークをトレーニングすることで、フォトリアリスティックで視覚に基づく再構築を可能にする創発的技術である。本稿では,混交常緑樹林における幹径推定を目的としたMLSとNeRF林の復元の比較を行った。さらに,コンベックス・ハルモデルを用いたDBH推定法を提案する。このアプローチを用いて1.68cmのRMSEを達成し、標準シリンダーモデリング手法を一貫して上回った。コードコントリビューションとフォレストデータセットはhttps://github.com/harelab-ucsc/RedwoodNeRF.comで無償公開しています。 Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide sequestration. Manual methods of forest mapping are labor intensive and time consuming, a bottleneck for large-scale mapping efforts. Automated mapping relies on acquiring dense forest reconstructions, typically in the form of point clouds. Terrestrial laser scanning (TLS) and mobile laser scanning (MLS) generate point clouds using expensive LiDAR sensing, and have been used successfully to estimate tree diameter. Neural radiance fields (NeRFs) are an emergent technology enabling photorealistic, vision-based reconstruction by training a neural network on a sparse set of input views. In this paper, we present a comparison of MLS and NeRF forest reconstructions for the purpose of trunk diameter estimation in a mixed-evergreen Redwood forest. In addition, we propose an improved DBH-estimation method using convex-hull modeling. Using this approach, we achieved 1.68 cm RMSE, which consistently outperformed standard cylinder modeling approaches. Our code contributions and forest datasets are freely available at https://github.com/harelab-ucsc/RedwoodNeRF.	翻訳日:2024-10-31 20:37:14 公開日:2024-10-11
# インシシット・ネットワークの家族に対する一般化境界 A Generalization Bound for a Family of Implicit Networks ( http://arxiv.org/abs/2410.07427v1 ) ライセンス: Link先を確認	Samy Wu Fung, Benjamin Berkels,	(参考訳) インプリシット・ネットワーク(英: Implicit Network)は、パラメータ化された演算子の固定点によって出力が定義されるニューラルネットワークのクラスである。彼らは自然言語処理、画像処理、その他多くのアプリケーションを含む多くのアプリケーションで成功を収めてきた。彼らは経験的成功を多く見出しているが、その一般化に関する理論的な研究はまだ未定である。本研究では、パラメータ化されたパラメータ化固定点演算子を定義する暗黙ネットワークの大規模なファミリーを考える。これらのアーキテクチャのラデマッハ複雑性の被覆数論に基づいて、このクラスに有界な一般化を示す。 Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.	翻訳日:2024-10-31 20:37:14 公開日:2024-10-11
# インシシット・ネットワークの家族に対する一般化境界 A Generalization Bound for a Family of Implicit Networks ( http://arxiv.org/abs/2410.07427v2 ) ライセンス: Link先を確認	Samy Wu Fung, Benjamin Berkels,	(参考訳) インプリシット・ネットワーク(英: Implicit Network)は、パラメータ化された演算子の固定点によって出力が定義されるニューラルネットワークのクラスである。彼らは自然言語処理、画像処理、その他多くのアプリケーションを含む多くのアプリケーションで成功を収めてきた。彼らは経験的成功を多く見出しているが、その一般化に関する理論的な研究はまだ未定である。本研究では、パラメータ化されたパラメータ化固定点演算子を定義する暗黙ネットワークの大規模なファミリーを考える。これらのアーキテクチャのラデマッハ複雑性の被覆数論に基づいて、このクラスに有界な一般化を示す。 Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.	翻訳日:2024-10-31 20:37:14 公開日:2024-10-11
# 実世界の物体検出のための自己監督型学習:サーベイ Self-Supervised Learning for Real-World Object Detection: a Survey ( http://arxiv.org/abs/2410.07442v1 ) ライセンス: Link先を確認	Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le Hégarat-Mascle, Arnaud Woiselle,	(参考訳) 自己監視学習(SSL)はコンピュータビジョンにおいて有望なアプローチとして登場し、大規模なラベルなしデータセットから意味のある表現をネットワークで学習することを可能にする。 SSLメソッドは、インスタンス識別とMasked Image Modeling(MIM)の2つの主要なカテゴリに分類される。インスタンスの識別はSSLの基本であるが、元々は分類のために設計されており、特に小さなオブジェクトに対して、オブジェクト検出にはあまり効果がない可能性がある。本研究では,実世界のオブジェクト検出に適したSSL手法に着目し,複雑な環境下での小さなオブジェクトの検出に重点を置いている。従来の調査と異なり、オブジェクトレベルのインスタンス識別とMIMメソッドを含むSSL戦略を詳細に比較し、CNNとViTベースのアーキテクチャの両方を用いた小さなオブジェクト検出の有効性を評価する。具体的には、我々のベンチマークは、広範に使用されているCOCOデータセットと、赤外線リモートセンシング画像における車両検出に焦点を当てた特殊な現実世界データセットに基づいて実施されている。また、カスタムドメイン固有のデータセットに対する事前トレーニングの影響を評価し、未処理のデータを扱うのにSSL戦略がいかに適しているかを強調します。分析の結果,インスタンス識別手法はCNNベースのエンコーダとよく似ており,MIM法はViTベースのアーキテクチャやカスタムデータセットの事前学習に適していることがわかった。この調査は、バックボーンアーキテクチャ、オブジェクトサイズ、カスタム事前トレーニング要件などの要因を考慮して、最適なSSL戦略を選択するための実用的なガイドを提供する。最終的に、適切なSSL事前トレーニング戦略を選択することは、適切なエンコーダとともに、現実世界のオブジェクト検出におけるパフォーマンスを著しく向上させることを示す。 Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on the widely-used COCO dataset, as well as on a specialized real-world dataset focused on vehicle detection in infrared remote sensing imagery. We also assess the impact of pre-training on custom domain-specific datasets, highlighting how certain SSL strategies are better suited for handling uncurated data. Our findings highlight that instance discrimination methods perform well with CNN-based encoders, while MIM methods are better suited for ViT-based architectures and custom dataset pre-training. This survey provides a practical guide for selecting optimal SSL strategies, taking into account factors such as backbone architecture, object size, and custom pre-training requirements. Ultimately, we show that choosing an appropriate SSL pre-training strategy, along with a suitable encoder, significantly enhances performance in real-world object detection, particularly for small object detection in frugal settings.	翻訳日:2024-10-31 17:06:37 公開日:2024-10-11
# 実世界の物体検出のための自己監督型学習:サーベイ Self-Supervised Learning for Real-World Object Detection: a Survey ( http://arxiv.org/abs/2410.07442v2 ) ライセンス: Link先を確認	Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le Hégarat-Mascle, Arnaud Woiselle,	(参考訳) 自己監視学習(SSL)はコンピュータビジョンにおいて有望なアプローチとして登場し、大規模なラベルなしデータセットから意味のある表現をネットワークで学習することを可能にする。 SSLメソッドは、インスタンス識別とMasked Image Modeling(MIM)の2つの主要なカテゴリに分類される。インスタンスの識別はSSLの基本であるが、元々は分類のために設計されており、特に小さなオブジェクトに対して、オブジェクト検出にはあまり効果がない可能性がある。本研究では,実世界のオブジェクト検出に適したSSL手法に着目し,複雑な環境下での小さなオブジェクトの検出に重点を置いている。従来の調査と異なり、オブジェクトレベルのインスタンス識別とMIMメソッドを含むSSL戦略を詳細に比較し、CNNとViTベースのアーキテクチャの両方を用いた小さなオブジェクト検出の有効性を評価する。具体的には、我々のベンチマークは、広範に使用されているCOCOデータセットと、赤外線リモートセンシング画像における車両検出に焦点を当てた特殊な現実世界データセットに基づいて実施されている。また、カスタムドメイン固有のデータセットに対する事前トレーニングの影響を評価し、未処理のデータを扱うのにSSL戦略がいかに適しているかを強調します。分析の結果,インスタンス識別手法はCNNベースのエンコーダとよく似ており,MIM法はViTベースのアーキテクチャやカスタムデータセットの事前学習に適していることがわかった。この調査は、バックボーンアーキテクチャ、オブジェクトサイズ、カスタム事前トレーニング要件などの要因を考慮して、最適なSSL戦略を選択するための実用的なガイドを提供する。最終的に、適切なSSL事前トレーニング戦略を選択することは、適切なエンコーダとともに、現実世界のオブジェクト検出におけるパフォーマンスを著しく向上させることを示す。 Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on the widely-used COCO dataset, as well as on a specialized real-world dataset focused on vehicle detection in infrared remote sensing imagery. We also assess the impact of pre-training on custom domain-specific datasets, highlighting how certain SSL strategies are better suited for handling uncurated data. Our findings highlight that instance discrimination methods perform well with CNN-based encoders, while MIM methods are better suited for ViT-based architectures and custom dataset pre-training. This survey provides a practical guide for selecting optimal SSL strategies, taking into account factors such as backbone architecture, object size, and custom pre-training requirements. Ultimately, we show that choosing an appropriate SSL pre-training strategy, along with a suitable encoder, significantly enhances performance in real-world object detection, particularly for small object detection in frugal settings.	翻訳日:2024-10-31 17:06:37 公開日:2024-10-11
# SEAL:二レベルデータ選択による安全性向上LLMファインチューニング SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection ( http://arxiv.org/abs/2410.07471v1 ) ライセンス: Link先を確認	Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen,	(参考訳) ダウンストリームパフォーマンスを高めるためにタスク固有のデータを微調整することは、LLM(Large Language Models)を活用する上で重要なステップである。しかし、以前の研究では、いくつかの反対サンプルや良質なデータにモデルを微調整することで、モデルが予め装備されたアライメントと安全性の能力を大きく構成できることが示されている。本研究では,LLMファインチューニングにおける安全性向上のための新しいフレームワークであるSEALを提案する。 SEALは、二段階最適化に基づいてデータローカを学習し、安全で高品質な微調整データをランク付けし、安全でないものや低品質なものをランク付けする。 SEALで訓練されたモデルは、Llama-3-8b-InstructモデルとMerlinite-7bモデルでそれぞれランダム選択と比較して8.5%と9.7%の勝利率で、複数のベースラインよりも優れた品質を示している。私たちのコードはgithub https://github.com/hanshen95/SEAL.comで利用可能です。 Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# SEAL:二レベルデータ選択による安全性向上LLMファインチューニング SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection ( http://arxiv.org/abs/2410.07471v2 ) ライセンス: Link先を確認	Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen,	(参考訳) ダウンストリームパフォーマンスを高めるためにタスク固有のデータを微調整することは、LLM(Large Language Models)を活用する上で重要なステップである。しかし、以前の研究では、いくつかの反対サンプルや良質なデータにモデルを微調整することで、モデルが予め装備されたアライメントと安全性の能力を大きく構成できることが示されている。本研究では,LLMファインチューニングにおける安全性向上のための新しいフレームワークであるSEALを提案する。 SEALは、二段階最適化に基づいてデータローカを学習し、安全で高品質な微調整データをランク付けし、安全でないものや低品質なものをランク付けする。 SEALで訓練されたモデルは、Llama-3-8b-InstructモデルとMerlinite-7bモデルでそれぞれランダム選択と比較して8.5%と9.7%の勝利率で、複数のベースラインよりも優れた品質を示している。私たちのコードはgithub https://github.com/hanshen95/SEAL.comで利用可能です。 Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# 機械的解釈の統一と検証--グループ運用を事例として Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations ( http://arxiv.org/abs/2410.07476v1 ) ライセンス: Link先を確認	Wilson Wu, Louis Jaburi, Jacob Drori, Jason Gross,	(参考訳) 機械論的解釈可能性に関する最近の研究は、有限群の二項演算で訓練されたニューラルネットワークによって実行される計算のリバースエンジニアリングに焦点が当てられている。我々は、このタスクで訓練された一層ニューラルネットワークの内部を調査し、未同定構造を明らかにし、過去の作品の説明を統一するモデルについてより完全な記述を生成する。特に、これらのモデルは各入力引数の同値である。我々は,モデル理解の定量的評価であるモデル性能のコンパクトな証明に翻訳することで,この課題を訓練した少数のネットワークに適用できることを確認した。特に、この説明は、ブルート力の30%の時間で走るモデルの精度を保証し、トレーニングしたモデルの45%に対して >=95% の精度を与える。従来の研究からの説明だけでは,非自明な非空洞的精度境界が得られなかった。 A recent line of work in mechanistic interpretability has focused on reverse-engineering the computation performed by neural networks trained on the binary operation of finite groups. We investigate the internals of one-hidden-layer neural networks trained on this task, revealing previously unidentified structure and producing a more complete description of such models that unifies the explanations of previous works. Notably, these models approximate equivariance in each input argument. We verify that our explanation applies to a large fraction of networks trained on this task by translating it into a compact proof of model performance, a quantitative evaluation of model understanding. In particular, our explanation yields a guarantee of model accuracy that runs in 30% the time of brute force and gives a >=95% accuracy bound for 45% of the models we trained. We were unable to obtain nontrivial non-vacuous accuracy bounds using only explanations from previous works.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# 機械的解釈の統一と検証--グループ運用を事例として Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations ( http://arxiv.org/abs/2410.07476v2 ) ライセンス: Link先を確認	Wilson Wu, Louis Jaburi, Jacob Drori, Jason Gross,	(参考訳) 機械論的解釈可能性に関する最近の研究は、有限群の二項演算で訓練されたニューラルネットワークによって実行される計算のリバースエンジニアリングに焦点が当てられている。我々は、このタスクで訓練された一層ニューラルネットワークの内部を調査し、未同定構造を明らかにし、過去の作品の説明を統一するモデルについてより完全な記述を生成する。特に、これらのモデルは各入力引数の同値である。我々は,モデル理解の定量的評価であるモデル性能のコンパクトな証明に翻訳することで,この課題を訓練した少数のネットワークに適用できることを確認した。特に、この説明は、ブルート力の30%の時間で走るモデルの精度を保証し、トレーニングしたモデルの45%に対して >=95% の精度を与える。従来の研究からの説明だけでは,非自明な非空洞的精度境界が得られなかった。 A recent line of work in mechanistic interpretability has focused on reverse-engineering the computation performed by neural networks trained on the binary operation of finite groups. We investigate the internals of one-hidden-layer neural networks trained on this task, revealing previously unidentified structure and producing a more complete description of such models that unifies the explanations of previous works. Notably, these models approximate equivariance in each input argument. We verify that our explanation applies to a large fraction of networks trained on this task by translating it into a compact proof of model performance, a quantitative evaluation of model understanding. In particular, our explanation yields a guarantee of model accuracy that runs in 30% the time of brute force and gives a >=95% accuracy bound for 45% of the models we trained. We were unable to obtain nontrivial non-vacuous accuracy bounds using only explanations from previous works.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# WALL-E: ルール学習による世界アライメントによる世界モデルベースLLMエージェントの改善 WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents ( http://arxiv.org/abs/2410.07484v1 ) ライセンス: Link先を確認	Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang,	(参考訳) 大規模言語モデル(LLM)はモデルベースエージェントの強力な世界モデルとして直接機能するのか? LLMの以前の知識と特定の環境のダイナミクスのギャップは存在するが、LLMをその展開環境と整合させることでギャップを橋渡しすることができ、LLMのルール学習によって「世界整合性」を効果的に達成できることが本研究で明らかとなった。 LLMの豊富な事前知識を考えると、LLM予測と指定された環境力学を整合させるのに十分なルールはいくつかしかない。そこで本研究では,エージェント探索軌道と世界モデル予測との比較に基づいて,これらの規則を LLM を通して学習するニューロシンボリックアプローチを提案する。結果として得られる世界モデルは、LLMと学習ルールから構成される。我々のLLMエージェントWALL-Eはモデル予測制御(MPC)に基づいて構築されている。精密世界モデルに基づくルックアヘッド動作の最適化により、MPCは探索と学習効率を大幅に改善する。既存のLLMエージェントと比較して、WALL-Eの推論は、LPM入力に含まれる冗長なバッファ付き軌道ではなく、いくつかの主規則のみを必要とする。 MinecraftとALFWorldのオープンワールドチャレンジでは、WALL-Eは既存の方法よりも成功率が高く、時間計画のコストが低く、推論に使用されるトークンの数も少ない。 Minecraftでは、WALL-Eは成功率を15-30%上回り、リプランラウンドのコストは8-20で、トークンの60-80%に過ぎなかった。 ALFWorldでは、成功率は6回の反復で95%という新記録に達した。 Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# WALL-E: ルール学習による世界アライメントによる世界モデルベースLLMエージェントの改善 WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents ( http://arxiv.org/abs/2410.07484v2 ) ライセンス: Link先を確認	Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang,	(参考訳) 大規模言語モデル(LLM)はモデルベースエージェントの強力な世界モデルとして直接機能するのか? LLMの以前の知識と特定の環境のダイナミクスのギャップは存在するが、LLMをその展開環境と整合させることでギャップを橋渡しすることができ、LLMのルール学習によって「世界整合性」を効果的に達成できることが本研究で明らかとなった。 LLMの豊富な事前知識を考えると、LLM予測と指定された環境力学を整合させるのに十分なルールはいくつかしかない。そこで本研究では,エージェント探索軌道と世界モデル予測との比較に基づいて,これらの規則を LLM を通して学習するニューロシンボリックアプローチを提案する。結果として得られる世界モデルは、LLMと学習ルールから構成される。我々のLLMエージェントWALL-Eはモデル予測制御(MPC)に基づいて構築されている。精密世界モデルに基づくルックアヘッド動作の最適化により、MPCは探索と学習効率を大幅に改善する。既存のLLMエージェントと比較して、WALL-Eの推論は、LPM入力に含まれる冗長なバッファ付き軌道ではなく、いくつかの主規則のみを必要とする。 MinecraftとALFWorldのオープンワールドチャレンジでは、WALL-Eは既存の方法よりも成功率が高く、時間計画のコストが低く、推論に使用されるトークンの数も少ない。 Minecraftでは、WALL-Eは成功率を15-30%上回り、リプランラウンドのコストは8-20で、トークンの60-80%に過ぎなかった。 ALFWorldでは、成功率は6回の反復で95%という新記録に達した。 Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.	翻訳日:2024-10-31 16:56:23 公開日:2024-10-11
# 単一核スピンquditの量子性証明 Certifying the quantumness of a single nuclear spin qudit through its uniform precession ( http://arxiv.org/abs/2410.07641v1 ) ライセンス: Link先を確認	Arjen Vaartjes, Martin Nurizzo, Lin Htoo Zaw, Benjamin Wilhelm, Xi Yu, Danielle Holmes, Daniel Schwienbacher, Anders Kringhøj, Mark R. van Blankenstein, Alexander M. Jakob, Fay E. Hudson, Kohei M. Itoh, Riley J. Murray, Robin Blume-Kohout, Namit Anand, Andrew S. Dzurak, David N. Jamieson, Valerio Scarani, Andrea Morello,	(参考訳) スピン・プレセッション(Spin precession)は、古典的な量子系の力学の教科書の例である。ここでは、核スピンの異方性状態の量子性を、その均一な先入観を通じて証明することで、この見解に挑戦する。この結果の鍵となるのは、予想値ではなく、先行するスピンの$x$-射影の正の値を測定し、半古典的なスピンコヒーレント状態に制限されないスピン > 1/2 qudit を用いることである。実験は1つのスピン-7/2$^{123}$Sb核上で行われ、シリコンナノエレクトロニクスデバイスに埋め込まれ、高忠実な準備、制御、投射的な単発読み出しが可能である。 Schr\\odinger cat state や他の核のBespoke state を用いて、19の標準偏差による古典的境界を破り、古典的な確率分布がこのスピンの沈み込みの統計を説明できないことを証明し、1つの原子スケールのクディットにおいて高い忠実度で量子資源状態を作成する能力を強調した。 Spin precession is a textbook example of dynamics of a quantum system that exactly mimics its classical counterpart. Here we challenge this view by certifying the quantumness of exotic states of a nuclear spin through its uniform precession. The key to this result is measuring the positivity, instead of the expectation value, of the $x$-projection of the precessing spin, and using a spin > 1/2 qudit, that is not restricted to semi-classical spin coherent states. The experiment is performed on a single spin-7/2 $^{123}$Sb nucleus, implanted in a silicon nanoelectronic device, amenable to high-fidelity preparation, control, and projective single-shot readout. Using Schr\"odinger cat states and other bespoke states of the nucleus, we violate the classical bound by 19 standard deviations, proving that no classical probability distribution can explain the statistic of this spin precession, and highlighting our ability to prepare quantum resource states with high fidelity in a single atomic-scale qudit.	翻訳日:2024-10-31 15:46:26 公開日:2024-10-11
# 核スピンquditの量子性に対する一様述語による証明 Certifying the quantumness of a nuclear spin qudit through its uniform precession ( http://arxiv.org/abs/2410.07641v2 ) ライセンス: Link先を確認	Arjen Vaartjes, Martin Nurizzo, Lin Htoo Zaw, Benjamin Wilhelm, Xi Yu, Danielle Holmes, Daniel Schwienbacher, Anders Kringhøj, Mark R. van Blankenstein, Alexander M. Jakob, Fay E. Hudson, Kohei M. Itoh, Riley J. Murray, Robin Blume-Kohout, Namit Anand, Andrew S. Dzurak, David N. Jamieson, Valerio Scarani, Andrea Morello,	(参考訳) スピン・プレセッション(Spin precession)は、古典的な量子系の力学の教科書の例である。ここでは、核スピンの異方性状態の量子性を、その均一な先入観を通じて証明することで、この見解に挑戦する。この結果の鍵となるのは、予想値ではなく、先行するスピンの$x$-射影の正の値を測定し、半古典的なスピンコヒーレント状態に制限されないスピン > 1/2 qudit を用いることである。実験は1つのスピン-7/2$^{123}$Sb核上で行われ、シリコンナノエレクトロニクスデバイスに埋め込まれ、高忠実な準備、制御、投射的な単発読み出しが可能である。 Schr\\odinger cat state や他の核のBespoke state を用いて、19の標準偏差による古典的境界を破り、古典的な確率分布がこのスピンの沈み込みの統計を説明できないことを証明し、1つの原子スケールのクディットにおいて高い忠実度で量子資源状態を作成する能力を強調した。 Spin precession is a textbook example of dynamics of a quantum system that exactly mimics its classical counterpart. Here we challenge this view by certifying the quantumness of exotic states of a nuclear spin through its uniform precession. The key to this result is measuring the positivity, instead of the expectation value, of the $x$-projection of the precessing spin, and using a spin > 1/2 qudit, that is not restricted to semi-classical spin coherent states. The experiment is performed on a single spin-7/2 $^{123}$Sb nucleus, implanted in a silicon nanoelectronic device, amenable to high-fidelity preparation, control, and projective single-shot readout. Using Schr\"odinger cat states and other bespoke states of the nucleus, we violate the classical bound by 19 standard deviations, proving that no classical probability distribution can explain the statistic of this spin precession, and highlighting our ability to prepare quantum resource states with high fidelity in a single atomic-scale qudit.	翻訳日:2024-10-31 15:46:26 公開日:2024-10-11
# シャドウ検出のためのテスト時間強度整合性適応 Test-Time Intensity Consistency Adaptation for Shadow Detection ( http://arxiv.org/abs/2410.07695v1 ) ライセンス: Link先を確認	Leyi Zhu, Weihuang Liu, Xinyi Chen, Zimeng Li, Xuhang Chen, Zhen Wang, Chi-Man Pun,	(参考訳) 影検出はコンピュータビジョンにおける正確なシーン理解には不可欠であるが、照明、物体形状、シーンコンテキストのバリエーションによって引き起こされる多様な影の出現に課題がある。ディープラーニングモデルは、トレーニングデータセットのサイズと多様性が制限されているため、現実のイメージに一般化するのに苦労することが多い。これを解決するために,テスト時間適応時に光強度情報を活用する新しいフレームワークであるTICAを導入し,影検出精度を向上する。 TICAは、影領域全体での光強度の固有の矛盾を利用して、より一貫した予測に向けてモデルを導く。基本的なエンコーダ・デコーダモデルは、最初はシャドー検出のためのラベル付きデータセットでトレーニングされる。そして、テストフェーズにおいて、2つの拡張入力画像バージョン間で一貫した強度予測を行うことにより、各テストサンプルに対してネットワークを調整する。この整合性トレーニングは、特に前景と背景の交差点領域の両方を対象として、堅牢な適応のために画像内の影領域を正確に識別する。 ISTDおよびSBUシャドウ検出データセットの大規模な評価により、TICAが既存の最先端手法よりも優れており、バランスの取れた誤差率(BER)において優れた結果が得られることが明らかにされた。 Shadow detection is crucial for accurate scene understanding in computer vision, yet it is challenged by the diverse appearances of shadows caused by variations in illumination, object geometry, and scene context. Deep learning models often struggle to generalize to real-world images due to the limited size and diversity of training datasets. To address this, we introduce TICA, a novel framework that leverages light-intensity information during test-time adaptation to enhance shadow detection accuracy. TICA exploits the inherent inconsistencies in light intensity across shadow regions to guide the model toward a more consistent prediction. A basic encoder-decoder model is initially trained on a labeled dataset for shadow detection. Then, during the testing phase, the network is adjusted for each test sample by enforcing consistent intensity predictions between two augmented input image versions. This consistency training specifically targets both foreground and background intersection regions to identify shadow regions within images accurately for robust adaptation. Extensive evaluations on the ISTD and SBU shadow detection datasets reveal that TICA significantly demonstrates that TICA outperforms existing state-of-the-art methods, achieving superior results in balanced error rate (BER).	翻訳日:2024-10-31 15:25:43 公開日:2024-10-11
# シャドウ検出のためのテスト時間強度整合性適応 Test-Time Intensity Consistency Adaptation for Shadow Detection ( http://arxiv.org/abs/2410.07695v2 ) ライセンス: Link先を確認	Leyi Zhu, Weihuang Liu, Xinyi Chen, Zimeng Li, Xuhang Chen, Zhen Wang, Chi-Man Pun,	(参考訳) 影検出はコンピュータビジョンにおける正確なシーン理解には不可欠であるが、照明、物体形状、シーンコンテキストのバリエーションによって引き起こされる多様な影の出現に課題がある。ディープラーニングモデルは、トレーニングデータセットのサイズと多様性が制限されているため、現実のイメージに一般化するのに苦労することが多い。これを解決するために,テスト時間適応時に光強度情報を活用する新しいフレームワークであるTICAを導入し,影検出精度を向上する。 TICAは、影領域全体での光強度の固有の矛盾を利用して、より一貫した予測に向けてモデルを導く。基本的なエンコーダ・デコーダモデルは、最初はシャドー検出のためのラベル付きデータセットでトレーニングされる。そして、テストフェーズにおいて、2つの拡張入力画像バージョン間で一貫した強度予測を行うことにより、各テストサンプルに対してネットワークを調整する。この整合性トレーニングは、特に前景と背景の交差点領域の両方を対象として、堅牢な適応のために画像内の影領域を正確に識別する。 ISTDおよびSBUシャドウ検出データセットの大規模な評価により、TICAが既存の最先端手法よりも優れており、バランスの取れた誤差率(BER)において優れた結果が得られることが明らかにされた。 Shadow detection is crucial for accurate scene understanding in computer vision, yet it is challenged by the diverse appearances of shadows caused by variations in illumination, object geometry, and scene context. Deep learning models often struggle to generalize to real-world images due to the limited size and diversity of training datasets. To address this, we introduce TICA, a novel framework that leverages light-intensity information during test-time adaptation to enhance shadow detection accuracy. TICA exploits the inherent inconsistencies in light intensity across shadow regions to guide the model toward a more consistent prediction. A basic encoder-decoder model is initially trained on a labeled dataset for shadow detection. Then, during the testing phase, the network is adjusted for each test sample by enforcing consistent intensity predictions between two augmented input image versions. This consistency training specifically targets both foreground and background intersection regions to identify shadow regions within images accurately for robust adaptation. Extensive evaluations on the ISTD and SBU shadow detection datasets reveal that TICA significantly demonstrates that TICA outperforms existing state-of-the-art methods, achieving superior results in balanced error rate (BER).	翻訳日:2024-10-31 15:25:43 公開日:2024-10-11
# ソフトと剛体ロボットと模倣学習を組み合わせたコンタクトリッチタスクの習得 Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning ( http://arxiv.org/abs/2410.07787v1 ) ライセンス: Link先を確認	Mariano Ramírez Montero, Ebrahim Shahabi, Giovanni Franzese, Jens Kober, Barbara Mazzolai, Cosimo Della Santina,	(参考訳) ソフトロボットは、安全で堅牢で適応可能な環境との相互作用を確立する能力によって、ロボットシステムの使用に革命をもたらす可能性があるが、その正確な制御は依然として困難である。対照的に、従来の剛性ロボットは高い精度と再現性を提供するが、ソフトロボットの柔軟性は欠如している。我々はこれらの特徴をハイブリッドロボットプラットフォームに組み込むことで、全体的な能力を大幅に向上させることができると論じている。この研究は、剛性マニピュレータと完全に発達したソフトアームを統合する、新しいハイブリッドロボットプラットフォームを提示する。このシステムは、自律的な模倣学習を通じて柔軟で一般化可能なタスクを実行するために必要な知性を備えている。物理的なソフトネスと機械学習により、当社のプラットフォームは高度に一般化可能なスキルを達成できる一方、剛体コンポーネントは精度と再現性を保証する。 Soft robots have the potential to revolutionize the use of robotic systems with their capability of establishing safe, robust, and adaptable interactions with their environment, but their precise control remains challenging. In contrast, traditional rigid robots offer high accuracy and repeatability but lack the flexibility of soft robots. We argue that combining these characteristics in a hybrid robotic platform can significantly enhance overall capabilities. This work presents a novel hybrid robotic platform that integrates a rigid manipulator with a fully developed soft arm. This system is equipped with the intelligence necessary to perform flexible and generalizable tasks through imitation learning autonomously. The physical softness and machine learning enable our platform to achieve highly generalizable skills, while the rigid components ensure precision and repeatability.	翻訳日:2024-10-31 14:56:00 公開日:2024-10-11
# ソフトと剛体ロボットと模倣学習を組み合わせたコンタクトリッチタスクの習得 Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning ( http://arxiv.org/abs/2410.07787v2 ) ライセンス: Link先を確認	Mariano Ramírez Montero, Ebrahim Shahabi, Giovanni Franzese, Jens Kober, Barbara Mazzolai, Cosimo Della Santina,	(参考訳) ソフトロボットは、安全で堅牢で適応可能な環境との相互作用を確立する能力によって、ロボットシステムの使用に革命をもたらす可能性があるが、その正確な制御は依然として困難である。対照的に、従来の剛性ロボットは高い精度と再現性を提供するが、ソフトロボットの柔軟性は欠如している。我々はこれらの特徴をハイブリッドロボットプラットフォームに組み込むことで、全体的な能力を大幅に向上させることができると論じている。この研究は、剛性マニピュレータと完全に発達したソフトアームを統合する、新しいハイブリッドロボットプラットフォームを提示する。このシステムは、自律的な模倣学習を通じて柔軟で一般化可能なタスクを実行するために必要な知性を備えている。物理的なソフトネスと機械学習により、当社のプラットフォームは高度に一般化可能なスキルを達成できる一方、剛体コンポーネントは精度と再現性を保証する。 Soft robots have the potential to revolutionize the use of robotic systems with their capability of establishing safe, robust, and adaptable interactions with their environment, but their precise control remains challenging. In contrast, traditional rigid robots offer high accuracy and repeatability but lack the flexibility of soft robots. We argue that combining these characteristics in a hybrid robotic platform can significantly enhance overall capabilities. This work presents a novel hybrid robotic platform that integrates a rigid manipulator with a fully developed soft arm. This system is equipped with the intelligence necessary to perform flexible and generalizable tasks through imitation learning autonomously. The physical softness and machine learning enable our platform to achieve highly generalizable skills, while the rigid components ensure precision and repeatability.	翻訳日:2024-10-31 14:56:00 公開日:2024-10-11
# BA-Net:ディープニューラルネットワークにおけるブリッジ注意 BA-Net: Bridge Attention in Deep Neural Networks ( http://arxiv.org/abs/2410.07860v1 ) ライセンス: Link先を確認	Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song,	(参考訳) 注意機構、特にチャネルアテンションは、多くのコンピュータビジョンタスクに大きな影響を与えている。その効果にもかかわらず、既存の多くのメソッドは、主に個々の畳み込み層に適用される複雑な注意モジュールを通してパフォーマンスを最適化することに焦点を当てており、しばしば複数の層にまたがる相乗的相互作用を見落としている。このギャップに対応するために、異なる畳み込み層間のより効率的な統合と情報フローを促進するために設計された新しいアプローチであるブリッジアテンションを導入する。本研究は,情報冗長性を低減し,全体の情報交換を最適化する適応選択演算子を導入することにより,元のブリッジアテンションモデル(BAv1)を拡張した。 BAv2はImageNet分類タスクにおいて、それぞれResNet50とResNet101をバックボーンネットワークとして使用する場合、80.49%と81.75%のTop-1アキュラシーを得る。これらの結果は、それぞれ1.61%、0.77%のリトレーニングベースラインを上回っている。さらに、BAv2は、従来のSENet101のような既存のチャンネルアテンション技術よりも0.52%向上し、BAv2を高度な畳み込みネットワークやビジョントランスフォーマーに統合することで、幅広いコンピュータビジョンタスクのパフォーマンスが大幅に向上し、その幅広い適用性を裏付けている。 Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability.	翻訳日:2024-10-31 14:25:50 公開日:2024-10-11
# BA-Net:ディープニューラルネットワークにおけるブリッジ注意 BA-Net: Bridge Attention in Deep Neural Networks ( http://arxiv.org/abs/2410.07860v2 ) ライセンス: Link先を確認	Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song,	(参考訳) 注意機構、特にチャネルアテンションは、多くのコンピュータビジョンタスクに大きな影響を与えている。その効果にもかかわらず、既存の多くのメソッドは、主に個々の畳み込み層に適用される複雑な注意モジュールを通してパフォーマンスを最適化することに焦点を当てており、しばしば複数の層にまたがる相乗的相互作用を見落としている。このギャップに対応するために、異なる畳み込み層間のより効率的な統合と情報フローを促進するために設計された新しいアプローチであるブリッジアテンションを導入する。本研究は,情報冗長性を低減し,全体の情報交換を最適化する適応選択演算子を導入することにより,元のブリッジアテンションモデル(BAv1)を拡張した。 BAv2はImageNet分類タスクにおいて、それぞれResNet50とResNet101をバックボーンネットワークとして使用する場合、80.49%と81.75%のTop-1アキュラシーを得る。これらの結果は、それぞれ1.61%、0.77%のリトレーニングベースラインを上回っている。さらに、BAv2は、従来のSENet101のような既存のチャンネルアテンション技術よりも0.52%向上し、BAv2を高度な畳み込みネットワークやビジョントランスフォーマーに統合することで、幅広いコンピュータビジョンタスクのパフォーマンスが大幅に向上し、その幅広い適用性を裏付けている。 Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability.	翻訳日:2024-10-31 14:25:50 公開日:2024-10-11
# 関数表現統一フレームワーク The Function-Representation Unification Framework ( http://arxiv.org/abs/2410.07928v1 ) ライセンス: Link先を確認	Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon,	(参考訳) 認知アーキテクチャは、人工的な認知を開発する研究の最前線です。しかし、分離されたメモリとプログラムモデルから問題にアプローチする。この計算モデルには、知識検索ヒューリスティックという根本的な問題がある。本稿では,メモリとプログラムが結合した新しい計算モデルである関数表現を用いて,この問題を解決することを提案する。本稿では、これらの関数表現の実装と利用に関するフレームワーク全体を提案し、数学的定義と証明を通してそれらの可能性を探る。また、複数のFunction-Representationを編成するさまざまな方法について話し、これらのFunction-Representationsが実装できる関数の種類を探る。最後に、提案の限界についても検討する。 Cognitive Architectures are the forefront of our research into developing an artificial cognition. However, they approach the problem from a separated memory and program model of computation. This model of computation poses a fundamental problem: the knowledge retrieval heuristic. In this paper we propose to solve this problem by using a new model of computation, one where the memory and the program are united: the Function-Representation. We propose a whole framework about how to implement and use these Function-Representations, and we explore their potential through mathematical definitions and proofs. We also talk about different ways to organise multiple Function-Representations, and explore the kind of functions that these Function-Representations can implement. Finally, we also explore the limitations of our proposal.	翻訳日:2024-10-31 13:53:52 公開日:2024-10-11
# 計算関数表現モデル The Function-Representation Model of Computation ( http://arxiv.org/abs/2410.07928v2 ) ライセンス: Link先を確認	Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon,	(参考訳) 認知アーキテクチャは、人工的な認知を開発する研究の最前線です。しかし、分離されたメモリとプログラムモデルから問題にアプローチする。この計算モデルには、知識検索ヒューリスティックという根本的な問題がある。本稿では,メモリとプログラムが結合した新しい計算モデルである関数表現を用いて,この問題を解決することを提案する。本稿では,これらの関数表現の実装と利用に基づく新しい計算モデルを提案し,その可能性について数学的定義と証明を用いて検討する。また、複数のFunction-Representationを編成するさまざまな方法について話し、これらのFunction-Representationsが実装できる関数の種類を探る。最後に、提案の限界についても検討する。 Cognitive Architectures are the forefront of our research into developing an artificial cognition. However, they approach the problem from a separated memory and program model of computation. This model of computation poses a fundamental problem: the knowledge retrieval heuristic. In this paper we propose to solve this problem by using a new model of computation, one where the memory and the program are united: the Function-Representation. We propose a novel model of computation based on implementing and using these Function-Representations, and we explore its potential through mathematical definitions and proofs. We also talk about different ways to organise multiple Function-Representations, and explore the kind of functions that these Function-Representations can implement. Finally, we also explore the limitations of our proposal.	翻訳日:2024-10-31 13:53:52 公開日:2024-10-11
# Omni-MATH:大規模言語モデルのためのユニバーサルオリンピックレベルの数学ベンチマーク Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models ( http://arxiv.org/abs/2410.07985v1 ) ライセンス: Link先を確認	Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、数学的推論能力に大きなブレークスルーをもたらした。しかし、GSM8KやMATHのような既存のベンチマークは高い精度で解決されている(例えば、OpenAI o1はMATHデータセットで94.8%を達成した)。このギャップを埋めるために、オリンピアードレベルでのLLMの数学的推論を評価するために設計された、包括的で挑戦的なベンチマークを提案する。既存のOlympiad関連のベンチマークとは異なり、我々のデータセットは数学のみに重点を置いており、厳密な人間のアノテーションを使った4428の競合レベルの問題の膨大なコレクションを含んでいる。これらの問題は33以上のサブドメインに厳密に分類され、Olympiad-mathematical reasoningにおけるモデル性能の総合的な評価を可能にしている。さらに,このベンチマークに基づいて詳細な分析を行った。実験の結果,最も先進的なモデルであるOpenAI o1-miniとOpenAI o1-previewでさえ,60.54%と52.55%の精度で,オリンピアードレベルの問題に悩まされ,オリンピアードレベルの数学的推論において重大な課題が浮き彫りにされていることがわかった。 Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems with rigorous human annotation. These problems are meticulously categorized into over 33 sub-domains and span more than 10 distinct difficulty levels, enabling a holistic assessment of model performance in Olympiad-mathematical reasoning. Furthermore, we conducted an in-depth analysis based on this benchmark. Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy, highlighting significant challenges in Olympiad-level mathematical reasoning.	翻訳日:2024-10-31 06:15:07 公開日:2024-10-11
# Omni-MATH:大規模言語モデルのためのユニバーサルオリンピックレベルの数学ベンチマーク Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models ( http://arxiv.org/abs/2410.07985v2 ) ライセンス: Link先を確認	Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、数学的推論能力に大きなブレークスルーをもたらした。しかし、GSM8KやMATHのような既存のベンチマークは高い精度で解決されている(例えば、OpenAI o1はMATHデータセットで94.8%を達成した)。このギャップを埋めるために、オリンピアードレベルでのLLMの数学的推論を評価するために設計された、包括的で挑戦的なベンチマークを提案する。既存のOlympiad関連のベンチマークとは異なり、我々のデータセットは数学のみに重点を置いており、厳密な人間のアノテーションを使った4428の競合レベルの問題の膨大なコレクションを含んでいる。これらの問題は33以上のサブドメインに厳密に分類され、Olympiad-mathematical reasoningにおけるモデル性能の総合的な評価を可能にしている。さらに,このベンチマークに基づいて詳細な分析を行った。実験の結果,最も先進的なモデルであるOpenAI o1-miniとOpenAI o1-previewでさえ,60.54%と52.55%の精度で,オリンピアードレベルの問題に悩まされ,オリンピアードレベルの数学的推論において重大な課題が浮き彫りにされていることがわかった。 Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems with rigorous human annotation. These problems are meticulously categorized into over 33 sub-domains and span more than 10 distinct difficulty levels, enabling a holistic assessment of model performance in Olympiad-mathematical reasoning. Furthermore, we conducted an in-depth analysis based on this benchmark. Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy, highlighting significant challenges in Olympiad-level mathematical reasoning.	翻訳日:2024-10-31 06:15:07 公開日:2024-10-11
# ロボットマニピュレーションのための相乗的・一般化・効率的なデュアルシステムを目指して Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation ( http://arxiv.org/abs/2410.08001v1 ) ライセンス: Link先を確認	Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao,	(参考訳) 多様な動的環境下での多目的ロボットシステムの運用に対する需要が増大するにつれ、幅広い適応性と高レベルの推論を容易にするために、大規模なクロス・エボディメント・データ・コーパスを活用するジェネリスト・ポリシーの重要性が強調されている。しかし、ジェネラリストは非効率な推論と費用対効果の訓練に苦慮した。代わりに、スペシャリストポリシーは特定のドメインデータに対してキュレーションされ、タスクレベルの精度を効率よく向上させる。しかし、幅広いアプリケーションに対する一般化能力は欠如している。これらの観測から着想を得たRoboDualは、一般論と専門政策の両方の利点を補う相乗的二重システムである。視覚言語アクション(VLA)に基づくジェネラリストの高レベルなタスク理解と離散化されたアクション出力に基づいて、多段階のアクションロールアウトのために、拡散トランスフォーマーベースのスペシャリストを考案した。 OpenVLAと比較すると、RoboDualは実世界の設定が26.7%改善し、CALVINが12%向上した。デモデータの5%のみを使用して、強力なパフォーマンスを維持し、実世界のデプロイにおける3.8倍の制御周波数を実現する。コードは一般に公開されている。私たちのプロジェクトページは、https://opendrivelab.com/RoboDual/でホストされています。 The increasing demand for versatile robotic systems to operate in diverse and dynamic environments has emphasized the importance of a generalist policy, which leverages a large cross-embodiment data corpus to facilitate broad adaptability and high-level reasoning. However, the generalist would struggle with inefficient inference and cost-expensive training. The specialist policy, instead, is curated for specific domain data and excels at task-level precision with efficiency. Yet, it lacks the generalization capacity for a wide range of applications. Inspired by these observations, we introduce RoboDual, a synergistic dual-system that supplements the merits of both generalist and specialist policy. A diffusion transformer-based specialist is devised for multi-step action rollouts, exquisitely conditioned on the high-level task understanding and discretized action output of a vision-language-action (VLA) based generalist. Compared to OpenVLA, RoboDual achieves 26.7% improvement in real-world setting and 12% gain on CALVIN by introducing a specialist policy with merely 20M trainable parameters. It maintains strong performance with 5% of demonstration data only, and enables a 3.8 times higher control frequency in real-world deployment. Code would be made publicly available. Our project page is hosted at: https://opendrivelab.com/RoboDual/	翻訳日:2024-10-31 06:05:02 公開日:2024-10-11
# ロボットマニピュレーションのための相乗的・一般化・効率的なデュアルシステムを目指して Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation ( http://arxiv.org/abs/2410.08001v2 ) ライセンス: Link先を確認	Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao,	(参考訳) 多様な動的環境下での多目的ロボットシステムの運用に対する需要が増大するにつれ、幅広い適応性と高レベルの推論を容易にするために、大規模なクロス・エボディメント・データ・コーパスを活用するジェネリスト・ポリシーの重要性が強調されている。しかし、ジェネラリストは非効率な推論と費用対効果の訓練に苦慮した。代わりに、スペシャリストポリシーは特定のドメインデータに対してキュレーションされ、タスクレベルの精度を効率よく向上させる。しかし、幅広いアプリケーションに対する一般化能力は欠如している。これらの観測から着想を得たRoboDualは、一般論と専門政策の両方の利点を補う相乗的二重システムである。視覚言語アクション(VLA)に基づくジェネラリストの高レベルなタスク理解と離散化されたアクション出力に基づいて、多段階のアクションロールアウトのために、拡散トランスフォーマーベースのスペシャリストを考案した。 OpenVLAと比較すると、RoboDualは実世界の設定が26.7%改善し、CALVINが12%向上した。デモデータの5%のみを使用して、強力なパフォーマンスを維持し、実世界のデプロイにおける3.8倍の制御周波数を実現する。コードは一般に公開されている。私たちのプロジェクトページは、https://opendrivelab.com/RoboDual/でホストされています。 The increasing demand for versatile robotic systems to operate in diverse and dynamic environments has emphasized the importance of a generalist policy, which leverages a large cross-embodiment data corpus to facilitate broad adaptability and high-level reasoning. However, the generalist would struggle with inefficient inference and cost-expensive training. The specialist policy, instead, is curated for specific domain data and excels at task-level precision with efficiency. Yet, it lacks the generalization capacity for a wide range of applications. Inspired by these observations, we introduce RoboDual, a synergistic dual-system that supplements the merits of both generalist and specialist policy. A diffusion transformer-based specialist is devised for multi-step action rollouts, exquisitely conditioned on the high-level task understanding and discretized action output of a vision-language-action (VLA) based generalist. Compared to OpenVLA, RoboDual achieves 26.7% improvement in real-world setting and 12% gain on CALVIN by introducing a specialist policy with merely 20M trainable parameters. It maintains strong performance with 5% of demonstration data only, and enables a 3.8 times higher control frequency in real-world deployment. Code would be made publicly available. Our project page is hosted at: https://opendrivelab.com/RoboDual/	翻訳日:2024-10-31 06:05:02 公開日:2024-10-11
# 高速フィードフォワード3次元ガウス平滑化圧縮 Fast Feedforward 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2410.08017v1 ) ライセンス: Link先を確認	Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,	(参考訳) 3D Gaussian Splatting (3DGS)は、新しいビュー合成のためのリアルタイムかつ高忠実なレンダリングを推し進めているため、ストレージ要件は広く採用される上で課題となる。様々な圧縮技術が提案されているが、既存の3DGSでは圧縮を実現するためにシーンごとの最適化が必要であり、圧縮が緩やかで遅くなる。この問題を解決するために,1つのフィードフォワードパスで3DGS表現を高速に圧縮できる最適化フリーモデルであるFCGS(Fast Compression of 3D Gaussian Splatting)を導入し,圧縮時間を数分から秒に短縮する。圧縮効率を向上させるために,ガウス属性を異なるエントロピー制約経路に割り当てるマルチパスエントロピーモジュールを提案する。また,非構造ガウス系ブロブの冗長性を取り除くために,ガウス系コンテキストモデルとガウス系コンテキストモデルの両方を慎重に設計する。全体として、FCGSは圧縮比を20倍以上に向上し、精細度を維持しながら、ほとんどのシーン毎のSOTA最適化手法を上回ります。私たちのコードは、https://github.com/YihangChen-ee/FCGS.comで利用可能です。 With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.	翻訳日:2024-10-31 05:55:14 公開日:2024-10-11
# 高速フィードフォワード3次元ガウス平滑化圧縮 Fast Feedforward 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2410.08017v2 ) ライセンス: Link先を確認	Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,	(参考訳) 3D Gaussian Splatting (3DGS)は、新しいビュー合成のためのリアルタイムかつ高忠実なレンダリングを推し進めているため、ストレージ要件は広く採用される上で課題となる。様々な圧縮技術が提案されているが、既存の3DGSでは圧縮を実現するためにシーンごとの最適化が必要であり、圧縮が緩やかで遅くなる。この問題を解決するために,1つのフィードフォワードパスで3DGS表現を高速に圧縮できる最適化フリーモデルであるFCGS(Fast Compression of 3D Gaussian Splatting)を導入し,圧縮時間を数分から秒に短縮する。圧縮効率を向上させるために,ガウス属性を異なるエントロピー制約経路に割り当てるマルチパスエントロピーモジュールを提案する。また,非構造ガウス系ブロブの冗長性を取り除くために,ガウス系コンテキストモデルとガウス系コンテキストモデルの両方を慎重に設計する。全体として、FCGSは圧縮比を20倍以上に向上し、精細度を維持しながら、ほとんどのシーン毎のSOTA最適化手法を上回ります。私たちのコードは、https://github.com/YihangChen-ee/FCGS.comで利用可能です。 With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.	翻訳日:2024-10-31 05:55:13 公開日:2024-10-11
# 内部解釈可能性のための回路探索の計算複雑性 The Computational Complexity of Circuit Discovery for Inner Interpretability ( http://arxiv.org/abs/2410.08025v1 ) ライセンス: Link先を確認	Federico Adolfi, Martina G. Vilas, Todd Wareham,	(参考訳) ニューラルネットワークの機械学習、認知/脳科学、社会のヒンジへの応用が、回路発見による内的解釈の可能性について提案されている。これは、実行可能なアルゴリズムオプションを経験的および理論的に探索することを要求する。ヒューリスティックスの設計とテストの進歩にもかかわらず、そのスケーラビリティと忠実さには懸念がある。これを解決するために,古典的・パラメータ化された計算複雑性理論を用いて回路探索について検討する:(1)説明,説明,予測,制御の余裕の観点から,回路探索クエリを推論するための概念的足場を記述する;(2)機械的説明を捉えるための包括的なクエリセットを形式化し,その分析のための形式的フレームワークを提案する;(3)多層パーセプトロン(例えば,トランスフォーマー)に対する実用的関心の複雑さの解決に使用する。私たちの発見は、難しい複雑さの風景を明らかにします。多くのクエリは引き込み可能(NP-hard, $\Sigma^p_2$-hard)であり、モデル/回路の特徴(例えば深さ)を制約する場合は固定パラメータ引き込み可能(W[1]-hard)であり、加法的、乗法的、確率的近似スキームでは不適応である。この状況をナビゲートするために、よりよく理解されたヒューリスティックスを用いてこれらの難解な問題(NP-対$\Sigma^p_2$-complete)に取り組むための変換が存在することを証明し、有用な空き容量を保持するより控えめなクエリのトラクタビリティ(PTIME)または固定パラメータトラクタビリティ(FPT)を証明する。このフレームワークは、解釈可能性クエリの範囲と限界を理解し、実行可能な選択肢を探究し、リソース要求を既存のアーキテクチャと将来のアーキテクチャと比較することを可能にする。 Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries that capture mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons (part of, e.g., transformers). Our findings reveal a challenging complexity landscape. Many queries are intractable (NP-hard, $\Sigma^p_2$-hard), remain fixed-parameter intractable (W[1]-hard) when constraining model/circuit features (e.g., depth), and are inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems (NP- vs. $\Sigma^p_2$-complete) with better-understood heuristics, and prove the tractability (PTIME) or fixed-parameter tractability (FPT) of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands among existing and future architectures.	翻訳日:2024-10-31 05:55:13 公開日:2024-10-11
# 内部解釈可能性のための回路探索の計算複雑性 The Computational Complexity of Circuit Discovery for Inner Interpretability ( http://arxiv.org/abs/2410.08025v2 ) ライセンス: Link先を確認	Federico Adolfi, Martina G. Vilas, Todd Wareham,	(参考訳) ニューラルネットワークの機械学習、認知/脳科学、社会のヒンジへの応用が、回路発見による内的解釈の可能性について提案されている。これは、実行可能なアルゴリズムオプションを経験的および理論的に探索することを要求する。ヒューリスティックスの設計とテストの進歩にもかかわらず、そのスケーラビリティと忠実さには懸念がある。これを解決するために,古典的・パラメータ化された計算複雑性理論を用いて回路探索について検討する:(1)説明,説明,予測,制御の余裕の観点から,回路探索クエリを推論するための概念的足場を記述する;(2)機械的説明を捉えるための包括的なクエリセットを形式化し,その分析のための形式的フレームワークを提案する;(3)多層パーセプトロン(例えば,トランスフォーマー)に対する実用的関心の複雑さの解決に使用する。私たちの発見は、難しい複雑さの風景を明らかにします。多くのクエリは引き込み可能(NP-hard, $\Sigma^p_2$-hard)であり、モデル/回路の特徴(例えば深さ)を制約する場合は固定パラメータ引き込み可能(W[1]-hard)であり、加法的、乗法的、確率的近似スキームでは不適応である。この状況をナビゲートするために、よりよく理解されたヒューリスティックスを用いてこれらの難解な問題(NP-対$\Sigma^p_2$-complete)に取り組むための変換が存在することを証明し、有用な空き容量を保持するより控えめなクエリのトラクタビリティ(PTIME)または固定パラメータトラクタビリティ(FPT)を証明する。このフレームワークは、解釈可能性クエリの範囲と限界を理解し、実行可能な選択肢を探究し、リソース要求を既存のアーキテクチャと将来のアーキテクチャと比較することを可能にする。 Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries that capture mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons (part of, e.g., transformers). Our findings reveal a challenging complexity landscape. Many queries are intractable (NP-hard, $\Sigma^p_2$-hard), remain fixed-parameter intractable (W[1]-hard) when constraining model/circuit features (e.g., depth), and are inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems (NP- vs. $\Sigma^p_2$-complete) with better-understood heuristics, and prove the tractability (PTIME) or fixed-parameter tractability (FPT) of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands among existing and future architectures.	翻訳日:2024-10-31 05:55:13 公開日:2024-10-11
# SPA: 効果的な身体表現を可能にする3次元空間認識 SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ( http://arxiv.org/abs/2410.08208v1 ) ライセンス: Link先を確認	Haoyi Zhu, Honghui Yang, Yating Wang, Jiange Yang, Limin Wang, Tong He,	(参考訳) 本稿では,具体的AIにおける3次元空間認識の重要性を強調する表現学習フレームワークであるSPAを紹介する。提案手法は,多視点画像上での識別可能なニューラルレンダリングを利用して,固有空間理解を備えたバニラビジョントランス (ViT) を実現する。本稿では,8つのシミュレータにまたがる268のタスクを,単一タスクおよび言語条件のマルチタスクシナリオにおいて多種多様なポリシーでカバーし,これまでに最も包括的な表現学習の評価を行った。 SPAは、AI、ビジョン中心のタスク、マルチモーダルアプリケーションに特化して設計されたものを含む、10以上の最先端表現メソッドを一貫して上回り、トレーニングデータが少ない。さらに,実際のシナリオにおいて実世界の実験を行い,その有効性を確認する。これらの結果は,表現学習における3次元空間認識の重要性を浮き彫りにした。私たちの最強のモデルは、トレーニングに6000時間以上を要し、すべてのコードとモデルの重みをオープンソースにして、具体的表現学習における将来の研究を促進することにコミットしています。プロジェクトページ: https://haoyizhu.github.io/spa/。 In this paper, we introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. Our approach leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. The results are compelling: SPA consistently outperforms more than 10 state-of-the-art representation methods, including those specifically designed for embodied AI, vision-centric tasks, and multi-modal applications, while using less training data. Furthermore, we conduct a series of real-world experiments to confirm its effectiveness in practical scenarios. These results highlight the critical role of 3D spatial awareness for embodied representation learning. Our strongest model takes more than 6000 GPU hours to train and we are committed to open-sourcing all code and model weights to foster future research in embodied representation learning. Project Page: https://haoyizhu.github.io/spa/.	翻訳日:2024-10-31 04:46:03 公開日:2024-10-11
# SPA: 効果的な身体表現を可能にする3次元空間認識 SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ( http://arxiv.org/abs/2410.08208v2 ) ライセンス: Link先を確認	Haoyi Zhu, Honghui Yang, Yating Wang, Jiange Yang, Limin Wang, Tong He,	(参考訳) 本稿では,具体的AIにおける3次元空間認識の重要性を強調する表現学習フレームワークであるSPAを紹介する。提案手法は,多視点画像上での識別可能なニューラルレンダリングを利用して,固有空間理解を備えたバニラビジョントランス (ViT) を実現する。本稿では,8つのシミュレータにまたがる268のタスクを,単一タスクおよび言語条件のマルチタスクシナリオにおいて多種多様なポリシーでカバーし,これまでに最も包括的な表現学習の評価を行った。 SPAは、AI、ビジョン中心のタスク、マルチモーダルアプリケーションに特化して設計されたものを含む、10以上の最先端表現メソッドを一貫して上回り、トレーニングデータが少ない。さらに,実際のシナリオにおいて実世界の実験を行い,その有効性を確認する。これらの結果は,表現学習における3次元空間認識の重要性を浮き彫りにした。私たちの最強のモデルは、トレーニングに6000時間以上を要し、すべてのコードとモデルの重みをオープンソースにして、具体的表現学習における将来の研究を促進することにコミットしています。プロジェクトページ: https://haoyizhu.github.io/spa/。 In this paper, we introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. Our approach leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. The results are compelling: SPA consistently outperforms more than 10 state-of-the-art representation methods, including those specifically designed for embodied AI, vision-centric tasks, and multi-modal applications, while using less training data. Furthermore, we conduct a series of real-world experiments to confirm its effectiveness in practical scenarios. These results highlight the critical role of 3D spatial awareness for embodied representation learning. Our strongest model takes more than 6000 GPU hours to train and we are committed to open-sourcing all code and model weights to foster future research in embodied representation learning. Project Page: https://haoyizhu.github.io/spa/.	翻訳日:2024-10-31 04:46:03 公開日:2024-10-11
# オルタナティブビルドからのバイナリの比較におけるバイナリ等価性のレベル Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds ( http://arxiv.org/abs/2410.08427v1 ) ライセンス: Link先を確認	Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan,	(参考訳) ソフトウェアサプライチェーンのセキュリティ上の課題に応えて、いくつかの組織が独立したオープンソースプロジェクトを構築し、その結果のバイナリをリリースするインフラストラクチャを構築した。ビルドプラットフォームの可変性は、妥協されたビルド環境の検出を容易にするため、セキュリティを強化することができる。さらに、ビルドプラットフォームのセキュリティ姿勢を改善し、ビルド中に実績情報を集めることで、結果のアーティファクトをより信頼性の高いものにすることができる。これらのサービスは、Google、Oracle、RedHatから利用可能である。同じソースから構築された複数のバイナリが利用可能になったことで、新たな課題と機会が生まれ、"Does build B?"や"Can build A revealed a compromiseed build B?"といった疑問が提起される。そのような質問に答えるためには、バイナリ間の等価性の概念が必要である。ビットワイド平等に基づく明らかなアプローチは、実際は重大な欠点があり、代替概念を選択することに価値があることを実証する。我々は、クローン検出タイプにインスパイアされた同値のレベルを導入することで、これを概念化する。いくつかの実験を通して、これらの新しいレベルの価値を実証する。我々は、異なるプロバイダによって同じソースから構築されたJavaバイナリからなるデータセットを構築し、合計14,156対のバイナリを生成する。次に、それらのjarファイルのコンパイルされたクラスファイルを比較した結果、3,750対のjar(26.49%)に対して、少なくとも1つの異なるファイルが存在し、jarファイルとそれらの暗号化ハッシュが異なることを強制することがわかった。しかし、新しい同値性レベルに基づいて、これらの多くが事実上同値であることを示すことができる。半合成データセット上のいくつかの候補同値関係を評価した結果、同値であるべき、あるいは同値でなくてもよいバイナリのペアからなるオラクルが得られた。 In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collecting provenance information during the build, the resulting artifacts can be used with greater trust. Such offerings are now available from Google, Oracle and RedHat. The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: 'Does build A confirm the integrity of build B?' or 'Can build A reveal a compromised build B?'. To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types. We demonstrate the value of these new levels through several experiments. We construct a dataset consisting of Java binaries built from the same sources independently by different providers, resulting in 14,156 pairs of binaries in total. We then compare the compiled class files in those jar files and find that for 3,750 pairs of jars (26.49%) there is at least one such file that is different, also forcing the jar files and their cryptographic hashes to be different. However, based on the new equivalence levels, we can still establish that many of them are practically equivalent. We evaluate several candidate equivalence relations on a semi-synthetic dataset that provides oracles consisting of pairs of binaries that either should be, or must not be equivalent.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# 有限温度崩壊における量子場相互作用のためのマスター方程式の厳密解 Exact solution of the master equation for interacting quantized fields at finite temperature decay ( http://arxiv.org/abs/2410.08428v1 ) ライセンス: Link先を確認	L. Hernández-Sánchez, I. A. Bocanegra-Garay, I. Ramos-Prieto, F. Soto-Eguibar, H. M. Moya-Cessa,	(参考訳) 有限温度崩壊における2つの量子化場の相互作用を含む量子系のマルコフ力学を解析する。超作用素技術を利用し、2つの非ユニタリ変換を適用することにより、リンドブラッド・マスター方程式を実効的な非エルミート的ハミルトニアンを持つフォン・ノイマン様方程式に再構成する。さらに、このハミルトニアンを対角化するために追加の非ユニタリ変換が使用され、リンドブラッドマスター方程式の正確な解を導出することができる。この方法は、完全な量子状態における任意の初期状態の進化を計算するための枠組みを提供する。特定の例として、最初に空洞内で相互作用する2つの識別不可能な光子の光子一致率を示す。 We analyze the Markovian dynamics of a quantum system involving the interaction of two quantized fields at finite temperature decay. Utilizing superoperator techniques and applying two non-unitary transformations, we reformulate the Lindblad master equation into a von Neumann-like equation with an effective non-Hermitian Hamiltonian. Furthermore, an additional non-unitary transformation is employed to diagonalize this Hamiltonian, enabling us to derive an exact solution to the Lindblad master equation. This method provides a framework to calculate the evolution of any initial state in a fully quantum regime. As a specific example, we present the photon coincidence rates for two indistinguishable photons initially interacting within a cavity.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# 大規模言語モデルにおける oRetrieval Augmented Generation とその医学的適合度評価における一般化可能性 oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness ( http://arxiv.org/abs/2410.08431v1 ) ライセンス: Link先を確認	Yu He Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Chang-Fu Kuo, Shao-Chun Wu, Vesela P. Kovacheva, Daniel Shu Wei Ting,	(参考訳) 大規模言語モデル(LLM)は医学的応用の可能性を示すが、専門的な臨床知識が欠如していることが多い。 Retrieval Augmented Generation (RAG)は、ドメイン固有の情報によるカスタマイズを可能にし、医療に適している。本研究は,手術適応の判定と術前指導におけるRAGモデルの精度,一貫性,安全性について検討した。 35の局所的および23の国際的術前ガイドラインを用いてLLM-RAGモデルを開発し、人為的な反応に対して試験を行った。合計3,682件の回答が得られた。臨床文書はLlamaindexを用いて処理され, GPT3.5, GPT4, Claude-3を含む10個のLCMが評価された。術前指導の7つの側面に焦点をあてて14の臨床シナリオを解析した。正しい回答を判断するために確立されたガイドラインと専門家の判断が用いられ、人為的な回答が比較として役立った。 LLM-RAGモデルでは、20秒以内に反応が生成され、臨床医 (10分) よりも有意に速かった。 GPT4 LLM-RAGモデルが最も精度が高く(96.4%対86.6%、p=0.016)、幻覚は無く、臨床医に匹敵する正しい指示が得られた。結果は地域と国際両方のガイドラインで一致していた。本研究は, LLM-RAGモデルの有効性を実証し, その効率性, 拡張性, 信頼性を明らかにした。 Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# MYCROFT: 効果的かつ効率的な外部データ拡張を目指して MYCROFT: Towards Effective and Efficient External Data Augmentation ( http://arxiv.org/abs/2410.08432v1 ) ライセンス: Link先を確認	Zain Sarwar, Van Tran, Arjun Nitin Bhagoji, Nick Feamster, Ben Y. Zhao, Supriyo Chakraborty,	(参考訳) 機械学習(ML)モデルは、よく機能するために大量のデータを必要とすることが多い。利用可能なデータが限られている場合、モデルトレーナーは外部ソースからより多くのデータを取得する必要がある。多くの場合、有用なデータは、プライバシーやプライバシーの懸念から、自分のデータを共有することをためらうプライベートなエンティティによって保持される。これにより、モデルトレーナーは、モデルパフォーマンスを改善するために必要なデータを取得するのが難しく、コストがかかる。この課題に対処するために、制約付きデータ共有予算で作業しながら、モデルトレーナーが異なるデータソースの相対的有用性を評価することのできる、データ効率のよい方法であるMycroftを提案する。特徴空間距離と勾配マッチングを活用することで、Mycroftは各所有者の小さなが情報に富むデータサブセットを特定し、モデルトレーナーは最小限のデータ露出でパフォーマンスを最大化することができる。 2つの領域における4つのタスクにまたがる実験結果から、Mycroftはすべてのデータが共有される全情報ベースラインのパフォーマンスに急速に収束することが示された。さらに、Mycroftはノイズに対して堅牢であり、ユーティリティによってデータ所有者を効果的にランク付けすることができる。 Mycroftは、高性能MLモデルの民主化トレーニングの道を開くことができる。 Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to improve model performance. To address this challenge, we propose Mycroft, a data-efficient method that enables model trainers to evaluate the relative utility of different data sources while working with a constrained data-sharing budget. By leveraging feature space distances and gradient matching, Mycroft identifies small but informative data subsets from each owner, allowing model trainers to maximize performance with minimal data exposure. Experimental results across four tasks in two domains show that Mycroft converges rapidly to the performance of the full-information baseline, where all data is shared. Moreover, Mycroft is robust to noise and can effectively rank data owners by utility. Mycroft can pave the way for democratized training of high performance ML models.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# SoK: ソフトウェア比較 SoK: Software Compartmentalization ( http://arxiv.org/abs/2410.08434v1 ) ライセンス: Link先を確認	Hugo Lefeuvre, Nathan Dautenhahn, David Chisnall, Pierre Olivier,	(参考訳) 大きなシステムを限られた特権を持つ小さなコンポーネントに分解することは、エクスプロイトの影響を最小限に抑える効果的な方法として長年認識されてきた。歴史的ルーツ、実証された利益、そして学術と産業における多くの研究努力にもかかわらず、ソフトウェアの区画化は依然として主流ではない。本稿では,この現状をどう改善できるかを考察する。既存の手法が用語学や分析手法の不整合に悩まされていることに留意し, 構成化手法の体系的分析, 比較, 指示のための統一モデルを提案する。このモデルを用いて、211の研究成果をレビューし、61の主流のコンパートナライズドシステムを分析し、研究と生産の両方の限界を理解する。中でも本研究は,手作業の方法,カスタム抽象化,レガシメカニズムに大きく依存していることが明らかとなった。分断化は、全体論的に解決されるべきである; 分断化ポリシーの定義を単純化するためには進歩が必要である; 混乱した議題とハードウェアの制限から脅威モデルに挑戦するためには、研究と主流のニーズの間のギャップを埋めることが必要である。本稿では, 歴史的, 現状の区画化の展望を地図化するとともに, それらの進化と導入を促進する枠組みを策定する。 Decomposing large systems into smaller components with limited privileges has long been recognized as an effective means to minimize the impact of exploits. Despite historical roots, demonstrated benefits, and a plethora of research efforts in academia and industry, the compartmentalization of software is still not a mainstream practice. This paper investigates why, and how this status quo can be improved. Noting that existing approaches are fraught with inconsistencies in terminology and analytical methods, we propose a unified model for the systematic analysis, comparison, and directing of compartmentalization approaches. We use this model to review 211 research efforts and analyze 61 mainstream compartmentalized systems, confronting them to understand the limitations of both research and production works. Among others, our findings reveal that mainstream efforts largely rely on manual methods, custom abstractions, and legacy mechanisms, poles apart from recent research. We conclude with recommendations: compartmentalization should be solved holistically; progress is needed towards simplifying the definition of compartmentalization policies; towards better challenging our threat models in the light of confused deputies and hardware limitations; as well as towards bridging the gaps we pinpoint between research and mainstream needs. This paper not only maps the historical and current landscape of compartmentalization, but also sets forth a framework to foster their evolution and adoption.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# 微粒な対話型テクスチャ誘導によるシンボリック音楽生成 Symbolic Music Generation with Fine-grained Interactive Textural Guidance ( http://arxiv.org/abs/2410.08435v1 ) ライセンス: Link先を確認	Tingyu Zhu, Haoyu Liu, Zhimin Jiang, Zeyu Zheng,	(参考訳) シンボリック・ミュージック・ジェネレーションの問題は、限られたデータ・アベイラビリティーと音符ピッチの高精度化の必要性が組み合わさって、独特な課題を呈している。これらの課題を克服するために,学習分布の誤りを補正するために,拡散モデル内に細粒度テクスチュラルガイダンス(FTG)を導入する。 FTGを取り入れることで、拡散モデルは音楽生成の精度を向上し、プログレッシブな音楽生成、即興、インタラクティブな音楽生成といった高度なタスクに適している。シンボリック音楽生成における課題とFTGアプローチの効果について理論的特徴を導出する。ユーザ入力による対話型音楽生成のための数値実験とデモページを提供し,提案手法の有効性を実証する。 The problem of symbolic music generation presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To overcome these difficulties, we introduce Fine-grained Textural Guidance (FTG) within diffusion models to correct errors in the learned distributions. By incorporating FTG, the diffusion models improve the accuracy of music generation, which makes them well-suited for advanced tasks such as progressive music generation, improvisation and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effect of the FTG approach. We provide numerical experiments and a demo page for interactive music generation with user input to showcase the effectiveness of our approach.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# 大規模言語モデルを用いた多段階自然言語推論における推論構造の役割を探る Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models ( http://arxiv.org/abs/2410.08436v1 ) ライセンス: Link先を確認	Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, Xiaodan Zhu,	(参考訳) 複雑な多段階推論タスクを行う場合、構造化中間証明ステップを導出する大規模言語モデル(LLM)の能力は、モデルが本当に望ましい推論を実行し、モデルの説明可能性を向上させるために重要である。本稿では,現在最先端のジェネラリスト LLM が,いくつかの例でこれらの構造を活用できるかどうかを,‘textit{in-context learning} を用いて,より優れた証明構造を構築することができるか,という,焦点を絞った研究に焦点をあてる。本研究は,構造認識型実演と構造認識型実演に焦点を当てた。どちらもパフォーマンス向上に役立ちます。結果を理解するのに役立つ詳細な分析が提供されている。 When performing complex multi-step reasoning tasks, the ability of Large Language Models (LLMs) to derive structured intermediate proof steps is important for ensuring that the models truly perform the desired reasoning and for improving models' explainability. This paper is centred around a focused study: whether the current state-of-the-art generalist LLMs can leverage the structures in a few examples to better construct the proof structures with \textit{in-context learning}. Our study specifically focuses on structure-aware demonstration and structure-aware pruning. We demonstrate that they both help improve performance. A detailed analysis is provided to help understand the results.	翻訳日:2024-10-31 03:26:42 公開日:2024-10-11
# $\forall$uto$\exists$$\lor\! $\forall$uto$\exists$$\lor\!\land$L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks ( http://arxiv.org/abs/2410.08437v1 ) ライセンス: Link先を確認	Rushang Karia, Daniel Bramblett, Daksh Dobhal, Siddharth Srivastava,	(参考訳) 本稿では,$\forall$uto$\exists$$\lor\! \land$Lは、翻訳における真理維持や論理的推論など、正確性を明確に定義したフォーマルなタスクにおいて、LLM(Large Language Model)アセスメントをスケールするための新しいベンチマークである。 $\forall$uto$\exists$$\lor\! 最初のベンチマークパラダイムである \land$L は、人間のラベルなしで LLM の客観的評価をスケールするのに必要ないくつかの重要な利点を提供する。 (a)難易度の異なるタスクの自動生成による高度化のLLMを評価する能力ロ費用及び時間のかかる人的注釈への依存を排除した地底真理の自動生成 (c) 自動生成されたランダム化されたデータセットを使用することで、多くの現代的なベンチマークで使用される静的データセットに過度に適合する連続LLMの能力を緩和する。実証分析によると、LLMのパフォーマンスは$\forall$uto$\exists$$\lor\! \land$Lは、翻訳と推論タスクに重点を置くさまざまなベンチマークのパフォーマンスを高く評価しているため、手作業によるデータセットの取得や更新が困難になるような環境では、貴重な自律的な評価パラダイムとなっている。 This paper presents $\forall$uto$\exists$$\lor\!\land$L, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. $\forall$uto$\exists$$\lor\!\land$L is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of increasing sophistication by auto-generating tasks at different levels of difficulty; (b) auto-generation of ground truth that eliminates dependence on expensive and time-consuming human annotation; (c) the use of automatically generated, randomized datasets that mitigate the ability of successive LLMs to overfit to static datasets used in many contemporary benchmarks. Empirical analysis shows that an LLM's performance on $\forall$uto$\exists$$\lor\!\land$L is highly indicative of its performance on a diverse array of other benchmarks focusing on translation and reasoning tasks, making it a valuable autonomous evaluation paradigm in settings where hand-curated datasets can be hard to obtain and/or update.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# 非マルコフ細胞集団動態の制御のための強化学習 Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics ( http://arxiv.org/abs/2410.08439v1 ) ライセンス: Link先を確認	Josiah C. Kratz, Jacob Adamczyk,	(参考訳) 細菌からがん細胞に至るまで、多くの生物や細胞は、変動する環境に適応する顕著な能力を示す。さらに、細胞は過去の環境の記憶を利用して、以前記録されたストレスを生き残ることができる。制御の観点からは、この適応性は細胞集団を絶滅へと駆り立てる上で重要な課題となり、臨床上大きな意味を持つオープンな問題となっている。本研究では,表現型可塑性を示す細胞集団における薬物投与に焦点を当てた。抵抗状態と感受性状態の間を切り替える特定の力学モデルについては、正確な解が知られている。しかし、基礎となるシステムパラメータが不明で、複雑なメモリベースのシステムでは、最適解を得るのは難解である。この課題に対処するために、新しい非マルコフ力学の下で進化する細胞集団を制御するための情報量削減戦略の同定に強化学習(RL)を適用した。モデルのない深部RLは、長距離時間力学の存在下でも正確な解を回復し、細胞集団を制御することができる。 Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and is thus an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# 単語・オブ・ムート社会学習における相互作用するカルマンフィルタの低収束性 Slow Convergence of Interacting Kalman Filters in Word-of-Mouth Social Learning ( http://arxiv.org/abs/2410.08447v1 ) ライセンス: Link先を確認	Vikram Krishnamurthy, Cristian Rojas,	(参考訳) 我々は、連続的に動作するカルマンフィルタエージェントを$m$で提供する、口語ソーシャルラーニングについて検討する。第1カルマンフィルタは生の観測を受け、その後の各カルマンフィルタは前のカルマンフィルタの条件平均のノイズ測定を受ける。前者は$m$-th Kalmanフィルタによって更新される。 m=2$と観測値がガウス確率変数のノイズのある測度であるとき、共分散は標準カルマンフィルタの$O(k^{-1})$の代わりに$k$観測に対して$k^{-1/3}$としてゼロとなる。本稿では、$m$エージェントの場合、共分散は$k^{-(2^m-1)}$としてゼロに減少し、つまり、エージェントの数とともに学習が指数関数的に遅くなることを示す。また, 学習速度をk^{-1}$とすることで, 学習速度を最大化できることを示す。意味するところは、口語社会学習において、人為的に前者を振り返れば、最適な学習率が得られるということである。 We consider word-of-mouth social learning involving $m$ Kalman filter agents that operate sequentially. The first Kalman filter receives the raw observations, while each subsequent Kalman filter receives a noisy measurement of the conditional mean of the previous Kalman filter. The prior is updated by the $m$-th Kalman filter. When $m=2$, and the observations are noisy measurements of a Gaussian random variable, the covariance goes to zero as $k^{-1/3}$ for $k$ observations, instead of $O(k^{-1})$ in the standard Kalman filter. In this paper we prove that for $m$ agents, the covariance decreases to zero as $k^{-(2^m-1)}$, i.e, the learning slows down exponentially with the number of agents. We also show that by artificially weighing the prior at each time, the learning rate can be made optimal as $k^{-1}$. The implication is that in word-of-mouth social learning, artificially re-weighing the prior can yield the optimal learning rate.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# 相関雑音を伴う確率勾配アルゴリズムの有限サンプルと大偏差解析 Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise ( http://arxiv.org/abs/2410.08449v1 ) ライセンス: Link先を確認	George Yin, Vikram Krishnamurthy,	(参考訳) ステップサイズを減少させる確率勾配アルゴリズムの有限標本残差を解析する。相関雑音を仮定し,解析の体系的アプローチとして摂動リアプノフ関数を用いる。最後に、大規模偏差理論を用いて、繰り返しの逃避時間を分析する。 We analyze the finite sample regret of a decreasing step size stochastic gradient algorithm. We assume correlated noise and use a perturbed Lyapunov function as a systematic approach for the analysis. Finally we analyze the escape time of the iterates using large deviations theory.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# Kolmogorov-Arnold によるニューラルネットワーク学習の証明 The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning ( http://arxiv.org/abs/2410.08451v1 ) ライセンス: Link先を確認	Michael H. Freedman,	(参考訳) コルモゴロフとアーノルドはヒルベルトの13番目の問題(連続関数の文脈で)に答え、現代のニューラルネットワーク理論の基礎を築いた。その証明は多変量関数の表現を次の2つのステップに分割する: 最初の(非線形でない)層間写像は、データ多様体を単一の隠れ層に普遍的な埋め込みを与える。私はこのパターンを、ほぼ至るところで定義された層間写像のヤコビアンの「小さな濃度」と解釈する。微量の濃度は、ヤコビアンのより高い外界の力に比例する。本稿では、今日のディープNNにおける高次概念の出現の舞台となるような空間性について概念的議論を行い、この仮説をテストするための2つの実験のクラスを提案する。 Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# 高エントロピー合金設計のためのコルモゴロフ・アルノルドニューラルネットワーク Kolmogorov-Arnold Neural Networks for High-Entropy Alloys Design ( http://arxiv.org/abs/2410.08452v1 ) ライセンス: Link先を確認	Yagnik Bandyopadhyay, Harshil Avlani, Houlong L. Zhuang,	(参考訳) 深層学習に基づく機械学習技術は、高エントロピー合金(HEA)の設計に広く応用されており、多くの貴重な洞察を得ている。 Kolmogorov-Arnold Networks (KAN)は、最近開発されたアーキテクチャであり、入力機能の精度と解釈性の両方を改善することを目的としている。本研究では,HEA設計のための3つの異なるデータセットを探索し,分類モデルと回帰モデルの両方に対するkanの適用を実証する。最初の例では、エンタルピーと価電子濃度の混合といった様々な性質に基づいて、高エントロピー炭化物セラミックスの単相生成の確率を予測するために、KA分類モデルを用いる。第2の例では, 熱処理時間, 冷間圧延率, 均質化温度を含むプロセス条件と化学組成に基づいて, HEAの降伏強度と究極引張強度を予測するために, KA回帰モデルを用いる。第3の例は、ある組成がHEAであるか非HEAであるかを判断するカン分類モデルと、同定されたHEAのバルク率を予測するカン回帰モデルである。これら3つの例は、分類のためのF1スコアや平均正方誤差(MSE)、多層パーセプトロン(MLP)の回帰のための決定係数(R2)などの精度において、分類および回帰タスクの両処理におけるkanの有効性を実証することにより、パフォーマンスを上回るか、一致させるかのどちらかである。我々は、先進的な機械学習技術を探求し、複雑な材料をより正確に予測し、より解釈しやすくし、最終的に望ましい特性を持つHEAの発見と最適化を加速する将来的な方向を提供する。 A wide range of deep learning-based machine learning techniques are extensively applied to the design of high-entropy alloys (HEAs), yielding numerous valuable insights. Kolmogorov-Arnold Networks (KAN) is a recently developed architecture that aims to improve both the accuracy and interpretability of input features. In this work, we explore three different datasets for HEA design and demonstrate the application of KAN for both classification and regression models. In the first example, we use a KAN classification model to predict the probability of single-phase formation in high-entropy carbide ceramics based on various properties such as mixing enthalpy and valence electron concentration. In the second example, we employ a KAN regression model to predict the yield strength and ultimate tensile strength of HEAs based on their chemical composition and process conditions including annealing time, cold rolling percentage, and homogenization temperature. The third example involves a KAN classification model to determine whether a certain composition is an HEA or non-HEA, followed by a KAN regressor model to predict the bulk modulus of the identified HEA, aiming to identify HEAs with high bulk modulus. In all three examples, KAN either outperform or match the performance in terms of accuracy such as F1 score for classification and Mean Square Error (MSE), and coefficient of determination (R2) for regression of the multilayer perceptron (MLP) by demonstrating the efficacy of KAN in handling both classification and regression tasks. We provide a promising direction for future research to explore advanced machine learning techniques, which lead to more accurate predictions and better interpretability of complex materials, ultimately accelerating the discovery and optimization of HEAs with desirable properties.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# AdvDiffuser: 誘導拡散による対人安全批判運転シナリオの生成 AdvDiffuser: Generating Adversarial Safety-Critical Driving Scenarios via Guided Diffusion ( http://arxiv.org/abs/2410.08453v1 ) ライセンス: Link先を確認	Yuting Xie, Xianda Guo, Cong Wang, Kunhua Liu, Long Chen,	(参考訳) 安全クリティカルなシナリオは、自然運転環境では頻繁に発生するが、自律運転システムの訓練とテストにおいて重要な役割を担っている。一般的なアプローチは、自然環境に敵対的な調整を導入することで、シミュレーションにおいて安全クリティカルなシナリオを自動的に生成することである。これらの調整は、しばしば特定のテストシステムに適合し、異なるシステム間での転送可能性を無視している。本稿では,誘導拡散による安全クリティカルな運転シナリオを生成するための逆フレームワークであるAdvDiffuserを提案する。 AdvDiffuserは、拡散モデルを導入して、背景車両の可視的集合行動と、敵のシナリオを効果的に扱うための軽量ガイドモデルを取り込むことにより、転送可能性を促進する。 nuScenesデータセットの実験結果によると、オフラインの運転ログに基づいてトレーニングされたAdvDiffuserは、最小限のウォームアップエピソードデータを持つ様々なテストシステムに適用でき、現実性、多様性、対向性能の点で他の既存手法よりも優れている。 Safety-critical scenarios are infrequent in natural driving environments but hold significant importance for the training and testing of autonomous driving systems. The prevailing approach involves generating safety-critical scenarios automatically in simulation by introducing adversarial adjustments to natural environments. These adjustments are often tailored to specific tested systems, thereby disregarding their transferability across different systems. In this paper, we propose AdvDiffuser, an adversarial framework for generating safety-critical driving scenarios through guided diffusion. By incorporating a diffusion model to capture plausible collective behaviors of background vehicles and a lightweight guide model to effectively handle adversarial scenarios, AdvDiffuser facilitates transferability. Experimental results on the nuScenes dataset demonstrate that AdvDiffuser, trained on offline driving logs, can be applied to various tested systems with minimal warm-up episode data and outperform other existing methods in terms of realism, diversity, and adversarial performance.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# なぜ事前学習が下流の分類作業にとって有益か? Why pre-training is beneficial for downstream classification tasks? ( http://arxiv.org/abs/2410.08455v1 ) ライセンス: Link先を確認	Xin Jiang, Xu Cheng, Zechao Li,	(参考訳) 事前学習は、精度を高め、収束を早めることによって下流タスクに顕著な利点を示したが、これらの利点の正確な理由は未だに不明である。そこで本研究では,深層ニューラルネットワーク(DNN)の学習行動に新たな光を当てる新たなゲーム理論的視点から,下流タスクに対する事前学習の効果を定量的かつ明示的に説明することを提案する。具体的には、事前学習されたモデルによって符号化された知識を抽出し、定量化し、さらに微調整過程における知識の変化を追跡する。興味深いことに、下流タスクの推測のために、訓練済みのモデルの知識が少量しか保存されていないことが判明した。しかし、そのような保存された知識は、スクラッチから学ぶためのモデルトレーニングにとって非常に難しい。したがって、この学習と有用な知識の助けを借りて、事前トレーニングで微調整されたモデルは、スクラッチからトレーニングしたモデルよりもパフォーマンスが良くなります。さらに、事前学習により、より直接的かつ迅速にダウンストリームタスクの目標知識を学習し、より高速な微調整モデルの収束を導出できることがわかった。 Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear. To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs). Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process. Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks. However, such preserved knowledge is very challenging for a model training from scratch to learn. Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch. Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# ドメイン一般化された人物再識別のための統合された深部セマンティック拡張フレームワーク A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification ( http://arxiv.org/abs/2410.08456v1 ) ライセンス: Link先を確認	Eugene P. W. Ang, Shan Lin, Alex C. Kot,	(参考訳) 監視された人物再識別法(Person ReID)は,1台のカメラネットワーク内でのトレーニングおよびテストにおいて優れた性能を発揮する。しかし、通常、異なるカメラシステムに適用した場合、かなりの性能劣化に悩まされる。近年,対象ドメインからのラベル付きデータを必要とせず,優れた性能を実現するために,多数のドメイン適応型人物ReID手法が提案されている。しかし、これらのアプローチはトレーニングプロセス中にターゲットドメインのラベル付けされていないデータを必要とするため、現実の多くのシナリオでは実用的ではない。本研究は、より実践的なドメイン一般化人再識別(DG-ReID)問題に焦点を当てる。 1つ以上のソースドメインが与えられたら、目に見えないターゲットドメインに適用可能な一般化されたモデルを学ぶことを目指している。 DG-ReIDにおける有望な研究方向の1つは、暗黙的な深い意味的特徴拡張の利用であり、我々の以前の手法であるDomain Embedding Expansion (DEX)は、DG-ReIDの強力な結果をもたらす一例である。しかし,本研究では,DeXと類似の暗黙的意味的特徴拡張手法が,提案した損失関数の制限により,飽和が早すぎる傾向にあるため,大規模な評価ベンチマークにおいて,その潜在能力を最大限に発揮できないことを示す。この分析を生かして、我々の新しいフレームワークであるUnified Deep Semantic Expansionを提案する。このフレームワークは、暗黙的かつ明示的なセマンティックな特徴拡張技術を単一のフレームワークに統合し、この初期の過剰適合を緩和し、すべてのDG-ReIDベンチマークで新しい最先端(SOTA)を実現する。さらに,提案手法をより一般的な画像検索タスクに適用し,これらのベンチマークのすべてにおいて,現在のSOTAをはるかに上回っている。 Supervised Person Re-identification (Person ReID) methods have achieved excellent performance when training and testing within one camera network. However, they usually suffer from considerable performance degradation when applied to different camera systems. In recent years, many Domain Adaptation Person ReID methods have been proposed, achieving impressive performance without requiring labeled data from the target domain. However, these approaches still need the unlabeled data of the target domain during the training process, making them impractical in many real-world scenarios. Our work focuses on the more practical Domain Generalized Person Re-identification (DG-ReID) problem. Given one or more source domains, it aims to learn a generalized model that can be applied to unseen target domains. One promising research direction in DG-ReID is the use of implicit deep semantic feature expansion, and our previous method, Domain Embedding Expansion (DEX), is one such example that achieves powerful results in DG-ReID. However, in this work we show that DEX and other similar implicit deep semantic feature expansion methods, due to limitations in their proposed loss function, fail to reach their full potential on large evaluation benchmarks as they have a tendency to saturate too early. Leveraging on this analysis, we propose Unified Deep Semantic Expansion, our novel framework that unifies implicit and explicit semantic feature expansion techniques in a single framework to mitigate this early over-fitting and achieve a new state-of-the-art (SOTA) in all DG-ReID benchmarks. Further, we apply our method on more general image retrieval tasks, also surpassing the current SOTA in all of these benchmarks by wide margins.	翻訳日:2024-10-31 03:16:22 公開日:2024-10-11
# Unity is Power: リソース制限クライアントにおける構造化プルーニングを伴う大規模モデルの半非同期協調学習 Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients ( http://arxiv.org/abs/2410.08457v1 ) ライセンス: Link先を確認	Yan Li, Mingyi Li, Xiao Zhang, Guangwei Xu, Feng Chen, Yuan Yuan, Yifei Zou, Mengying Zhao, Jianbo Lu, Dongxiao Yu,	(参考訳) 本研究では,分散データセット上で大規模モデルを協調的に学習するための,巨大不均一な弱い計算能力の可能性を明らかにする。資源適応型協調学習における効率性と精度を両立させるため, 同時に, \textit{unstructured pruning}, \textit{var submodel architectures}, \textit{knowledge loss}, \textit{straggler}の課題を考える。本稿では,データ分散を意識した構造化プルーニングとブロック間知識伝達機構を備えた半非同期協調学習フレームワーク,すなわち${Co\text{-}S}^2{P}$を提案する。さらに、${Co\text{-}S}^2{P}$が$O(1/\sqrt{N^EQ})$の漸近的最適収束率を達成できるという理論的証明を与える。最後に,16個の異種ジェットソンデバイスを一体化して,最大0.11億のパラメータを持つ大規模モデルのトレーニングを行う実世界のハードウェアテストベッド上で,広範な実験を行う。実験結果によると、$Co\text{-}S^2P$はリソース使用率を最大8.8\%改善し、リソース使用率を1.2$\times$に向上し、メモリ使用量を約22\%削減し、すべてのリソース制限されたデバイスでトレーニング時間を約24\%短縮した。 In this work, we study to release the potential of massive heterogeneous weak computing power to collaboratively train large-scale models on dispersed datasets. In order to improve both efficiency and accuracy in resource-adaptive collaborative learning, we take the first step to consider the \textit{unstructured pruning}, \textit{varying submodel architectures}, \textit{knowledge loss}, and \textit{straggler} challenges simultaneously. We propose a novel semi-asynchronous collaborative training framework, namely ${Co\text{-}S}^2{P}$, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns. Furthermore, we provide theoretical proof that ${Co\text{-}S}^2{P}$ can achieve asymptotic optimal convergence rate of $O(1/\sqrt{N^EQ})$. Finally, we conduct extensive experiments on a real-world hardware testbed, in which 16 heterogeneous Jetson devices can be united to train large-scale models with parameters up to 0.11 billion. The experimental results demonstrate that $Co\text{-}S^2P$ improves accuracy by up to 8.8\% and resource utilization by up to 1.2$\times$ compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# Reward蒸留とPreference Learningの同時学習:両方を実行できる言語モデルを得る Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both ( http://arxiv.org/abs/2410.08458v1 ) ライセンス: Link先を確認	Abhijnan Nath, Changsoo Jung, Ethan Seefried, Nikhil Krishnaswamy,	(参考訳) 人選好のリワードモデリングは、使用可能な生成型大言語モデル(LLM)を構築するための基盤の1つである。従来のRLHFベースのアライメント手法は、別の報酬モデルから期待される報酬を明示的に最大化するが、より最近のDPO(Direct Preference Optimization)のような教師付きアライメント手法はこのフェーズを回避し、モデルのドリフトや報酬オーバーフィッティングなどの問題を回避する。単純さから人気があるが、DPOや同様の直接アライメント手法はいまだに退化ポリシーを導いており、ブラッドリー・テリーによる選好の定式化に大きく依存して、候補出力のペア間の報酬差をモデル化している。この定式化は、例えば2つの候補出力の人間のスコアが低いというような、非決定的またはノイズの多い選好ラベルによって挑戦される。本稿では,DRDO(Direct Reward Distillation and Policy-Optimization)について紹介する。 DRDOは、新規な嗜好確率の定式化から人間の嗜好を学習しながら、託宣によって割り当てられた報酬を直接模倣する。 Ultrafeedback と TL;DR データセットによる実験結果から,DRDO を用いて訓練したポリシーは,期待される報酬の点において,DPO や e-DPO といった従来の手法を上回り,ノイズの多い選好信号やアウト・オブ・ディストリビューション(OOD)の設定に対して,より堅牢であることが示された。 Reward modeling of human preferences is one of the cornerstones of building usable generative large language models (LLMs). While traditional RLHF-based alignment methods explicitly maximize the expected rewards from a separate reward model, more recent supervised alignment methods like Direct Preference Optimization (DPO) circumvent this phase to avoid problems including model drift and reward overfitting. Although popular due to its simplicity, DPO and similar direct alignment methods can still lead to degenerate policies, and rely heavily on the Bradley-Terry-based preference formulation to model reward differences between pairs of candidate outputs. This formulation is challenged by non-deterministic or noisy preference labels, for example human scoring of two candidate outputs is of low confidence. In this paper, we introduce DRDO (Direct Reward Distillation and policy-Optimization), a supervised knowledge distillation-based preference alignment method that simultaneously models rewards and preferences to avoid such degeneracy. DRDO directly mimics rewards assigned by an oracle while learning human preferences from a novel preference likelihood formulation. Our experimental results on the Ultrafeedback and TL;DR datasets demonstrate that policies trained using DRDO surpass previous methods such as DPO and e-DPO in terms of expected rewards and are more robust, on average, to noisy preference signals as well as out-of-distribution (OOD) settings.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# Omni-Domain Generalized Person Re-identificationのための多元的深層特徴アンサンブル学習 Diverse Deep Feature Ensemble Learning for Omni-Domain Generalized Person Re-identification ( http://arxiv.org/abs/2410.08460v1 ) ライセンス: Link先を確認	Eugene P. W. Ang, Shan Lin, Alex C. Kot,	(参考訳) 個人再識別(Person ReID)は、単一ドメインで管理されるPerson ReIDのパフォーマンスが飽和したレベルまで進んでいる。しかし、これらの手法は、異なるデータセット間でトレーニングおよびテストを行う際に、大幅なパフォーマンス低下を経験し、ドメインの一般化技術の開発を動機付けている。しかし,本研究では,単一データセットのベンチマークにおいて,ドメイン一般化手法が単一ドメイン管理手法を著しく低下させることを明らかにした。理想的なPerson ReIDメソッドは、関連するドメインの数に関係なく有効であり、テストドメインデータがトレーニングに利用できる場合は、最先端(SOTA)と同様に、完全に管理されたメソッドも実行すべきである。これはOmni-Domain Generalization Person ReID(ODG-ReID)と呼ばれるパラダイムです。本稿では,自己アンサンブルによる深い特徴の多様性を生かし,ODG-ReIDを実現する方法を提案する。提案手法であるDiverse Deep Feature Ensemble Learning (D2FEL)は,複数の多様なビューを生成し,これらのビューをコンパクトなエンコードに再結合するユニークなインスタンス正規化パターンをデプロイする。我々の知る限り、ペルソナ・リIDにおける全ドメインの一般化を考えることは少ないものの1つであり、ペルソナ・リIDにおける特徴アンサンブルの研究を進める。 D2FELは、主要なドメイン一般化と単一ドメイン管理ベンチマークのためのSOTA性能を著しく改善し、一致させる。 Person Re-identification (Person ReID) has progressed to a level where single-domain supervised Person ReID performance has saturated. However, such methods experience a significant drop in performance when trained and tested across different datasets, motivating the development of domain generalization techniques. However, our research reveals that domain generalization methods significantly underperform single-domain supervised methods on single dataset benchmarks. An ideal Person ReID method should be effective regardless of the number of domains involved, and when test domain data is available for training it should perform as well as state-of-the-art (SOTA) fully supervised methods. This is a paradigm that we call Omni-Domain Generalization Person ReID (ODG-ReID). We propose a way to achieve ODG-ReID by creating deep feature diversity with self-ensembles. Our method, Diverse Deep Feature Ensemble Learning (D2FEL), deploys unique instance normalization patterns that generate multiple diverse views and recombines these views into a compact encoding. To the best of our knowledge, our work is one of few to consider omni-domain generalization in Person ReID, and we advance the study of applying feature ensembles in Person ReID. D2FEL significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# プライバシを前方に進める - 合成データ生成によるスマート車内の情報漏洩の軽減 Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation ( http://arxiv.org/abs/2410.08462v1 ) ライセンス: Link先を確認	Krish Parikh,	(参考訳) スマートカーは大量のデータを生成し、そのほとんどが機密性があり、プライバシー侵害のリスクがある。攻撃者がこれらのデータセット内の匿名メタデータをプロファイリングドライバに活用する傾向にあるため、イノベーションや進行中の研究を妨げることなく、情報漏洩を緩和するソリューションを見つけることが重要である。合成データは、これらのプライバシー問題に対処するための有望なツールとして登場し、現実のデータ関係の複製を可能にすると同時に、機密情報を開示するリスクを最小限にする。本稿では,これらの課題に対処するための合成データの利用について検討する。まず、14の車載センサーを包括的に分類し、潜在的な攻撃を特定し、その脆弱性を分類することから始めます。次に、PVS(Passive Vehicular Sensor)データセットを使用して、100万以上のデータポイントを含むTabular Variational Autoencoder(TVAE)モデルで合成データを生成する。最後に、これらを3つのコアメトリクス – 忠実度、ユーティリティ、プライバシ – に対して評価する。その結果, 運転者のプロファイリングを防止しつつ, 本来の意図でテストした場合, 90.1%の統計的類似度と78%の分類精度を達成できた。コードはhttps://github.com/krish-parikh/Synthetic-Data-Generationにある。 Smart vehicles produce large amounts of data, much of which is sensitive and at risk of privacy breaches. As attackers increasingly exploit anonymised metadata within these datasets to profile drivers, it's important to find solutions that mitigate this information leakage without hindering innovation and ongoing research. Synthetic data has emerged as a promising tool to address these privacy concerns, as it allows for the replication of real-world data relationships while minimising the risk of revealing sensitive information. In this paper, we examine the use of synthetic data to tackle these challenges. We start by proposing a comprehensive taxonomy of 14 in-vehicle sensors, identifying potential attacks and categorising their vulnerability. We then focus on the most vulnerable signals, using the Passive Vehicular Sensor (PVS) dataset to generate synthetic data with a Tabular Variational Autoencoder (TVAE) model, which included over 1 million data points. Finally, we evaluate this against 3 core metrics: fidelity, utility, and privacy. Our results show that we achieved 90.1% statistical similarity and 78% classification accuracy when tested on its original intent while also preventing the profiling of the driver. The code can be found at https://github.com/krish-parikh/Synthetic-Data-Generation	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# ARCap:拡張現実フィードバックによるロボット学習のための高品質な人間デモ収集 ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback ( http://arxiv.org/abs/2410.08464v1 ) ライセンス: Link先を確認	Sirui Chen, Chen Wang, Kaden Nguyen, Li Fei-Fei, C. Karen Liu,	(参考訳) 人間の実演による模倣学習の進歩は,ロボットの操り方を教える上で有望な成果を上げている。トレーニングデータセットをさらにスケールアップするために、最近の研究は、物理的なロボットハードウェアを必要とせずにポータブルなデータ収集デバイスを使い始めた。しかし、データ収集中にオンボットフィードバックがないため、データ品質はユーザーの専門知識に大きく依存しており、多くのデバイスは特定のロボットの体格に限定されている。本稿では,拡張現実(AR)と触覚警告を通じて視覚的フィードバックを提供する携帯型データ収集システムARCapを提案する。広範にわたるユーザスタディを通じて,ARCapは,ロボットキネマティクスにマッチし,シーンとの衝突を避けるロボット実行可能なデータ収集を可能にする。 ARCapから収集されたデータにより、ロボットは散らかった環境での操作や長い水平交叉操作といった困難なタスクを実行できる。 ARCapは完全にオープンソースで、キャリブレーションが簡単で、すべてのコンポーネントは既製の製品から作られている。詳細と結果は、私たちのWebサイトにある。 Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products. More details and results can be found on our website: https://stanford-tml.github.io/ARCap	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# Omni-Domain Generalized Person Re-Identificationのための配向分岐経路 Aligned Divergent Pathways for Omni-Domain Generalized Person Re-Identification ( http://arxiv.org/abs/2410.08466v1 ) ライセンス: Link先を確認	Eugene P. W. Ang, Shan Lin, Alex C. Kot,	(参考訳) パーソン・リID (Person ReID) は、完全に監督され、ドメインが一般化されたパーソン R e ID において著しく進歩している。しかし、一方のタスクドメインの転送のために開発されたメソッドは他方によくない。理想的なPerson ReIDメソッドは、トレーニングやテストに関わるドメインの数に関係なく有効であるべきです。さらに、対象ドメインからのトレーニングデータから、少なくとも最先端(SOTA)のPerson ReIDメソッドと同様に、実行すべきである。我々は、このパラダイムをODG-ReIDと呼ぶOmni-Domain Generalization Person ReIDと呼び、互換性のあるバックボーンアーキテクチャを複数の多様な経路に拡張することで、これを実現する方法を提案する。提案手法であるAligned Divergent Pathways (ADP) は,まずベースアーキテクチャを元のバックボーンのテールをコピーしてマルチブランチ構造に変換する。 DyMAIN(Dynamic Max-Deviance Adaptive Instance Normalization)を設計し、Omniドメイン方向に対して堅牢な一般化特徴の学習を促進し、DyMAINをADPのブランチに適用する。提案したPMoC(Pysid Mixture-of-Cosines)は,より多様な学習を行うために,枝間で安定な学習率と乱流の学習率の混合を協調する。最後に,提案した次元距離損失(DCML)を用いて,枝間の特徴空間を同定する。 ADPは、マルチソースドメインの一般化のための最先端(SOTA)結果より優れ、同じドメイン内でReIDを教師する。さらに,本手法は,Person ReIDタスクに対するOmni-Domain Generalizationを達成し,幅広い単一ソース領域の一般化ベンチマークの改善を示す。 Person Re-identification (Person ReID) has advanced significantly in fully supervised and domain generalized Person R e ID. However, methods developed for one task domain transfer poorly to the other. An ideal Person ReID method should be effective regardless of the number of domains involved in training or testing. Furthermore, given training data from the target domain, it should perform at least as well as state-of-the-art (SOTA) fully supervised Person ReID methods. We call this paradigm Omni-Domain Generalization Person ReID, referred to as ODG-ReID, and propose a way to achieve this by expanding compatible backbone architectures into multiple diverse pathways. Our method, Aligned Divergent Pathways (ADP), first converts a base architecture into a multi-branch structure by copying the tail of the original backbone. We design our module Dynamic Max-Deviance Adaptive Instance Normalization (DyMAIN) that encourages learning of generalized features that are robust to omni-domain directions and apply DyMAIN to the branches of ADP. Our proposed Phased Mixture-of-Cosines (PMoC) coordinates a mix of stable and turbulent learning rate schedules among branches for further diversified learning. Finally, we realign the feature space between branches with our proposed Dimensional Consistency Metric Loss (DCML). ADP outperforms the state-of-the-art (SOTA) results for multi-source domain generalization and supervised ReID within the same domain. Furthermore, our method demonstrates improvement on a wide range of single-source domain generalization benchmarks, achieving Omni-Domain Generalization over Person ReID tasks.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# 可溶性広帯域相互作用を持つ格子フェルミオン Lattice fermions with solvable wide range interactions ( http://arxiv.org/abs/2410.08467v1 ) ライセンス: Link先を確認	Ryu Sasaki,	(参考訳) 厳密に解ける(スピンレス)格子フェルミオンは、数年前に小田家と私によって報告された、正確に解ける定常かつ可逆的なマルコフ鎖によって明示的に構成される。定常分布 $\pi$ に対する $\mathcal{K}^R$ の可逆性は、正の古典的ハミルトニアン $\mathcal{H}^R$ につながる。 $\mathcal{H}^R$ の正確な可解性は、スピンレス格子フェルミオン $c_x$, $c_x^\dagger$, $\mathcal{H}^R_f=\sum_{x,y\in\mathcal{X}}c_x^\dagger\mathcal{H}^R(x,y) c_y$ の保証である。可逆マルコフ連鎖 $\mathcal{K}^R$ はアスキースキースキームの離散直交多項式の直交測度の畳み込みによって構成される。広い範囲の相互作用を持つフェルミオン系のいくつかの明示的な例を示す。 Exactly solvable (spinless) lattice fermions with wide range interactions are constructed explicitly based on {\em exactly solvable stationary and reversible Markov chains} $\mathcal{K}^R$ reported a few years earlier by Odake and myself. The reversibility of $\mathcal{K}^R$ with the stationary distribution $\pi$ leads to a positive classical Hamiltonian $\mathcal{H}^R$. The exact solvability of $\mathcal{H}^R$ warrants that of a spinless lattice fermion $c_x$, $c_x^\dagger$, $\mathcal{H}^R_f=\sum_{x,y\in\mathcal{X}}c_x^\dagger\mathcal{H}^R(x,y) c_y$ based on the principle advocated recently by myself. The reversible Markov chains $\mathcal{K}^R$ are constructed by convolutions of the orthogonality measures of the discrete orthogonal polynomials of Askey scheme. Several explicit examples of the fermion systems with wide range interactions are presented.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# DAT:人間エンゲージメント推定のためのModality-Group Fusionを用いた対話認識変換器 DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation ( http://arxiv.org/abs/2410.08470v1 ) ライセンス: Link先を確認	Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong,	(参考訳) エンゲージメント推定は、人間の社会的行動を理解する上で重要な役割を担い、感情コンピューティングや人間とコンピュータの相互作用といった分野における研究の関心を惹きつける。本稿では,対話における人間のエンゲージメントを推定するために,音声・視覚入力のみに依存し,言語に依存しないモダリティ・グループ・フュージョン(MGF)を用いた対話対応トランスフォーマフレームワーク(DAT)を提案する。具体的には、音声・視覚コンテンツ全体を推測する前に、各人ごとのモーダル内での音響特徴と視覚的特徴を独立に融合するモーダル群融合戦略を用いる。この戦略はモデルの性能と堅牢性を大幅に向上させる。さらに、対象者のエンゲージメントレベルをより正確に推定するために、紹介された対話意識変換器は、参加者の行動と会話相手からの手がかりの両方を考慮する。提案手法は,MultiMediate'24が実施したマルチドメインエンゲージメント推定チャレンジで厳密に検証され,ベースラインモデルに対するエンゲージメントレベル回帰精度の顕著な改善が示された。提案手法は,NoXiベーステストセットの平均CCCスコア0.76,NoXiベース,NoXi-Add,MPIIGIテストセットの平均CCC0.64を達成する。 Engagement estimation plays a crucial role in understanding human social behaviors, attracting increasing research interests in fields such as affective computing and human-computer interaction. In this paper, we propose a Dialogue-Aware Transformer framework (DAT) with Modality-Group Fusion (MGF), which relies solely on audio-visual input and is language-independent, for estimating human engagement in conversations. Specifically, our method employs a modality-group fusion strategy that independently fuses audio and visual features within each modality for each person before inferring the entire audio-visual content. This strategy significantly enhances the model's performance and robustness. Additionally, to better estimate the target participant's engagement levels, the introduced Dialogue-Aware Transformer considers both the participant's behavior and cues from their conversational partners. Our method was rigorously tested in the Multi-Domain Engagement Estimation Challenge held by MultiMediate'24, demonstrating notable improvements in engagement-level regression precision over the baseline model. Notably, our approach achieves a CCC score of 0.76 on the NoXi Base test set and an average CCC of 0.64 across the NoXi Base, NoXi-Add, and MPIIGI test sets.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# Deep Graph Convolutional Networksのより深い洞察:安定性と一般化 Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization ( http://arxiv.org/abs/2410.08473v1 ) ライセンス: Link先を確認	Guangrui Yang, Ming Li, Han Feng, Xiaosheng Zhuang,	(参考訳) グラフ畳み込みネットワーク(GCN)は、グラフ学習タスクの強力なモデルとして登場し、様々な領域で有望なパフォーマンスを示している。彼らの経験的成功は明らかだが、理論的な観点から本質的な能力を理解する必要性が高まっている。既存の理論的研究は主に単層GCNの解析に重点を置いているが、深いGCNの安定性と一般化に関する包括的な理論的探索は依然として限られている。本稿では,このギャップを深いGCNの安定性と一般化特性を掘り下げることで橋渡しする。本理論により,深いGCNの安定性と一般化は,グラフフィルタ演算子の絶対固有値の最大値やネットワークの深さなど,特定の要因の影響を受けていることが明らかとなった。我々の理論的研究は、深いGCNの安定性と一般化特性のより深い理解に寄与し、より信頼性が高く良好なモデルを開発するための道を開く可能性がある。 Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks, exhibiting promising performance in various domains. While their empirical success is evident, there is a growing need to understand their essential ability from a theoretical perspective. Existing theoretical research has primarily focused on the analysis of single-layer GCNs, while a comprehensive theoretical exploration of the stability and generalization of deep GCNs remains limited. In this paper, we bridge this gap by delving into the stability and generalization properties of deep GCNs, aiming to provide valuable insights by characterizing rigorously the associated upper bounds. Our theoretical results reveal that the stability and generalization of deep GCNs are influenced by certain key factors, such as the maximum absolute eigenvalue of the graph filter operators and the depth of the network. Our theoretical studies contribute to a deeper understanding of the stability and generalization properties of deep GCNs, potentially paving the way for developing more reliable and well-performing models.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# GIVE:知識グラフにインスパイアされた正当性外挿による構造化推論 GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation ( http://arxiv.org/abs/2410.08475v1 ) ライセンス: Link先を確認	Jiashu He, Mingyu Derek Ma, Jinxuan Fan, Dan Roth, Wei Wang, Alejandro Ribeiro,	(参考訳) 既存の検索に基づく大規模言語モデル(LLM)の推論手法は、ドメイン知識と明示的な推論チェーンを提供するために、非パラメトリックな知識ソースの密度と品質に大きく依存している。しかし、包括的知識源は高価であり、科学ドメインやコーナードメインのために構築することができない場合もある。この課題に対処するために,グラフインスパイアされた正当性外挿法(GIVE)を導入する。これは,パラメトリックメモリと非パラメトリックメモリを統合し,知識検索と忠実な推論プロセスの両面を,非常にスパースな知識グラフ上で強化する新しい推論フレームワークである。外部構造知識を活用してLCMを刺激し,関連する概念間の相互関係をモデル化することにより,金の回答検索よりも専門家の問題解決に類似した論理的,ステップワイドな推論手法を実現する。具体的には、このフレームワークはLLMに対して、クエリを重要な概念と属性に分解し、関連するエンティティを持つエンティティグループを構築し、これらのエンティティグループにまたがるノードペア間の潜在的な関係を探索することによって、拡張された推論チェーンを構築するよう促す。提案手法は, 事実と外挿の両方のリンクを組み込んで, 包括的理解と応答生成を可能にする。バイオメディカルおよびコモンセンスQAにおける推論・インセンスベンチマークの大規模な実験により,提案手法の有効性が示された。具体的には、GIVE は GPT3.5-turbo を追加のトレーニングコストなしで GPT4 のような先進モデルより優れており、そのため、外部リソースの限られた特殊なタスクに対処するための構造化情報の統合と LLM の内部推論能力の有効性が強調される。 Existing retrieval-based reasoning approaches for large language models (LLMs) heavily rely on the density and quality of the non-parametric knowledge source to provide domain knowledge and explicit reasoning chain. However, inclusive knowledge sources are expensive and sometimes infeasible to build for scientific or corner domains. To tackle the challenges, we introduce Graph Inspired Veracity Extrapolation (GIVE), a novel reasoning framework that integrates the parametric and non-parametric memories to enhance both knowledge retrieval and faithful reasoning processes on very sparse knowledge graphs. By leveraging the external structured knowledge to inspire LLM to model the interconnections among relevant concepts, our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval. Specifically, the framework prompts LLMs to decompose the query into crucial concepts and attributes, construct entity groups with relevant entities, and build an augmented reasoning chain by probing potential relationships among node pairs across these entity groups. Our method incorporates both factual and extrapolated linkages to enable comprehensive understanding and response generation. Extensive experiments on reasoning-intense benchmarks on biomedical and commonsense QA demonstrate the effectiveness of our proposed method. Specifically, GIVE enables GPT3.5-turbo to outperform advanced models like GPT4 without any additional training cost, thereby underscoring the efficacy of integrating structured information and internal reasoning ability of LLMs for tackling specialized tasks with limited external resources.	翻訳日:2024-10-31 03:06:36 公開日:2024-10-11
# 動的語彙による生成 Generation with Dynamic Vocabulary ( http://arxiv.org/abs/2410.08481v1 ) ライセンス: Link先を確認	Yanting Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, Xiaoling Wang,	(参考訳) 言語モデルのための動的語彙を新たに導入する。生成中に任意のテキストスパンを含むことができる。これらのテキストは、伝統的な静的語彙のトークンに似た、基本的な世代のレンガとして機能する。その結果,マルチトークンの生成能力は生成品質と効率の両方をアトミックに向上させる(標準言語モデルと比較すると,MAUVEメトリックは25%向上し,レイテンシは20%低下する)。動的語彙はプラグイン・アンド・プレイ方式で展開できるため、様々なダウンストリームアプリケーションには魅力的である。例えば、動的語彙は訓練のない方法で異なる領域に適用できることを示す。また、質問応答タスク(回答精度を損なうことなく、サブストラテティヴに引用結果を強化する)において、信頼性の高い引用を生成するのにも役立ちます。 We introduce a new dynamic vocabulary for language models. It can involve arbitrary text spans during generation. These text spans act as basic generation bricks, akin to tokens in the traditional static vocabularies. We show that, the ability to generate multi-tokens atomically improve both generation quality and efficiency (compared to the standard language model, the MAUVE metric is increased by 25%, the latency is decreased by 20%). The dynamic vocabulary can be deployed in a plug-and-play way, thus is attractive for various downstream applications. For example, we demonstrate that dynamic vocabulary can be applied to different domains in a training-free manner. It also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# GFVCを超えて - 適応的なビジュアルトークンを備えたプログレッシブな顔ビデオ圧縮フレームワーク Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens ( http://arxiv.org/abs/2410.08485v1 ) ライセンス: Link先を確認	Bolin Chen, Shanzhi Yin, Zihan Zhang, Jie Chen, Ru-Ling Liao, Lingyu Zhu, Shiqi Wang, Yan Ye,	(参考訳) 近年、深層生成モデルにより、将来性のある速度歪み性能と多種多様なアプリケーション機能に向けて、顔映像符号化の進歩が著しく進んでいる。従来のハイブリッドビデオ符号化のパラダイムを超えて、GFVC(Generative Face Video Compression)は、深層生成モデルの強力な能力と初期のモデルベース符号化(MBC)の哲学を頼りに、視覚的顔信号のコンパクトな表現と現実的な再構築を容易にし、超低ビットレートの顔ビデオ通信を実現する。しかし、これらのGFVCアルゴリズムは、不安定な再構成品質と限られたビットレート範囲に直面することがある。これらの問題に対処するために, 適応型視覚トークンを用いた新しいプログレッシブ・フェイス・ビデオ圧縮フレームワークであるPFVCを提案し, 再構成ロバスト性と帯域幅インテリジェンスとの異例なトレードオフを実現する。特に、提案したPFVCのエンコーダは、高次元の顔信号をプログレッシブな方法で適応的な視覚トークンに投影し、デコーダは、これらの適応的な視覚トークンを運動推定や信号合成のために、異なる粒度レベルで再構築することができる。実験により,提案したPFVCフレームワークは,最新のVersatile Video Coding(VVC)コーデックや最先端GFVCアルゴリズムと比較して,符号化の柔軟性と速度歪み性能を向上できることを示した。プロジェクトのページはhttps://github.com/Berlin0610/PFVCで見ることができる。 Recently, deep generative models have greatly advanced the progress of face video coding towards promising rate-distortion performance and diverse application functionalities. Beyond traditional hybrid video coding paradigms, Generative Face Video Compression (GFVC) relying on the strong capabilities of deep generative models and the philosophy of early Model-Based Coding (MBC) can facilitate the compact representation and realistic reconstruction of visual face signal, thus achieving ultra-low bitrate face video communication. However, these GFVC algorithms are sometimes faced with unstable reconstruction quality and limited bitrate ranges. To address these problems, this paper proposes a novel Progressive Face Video Compression framework, namely PFVC, that utilizes adaptive visual tokens to realize exceptional trade-offs between reconstruction robustness and bandwidth intelligence. In particular, the encoder of the proposed PFVC projects the high-dimensional face signal into adaptive visual tokens in a progressive manner, whilst the decoder can further reconstruct these adaptive visual tokens for motion estimation and signal synthesis with different granularity levels. Experimental results demonstrate that the proposed PFVC framework can achieve better coding flexibility and superior rate-distortion performance in comparison with the latest Versatile Video Coding (VVC) codec and the state-of-the-art GFVC algorithms. The project page can be found at https://github.com/Berlin0610/PFVC.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# 量子信頼実行環境のための量子オペレーティングシステム Quantum Operating System Support for Quantum Trusted Execution Environments ( http://arxiv.org/abs/2410.08486v1 ) ライセンス: Link先を確認	Theodoros Trochatos, Jakub Szefer,	(参考訳) クラウドベースの量子コンピューティングへの依存が高まり、量子コンピューティングの機密性と完全性を保証することが最重要である。 Quantum Trusted Execution Environments (QTEE) は、リモートクラウドベースの量子コンピュータに送信されたユーザの量子回路を保護するために提案されている。しかし、QTEEの配備にはQTEEのハードウェアと操作をサポートする量子オペレーティングシステム(QOS)が必要である。この作業では、クラウドプラットフォーム上でのセキュアな量子タスク実行に必要な、QOSをサポートするための最初のアーキテクチャを導入している。 With the growing reliance on cloud-based quantum computing, ensuring the confidentiality and integrity of quantum computations is paramount. Quantum Trusted Execution Environments (QTEEs) have been proposed to protect users' quantum circuits when they are submitted to remote cloud-based quantum computers. However, deployment of QTEEs necessitates a Quantum Operating Systems (QOS) that can support QTEEs hardware and operation. This work introduces the first architecture for a QOS to support and enable essential steps required for secure quantum task execution on cloud platforms.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# コントラストフリー血管造影のためのCAS-GAN CAS-GAN for Contrast-free Angiography Synthesis ( http://arxiv.org/abs/2410.08490v1 ) ライセンス: Link先を確認	De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Hao Li, Tian-Yu Xiang, Zeng-Guang Hou,	(参考訳) ヨウ化コントラスト剤は、多くの介入手順で広く利用されるが、患者にかなりの健康リスクをもたらす。 CAS-GANは「仮想コントラスト剤」として機能する新規なGANフレームワークであり, 血管意味指導によるX線アンジオグラフィーを合成し, 介入処理中のヨウ素化剤への依存を低減させる。具体的には,X線アンギオグラフィーを背景および血管成分に分解し,医学的先行知識を活用する。特殊予測器は、これらのコンポーネント間の相互関係をマップする。さらに、生成した画像の視覚的忠実度を高めるために、容器意味誘導ジェネレータとそれに対応する損失関数を導入する。 CAS-GANのFIDは5.94,MDは0.017であった。これらの有望な結果はCAS-GANの臨床応用の可能性を強調している。 Iodinated contrast agents are widely utilized in numerous interventional procedures, yet posing substantial health risks to patients. This paper presents CAS-GAN, a novel GAN framework that serves as a ``virtual contrast agent" to synthesize X-ray angiographies via disentanglement representation learning and vessel semantic guidance, thereby reducing the reliance on iodinated agents during interventional procedures. Specifically, our approach disentangles X-ray angiographies into background and vessel components, leveraging medical prior knowledge. A specialized predictor then learns to map the interrelationships between these components. Additionally, a vessel semantic-guided generator and a corresponding loss function are introduced to enhance the visual fidelity of generated images. Experimental results on the XCAD dataset demonstrate the state-of-the-art performance of our CAS-GAN, achieving a FID of 5.94 and a MMD of 0.017. These promising results highlight CAS-GAN's potential for clinical applications.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# 自動走行におけるエッジケース検出のシステムレビュー:方法,課題,今後の方向性 A Systematic Review of Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions ( http://arxiv.org/abs/2410.08491v1 ) ライセンス: Link先を確認	Saeed Rahmani, Sabine Rieder, Erwin de Gelder, Marcel Sonntag, Jorge Lorente Mallada, Sytze Kalisvaart, Vahid Hashemi, Simeon C. Calvert,	(参考訳) 自動車両(AV)の急速な開発は、安全と効率を高めることで輸送に革命をもたらすことを約束している。しかしながら、様々な現実世界の状況における信頼性を確保することは、特にエッジケースとして知られる稀で予期せぬ状況のため、重要な課題である。エッジケースの検出には多くのアプローチが存在するが、これらのテクニックを体系的にレビューする包括的な調査が欠如している。本稿では,エッジケースの検出と評価手法の体系的分類を実践的かつ階層的に検討することにより,このギャップを埋める。本分類は, AVモジュールによる検出手法の分類, 知覚関連, 軌跡関連エッジケースの分類, および基礎となる方法論と理論に基づく2つのレベルで構成されている。我々は、この分類を「知識駆動」アプローチと呼ばれる新しいクラスを導入することで拡張する。さらに,エッジケース検出手法の評価手法と,エッジケースの同定手法について検討した。我々の知る限りでは、すべてのAVサブシステムにおけるエッジケース検出手法を包括的にカバーし、知識駆動エッジケースについて議論し、検出方法の評価手法を探求する最初の調査である。この構造化・多面解析は、AVのターゲットとなる研究とモジュラーテストを促進することを目的としている。さらに、様々なアプローチの長所と短所を特定し、課題と今後の方向性について議論することにより、効率的なエッジケース検出を通じて自動運転(AD)システムの安全性と信頼性を高めるために、AV開発者、研究者、政策立案者を支援することを目的とする。 The rapid development of automated vehicles (AVs) promises to revolutionize transportation by enhancing safety and efficiency. However, ensuring their reliability in diverse real-world conditions remains a significant challenge, particularly due to rare and unexpected situations known as edge cases. Although numerous approaches exist for detecting edge cases, there is a notable lack of a comprehensive survey that systematically reviews these techniques. This paper fills this gap by presenting a practical, hierarchical review and systematic classification of edge case detection and assessment methodologies. Our classification is structured on two levels: first, categorizing detection approaches according to AV modules, including perception-related and trajectory-related edge cases; and second, based on underlying methodologies and theories guiding these techniques. We extend this taxonomy by introducing a new class called "knowledge-driven" approaches, which is largely overlooked in the literature. Additionally, we review the techniques and metrics for the evaluation of edge case detection methods and identified edge cases. To our knowledge, this is the first survey to comprehensively cover edge case detection methods across all AV subsystems, discuss knowledge-driven edge cases, and explore evaluation techniques for detection methods. This structured and multi-faceted analysis aims to facilitate targeted research and modular testing of AVs. Moreover, by identifying the strengths and weaknesses of various approaches and discussing the challenges and future directions, this survey intends to assist AV developers, researchers, and policymakers in enhancing the safety and reliability of automated driving (AD) systems through effective edge case detection.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# ミニマックス問題に対するシャーパリスク境界に向けて Towards Sharper Risk Bounds for Minimax Problems ( http://arxiv.org/abs/2410.08497v1 ) ライセンス: Link先を確認	Bowei Zhu, Shaojie Li, Yong Liu,	(参考訳) ミニマックス問題は、敵の訓練、堅牢な最適化、強化学習などの機械学習で成功している。理論解析において、一般化誤差と最適化誤差によって構成される現在の最適過大なリスク境界は、強凸強対流(SC-SC)設定において1/nレートを示す。既存の研究は主に最適化誤差の特定のアルゴリズムによるミニマックス問題に焦点を合わせており、より優れた過大なリスク境界を制限する一般化性能についてはほとんど研究されていない。本稿では,一様局所収束を用いた一次関数の勾配によって測定される一般化境界について検討する。我々は,非凸強対流(NC-SC)確率最小値問題に対して,よりシャープな高確率一般化誤差を求める。さらに、外層に対して、ポリアック・ロジャシエヴィチ条件の下で次元に依存しない結果を与える。一般化誤差に基づいて、経験的サドル点(ESP)、勾配勾配上昇(GDA)、確率勾配上昇(SGDA)などの一般的なアルゴリズムを分析した。我々は、より合理的な仮定を伴って、過剰な原始的リスク境界を導出するが、これは私たちの知識の最も良いところは、ミニマックス問題における結果よりもn倍高速である。 Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# 計算イメージングにおける隠れ特性について On a Hidden Property in Computational Imaging ( http://arxiv.org/abs/2410.08498v1 ) ライセンス: Link先を確認	Yinan Feng, Yinpeng Chen, Yueh Lee, Youzuo Lin,	(参考訳) FWI(Full Waveform Inversion)、CT(Computerd Tomography)、EM(Electromagnetic)インバージョン(EM)インバージョンなど、様々な科学的・医学的応用において、計算画像は重要な役割を担っている。これらの手法は、両モードが複雑な数学的方程式によって制御される測定データ(例えば、FWIの地震波形データ)から物理特性(例えば、FWIの音響速度マップ)を再構成することで、逆問題に対処する。本稿では, 異なる支配方程式にもかかわらず, 3つの逆問題 (FWI, CT, EM) が潜在空間内に隠れた性質を共有することを実証的に示す。具体的には、FWI を例として、モーダル性(速度マップと地震波形データ)が、潜時空間において同じ一方向波動方程式のセットに従うが、線形に相関する異なる初期条件を持つことを示す。このことは、潜在埋め込み空間への射影の後、2つのモジュラリティが、その初期条件を通して連結された同じ方程式の異なる解に対応することを示唆している。実験により,この隠蔽特性は3つの画像問題すべてに一貫性があることが確認された。 Computational imaging plays a vital role in various scientific and medical applications, such as Full Waveform Inversion (FWI), Computed Tomography (CT), and Electromagnetic (EM) inversion. These methods address inverse problems by reconstructing physical properties (e.g., the acoustic velocity map in FWI) from measurement data (e.g., seismic waveform data in FWI), where both modalities are governed by complex mathematical equations. In this paper, we empirically demonstrate that despite their differing governing equations, three inverse problems (FWI, CT, and EM inversion) share a hidden property within their latent spaces. Specifically, using FWI as an example, we show that both modalities (the velocity map and seismic waveform data) follow the same set of one-way wave equations in the latent space, yet have distinct initial conditions that are linearly correlated. This suggests that after projection into the latent embedding space, the two modalities correspond to different solutions of the same equation, connected through their initial conditions. Our experiments confirm that this hidden property is consistent across all three imaging problems, providing a novel perspective for understanding these computational imaging tasks.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# ログレベルの提案のための大規模言語モデルの学習とベンチマーク Studying and Benchmarking Large Language Models For Log Level Suggestion ( http://arxiv.org/abs/2410.08499v1 ) ライセンス: Link先を確認	Yi Wen Heng, Zeyang Ma, Zhenhao Li, Dong Jae Kim, Tse-Hsun, Chen,	(参考訳) 大規模言語モデル(LLM)は、ソフトウェア工学を含む様々な分野における研究の焦点となり、その能力はますます活用されている。近年の研究では,LLMをソフトウェア開発ツールやフレームワークに統合し,テキストおよびコード関連タスクのパフォーマンス向上の可能性を明らかにしている。ログレベルはロギングステートメントの重要な部分であり、開発者はシステム実行中に記録された情報を制御できる。ログメッセージが自然言語とコードライクな変数を混在することが多いことから、LLMの言語翻訳能力は、ロギングステートメントに適した冗長度を決定するために応用できる。本稿では,12個のオープンソースLCMの性能に及ぼす特性と学習パラダイムの影響を,ログレベルの提案で詳細に分析する。機密情報を効果的に保護し、データセキュリティを維持しながら、社内コードの利用を可能にするため、オープンソースモデルを選択しました。我々は,Zero-shot,Few-shot,Few-tuningなど,多種多様なLCMを用いて,正確なログレベル提案のための最も効果的な組み合わせを特定する。私たちの研究は、9つの大規模なJavaシステムで実施された実験によって支援されています。その結果,より小規模なLLMは適切な指導と適切な手法で効果的に動作可能であるが,ログレベルを提案する能力が向上する可能性は高いことがわかった。 Large Language Models (LLMs) have become a focal point of research across various domains, including software engineering, where their capabilities are increasingly leveraged. Recent studies have explored the integration of LLMs into software development tools and frameworks, revealing their potential to enhance performance in text and code-related tasks. Log level is a key part of a logging statement that allows software developers control the information recorded during system runtime. Given that log messages often mix natural language with code-like variables, LLMs' language translation abilities could be applied to determine the suitable verbosity level for logging statements. In this paper, we undertake a detailed empirical analysis to investigate the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion. We opted for open-source models because they enable us to utilize in-house code while effectively protecting sensitive information and maintaining data security. We examine several prompting strategies, including Zero-shot, Few-shot, and fine-tuning techniques, across different LLMs to identify the most effective combinations for accurate log level suggestions. Our research is supported by experiments conducted on 9 large-scale Java systems. The results indicate that although smaller LLMs can perform effectively with appropriate instruction and suitable techniques, there is still considerable potential for improvement in their ability to suggest log levels.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# セマンティック-トポ-メトリ表現誘導LDM推論による空中視覚・言語ナビゲーション Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning ( http://arxiv.org/abs/2410.08500v1 ) ライセンス: Link先を確認	Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao,	(参考訳) ALN(Aerial Vision-and-Language Navigation)は、無人航空機(Unmanned Aerial Vehicles、UAV)が自然言語の指示や視覚的手がかりを通じて屋外の環境を航行できるようにする新しいタスクである。屋外空間の複雑な空間的関係のため、依然として困難である。本稿では,大規模言語モデル(LLM)をアクション予測のエージェントとして導入する,航空VLNタスクのためのエンドツーエンドゼロショットフレームワークを提案する。具体的には,LLMの空間的推論能力を高めるために,Stemantic-Topo-Metric Representation (STMR) を開発した。これは、ランドマークの指示関連セマンティックマスクを、周囲のランドマークの位置情報を含むトップダウンマップに抽出して投影することで達成される。さらに、この地図は、LLMへのテキストプロンプトとして距離メトリクスを持つ行列表現に変換され、命令に従って動作予測を行う。 AerialVLN-Sデータセット上でのOracle Success Rate(OSR)において、15.9%と12.5%の改善(絶対)を達成した。 Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues. It remains challenging due to the complex spatial relationships in outdoor aerial scenes. In this paper, we propose an end-to-end zero-shot framework for aerial VLN tasks, where the large language model (LLM) is introduced as our agent for action prediction. Specifically, we develop a novel Semantic-Topo-Metric Representation (STMR) to enhance the spatial reasoning ability of LLMs. This is achieved by extracting and projecting instruction-related semantic masks of landmarks into a top-down map that contains the location information of surrounding landmarks. Further, this map is transformed into a matrix representation with distance metrics as the text prompt to the LLM, for action prediction according to the instruction. Experiments conducted in real and simulation environments have successfully proved the effectiveness and robustness of our method, achieving 15.9% and 12.5% improvements (absolute) in Oracle Success Rate (OSR) on AerialVLN-S dataset.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# 対人訓練はロバスト性を改善する:構造化データに基づく特徴学習過程の理論解析 Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data ( http://arxiv.org/abs/2410.08503v1 ) ライセンス: Link先を確認	Binghui Li, Yuanzhi Li,	(参考訳) 敵のトレーニングは、敵の摂動に対して堅牢であるようにディープニューラルネットワークをトレーニングするための広く応用されたアプローチである。しかし、実際は敵の訓練が経験的な成功をおさめているが、なぜ敵の例が存在するのか、また敵の訓練方法がモデル堅牢性をどのように改善するかはいまだ不明である。本稿では, 特徴学習理論の観点から, 対角的例と対角的学習アルゴリズムの理論的理解を提供する。具体的には, 頑健な特徴, 摂動に抵抗するがスパースに抵抗する特徴, 摂動に敏感な非破壊的特徴, という2つのタイプの特徴から構成される。我々は、構造化データを学ぶために、2層スムーズなReLU畳み込みニューラルネットワークを訓練する。まず、ネットワーク学習者は、標準的な訓練(経験的リスクよりも緩やかな降下)を用いることで、頑健な特徴よりも非破壊的特徴を学習し、それによって、負の非破壊的特徴方向に整合した摂動によって生じる逆の例が導かれることを証明した。そこで, 勾配に基づく逆数学習アルゴリズムについて検討し, モデル更新において, 逆数を求めるために勾配を上昇させ, 経験的リスクよりも勾配を降下させ, モデル更新を行う。本手法は,頑健な特徴学習を効果的に強化し,非ロバストな特徴学習を抑え,ネットワークの堅牢性を向上させることができることを示す。最後に,MNIST, CIFAR10, SVHNなどの実画像データセットを用いた実験により, 理論的知見を実証的に検証した。 Adversarial training is a widely-applied approach to training deep neural networks to be robust against adversarial perturbation. However, although adversarial training has achieved empirical success in practice, it still remains unclear why adversarial examples exist and how adversarial training methods improve model robustness. In this paper, we provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory. Specifically, we focus on a multiple classification setting, where the structured data can be composed of two types of features: the robust features, which are resistant to perturbation but sparse, and the non-robust features, which are susceptible to perturbation but dense. We train a two-layer smoothed ReLU convolutional neural network to learn our structured data. First, we prove that by using standard training (gradient descent over the empirical risk), the network learner primarily learns the non-robust feature rather than the robust feature, which thereby leads to the adversarial examples that are generated by perturbations aligned with negative non-robust feature directions. Then, we consider the gradient-based adversarial training algorithm, which runs gradient ascent to find adversarial examples and runs gradient descent over the empirical risk at adversarial examples to update models. We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning to improve the network robustness. Finally, we also empirically validate our theoretical findings with experiments on real-image datasets, including MNIST, CIFAR10 and SVHN.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# 弱視下腹腔鏡下画像分割に対するベイズ的アプローチ A Bayesian Approach to Weakly-supervised Laparoscopic Image Segmentation ( http://arxiv.org/abs/2410.08509v1 ) ライセンス: Link先を確認	Zhou Zheng, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kensaku Mori,	(参考訳) 本稿では,スパースアノテーションを用いた腹腔鏡下画像分割法について検討する。本稿では,モデルセグメンテーションの精度と解釈可能性の向上を目的として,ベイズ的枠組みを包括的に構築し,ロバストかつ理論的に検証された手法を確実にする新しいベイズ的深層学習手法を提案する。提案手法は,観察画像とそれに対応する弱いアノテーションを直接訓練する従来の手法と異なる。その代わり、得られたデータから画像とラベルの同時分布を推定する。これにより、画像とその高品質な擬似ラベルのサンプリングが容易になり、一般化可能なセグメンテーションモデルのトレーニングが可能になる。モデルの各コンポーネントは確率的定式化によって表現され、コヒーレントで解釈可能な構造を提供する。この確率的性質は、スパースアノテーションから正確で実践的な学習の恩恵を受け、不確実性を定量化する能力に私たちのモデルを装備する。 2つの公開腹腔鏡的データセットによる広範囲な評価の結果,既存の手法よりも優れた結果が得られた。さらに, 本法はスクリブル制御型心筋多構造分割法に適応し, 従来法と比較して高い性能を示した。コードはhttps://github.com/MoriLabNU/Bayesian_WSS.comで公開されている。 In this paper, we study weakly-supervised laparoscopic image segmentation with sparse annotations. We introduce a novel Bayesian deep learning approach designed to enhance both the accuracy and interpretability of the model's segmentation, founded upon a comprehensive Bayesian framework, ensuring a robust and theoretically validated method. Our approach diverges from conventional methods that directly train using observed images and their corresponding weak annotations. Instead, we estimate the joint distribution of both images and labels given the acquired data. This facilitates the sampling of images and their high-quality pseudo-labels, enabling the training of a generalizable segmentation model. Each component of our model is expressed through probabilistic formulations, providing a coherent and interpretable structure. This probabilistic nature benefits accurate and practical learning from sparse annotations and equips our model with the ability to quantify uncertainty. Extensive evaluations with two public laparoscopic datasets demonstrated the efficacy of our method, which consistently outperformed existing methods. Furthermore, our method was adapted for scribble-supervised cardiac multi-structure segmentation, presenting competitive performance compared to previous methods. The code is available at https://github.com/MoriLabNU/Bayesian_WSS.	翻訳日:2024-10-30 23:34:54 公開日:2024-10-11
# コヒーレンス変動からみた量子速度限界 Quantum Speed Limit in Terms of Coherence Variations ( http://arxiv.org/abs/2410.08514v1 ) ライセンス: Link先を確認	Zi-yi Mai, CHang-shui Yu,	(参考訳) コヒーレンス(Coherence)は、量子情報処理における最も基本的な量子資源である。物理的システムがいかに早くコヒーレンスやデコヒーレンスを得るかは重要な要素である。本稿では,動的プロセスによる量子コヒーレンスの変化に基づいて,達成可能な量子速度制限を提案する。これは、2次元の量子状態に対して、それに対応するダイナミクスが必ず見つけ、測地線に沿ってあるコヒーレンスな変化で他の状態へと進化することを示している。応用として、縮退力学と散逸力学のコヒーレンス量子速度限界について検討する。劣化力学は我々のコヒーレンス量子速度限界を飽和させ、同じ人口を持つ状態のデコヒーレンスは他のものよりも速くなることが示されている。しかし、散逸力学は反対の振る舞いを持つ。さらに、上述のダイナミクスに対する境界のより強い厳密さを比較によって示す。 Coherence is the most fundamental quantum resource in quantum information processing. How fast a physical system gets coherence or decoherence is a critical ingredient. We present an attainable quantum speed limit based on the variation of quantum coherence subject to a dynamical process. It indicates that for a 2-dimensional quantum state, one can always find corresponding dynamics driving it to evolve along the geodesic to another state with certain coherence variation. As applications, we study the coherence quantum speed limits of the dephasing and dissipative dynamics. It is shown that the dephasing dynamics can saturate our coherence quantum speed limit, and the decoherence of the state with identical populations will be faster than others. However, the dissipative dynamics have the opposite behavior. In addition, we illustrate a stronger tightness of our bound for the mentioned dynamics by comparison.	翻訳日:2024-10-30 23:24:45 公開日:2024-10-11
# WasmWalker: WebAssemblyプログラム分析を改善するパスベースのコード表現 WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis ( http://arxiv.org/abs/2410.08517v1 ) ライセンス: Link先を確認	Mohammad Robati Shirzad, Patrick Lam,	(参考訳) WebAssembly(Wasm)は、Webブラウザでほぼネイティブなコードの実行を可能にする低レベルのバイナリ言語である。 Wasmは、ゲーム、オーディオ、ビデオ処理、クラウドコンピューティングなどのアプリケーションで有用であることが証明され、Web開発におけるJavaScriptのハイパフォーマンスで低オーバーヘッドな代替手段を提供する。すべての主要なブラウザがWebAssemblyを迅速かつ広く採用していることにより、この新しいテクノロジをサポートする分析ツールが誕生した。ディープラーニングプログラム分析モデルは、AST(Abstract Syntax Tree)対応のコード表現に含まれるプログラム構造情報から大きな恩恵を受けることができる。このようなコード表現を得るために、Ubuntu 18.04リポジトリのソースパッケージからコンパイルされたWebAssemblyバイナリファイルの大規模なデータセットのWebAssembly TextフォーマットでASTパスを実証分析した。収集したパスを精査した結果、これらのバイナリファイルに3,352のユニークなパスしか現れていないことがわかった。この知見により、WebAssemblyバイナリ用の2つの新しいコード表現を提案する。これらの新しい表現は、固定サイズのコード埋め込みを生成するだけでなく、シーケンス・ツー・シーケンス・モデルに追加情報を提供するのに役立つ。最終的に、我々のアプローチは、プログラム分析モデルがWasmバイナリから新しい性質を明らかにするのに役立つ。 2つのアプリケーションで新しいコード表現を評価しました。 (i)メソッド名予測及び方法 (ii)正確な戻り型を復元する。本研究は,従来の手法よりも新しい手法が優れていることを示すものである。具体的には,従来の最先端技術であるSnowWhiteと比較して,メソッド名予測におけるTop-1(Top-5)の精度が5.36%(11.31%)向上し,精度が8.02%(7.92%)向上した。 WebAssembly, or Wasm, is a low-level binary language that enables execution of near-native-performance code in web browsers. Wasm has proven to be useful in applications including gaming, audio and video processing, and cloud computing, providing a high-performance, low-overhead alternative to JavaScript in web development. The fast and widespread adoption of WebAssembly by all major browsers has created an opportunity for analysis tools that support this new technology. Deep learning program analysis models can greatly benefit from the program structure information included in Abstract Syntax Tree (AST)-aware code representations. To obtain such code representations, we performed an empirical analysis on the AST paths in the WebAssembly Text format of a large dataset of WebAssembly binary files compiled from source packages in the Ubuntu 18.04 repositories. After refining the collected paths, we discovered that only 3,352 unique paths appeared across all of these binary files. With this insight, we propose two novel code representations for WebAssembly binaries. These novel representations serve not only to generate fixed-size code embeddings but also to supply additional information to sequence-to-sequence models. Ultimately, our approach helps program analysis models uncover new properties from Wasm binaries, expanding our understanding of their potential. We evaluated our new code representation on two applications: (i) method name prediction and (ii) recovering precise return types. Our results demonstrate the superiority of our novel technique over previous methods. More specifically, our new method resulted in 5.36% (11.31%) improvement in Top-1 (Top-5) accuracy in method name prediction and 8.02% (7.92%) improvement in recovering precise return types, compared to the previous state-of-the-art technique, SnowWhite.	翻訳日:2024-10-30 23:24:45 公開日:2024-10-11
# ハイブリッド変圧器モデルと意味的フィルタリング手法による法的エンティティ認識の改善 Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach ( http://arxiv.org/abs/2410.08521v1 ) ライセンス: Link先を確認	Duraimurugan Rajamanickam,	(参考訳) 法的エンティティ認識(LER)は、契約分析、コンプライアンス監視、訴訟支援などの法的ワークフローを自動化する上で重要である。ルールベースのシステムや古典的な機械学習モデルを含む既存のアプローチは、特にあいまいさやネストされたエンティティ構造を扱う際に、法的文書やドメイン特異性の複雑さに悩まされている。本稿では,意味的類似性に基づくフィルタリング機構を導入することで,法律テキスト処理用に微調整されたトランスフォーマモデルである Legal-BERT の精度と精度を向上させる新しいハイブリッドモデルを提案する。 15,000の注釈付き法律文書のデータセット上で、F1スコア93.4%を達成し、従来の手法よりも精度とリコールが大幅に向上したことを示す。 Legal Entity Recognition (LER) is critical in automating legal workflows such as contract analysis, compliance monitoring, and litigation support. Existing approaches, including rule-based systems and classical machine learning models, struggle with the complexity of legal documents and domain specificity, particularly in handling ambiguities and nested entity structures. This paper proposes a novel hybrid model that enhances the accuracy and precision of Legal-BERT, a transformer model fine-tuned for legal text processing, by introducing a semantic similarity-based filtering mechanism. We evaluate the model on a dataset of 15,000 annotated legal documents, achieving an F1 score of 93.4%, demonstrating significant improvements in precision and recall over previous methods.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# グラフ畳み込みニューラルネットワークによるリンクレベル自転車容積推定におけるデータ空間の影響評価 Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach ( http://arxiv.org/abs/2410.08522v1 ) ライセンス: Link先を確認	Mohit Gupta, Debjit Bhowmick, Meead Saberi, Shirui Pan, Ben Beck,	(参考訳) 自転車の正確な体積推定は、将来の自転車インフラへの投資に関する情報決定に不可欠である。従来のリンクレベルの容積推定モデルは、交通のモーター化に有効であるが、スパースデータと自転車の移動パターンの複雑な性質のために、自転車のコンテキストに適用した場合、重大な課題に直面している。我々の知る限り、リンクレベルの自転車の体積をモデル化するために、グラフ畳み込みネットワーク(GCN)アーキテクチャを利用する最初の研究を示す。オーストラリア,メルボルン市全体での年間平均自転車数(AADB)を,Strava Metro の自転車数データを用いて推定した。 GCNモデルの有効性を評価するため、線形回帰、サポートベクトルマシン、ランダムフォレストといった従来の機械学習モデルと比較した。以上の結果から,GCNモデルはAADB数予測において従来のモデルよりも優れた性能を示し,自転車交通データに固有の空間依存性を捉える能力を示した。さらに、GCNアーキテクチャの性能に様々なレベルのデータ空間がどう影響するかについても検討する。 GCNアーキテクチャは、最大80%のスパーシリティレベルで良好に機能するが、その制限は、データのスパーシリティがさらに増加するにつれて明らかになり、自転車の体積推定における極端なデータスパーシリティの処理に関するさらなる研究の必要性を強調している。本研究は, 自転車インフラの整備と持続可能な交通の促進をめざして, 都市計画者にとって貴重な知見を提供する。 Accurate bicycling volume estimation is crucial for making informed decisions about future investments in bicycling infrastructure. Traditional link-level volume estimation models are effective for motorised traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes. We estimate the Annual Average Daily Bicycle (AADB) counts across the City of Melbourne, Australia using Strava Metro bicycling count data. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model performs better than these traditional models in predicting AADB counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic data. We further investigate how varying levels of data sparsity affect performance of the GCN architecture. The GCN architecture performs well and better up to 80% sparsity level, but its limitations become apparent as the data sparsity increases further, emphasizing the need for further research on handling extreme data sparsity in bicycling volume estimation. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# IGNN-Solver: 暗黙のグラフニューラルネットワークのためのグラフニューラルネットワーク IGNN-Solver: A Graph Neural Solver for Implicit Graph Neural Networks ( http://arxiv.org/abs/2410.08524v1 ) ライセンス: Link先を確認	Junchao Lin, Zenan Ling, Zhanbo Feng, Feng Zhou, Jingwen Xu, Robert C Qiu,	(参考訳) 単一層で強い表現力を示すインプリシットグラフニューラルネットワーク(IGNN)は,近年,過度なスムーシング問題を効果的に軽減しつつ,基礎となるグラフの長距離依存性(LRD)を捕捉する際,顕著な性能を示した。しかし、IGNNは計算コストのかかる固定点反復に依存するため、大幅なスピードとスケーラビリティの制限が生じ、大規模グラフへの応用が妨げられる。 IGNNの高速な固定点解法を実現するために,一般化されたAnderson Acceleration法を利用した新しいグラフニューラルネットワークIGNN-Solverを提案し,グラフ依存時間プロセスとして繰り返し更新を学習する。大規模な実験では、IGNN-Solverは推論を著しく加速し、精度を犠牲にすることなく1.5\times$から8\times$スピードアップを達成する。さらに、グラフの規模が大きくなるにつれて、この利点はますます顕著になり、現実世界のアプリケーションに大規模に展開する上で役立ちます。 Implicit graph neural networks (IGNNs), which exhibit strong expressive power with a single layer, have recently demonstrated remarkable performance in capturing long-range dependencies (LRD) in underlying graphs while effectively mitigating the over-smoothing problem. However, IGNNs rely on computationally expensive fixed-point iterations, which lead to significant speed and scalability limitations, hindering their application to large-scale graphs. To achieve fast fixed-point solving for IGNNs, we propose a novel graph neural solver, IGNN-Solver, which leverages the generalized Anderson Acceleration method, parameterized by a small GNN, and learns iterative updates as a graph-dependent temporal process. Extensive experiments demonstrate that the IGNN-Solver significantly accelerates inference, achieving a $1.5\times$ to $8\times$ speedup without sacrificing accuracy. Moreover, this advantage becomes increasingly pronounced as the graph scale grows, facilitating its large-scale deployment in real-world applications.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# I Am the One and Only, Your Cyber BFF: GenAIの影響を理解するには人類型AIの影響を理解する必要がある "I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI ( http://arxiv.org/abs/2410.08526v1 ) ライセンス: Link先を確認	Myra Cheng, Alicia DeVrio, Lisa Egede, Su Lin Blodgett, Alexandra Olteanu,	(参考訳) 最先端のジェネレーティブAI(GenAI)システムの多くは、人為的行動、すなわち人間に類似したアウトプットを生成する傾向が増している。このことが、人類型AIシステムのようなネガティブな影響の可能性への懸念を増大させているが、AI開発、展開、使用における人為的同型は、見過ごされ、調査され、未特定のままである。この観点からは、人為的AIの社会的影響をマッピングすることなく、生成AIの社会的影響を徹底的にマッピングすることはできないと論じ、行動への呼びかけを概説する。 Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that are perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. In this perspective, we argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI, and outline a call to action.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# LLMにおける下流性能予測のためのスケーリング法則 Scaling Laws for Predicting Downstream Performance in LLMs ( http://arxiv.org/abs/2410.08527v1 ) ライセンス: Link先を確認	Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji,	(参考訳) 学習前の大規模言語モデル(LLM)の下流性能の正確な推定は,開発プロセスの指導に不可欠である。スケーリング法則解析は、ターゲットLLMの性能を予測するために、かなり小さなサンプリング言語モデル(LM)の統計を利用する。下流のパフォーマンス予測にとって重要な課題は、タスク固有の計算しきい値を超えて発生するLLMの創発的能力にある。本研究では,性能評価のための計算効率の高い指標として,事前学習損失に着目した。我々の2段階のアプローチは、まず一連のサンプリングモデルを用いて計算資源(例えばFLOP)を事前学習損失にマッピングする関数を推定し、続いてクリティカルな「創発的なフェーズ」の後、トレーニング前の損失を下流タスクのパフォーマンスにマッピングする。予備実験では、7Bパラメータと13BパラメータのLCMの性能を3BまでのサンプリングLMを用いて正確に予測し、それぞれ5%と10%の誤差マージンを達成し、FLOPs-to-Performanceアプローチを著しく上回った。これは、事前トレーニング中に複数のソースからのデータセットを統合する必要性に対処する、パフォーマンス予測の基本的なアプローチであるFLP-Mを動機付けている。 FLP-Mは、データソース間のFLOPに基づいて、ドメイン固有の事前トレーニング損失を予測するために、電力法解析関数を拡張し、複数のドメイン固有の損失と下流のパフォーマンスの間の非線形関係をモデル化するために、2層ニューラルネットワークを使用する。 FLP-Mは、特定の比率とより小さなサンプリング用LMを訓練した3B LLMを利用することで、多くのベンチマークで10%の誤差マージンで3Bおよび7B LLMの性能を効果的に予測できる。 Precise estimation of downstream performance in large language models (LLMs) prior to training is essential for guiding their development process. Scaling laws analysis utilizes the statistics of a series of significantly smaller sampling language models (LMs) to predict the performance of the target LLM. For downstream performance prediction, the critical challenge lies in the emergent abilities in LLMs that occur beyond task-specific computational thresholds. In this work, we focus on the pre-training loss as a more computation-efficient metric for performance estimation. Our two-stage approach consists of first estimating a function that maps computational resources (e.g., FLOPs) to the pre-training Loss using a series of sampling models, followed by mapping the pre-training loss to downstream task Performance after the critical "emergent phase". In preliminary experiments, this FLP solution accurately predicts the performance of LLMs with 7B and 13B parameters using a series of sampling LMs up to 3B, achieving error margins of 5% and 10%, respectively, and significantly outperforming the FLOPs-to-Performance approach. This motivates FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training, specifically blending general corpora with code data to accurately represent the common necessity. FLP-M extends the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources, and employs a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance. By utilizing a 3B LLM trained on a specific ratio and a series of smaller sampling LMs, FLP-M can effectively forecast the performance of 3B and 7B LLMs across various data mixtures for most benchmarks within 10% error margins.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# VOVTrack: オープン語彙オブジェクト追跡のためのビデオの可能性を探る VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking ( http://arxiv.org/abs/2410.08529v1 ) ライセンス: Link先を確認	Zekun Qian, Ruize Han, Junhui Hou, Linqi Song, Wei Feng,	(参考訳) Open-vocabulary Multi-Object Tracking (OVMOT)は、ビデオにおける多様なオブジェクトカテゴリの検出と追跡に関わる重要な新しい課題であり、目に見えないカテゴリ(ベースクラス)と見えないカテゴリ(ノーベルクラス)の両方を含んでいる。この問題は、OVD(Open-vocabulary Object Detection)とMOT(Multi-object Tracking)の複雑さと相容れない。 OVMOT の既存のアプローチは、OVD と MOT の方法論を別個のモジュールとして統合することが多く、主に画像中心のレンズによる問題に焦点を当てている。本稿では,この課題に対処するため,ビデオオブジェクト追跡の観点からMOTおよびビデオ中心トレーニングに関連するオブジェクト状態を統合する新しい手法VOVTrackを提案する。まず、追跡中の物体の追跡関連状態を考察し、時間変化物体のより正確な位置決めと分類(検出)のための新しい注意機構を提案する。その後,自己教師型オブジェクト類似性学習手法を定式化し,時間的オブジェクト関連(追跡)を容易にすることによって,アノテーションを使わずに生のビデオデータを活用する。実験の結果、VOVTrackは既存の手法よりも優れており、オープン語彙追跡タスクの最先端ソリューションとして確立されている。 Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel classes). This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT). Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens. In this paper, we propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint. First, we consider the tracking-related state of the objects during tracking and propose a new prompt-guided attention mechanism for more accurate localization and classification (detection) of the time-varying objects. Subsequently, we leverage raw video data without annotations for training by formulating a self-supervised object similarity learning technique to facilitate temporal object association (tracking). Experimental results underscore that VOVTrack outperforms existing methods, establishing itself as a state-of-the-art solution for open-vocabulary tracking task.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# Ego3DT:エゴ中心のビデオですべての3Dオブジェクトを追跡する Ego3DT: Tracking Every 3D Object in Ego-centric Videos ( http://arxiv.org/abs/2410.08530v1 ) ライセンス: Link先を確認	Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang,	(参考訳) インテリジェンスへの関心の高まりは、現代の研究にエゴ中心の視点をもたらした。この領域における重要な課題の1つは、エゴ中心のビデオにおける物体の正確な位置決めと追跡である。本稿では,エゴ中心映像からの物体の3次元再構成と追跡のための新しいゼロショット手法を提案する。 Ego3DTは,エゴ環境内のオブジェクトの検出とセグメンテーション情報を最初に識別し,抽出する新しいフレームワークである。 Ego3DTは、隣接するビデオフレームからの情報を利用して、事前に訓練された3Dシーン再構成モデルを用いて、エゴビューの3Dシーンを動的に構築する。さらに,エゴ中心ビデオにおける物体の3次元追跡軌道を安定的に作成するための動的階層化機構を革新した。さらに,本手法の有効性は, HOTAの1.04x - 2.90xの2つの新たにコンパイルされたデータセットに対して, 多様なエゴ中心のシナリオにおいて, 提案手法の堅牢性と精度を示す広範な実験によって裏付けられている。 The growing interest in embodied intelligence has brought ego-centric perspectives to contemporary research. One significant challenge within this realm is the accurate localization and tracking of objects in ego-centric videos, primarily due to the substantial variability in viewing angles. Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video. We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment. Utilizing information from adjacent video frames, Ego3DT dynamically constructs a 3D scene of the ego view using a pre-trained 3D scene reconstruction model. Additionally, we have innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos. Moreover, the efficacy of our approach is corroborated by extensive experiments on two newly compiled datasets, with 1.04x - 2.90x in HOTA, showcasing the robustness and accuracy of our method in diverse ego-centric scenarios.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# 画像生成に視覚的優先順位を必要とする拡散モデル Diffusion Models Need Visual Priors for Image Generation ( http://arxiv.org/abs/2410.08531v1 ) ライセンス: Link先を確認	Xiaoyu Yue, Zidong Wang, Zeyu Lu, Shuyang Sun, Meng Wei, Wanli Ouyang, Lei Bai, Luping Zhou,	(参考訳) 従来のクラス誘導拡散モデルは一般的に正しいセマンティックな内容の画像を生成できるが、テクスチャの詳細に苦しむことが多い。この制限は、粗い条件情報のみを提供するクラス事前の使用に起因している。この問題に対処するために,Diffusion on Diffusion (DoD)を提案する。Diffusion on Diffusionは,以前に生成したサンプルから視覚的先行情報を抽出し,拡散サンプリングの初期段階から視覚的先行情報を活用する拡散モデルについて,豊富なガイダンスを提供する。具体的には,各段階における条件付きサンプルから冗長な詳細情報を排除し,ガイダンスのセマンティック情報のみを保持する圧縮再構成手法を用いた潜伏埋め込みモジュールを提案する。我々は、人気のあるImageNet-$256 \times 256$データセット上でDoDを評価し、7$\times$トレーニングコストをSiTやDiTと比較して削減し、FID-50Kスコアよりもパフォーマンスが向上した。私たちの最大のモデルであるDoD-XLは、FID-50Kスコアが1.83で、100万のトレーニングステップしか達成していません。 Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and limited conditional information. To address this issue, we propose Diffusion on Diffusion (DoD), an innovative multi-stage generation framework that first extracts visual priors from previously generated samples, then provides rich guidance for the diffusion model leveraging visual priors from the early stages of diffusion sampling. Specifically, we introduce a latent embedding module that employs a compression-reconstruction approach to discard redundant detail information from the conditional samples in each stage, retaining only the semantic information for guidance. We evaluate DoD on the popular ImageNet-$256 \times 256$ dataset, reducing 7$\times$ training cost compared to SiT and DiT with even better performance in terms of the FID-50K score. Our largest model DoD-XL achieves an FID-50K score of 1.83 with only 1 million training steps, which surpasses other state-of-the-art methods without bells and whistles during inference.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# 複数の情報源からの観測データを用いたロバストオフライン政策学習 Robust Offline Policy Learning with Observational Data from Multiple Sources ( http://arxiv.org/abs/2410.08537v1 ) ライセンス: Link先を確認	Aldo Gael Carranza, Susan Athey,	(参考訳) 複数の異種データソースからの観測帯域フィードバックデータを用いて、多様なターゲット設定を安定的に一般化するパーソナライズされた決定ポリシーを学習する。そこで本研究では,ソース分布の一般混合条件下での一様に低い後悔度を確保するために,最小限の後悔度最適化手法を提案する。我々は,この目的に合わせたポリシー学習アルゴリズムを開発し,2つの頑健なオフラインポリシー評価手法と,最小限の最適化のための非回帰学習アルゴリズムを組み合わせた。我々の後悔分析は、この手法が全ソースにわたる全データの適度な消滅率まで、最小限の最悪の混合を達成していることを示している。分析,拡張,実験結果は,複数のデータソースから堅牢な意思決定ポリシーを学習する上で,このアプローチの利点を示すものである。 We consider the problem of using observational bandit feedback data from multiple heterogeneous data sources to learn a personalized decision policy that robustly generalizes across diverse target settings. To achieve this, we propose a minimax regret optimization objective to ensure uniformly low regret under general mixtures of the source distributions. We develop a policy learning algorithm tailored to this objective, combining doubly robust offline policy evaluation techniques and no-regret learning algorithms for minimax optimization. Our regret analysis shows that this approach achieves the minimal worst-case mixture regret up to a moderated vanishing rate of the total data across all sources. Our analysis, extensions, and experimental results demonstrate the benefits of this approach for learning robust decision policies from multiple data sources.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# Kaleidoscope: 不均一なマルチエージェント強化学習のための学習可能なマスク Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2410.08540v1 ) ライセンス: Link先を確認	Xinran Li, Ling Pan, Jun Zhang,	(参考訳) マルチエージェント強化学習(MARL)では、パラメータ共有がサンプリング効率を高めるために一般的に用いられる。しかし、完全なパラメータ共有の一般的なアプローチは、しばしばエージェント間の均質なポリシーをもたらし、ポリシーの多様性から得られるパフォーマンス上の利点を制限する可能性がある。この限界に対処するために、我々は高サンプル効率を維持しながら政策の不均一性を育む新しい適応型部分パラメータ共有スキームである 'emph{Kaleidoscope} を導入する。具体的には、Kaleidoscopeは、異なるエージェントに対して複数の異なる学習可能なマスクのセットとともに共通のパラメータのセットを維持し、パラメータの共有を規定している。パラメータ共有の効率を犠牲にすることなく、これらのマスク間の相違を促進することで、ポリシーネットワーク間の多様性を促進する。この設計により、カレイドスコープは広いポリシー表現能力で高効率を動的にバランスさせ、様々な環境における全パラメータ共有と非パラメータ共有のギャップを効果的に埋めることができる。我々はさらに、Keleidoscopeを、価値評価の改善に役立つアクタークリティカルアルゴリズムの文脈におけるアンサンブルを批判するために拡張し、マルチエージェント粒子環境、マルチエージェントMuJoCo、スタークラフトマルチエージェントチャレンジv2を含む広範囲な環境における実験的な評価を行い、既存のパラメータ共有アプローチと比較してKaleidoscopeの優れた性能を示し、MARLの性能向上の可能性を示している。コードは \url{https://github.com/LXXXXR/Kaleidoscope} で公開されている。 In multi-agent reinforcement learning (MARL), parameter sharing is commonly employed to enhance sample efficiency. However, the popular approach of full parameter sharing often leads to homogeneous policies among agents, potentially limiting the performance benefits that could be derived from policy diversity. To address this critical limitation, we introduce \emph{Kaleidoscope}, a novel adaptive partial parameter sharing scheme that fosters policy heterogeneity while still maintaining high sample efficiency. Specifically, Kaleidoscope maintains one set of common parameters alongside multiple sets of distinct, learnable masks for different agents, dictating the sharing of parameters. It promotes diversity among policy networks by encouraging discrepancy among these masks, without sacrificing the efficiencies of parameter sharing. This design allows Kaleidoscope to dynamically balance high sample efficiency with a broad policy representational capacity, effectively bridging the gap between full parameter sharing and non-parameter sharing across various environments. We further extend Kaleidoscope to critic ensembles in the context of actor-critic algorithms, which could help improve value estimations.Our empirical evaluations across extensive environments, including multi-agent particle environment, multi-agent MuJoCo and StarCraft multi-agent challenge v2, demonstrate the superior performance of Kaleidoscope compared with existing parameter sharing approaches, showcasing its potential for performance enhancement in MARL. The code is publicly available at \url{https://github.com/LXXXXR/Kaleidoscope}.	翻訳日:2024-10-30 23:24:44 公開日:2024-10-11
# AIにおける人間性: 大規模言語モデルの個性を検出する Humanity in AI: Detecting the Personality of Large Language Models ( http://arxiv.org/abs/2410.08545v1 ) ライセンス: Link先を確認	Baohua Zhan, Yongyi Huang, Wenyao Cui, Huaping Zhang, Jianyun Shang,	(参考訳) アンケートは,Large Language Models (LLMs) の個性を検出する一般的な方法である。しかしながら、その信頼性は幻覚(LLMが不正確または無関係に反応する)と、提示されたオプションの順序に対する応答の感度の2つの主要な問題によってしばしば損なわれる。これらの課題に対処するために,テキストマイニングとアンケート手法を組み合わせることを提案する。テキストマイニングは、オプションの順序に影響されることなく、LSMの反応から心理的特徴を抽出することができる。さらに, 本手法は特定の回答に依存しないため, 幻覚の影響を低減させる。両手法のスコアの正規化とルート平均二乗誤差の計算により,本手法の有効性を検証した。 LLMの性格特性の起源をさらに解明するために, BERT や GPT などの事前学習言語モデル (PLM) と ChatGPT のような会話モデル (ChatLLM) の両方で実験を行った。その結果,LLMには特定の個性があることが明らかとなった。例えば,ChatGPTとChatGLMは「良心」の性格特性を示す。さらに, LLMの個人性は, 事前学習したデータから導かれることがわかった。 ChatLLMを訓練するために使用される命令データは、個性を含むデータの生成を高め、その隠された個性を公開することができる。結果と人間の平均的性格スコアを比較し,ChatLLMsにおけるPLMにおけるFLAN-T5とChatGPTの性格は,それぞれ0.34と0.22のスコア差で人間と類似していることがわかった。 Questionnaires are a common method for detecting the personality of Large Language Models (LLMs). However, their reliability is often compromised by two main issues: hallucinations (where LLMs produce inaccurate or irrelevant responses) and the sensitivity of responses to the order of the presented options. To address these issues, we propose combining text mining with questionnaires method. Text mining can extract psychological features from the LLMs' responses without being affected by the order of options. Furthermore, because this method does not rely on specific answers, it reduces the influence of hallucinations. By normalizing the scores from both methods and calculating the root mean square error, our experiment results confirm the effectiveness of this approach. To further investigate the origins of personality traits in LLMs, we conduct experiments on both pre-trained language models (PLMs), such as BERT and GPT, as well as conversational models (ChatLLMs), such as ChatGPT. The results show that LLMs do contain certain personalities, for example, ChatGPT and ChatGLM exhibit the personality traits of 'Conscientiousness'. Additionally, we find that the personalities of LLMs are derived from their pre-trained data. The instruction data used to train ChatLLMs can enhance the generation of data containing personalities and expose their hidden personality. We compare the results with the human average personality score, and we find that the personality of FLAN-T5 in PLMs and ChatGPT in ChatLLMs is more similar to that of a human, with score differences of 0.34 and 0.22, respectively.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 反射多重エントロピーとそのホログラフィック双対 Reflected multi-entropy and its holographic dual ( http://arxiv.org/abs/2410.08546v1 ) ライセンス: Link先を確認	Ma-Ke Yuan, Mingyi Li, Yang Zhou,	(参考訳) 反射多エントロピーと呼ばれる正準浄化による多エントロピーの混合状態一般化を導入する。我々はこの測度のホログラフィック双対を提案する。三部式の場合、大きめの$c$制限でツイスト作用素の6点関数を用いて場理論計算を行う。零温度と有限温度の両方において、場の理論的な結果はホログラフィックの結果と正確に一致し、この新しい測度に対するホログラフィック予想を支持した。 We introduce a mixed-state generalization of the multi-entropy through the canonical purification, which we called reflected multi-entropy. We propose the holographic dual of this measure. For the tripartite case, a field-theoretical calculation is performed using a six-point function of twist operators at large $c$ limit. At both zero and finite temperature, the field-theoretical results match the holographic results exactly, supporting our holographic conjecture of this new measure.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 量子状態群アクション Quantum State Group Actions ( http://arxiv.org/abs/2410.08547v1 ) ライセンス: Link先を確認	Saachi Mutreja, Mark Zhandry,	(参考訳) 暗号グループのアクションは、量子後暗号の主要な候補であり、量子暗号プロトコルの開発にも使われてきた。本研究では、量子状態の集合に作用する群からなる量子状態群作用について検討する。 1) ある設定では、統計的(クエリ境界付きでさえも)セキュリティは、量子後古典群アクションと類似して不可能である。 2) 量子状態群の動作を構築し, 暗号学者によって提案された多くの計算問題がそれを保持することを証明した。構成によっては、我々の証明は無条件であり、LWEに依存しているか、量子ランダムオラクルモデルに依存している。我々の分析は古典的な集団行動に直接当てはまらないが、暗号学者による量子後仮定に明らかな欠陥はないという、少なくとも正当性チェックを与えると論じている。 3 我々の量子状態群アクションは、グループアクションに基づくものと非折り畳みハッシュに基づくものという、2つの既存の量子マネースキームを統一することができる。また、古典的および量子的鍵分布を統一する方法についても説明する。 Cryptographic group actions are a leading contender for post-quantum cryptography, and have also been used in the development of quantum cryptographic protocols. In this work, we explore quantum state group actions, which consist of a group acting on a set of quantum states. We show the following results: 1. In certain settings, statistical (even query bounded) security is impossible, analogously to post-quantum classical group actions. 2. We construct quantum state group actions and prove that many computational problems that have been proposed by cryptographers hold it. Depending on the construction, our proofs are either unconditional, rely on LWE, or rely on the quantum random oracle model. While our analysis does not directly apply to classical group actions, we argue it gives at least a sanity check that there are no obvious flaws in the post-quantum assumptions made by cryptographers. 3. Our quantum state group action allows for unifying two existing quantum money schemes: those based on group actions, and those based on non-collapsing hashes. We also explain how they can unify classical and quantum key distribution.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# イノベーションとプライバシのバランスをとる - 自然言語処理アプリケーションにおけるデータセキュリティ戦略 Balancing Innovation and Privacy: Data Security Strategies in Natural Language Processing Applications ( http://arxiv.org/abs/2410.08553v1 ) ライセンス: Link先を確認	Shaobo Liu, Guiran Liu, Binrong Zhu, Yuanshuai Luo, Linxiao Wu, Rui Wang,	(参考訳) 本研究は,チャットボット,感情分析,機械翻訳などの共通アプリケーションにおけるユーザデータの保護を目的とした,差分プライバシーに基づく新しいアルゴリズムを導入することにより,自然言語処理(NLP)におけるプライバシ保護に対処する。 NLP技術の普及により、ユーザデータのセキュリティとプライバシ保護が重要な問題となり、緊急に解決する必要がある。本稿では,ユーザの機密情報の漏洩を効果的に防止する新しいプライバシ保護アルゴリズムを提案する。差分プライバシー機構を導入することにより、ランダムノイズを付加しながらデータ解析結果の精度と信頼性を確保することができる。この方法は,データ漏洩によるリスクを軽減するだけでなく,ユーザのプライバシ保護を図りながら,効率的なデータ処理を実現する。データ匿名化や同型暗号化といった従来のプライバシ手法と比較して,本手法はデータ解析の精度を維持しつつ,計算効率とスケーラビリティの面で大きな利点をもたらす。提案アルゴリズムの有効性は、精度(0.89)、精度(0.85)、リコール(0.88)などの性能指標で示され、プライバシーとユーティリティのバランスをとる他の方法よりも優れている。プライバシー保護規制がますます厳しくなっているため、企業や開発者は、プライバシーリスクに対処するための効果的な措置を講じなければならない。我々の研究は、NLP分野におけるプライバシ保護技術の応用に重要な参考を提供し、技術革新とユーザプライバシのバランスを達成する必要性を強調している。将来的には、テクノロジの継続的な進歩により、プライバシ保護はデータ駆動アプリケーションの中核的な要素となり、業界全体の健全な開発を促進するでしょう。 This research addresses privacy protection in Natural Language Processing (NLP) by introducing a novel algorithm based on differential privacy, aimed at safeguarding user data in common applications such as chatbots, sentiment analysis, and machine translation. With the widespread application of NLP technology, the security and privacy protection of user data have become important issues that need to be solved urgently. This paper proposes a new privacy protection algorithm designed to effectively prevent the leakage of user sensitive information. By introducing a differential privacy mechanism, our model ensures the accuracy and reliability of data analysis results while adding random noise. This method not only reduces the risk caused by data leakage but also achieves effective processing of data while protecting user privacy. Compared to traditional privacy methods like data anonymization and homomorphic encryption, our approach offers significant advantages in terms of computational efficiency and scalability while maintaining high accuracy in data analysis. The proposed algorithm's efficacy is demonstrated through performance metrics such as accuracy (0.89), precision (0.85), and recall (0.88), outperforming other methods in balancing privacy and utility. As privacy protection regulations become increasingly stringent, enterprises and developers must take effective measures to deal with privacy risks. Our research provides an important reference for the application of privacy protection technology in the field of NLP, emphasizing the need to achieve a balance between technological innovation and user privacy. In the future, with the continuous advancement of technology, privacy protection will become a core element of data-driven applications and promote the healthy development of the entire industry.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 高齢者の安全, プライバシを重視し, アクセシブルな電子支払アプリケーションの設計 Design of Secure, Privacy-focused, and Accessible E-Payment Applications for Older Adults ( http://arxiv.org/abs/2410.08555v1 ) ライセンス: Link先を確認	Sanchari Das,	(参考訳) 電子決済は、今日のデジタル経済においてトランザクションの利便性に不可欠であり、セキュリティ、プライバシ、ユーザビリティの向上の必要性を強調しながら、高齢者にとってますます重要になっている。そこで本稿では,65歳以上の高齢者400名を対象に,多要素認証(MFA)やQRコードによる受取者追加などの機能を備えた電子支払モバイルアプリケーションの高忠実度プロトタイプの評価を行った。以上の結果から, この人口動態の具体的なニーズを満たすために, アプリケーション用に調整した \b{eta} バージョンを開発した。特に、参加者の約91%は、専門家が推奨するMFAに比べて、従来の知識ベースとシングルモード認証を好んだ。我々は,高齢者のセキュリティ,プライバシ,ユーザビリティ要件に対処する包括的電子支払いソリューションの開発を目的としたレコメンデーションを提供することで,結論付けた。 E-payments are essential for transactional convenience in today's digital economy and are becoming increasingly important for older adults, emphasizing the need for enhanced security, privacy, and usability. To address this, we conducted a survey-based study with 400 older adults aged 65 and above to evaluate a high-fidelity prototype of an e-payment mobile application, which included features such as multi-factor authentication (MFA) and QR code-based recipient addition. Based on our findings, we developed a tailored \b{eta} version of the application to meet the specific needs of this demographic. Notably, approximately 91% of participants preferred traditional knowledge-based and single-mode authentication compared to expert-recommended MFA. We concluded by providing recommendations aimed at developing inclusive e-payment solutions that address the security, privacy, and usability requirements of older adults.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# MUSO:Over-Parameterized Regimesでエキサイティングな機械学習を実現する MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes ( http://arxiv.org/abs/2410.08557v1 ) ライセンス: Link先を確認	Ruikai Yang, Mingzhen He, Zhengbao He, Youmei Qiu, Xiaolin Huang,	(参考訳) マシン・アンラーニング(MU)とは、訓練されたモデルを特定のデータでトレーニングされたことがないかのように振る舞うことである。今日の過度パラメータ化モデルでは、ニューラルネットワークが支配する一般的なアプローチは、手動でデータをレバーブルし、よく訓練されたモデルを微調整することである。出力空間のMUモデルを近似することができるが、パラメータ空間のMUを正確に達成できるかどうかという疑問は残る。本稿では、ランダムな特徴技術を用いて分析フレームワークを構築することにより、この問題に答える。確率勾配勾配勾配によるモデル最適化の前提の下では、過パラメータ化線形モデルが特定のデータを許容することで正確なMUを達成できることを理論的に証明した。また、本研究を実世界の非線形ネットワークに拡張し、未学習と再学習のタスクを統一する交互最適化アルゴリズムを提案する。このアルゴリズムの有効性は、数値実験によって確認され、現在の最先端の手法と比較して、様々なシナリオで学習する際の優れた性能、特に類似のラベリングに基づくMUアプローチよりも優れていることを強調している。 Machine unlearning (MU) is to make a well-trained model behave as if it had never been trained on specific data. In today's over-parameterized models, dominated by neural networks, a common approach is to manually relabel data and fine-tune the well-trained model. It can approximate the MU model in the output space, but the question remains whether it can achieve exact MU, i.e., in the parameter space. We answer this question by employing random feature techniques to construct an analytical framework. Under the premise of model optimization via stochastic gradient descent, we theoretically demonstrated that over-parameterized linear models can achieve exact MU through relabeling specific data. We also extend this work to real-world nonlinear networks and propose an alternating optimization algorithm that unifies the tasks of unlearning and relabeling. The algorithm's effectiveness, confirmed through numerical experiments, highlights its superior performance in unlearning across various scenarios compared to current state-of-the-art methods, particularly excelling over similar relabeling-based MU approaches.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 複合埋め込み予測アーキテクチャを用いた12左心電図の一般表現 Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive architecture ( http://arxiv.org/abs/2410.08559v1 ) ライセンス: Link先を確認	Sehun Kim,	(参考訳) 本稿では,ECG-JEPA (Joint Embedding Predictive Architecture) と呼ばれる12誘導心電図解析のための自己教師付き学習手法を提案する。 ECG-JEPAは、ECGデータのセマンティック表現を学ぶためにマスキング戦略を採用している。既存の方法とは異なり、ECG-JEPAは生データを再構築するのではなく、隠された表現レベルで予測する。このアプローチはECG領域にいくつかの利点をもたらす:(1)標準ECGで一般的なノイズのような不要な詳細を発生させないこと、(2)生信号間のna\\ive L2損失の制限に対処すること。もうひとつの重要な貢献は、12リードのECGデータであるCross-Pattern Attention (CroPA)用に調整された、特別なマスク付きアテンションの導入である。 CroPAは、モデルがパッチ間の関係を効果的にキャプチャすることを可能にする。さらに、ECG-JEPAは非常にスケーラブルで、大規模なデータセットの効率的なトレーニングを可能にする。私たちのコードはhttps://github.com/sehunfromdaegu/ECG_JEPAで公開されています。 We propose a self-supervised learning method for 12-lead Electrocardiogram (ECG) analysis, named ECG Joint Embedding Predictive Architecture (ECG-JEPA). ECG-JEPA employs a masking strategy to learn semantic representations of ECG data. Unlike existing methods, ECG-JEPA predicts at the hidden representation level rather than reconstructing raw data. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in standard ECG; and (2) it addresses the limitations of na\"ive L2 loss between raw signals. Another key contribution is the introduction of a special masked attention tailored for 12-lead ECG data, Cross-Pattern Attention (CroPA). CroPA enables the model to effectively capture inter-patch relationships. Additionally, ECG-JEPA is highly scalable, allowing efficient training on large datasets. Our code is openly available https://github.com/sehunfromdaegu/ECG_JEPA.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 民事訴訟の訴訟の原因としての類似のフレーム Similar Phrases for Cause of Actions of Civil Cases ( http://arxiv.org/abs/2410.08564v1 ) ライセンス: Link先を確認	Ho-Chien Huang, Chao-Lin Liu,	(参考訳) 台湾の司法制度では、関連する法的判断を特定するために、行動原因(COA)が不可欠である。しかし、標準化されたCOAラベルの欠如は、基本手法を用いてケースをフィルタリングする際の課題を生んでいる。本研究は, 埋込法とクラスタリング法を利用して, 引用法論文に基づいてCOAの類似性を解析することによってこの問題に対処する。この研究は、ディース係数やピアソンの相関係数など、様々な類似度尺度を実装している。アンサンブルモデルはランキングを組み合わせ、ソーシャルネットワーク分析は関連するCOAのクラスタを特定する。このアプローチは、COA間の不明瞭な関係を明らかにすることで法的な分析を強化し、民事法を超えた法研究に潜在的に応用する可能性がある。 In the Taiwanese judicial system, Cause of Actions (COAs) are essential for identifying relevant legal judgments. However, the lack of standardized COA labeling creates challenges in filtering cases using basic methods. This research addresses this issue by leveraging embedding and clustering techniques to analyze the similarity between COAs based on cited legal articles. The study implements various similarity measures, including Dice coefficient and Pearson's correlation coefficient. An ensemble model combines rankings, and social network analysis identifies clusters of related COAs. This approach enhances legal analysis by revealing inconspicuous connections between COAs, offering potential applications in legal research beyond civil law.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# Baichuan-Omni技術報告 Baichuan-Omni Technical Report ( http://arxiv.org/abs/2410.08565v1 ) ライセンス: Link先を確認	Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang, Bowen Ding, Wei Song, Zhenglin Cheng, Yuqi Huo, Song Chen, Xu Li, Da Pan, Shusen Zhang, Xin Wu, Zheng Liang, Jun Liu, Tao Zhang, Keer Lu, Yaqi Zhao, Yanjun Shen, Fan Yang, Kaicheng Yu, Tao Lin, Jianhua Xu, Zenan Zhou, Weipeng Chen,	(参考訳) GPT-4oの健全なマルチモーダル機能とインタラクティブな体験は、実用アプリケーションにおけるその重要な役割を浮き彫りにしている。本稿では,画像,ビデオ,音声,テキストのモダリティを同時処理・解析できる,オープンソースの7B Multimodal Large Language Model (MLLM) であるBaichuan-Omniを紹介する。本稿では、7Bモデルから始まり、2段階のマルチモーダルアライメントと、オーディオ、画像、ビデオ、テキストモダルをまたいだマルチタスク微調整を行う効果的なマルチモーダルトレーニングスキーマを提案する。このアプローチは、視覚的および音声的データを効果的に扱う能力を備えた言語モデルである。様々なOmni-modalベンチマークとマルチモーダルベンチマークにまたがる強力なパフォーマンスを実証し、この貢献は、マルチモーダル理解とリアルタイムインタラクションの進歩において、オープンソースコミュニティの競争基盤となることを目的としている。 The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-Omni, the first open-source 7B Multimodal Large Language Model (MLLM) adept at concurrently processing and analyzing modalities of image, video, audio, and text, while delivering an advanced multimodal interactive experience and strong performance. We propose an effective multimodal training schema starting with 7B model and proceeding through two stages of multimodal alignment and multitask fine-tuning across audio, image, video, and text modal. This approach equips the language model with the ability to handle visual and audio data effectively. Demonstrating strong performance across various omni-modal and multimodal benchmarks, we aim for this contribution to serve as a competitive baseline for the open-source community in advancing multimodal understanding and real-time interaction.	翻訳日:2024-10-30 23:14:57 公開日:2024-10-11
# 透過的・反射的物体に対する拡散法による深度塗布 Diffusion-Based Depth Inpainting for Transparent and Reflective Objects ( http://arxiv.org/abs/2410.08567v1 ) ライセンス: Link先を確認	Tianyu Sun, Dingchang Hu, Yixiang Dai, Guijin Wang,	(参考訳) 我々の日常生活に共通する透明で反射的な物体は、その独特の視覚的・光学的特性から、3Dイメージング技術に重大な課題をもたらす。この種の物体に直面して、RGB-Dカメラは正確な空間情報で実際の深度を捉えることができない。この問題に対処するために,透過的および反射的オブジェクトに特化して設計された拡散型Depth InpaintingフレームワークであるDITRを提案する。このネットワークは、リージョンプロポーザルステージとディープス・インペインティングステージの2つのステージで構成されている。 DITRは光学的および幾何学的深さ損失を動的に解析し、それらを自動的に塗布する。さらに、総合的な実験結果から、DITRは堅牢な適応性を持つ透明で反射性のある物体の深部塗布作業に極めて効果的であることが示された。 Transparent and reflective objects, which are common in our everyday lives, present a significant challenge to 3D imaging techniques due to their unique visual and optical properties. Faced with these types of objects, RGB-D cameras fail to capture the real depth value with their accurate spatial information. To address this issue, we propose DITR, a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects. This network consists of two stages, including a Region Proposal stage and a Depth Inpainting stage. DITR dynamically analyzes the optical and geometric depth loss and inpaints them automatically. Furthermore, comprehensive experimental results demonstrate that DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# CNNを用いたモデルパラメータと勾配の適応フィルタによるGPRフルウェーブフォームインバージョン GPR Full-Waveform Inversion through Adaptive Filtering of Model Parameters and Gradients Using CNN ( http://arxiv.org/abs/2410.08568v1 ) ライセンス: Link先を確認	Peng Jiang, Kun Wang, Jiaxing Wang, Zeliang Feng, Shengjie Qiao, Runhuai Deng, Fengkai Zhang,	(参考訳) GPRフルウェーブフォームインバージョンは、地下特性モデルを反復的に最適化し、波形情報全体と一致させる。しかし、波動場継続から導かれるモデル勾配は、ゴースト値や送信機や受信機における過大な値などの誤差を含むことが多い。さらに、これらの勾配に基づいて更新されたモデルでは、しばしば異常な体や偽の異常の明確な特徴が示され、正確な逆転結果を得るのが困難である。これらの問題に対処するために、モデルパラメータと勾配を適応的にフィルタリングするために、組み込み畳み込みニューラルネットワーク(CNN)を組み込んだ、新しいフルウェーブフォーム・インバージョン(FWI)フレームワークを導入しました。具体的には、フォワードモデリングプロセスの前にCNNモジュールを組み込んで、FWIプロセス全体を差別化できるようにします。この設計では、ディープラーニングライブラリのオートグレードツールを活用し、前方計算中にモデル値がCNNモジュールを通過し、バックプロパゲーション時にモデル勾配がCNNモジュールを通過することができる。実験により、フォワード計算中にモデルパラメータをフィルタリングし、バックプロパゲーション時にモデル勾配が最終的に高品質な反転結果をもたらすことが示されている。 GPR full-waveform inversion optimizes the subsurface property model iteratively to match the entire waveform information. However, the model gradients derived from wavefield continuation often contain errors, such as ghost values and excessively large values at transmitter and receiver points. Furthermore, models updated based on these gradients frequently exhibit unclear characterization of anomalous bodies or false anomalies, making it challenging to obtain accurate inversion results. To address these issues, we introduced a novel full-waveform inversion (FWI) framework that incorporates an embedded convolutional neural network (CNN) to adaptively filter model parameters and gradients. Specifically, we embedded the CNN module before the forward modeling process and ensured the entire FWI process remains differentiable. This design leverages the auto-grad tool of the deep learning library, allowing model values to pass through the CNN module during forward computation and model gradients to pass through the CNN module during backpropagation. Experiments have shown that filtering the model parameters during forward computation and the model gradients during backpropagation can ultimately yield high-quality inversion results.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 連続変数を用いた量子アニーリングによる線形回帰 Linear Regression Using Quantum Annealing with Continuous Variables ( http://arxiv.org/abs/2410.08569v1 ) ライセンス: Link先を確認	Asuka Koura, Takashi Imoto, Katsuki Ura, Yuichiro Matsuzaki,	(参考訳) 線形回帰はデータ解析手法であり、教師あり学習に分類される。既知のデータを利用することで、未知のデータを予測することができる。近年、量子アニール(QA)を用いて線形回帰を行い、パラメータを二進数を用いて離散値に近似する手法が研究されている。しかし、このアプローチには限界があり、精度を向上させるためにキュービットの数を増やす必要がある。本稿では,連続変数を利用した線形回帰手法を提案する。特に、ボソン系は、離散近似に頼らずに線形回帰の最適化を容易にする。我々の新しいアプローチの大きな利点は、断熱条件が満たされる限り、キュービット数を増やすことなく精度を確保することができることである。 Linear regression is a data analysis technique, which is categorized as supervised learning. By utilizing known data, we can predict unknown data. Recently, researchers have explored the use of quantum annealing (QA) to perform linear regression where parameters are approximated to discrete values using binary numbers. However, this approach has a limitation: we need to increase the number of qubits to improve the accuracy. Here, we propose a novel linear regression method using QA that leverages continuous variables. In particular, the boson system facilitates the optimization of linear regression without resorting to discrete approximations, as it directly manages continuous variables while engaging in QA. The major benefit of our new approach is that it can ensure accuracy without increasing the number of qubits as long as the adiabatic condition is satisfied.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 高ボリュームデータ環境におけるAI駆動型データ品質モニタリングのための理論的枠組み A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments ( http://arxiv.org/abs/2410.08576v1 ) ライセンス: Link先を確認	Nikhil Bangad, Vivekananda Jayaram, Manjunatha Sughaturu Krishnappa, Amey Ram Banarse, Darshan Mohan Bidkar, Akshay Nagpal, Vidyasagar Parlapalli,	(参考訳) 本稿では,高ボリューム環境におけるデータ品質維持の課題に対処するために,AIによるデータ品質監視システムに関する理論的枠組みを提案する。本稿では,ビッグデータのスケール,速度,多様性の管理における従来の手法の限界について検討し,高度な機械学習技術を活用した概念的アプローチを提案する。我々のフレームワークは、リアルタイムでスケーラブルなデータ品質管理のための異常検出、分類、予測分析を組み込んだシステムアーキテクチャの概要を述べる。主なコンポーネントは、インテリジェントデータ取り込み層、適応前処理機構、コンテキスト認識機能抽出、AIベースの品質評価モジュールなどである。継続的学習のパラダイムは私たちのフレームワークの中心であり、進化するデータパターンと品質要件への適応性を保証する。また、既存のデータエコシステム内でのスケーラビリティ、プライバシ、統合といった問題にも対処しています。実際の成果は提供されていないが、将来の研究と実装のための堅牢な理論的基盤を築き、データ品質管理を推進し、動的環境におけるAI駆動ソリューションの探索を奨励している。 This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments. We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques. Our framework outlines a system architecture that incorporates anomaly detection, classification, and predictive analytics for real-time, scalable data quality management. Key components include an intelligent data ingestion layer, adaptive preprocessing mechanisms, context-aware feature extraction, and AI-based quality assessment modules. A continuous learning paradigm is central to our framework, ensuring adaptability to evolving data patterns and quality requirements. We also address implications for scalability, privacy, and integration within existing data ecosystems. While practical results are not provided, it lays a robust theoretical foundation for future research and implementations, advancing data quality management and encouraging the exploration of AI-driven solutions in dynamic environments.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 非拘束的部分モジュラー最大化確率帯域に対する対数回帰法 Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit ( http://arxiv.org/abs/2410.08578v1 ) ライセンス: Link先を確認	Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel,	(参考訳) オンラインの非制約部分モジュラー最大化問題(Online USM)について,確率的帯域幅フィードバックを用いて検討した。この枠組みでは、決定者は、既知の有界区間で値を取る非単調な部分モジュラ函数からノイズの報奨を受ける。本稿では、オフラインおよびオンラインのフル情報設定から、Double-Greedy-Explore-then-Commit(DG-ETC)を適用したDG-ETCを提案する。 DG-ETC は O(d log(dT)) 問題依存上界を 1/2-近似擬回帰に対して満たし、O(dT^{2/3}log(dT)^{1/3}) 問題依存上界を同時に満たし、既存のアプローチより優れている。そのために, 部分モジュラ函数に対する硬さの概念を導入し, この戦略でそれらを最大化することがいかに困難であるかを特徴付ける。 We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a nonmonotone submodular function, taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a O(d log(dT)) problemdependent upper bound for the 1/2-approximate pseudo-regret, as well as a O(dT^{2/3}log(dT)^{1/3}) problem-free one at the same time, outperforming existing approaches. To that end, we introduce a notion of hardness for submodular functions, characterizing how difficult it is to maximize them with this type of strategy.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 翻訳改訂におけるフィードバック強化のためのAIの統合 -学生エンゲージメントの混合手法の検討- Integrating AI for Enhanced Feedback in Translation Revision- A Mixed-Methods Investigation of Student Engagement ( http://arxiv.org/abs/2410.08581v1 ) ライセンス: Link先を確認	Simin Xu, Yanfang Su, Kanglong Liu,	(参考訳) 教育におけるフィードバックの重要性が確立されているにもかかわらず、人工知能(AI)によるフィードバックの適用、特にChatGPTのような言語モデルからのフィードバックは、翻訳教育においてまだ検討されていない。本研究は,ChatGPTによる翻訳過程における教師の学生の関与について検討した。定量的・質的な分析と翻訳・修正実験を組み合わせた混合手法を用いて, フィードバック, 翻訳前・修正後, 修正プロセス, 学生の振り返りについて検討した。その結果、認知的・感情的・行動的側面の複雑な相互関係が、学生のAIフィードバックとその後の改訂に影響を及ぼすことが明らかとなった。具体的には、学生はフィードバックが理解可能であるにもかかわらず、リビジョンプロセスにかなりの認知的努力を注いでいることが示唆された。さらに、彼らはフィードバックモデルに対して適度な感情的満足感を示した。行動は認知的・情緒的要因に大きく影響されたが,いくつかの矛盾がみられた。この研究は、翻訳教育におけるAI生成フィードバックの潜在的な応用に関する新たな洞察を提供し、言語教育環境におけるAIツールの統合に関するさらなる研究の道を開く。 Despite the well-established importance of feedback in education, the application of Artificial Intelligence (AI)-generated feedback, particularly from language models like ChatGPT, remains understudied in translation education. This study investigates the engagement of master's students in translation with ChatGPT-generated feedback during their revision process. A mixed-methods approach, combining a translation-and-revision experiment with quantitative and qualitative analyses, was employed to examine the feedback, translations pre-and post-revision, the revision process, and student reflections. The results reveal complex interrelations among cognitive, affective, and behavioural dimensions influencing students' engagement with AI feedback and their subsequent revisions. Specifically, the findings indicate that students invested considerable cognitive effort in the revision process, despite finding the feedback comprehensible. Additionally, they exhibited moderate affective satisfaction with the feedback model. Behaviourally, their actions were largely influenced by cognitive and affective factors, although some inconsistencies were observed. This research provides novel insights into the potential applications of AI-generated feedback in translation teachingand opens avenues for further investigation into the integration of AI tools in language teaching settings.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# DeBiFormer: 変形可能なエージェントバイレベルルーティングアテンションを備えたビジョントランス DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention ( http://arxiv.org/abs/2410.08582v1 ) ライセンス: Link先を確認	Nguyen Huu Bao Long, Chenyu Zhang, Yuzhi Shi, Tsubasa Hirakawa, Takayoshi Yamashita, Tohgoroh Matsui, Hironobu Fujiyoshi,	(参考訳) 様々な注目モジュールを持つ視覚変換器は、視覚タスクにおいて優れた性能を示す。 DATのような空間適応的注意を用いた場合、画像分類において強い結果が得られたが、変形可能な点から選択されたキー値対は意味的セグメンテーションタスクの微調整時に意味的関連性が欠如している。 BiFormerのクエリ対応の空間的注意は、各クエリをトップkのルーティングリージョンに集中させることを目指している。しかし、注意計算では、選択されたキーと値のペアは、無関係なクエリが多すぎることで影響を受け、より重要なクエリへの注意が減少する。これらの問題に対処するために,エージェントクエリを用いたキー値ペアの選択を最適化し,アテンションマップにおけるクエリの解釈可能性を高める,変形可能なバイレベルルーティングアテンション(DBRA)モジュールを提案する。そこで本研究では,DBRAモジュールで構築した汎用視覚変換器であるDeformable Bi-level Routing Attention Transformer (DeBiFormer)を紹介する。 DeBiFormerは画像分類、オブジェクト検出、セマンティックセグメンテーションなど様々なコンピュータビジョンタスクで検証されており、その効果の強い証拠を提供している。 Vision Transformers with various attention modules have demonstrated superior performance on vision tasks. While using sparsity-adaptive attention, such as in DAT, has yielded strong results in image classification, the key-value pairs selected by deformable points lack semantic relevance when fine-tuning for semantic segmentation tasks. The query-aware sparsity attention in BiFormer seeks to focus each query on top-k routed regions. However, during attention calculation, the selected key-value pairs are influenced by too many irrelevant queries, reducing attention on the more important ones. To address these issues, we propose the Deformable Bi-level Routing Attention (DBRA) module, which optimizes the selection of key-value pairs using agent queries and enhances the interpretability of queries in attention maps. Based on this, we introduce the Deformable Bi-level Routing Attention Transformer (DeBiFormer), a novel general-purpose vision transformer built with the DBRA module. DeBiFormer has been validated on various computer vision tasks, including image classification, object detection, and semantic segmentation, providing strong evidence of its effectiveness.Code is available at {https://github.com/maclong01/DeBiFormer}	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# シークエンシャルレコメンデーションのためのインテント強化データ拡張 Intent-Enhanced Data Augmentation for Sequential Recommendation ( http://arxiv.org/abs/2410.08583v1 ) ライセンス: Link先を確認	Shuai Chen, Zhoujun Li,	(参考訳) インテント強化シーケンシャルレコメンデーションアルゴリズムの研究は、シーケンシャルレコメンデーションタスクのためのユーザ行動データに基づいて、動的ユーザインテントをよりよくマイニングする方法に焦点を当てている。様々なデータ拡張手法が、現在のシーケンシャルレコメンデーションアルゴリズムに広く適用され、ユーザの意図を捕捉する能力を効果的に向上する。しかし、これらの広く使われているデータ拡張方法は、トレーニングデータに過剰なノイズを導入し、ユーザの意図を曖昧にし、レコメンデーション性能に悪影響を及ぼすような、大量のランダムサンプリングに依存することが多い。さらに、これらの手法は、拡張データを利用するための限られたアプローチを持ち、拡張サンプルを完全に活用することができない。本稿では,インテント・セグメンテーションの挿入によるユーザ行動系列に基づく正と負のサンプルを構成するシーケンシャルレコメンデーション(\textbf{IESRec})のためのインテント強化データ拡張手法を提案する。一方、生成された正のサンプルは、元のトレーニングデータと混合され、推奨性能を改善するために一緒に訓練される。一方、生成した正および負のサンプルは、自己教師付きトレーニングによる推奨性能を高めるために、対照的な損失関数を構築するために使用される。最後に、メインレコメンデーションタスクは、対照的な学習損失最小化タスクと共同で訓練される。実世界の3つのデータセットの実験により、IESRecモデルの有効性が検証された。 The research on intent-enhanced sequential recommendation algorithms focuses on how to better mine dynamic user intent based on user behavior data for sequential recommendation tasks. Various data augmentation methods are widely applied in current sequential recommendation algorithms, effectively enhancing the ability to capture user intent. However, these widely used data augmentation methods often rely on a large amount of random sampling, which can introduce excessive noise into the training data, blur user intent, and thus negatively affect recommendation performance. Additionally, these methods have limited approaches to utilizing augmented data, failing to fully leverage the augmented samples. We propose an intent-enhanced data augmentation method for sequential recommendation(\textbf{IESRec}), which constructs positive and negative samples based on user behavior sequences through intent-segment insertion. On one hand, the generated positive samples are mixed with the original training data, and they are trained together to improve recommendation performance. On the other hand, the generated positive and negative samples are used to build a contrastive loss function, enhancing recommendation performance through self-supervised training. Finally, the main recommendation task is jointly trained with the contrastive learning loss minimization task. Experiments on three real-world datasets validate the effectiveness of our IESRec model.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# ZipVL:動的トークンスカラー化とKVキャッシュ圧縮を併用した高能率視覚言語モデル ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression ( http://arxiv.org/abs/2410.08584v1 ) ライセンス: Link先を確認	Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang,	(参考訳) 大規模視覚言語モデル(LVLM)の効率は、プリフィルフェーズにおける注意機構の計算的ボトルネックと、特に高解像度画像やビデオを含むシナリオにおいて、デコードフェーズにおいてキー値(KV)キャッシュを取得する際のメモリボトルネックによって制約される。視覚コンテンツは、しばしば相当な冗長性を示し、その結果、LVLM内の高度に疎い注意マップが生成される。この空間は、注意計算を加速したり、様々なアプローチでKVキャッシュを圧縮するために利用することができる。しかしながら、ほとんどの研究はこれらのボトルネックの1つにのみ対処することに集中しており、異なる層やタスクに関する空間の動的調整を適切にサポートしていない。本稿では,LVLMのための効率的な推論フレームワークZipVLを提案する。この比率は、固定されたハイパーパラメータではなく、アテンションスコアの層別分布に基づいて適応的に決定される。次に,正規化された注目スコアに基づいて重要なトークンを選択し,それらの重要なトークンのみにアテンション機構を実行し,プリフィルフェーズを高速化する。復号フェーズにおけるメモリボトルネックを軽減するため、KVキャッシュに混合精度量子化を適用し、重要トークンのキャッシュにハイビット量子化を用いる一方、重要でないキャッシュには低ビット量子化を適用する。実験により、ZipVLはプリフィルフェーズを2.6$\times$で高速化し、GPUメモリ使用量を50.0%削減できることを示した。 The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos. Visual content often exhibits substantial redundancy, resulting in highly sparse attention maps within LVLMs. This sparsity can be leveraged to accelerate attention computation or compress the KV cache through various approaches. However, most studies focus on addressing only one of these bottlenecks and do not adequately support dynamic adjustment of sparsity concerning distinct layers or tasks. In this paper, we present ZipVL, an efficient inference framework designed for LVLMs that resolves both computation and memory bottlenecks through a dynamic ratio allocation strategy of important tokens. This ratio is adaptively determined based on the layer-specific distribution of attention scores, rather than fixed hyper-parameters, thereby improving efficiency for less complex tasks while maintaining high performance for more challenging ones. Then we select important tokens based on their normalized attention scores and perform attention mechanism solely on those important tokens to accelerate the prefill phase. To mitigate the memory bottleneck in the decoding phase, we employ mixed-precision quantization to the KV cache, where high-bit quantization is used for caches of important tokens, while low-bit quantization is applied to those of less importance. Our experiments demonstrate that ZipVL can accelerate the prefill phase by 2.6$\times$ and reduce GPU memory usage by 50.0%, with a minimal accuracy reduction of only 0.2% on Video-MME benchmark over LongVA-7B model, effectively enhancing the generation efficiency of LVLMs.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# ViT3DアライメントによるLLaMA3の3次元医用画像生成 ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation ( http://arxiv.org/abs/2410.08588v1 ) ライセンス: Link先を確認	Siyou Li, Beining Xu, Yihao Luo, Dong Nie, Le Zhang,	(参考訳) 医用画像から詳細なテキストレポートを作成するための医療報告自動生成(MRG)がこの領域で重要な課題となっている。 MRGシステムは、レポート作成に必要な時間と労力を削減し、診断効率を向上させることで、放射線学的ワークフローを向上させることができる。本研究では,マルチモーダル大言語モデルを用いたMRGの自動生成手法を提案する。具体的には、M3D-CLIPから導入された3D Vision Transformer (ViT3D)画像エンコーダを用いて、3Dスキャンを処理し、Asclepius-Llama3-8Bを言語モデルとして使用し、自動回帰復号によりテキストレポートを生成する。実験の結果,MRGタスク検証セットでは平均グリーンスコア0.3,視覚質問応答(VQA)タスク検証セットでは平均0.61,ベースラインモデルでは平均グリーンスコア0.3を達成できた。提案手法は,LLaMA3のVT3DアライメントによるMRGとVQAの自動タスクの有効性を示す。 Automatic medical report generation (MRG), which aims to produce detailed text reports from medical images, has emerged as a critical task in this domain. MRG systems can enhance radiological workflows by reducing the time and effort required for report writing, thereby improving diagnostic efficiency. In this work, we present a novel approach for automatic MRG utilizing a multimodal large language model. Specifically, we employed the 3D Vision Transformer (ViT3D) image encoder introduced from M3D-CLIP to process 3D scans and use the Asclepius-Llama3-8B as the language model to generate the text reports by auto-regressive decoding. The experiment shows our model achieved an average Green score of 0.3 on the MRG task validation set and an average accuracy of 0.61 on the visual question answering (VQA) task validation set, outperforming the baseline model. Our approach demonstrates the effectiveness of the ViT3D alignment of LLaMA3 for automatic MRG and VQA tasks by tuning the model on a small dataset.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 階層クラスタリングによるスパースミキサーのリトレーニングフリーマージ Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering ( http://arxiv.org/abs/2410.08589v1 ) ライセンス: Link先を確認	I-Chun Chen, Hsu-Shen Liu, Wei-Fang Sun, Chen-Hao Chao, Yen-Chang Hsu, Chun-Yi Lee,	(参考訳) SMOE(Sparse Mixture-of-Experts)モデルは、大規模な言語モデル開発において重要なブレークスルーとなる。これらのモデルは、推論コストを比例的に増加させることなく、性能改善を可能にする。タスク実行中に小さなパラメータセットを選択的に活性化することにより、SMoEはモデルのキャパシティを向上させる。しかし、専門家の増加に対応するために必要なメモリフットプリントがかなり大きいため、彼らのデプロイメントは依然として困難である。この制約により、限られたハードウェアリソースを持つ環境では実現不可能になる。この課題に対処するために,SMoEモデルパラメータをリトレーニングせずに削減するタスクに依存しないエキスパートマージフレームワークであるHyerarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE)を提案する。従来の手法とは異なり、HC-SMoEは専門家の出力に基づいた階層的なクラスタリングを採用している。このアプローチは、マージプロセスがルーティング決定の影響を受けないことを保証する。アウトプットベースのクラスタリング戦略は、専門家間の機能的類似性を捉え、多くの専門家とモデルに適応可能なソリューションを提供する。我々は8つのゼロショット言語タスクに関する広範な実験を通じてアプローチを検証するとともに、QwenやMixtralといった大規模SMoEモデルにおいてその効果を実証する。我々の総合的な結果はHC-SMoEが一貫して高いパフォーマンスを達成していることを示している。 Sparse Mixture-of-Experts (SMoE) models represent a significant breakthrough in large language model development. These models enable performance improvements without a proportional increase in inference costs. By selectively activating a small set of parameters during task execution, SMoEs enhance model capacity. However, their deployment remains challenging due to the substantial memory footprint required to accommodate the growing number of experts. This constraint renders them less feasible in environments with limited hardware resources. To address this challenge, we propose Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework that reduces SMoE model parameters without retraining. Unlike previous methods, HC-SMoE employs hierarchical clustering based on expert outputs. This approach ensures that the merging process remains unaffected by routing decisions. The output-based clustering strategy captures functional similarities between experts, offering an adaptable solution for models with numerous experts. We validate our approach through extensive experiments on eight zero-shot language tasks and demonstrate its effectiveness in large-scale SMoE models such as Qwen and Mixtral. Our comprehensive results demonstrate that HC-SMoE consistently achieves strong performance, which highlights its potential for real-world deployment.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# VIBES -- ビジョンバックボーン効率の良い選択 VIBES -- Vision Backbone Efficient Selection ( http://arxiv.org/abs/2410.08592v1 ) ライセンス: Link先を確認	Joris Guerin, Shray Bansal, Amirreza Shaban, Paulo Mann, Harshvardhan Gazula,	(参考訳) この作業は、特定の目標タスクに対して、高性能な事前学習された視覚バックボーンを効率的に選択する課題に対処する。有限個のバックボーン内の徹底的な探索はこの問題を解決できるが、大規模なデータセットやバックボーンプールでは現実的ではない。この問題を解決するために、Vision Backbone Efficient Selection (VIBES)を導入します。我々は、VIBESに対処するための単純で効果的なヒューリスティックをいくつか提案し、それらを4つのコンピュータビジョンデータセットで評価する。提案手法は,1つのGPU上での検索予算が1時間以内であっても,ジェネリックベンチマークより優れたバックボーンを同定できることを示す。 VIBESはベンチマークからタスク固有の最適化へのパラダイムシフトだと考えています。 This work tackles the challenge of efficiently selecting high-performance pre-trained vision backbones for specific target tasks. Although exhaustive search within a finite set of backbones can solve this problem, it becomes impractical for large datasets and backbone pools. To address this, we introduce Vision Backbone Efficient Selection (VIBES), which aims to quickly find well-suited backbones, potentially trading off optimality for efficiency. We propose several simple yet effective heuristics to address VIBES and evaluate them across four diverse computer vision datasets. Our results show that these approaches can identify backbones that outperform those selected from generic benchmarks, even within a limited search budget of one hour on a single GPU. We reckon VIBES marks a paradigm shift from benchmarks to task-specific optimization.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# ビデオコーパスモーメント検索ベンチマーク「VERIFIED」 VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding ( http://arxiv.org/abs/2410.08593v1 ) ライセンス: Link先を確認	Houlun Chen, Xin Wang, Hong Chen, Zeyang Zhang, Wei Feng, Bin Huang, Jia Jia, Wenwu Zhu,	(参考訳) 既存のビデオコーパスモーメント検索(VCMR)は粗大な理解に限られており、きめ細かいクエリが与えられたときの正確なビデオモーメントのローカライゼーションを妨げる。本稿では,より難易度の高いVCMRベンチマークを提案する。データセット構築の効率を改善し,高品質なデータアノテーションを保証するために, VERIFIEDを提案する。これは, \underline{R}el\underline{I}able \underline{FI}n\underline{E}-grand statics と \underline{D}ynamics を用いたキャプションを生成するための,自動 \underline{V}id\underline{E}o-text アノテーションパイプラインである。具体的には,大規模言語モデル (LLM) と大規模マルチモーダルモデル (LMM) に代えて,ビデオ毎に様々な細粒度キャプションを生成する静的・ダイナミクス拡張キャプションモジュールを提案する。 LLMの幻覚による不正確なアノテーションをフィルタリングするために,不規則なハードネガティブを付加したビデオ基礎モデルを微調整するファイングラニュラリティ・アウェアノイズ評価器を提案する。 VERIFIEDを用いて、高レベルのアノテーション品質を示すCharades-FIG、DiDeMo-FIG、ActivityNet-FIGを含むより難易度の高いVCMRベンチマークを構築する。提案したデータセット上で、いくつかの最先端VCMRモデルを評価し、VCMRの微細なビデオ理解には依然としてかなりの範囲があることを明らかにする。コードとデータセットは \href{https://github.com/hlchen23/VERIFIED}{https://github.com/hlchen23/VERIFIED} にある。 Existing Video Corpus Moment Retrieval (VCMR) is limited to coarse-grained understanding, which hinders precise video moment localization when given fine-grained queries. In this paper, we propose a more challenging fine-grained VCMR benchmark requiring methods to localize the best-matched moment from the corpus with other partially matched candidates. To improve the dataset construction efficiency and guarantee high-quality data annotations, we propose VERIFIED, an automatic \underline{V}id\underline{E}o-text annotation pipeline to generate captions with \underline{R}el\underline{I}able \underline{FI}n\underline{E}-grained statics and \underline{D}ynamics. Specifically, we resort to large language models (LLM) and large multimodal models (LMM) with our proposed Statics and Dynamics Enhanced Captioning modules to generate diverse fine-grained captions for each video. To filter out the inaccurate annotations caused by the LLM hallucination, we propose a Fine-Granularity Aware Noise Evaluator where we fine-tune a video foundation model with disturbed hard-negatives augmented contrastive and matching losses. With VERIFIED, we construct a more challenging fine-grained VCMR benchmark containing Charades-FIG, DiDeMo-FIG, and ActivityNet-FIG which demonstrate a high level of annotation quality. We evaluate several state-of-the-art VCMR models on the proposed dataset, revealing that there is still significant scope for fine-grained video understanding in VCMR. Code and Datasets are in \href{https://github.com/hlchen23/VERIFIED}{https://github.com/hlchen23/VERIFIED}.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# なぜ猫を殺したのか?物語における好奇心(とサスペンス、驚き)の論理的形式化に向けて What killed the cat? Towards a logical formalization of curiosity (and suspense, and surprise) in narratives ( http://arxiv.org/abs/2410.08597v1 ) ライセンス: Link先を確認	Florence Dupin de Saint-Cyr, Anne-Gwenn Bosser, Benjamin Callac, Eric Maisel,	(参考訳) 物語の緊張の中心にある3つの感情(好奇心、サスペンス、驚き)を形式化する統一的な枠組みを提供する。このフレームワークは、非単調な推論に基づいて構築され、世界のデフォルトの振る舞いをコンパクトに表現し、ストーリーを受け取るエージェントの感情進化をシミュレートすることができる。認識、好奇心、驚き、サスペンスの概念を定式化した後、私たちは定義によって引き起こされる特性を探求し、それらを検出するための計算複雑性について研究する。物語を聴くエージェントに対して、最後にこれらの感情の強さを評価する手段を提案する。 We provide a unified framework in which the three emotions at the heart of narrative tension (curiosity, suspense and surprise) are formalized. This framework is built on nonmonotonic reasoning which allows us to compactly represent the default behavior of the world and to simulate the affective evolution of an agent receiving a story. After formalizing the notions of awareness, curiosity, surprise and suspense, we explore the properties induced by our definitions and study the computational complexity of detecting them. We finally propose means to evaluate these emotions' intensity for a given agent listening to a story.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# 意味的知識チューニングを用いた大規模言語モデルのパラメータ効率の良い微調整 Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning ( http://arxiv.org/abs/2410.08598v1 ) ライセンス: Link先を確認	Nusrat Jahan Prottasha, Asif Mahmud, Md. Shohanur Islam Sobuj, Prakash Bhat, Md Kowsher, Niloofar Yousefi, Ozlem Ozmen Garibay,	(参考訳) 大規模言語モデル (LLM) は, 計算コストの低さから, プロンプトを用いた特殊タスクにおいて近年顕著に普及している。接頭辞のチューニングのような標準的な方法は、意味を欠いた特別な変更可能なトークンを使用し、しばしば不足する、最高のパフォーマンスのための広範なトレーニングを必要とする。そこで本研究では,ランダムトークンの代わりに有意な単語を用いるプロンプトおよびプレフィックスチューニングのための,セマンティック知識チューニング(SK-Tuning)と呼ばれる新しい手法を提案する。この方法は、固定LLMを使用して、ゼロショット機能を通じてプロンプトの意味的内容を理解し、処理する。これに続いて、処理されたプロンプトと入力テキストを統合して、特定のタスクにおけるモデルの性能を改善する。実験の結果,SK-Tuningはテキスト分類や理解などのタスクにおいて,他のチューニング手法と比較して,学習時間やパラメータの少ない,優れたパフォーマンスを示すことがわかった。このアプローチは、言語タスクの処理におけるLLMの効率性と有効性を最適化するための有望な方法を提供する。 Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens. This method involves using a fixed LLM to understand and process the semantic content of the prompt through zero-shot capabilities. Following this, it integrates the processed prompt with the input text to improve the model's performance on particular tasks. Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding compared to other tuning methods. This approach offers a promising method for optimizing the efficiency and effectiveness of LLMs in processing language tasks.	翻訳日:2024-10-30 23:04:57 公開日:2024-10-11
# StraGo: プロンプト最適化のための戦略的ガイダンスを損なう StraGo: Harnessing Strategic Guidance for Prompt Optimization ( http://arxiv.org/abs/2410.08601v1 ) ライセンス: Link先を確認	Yurong Wu, Yan Gao, Bin Benjamin Zhu, Zineng Zhou, Xiaodi Sun, Sheng Yang, Jian-Guang Lou, Zhiming Ding, Linjun Yang,	(参考訳) Prompt Engineeringは、さまざまなアプリケーションにまたがる大規模言語モデル(LLM)の能力を活用する上で重要である。既存のプロンプト最適化手法は迅速な効率を改善するが、しばしばプロンプトドリフトを引き起こし、新しく生成されたプロンプトは失敗に対処しながら、以前成功したケースに悪影響を及ぼす可能性がある。さらに、これらの手法は最適化タスクの高速化にLLMの本質的な能力に大きく依存する傾向にある。本稿では,StraGo(Strategic-Guided Optimization)について紹介する。これは,成功事例と失敗事例の両方から得られた知見を活用して,最適化目標を達成するための重要な要因を特定することによって,迅速なドリフトを緩和する新しいアプローチである。 StraGoはハウツードの方法論を採用し、コンテキスト内学習を統合して、具体的で実行可能な戦略を定式化し、迅速な最適化のための詳細なステップバイステップのガイダンスを提供する。推論、自然言語理解、ドメイン固有の知識、産業応用など、広範囲にわたる大規模な実験は、StraGoの優れた性能を実証している。迅速な最適化の新たな最先端を確立し、安定的で効果的な迅速な改善を提供する能力を示している。 Prompt engineering is pivotal for harnessing the capabilities of large language models (LLMs) across diverse applications. While existing prompt optimization methods improve prompt effectiveness, they often lead to prompt drifting, where newly generated prompts can adversely impact previously successful cases while addressing failures. Furthermore, these methods tend to rely heavily on LLMs' intrinsic capabilities for prompt optimization tasks. In this paper, we introduce StraGo (Strategic-Guided Optimization), a novel approach designed to mitigate prompt drifting by leveraging insights from both successful and failed cases to identify critical factors for achieving optimization objectives. StraGo employs a how-to-do methodology, integrating in-context learning to formulate specific, actionable strategies that provide detailed, step-by-step guidance for prompt optimization. Extensive experiments conducted across a range of tasks, including reasoning, natural language understanding, domain-specific knowledge, and industrial applications, demonstrate StraGo's superior performance. It establishes a new state-of-the-art in prompt optimization, showcasing its ability to deliver stable and effective prompt improvements.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# MergePrint: 大規模言語モデルのマージに対するロバストフィンガープリント MergePrint: Robust Fingerprinting against Merging Large Language Models ( http://arxiv.org/abs/2410.08604v1 ) ライセンス: Link先を確認	Shojiro Yamabe, Tsubasa Takahashi, Futa Waseda, Koki Wataoka,	(参考訳) 大きな言語モデル(LLM)のトレーニングコストが上昇するにつれて、その知的財産を保護することがますます重要になっている。複数の専門家モデルを統合するモデルマージは、複数のタスクを実行することができる単一のモデルに統合される。フィンガープリント技術はモデルの所有権を主張するために研究されているが、既存の手法は主に微調整に焦点を当てており、モデルのマージは未調査のままである。このギャップに対処するために,モデルマージ後にも所有権の主張を保持するために,頑健な指紋を埋め込む新しい指紋認証手法MergePrintを提案する。マージ後のモデルの重みをシミュレートする擬似マージモデルに対して最適化することで、マージ後も検出可能な指紋を生成する。さらに、指紋入力を最適化して性能劣化を最小限に抑え、ターゲット入力からの特定の出力による検証を可能にする。このアプローチは、モデルマージによる誤適用の場合のオーナシップを主張するために、実用的なフィンガープリント戦略を提供する。 As the cost of training large language models (LLMs) rises, protecting their intellectual property has become increasingly critical. Model merging, which integrates multiple expert models into a single model capable of performing multiple tasks, presents a growing risk of unauthorized and malicious usage. While fingerprinting techniques have been studied for asserting model ownership, existing methods have primarily focused on fine-tuning, leaving model merging underexplored. To address this gap, we propose a novel fingerprinting method MergePrint that embeds robust fingerprints designed to preserve ownership claims even after model merging. By optimizing against a pseudo-merged model, which simulates post-merged model weights, MergePrint generates fingerprints that remain detectable after merging. Additionally, we optimize the fingerprint inputs to minimize performance degradation, enabling verification through specific outputs from targeted inputs. This approach provides a practical fingerprinting strategy for asserting ownership in cases of misappropriation through model merging.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# 生成的敵対的ネットワークを用いたテキスト・ツー・イメージ Text-To-Image with Generative Adversarial Networks ( http://arxiv.org/abs/2410.08608v1 ) ライセンス: Link先を確認	Mehrshad Momen-Tayefeh,	(参考訳) 人間のテキストから現実的な画像を生成することは、コンピュータビジョン(CV)の分野で最も難しい問題の一つである。与えられた記述の意味は、既存のテキスト・ツー・イメージのアプローチによって大まかに反映できる。本稿では,GAN(Generative Adversarial Networks)に基づく5つの異なる手法の簡単な比較を行い,テキストから画像を作成することを目的とする。さらに、各モデルアーキテクチャは解像度の異なる画像を生成する。さらに、最高の解像度と最悪の解像度は、それぞれ6464、256256である。しかし、各モデルの精度を導入する指標をいくつかチェックして比較した。また、本研究では、これらの異なるアプローチを重要な指標と比較することにより、この問題の最良のモデルを見出した。 Generating realistic images from human texts is one of the most challenging problems in the field of computer vision (CV). The meaning of descriptions given can be roughly reflected by existing text-to-image approaches. In this paper, our main purpose is to propose a brief comparison between five different methods base on the Generative Adversarial Networks (GAN) to make image from the text. In addition, each model architectures synthesis images with different resolution. Furthermore, the best and worst obtained resolutions is 6464, 256256 respectively. However, we checked and compared some metrics that introduce the accuracy of each model. Also, by doing this study, we found out the best model for this problem by comparing these different approaches essential metrics.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# 共役セマンティックプールによる事前学習型視覚言語モデルによるOOD検出の改善 Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models ( http://arxiv.org/abs/2410.08611v1 ) ライセンス: Link先を確認	Mengyuan Chen, Junyu Gao, Changsheng Xu,	(参考訳) ゼロショットアウト・オブ・ディストリビューション(OOD)検出のための簡単なパイプラインは、広範囲なセマンティックプールから潜在的OODラベルを選択し、訓練済みの視覚言語モデルを利用して、イン・ディストリビューション(ID)とOODラベルの両方の分類を実行する。本稿では,OOD ラベルが活性化される可能性を高めつつ,これらの OOD ラベルのアクティベーションの相互依存性を低く抑えつつ,セマンティックプールの拡大が求められることを理論的に論じる。自然な拡張法はより大きな語彙を採用することであるが、多くの同義語や非一般的な単語の必然的な導入は上記の要件を満たすことに失敗し、実行可能な拡張法が単に語彙から単語を選択することを超えることを示唆している。 OOD検出は、入力画像をID/OODクラスグループに正しく分類することを目的としているため、標準クラス名ではないがプロセスに有利なOODラベル候補を“メイクアップ”することができる。元のセマンティックプールが未修正の特定のクラス名から成り立っていることを観察し、修正されたスーパークラス名からなる共役セマンティックプール(CSP)を構築し、それぞれ異なるカテゴリで類似したプロパティを共有するクラスタセンターとして機能する。確立された理論と一致し、CSPによるOODラベル候補の拡大が要求を満たし、FPR95において既存の作品の7.89%を上回ります。コードはhttps://github.com/MengyuanChen21/NeurIPS2024-CSPで公開されている。 A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels. In this paper, we theorize that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels. A natural expansion manner is to adopt a larger lexicon; however, the inevitable introduction of numerous synonyms and uncommon words fails to meet the above requirements, indicating that viable expansion manners move beyond merely selecting words from a lexicon. Since OOD detection aims to correctly classify input images into ID/OOD class groups, we can "make up" OOD label candidates which are not standard class names but beneficial for the process. Observing that the original semantic pool is comprised of unmodified specific class names, we correspondingly construct a conjugated semantic pool (CSP) consisting of modified superclass names, each serving as a cluster center for samples sharing similar properties across different categories. Consistent with our established theory, expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95. Codes are available in https://github.com/MengyuanChen21/NeurIPS2024-CSP.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# Synth-SONAR:Dual Diffusion ModelとGPT Promptingによる多様性とリアリズムの強化によるソナー画像合成 Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting ( http://arxiv.org/abs/2410.08612v1 ) ライセンス: Link先を確認	Purushothaman Natarajan, Kamal Basha, Athira Nambiar,	(参考訳) ソナー画像合成は水中探査、海洋生物学、防衛における応用の進展に不可欠である。従来の手法は、ソナーセンサーを使ってデータ品質と多様性を危険にさらす、広範囲で高価なデータ収集に依存していることが多い。これらの制約を克服するために,拡散モデルとGPTプロンプトを利用した新しいソナー画像合成フレームワーク,Synth-SONARを提案する。まず、生成AIベースのスタイルインジェクション技術と、公開されている実/シミュレーションデータを統合することで、ソナー研究のための最大のソナーデータコーパスの1つを生成する。第二に、二重テキスト条件ソナー拡散モデル階層は、粗くきめ細かなソナー画像を品質と多様性を向上して合成する。第3に、高レベル(粗度)と低レベル(詳細)のテキストベースのソナー生成手法は、視覚言語モデル(VLM)とGPTプロンプトで利用可能な高度な意味情報を活用する。推測中,本手法はテキストのプロンプトから多彩でリアルなソナー画像を生成し,テキスト記述とソナー画像生成のギャップを埋める。このことは、私たちの知る限り、初めてソナー画像にGPTプロンプティングを応用したことを意味する。 Synth-SONARは高品質な合成ソナーデータセットの作成において最先端の結果を達成し、その多様性とリアリズムを著しく向上させる。 Sonar image synthesis is crucial for advancing applications in underwater exploration, marine biology, and defence. Traditional methods often rely on extensive and costly data collection using sonar sensors, jeopardizing data quality and diversity. To overcome these limitations, this study proposes a new sonar image synthesis framework, Synth-SONAR leveraging diffusion models and GPT prompting. The key novelties of Synth-SONAR are threefold: First, by integrating Generative AI-based style injection techniques along with publicly available real/simulated data, thereby producing one of the largest sonar data corpus for sonar research. Second, a dual text-conditioning sonar diffusion model hierarchy synthesizes coarse and fine-grained sonar images with enhanced quality and diversity. Third, high-level (coarse) and low-level (detailed) text-based sonar generation methods leverage advanced semantic information available in visual language models (VLMs) and GPT-prompting. During inference, the method generates diverse and realistic sonar images from textual prompts, bridging the gap between textual descriptions and sonar image generation. This marks the application of GPT-prompting in sonar imagery for the first time, to the best of our knowledge. Synth-SONAR achieves state-of-the-art results in producing high-quality synthetic sonar datasets, significantly enhancing their diversity and realism.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# リモートセンシング画像セグメンテーションの参照のための双方向双方向相互干渉モデル Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation ( http://arxiv.org/abs/2410.08613v1 ) ライセンス: Link先を確認	Zhe Dong, Yuzhe Sun, Yanfeng Gu, Tianzhu Liu,	(参考訳) 自然言語表現とリモートセンシング画像が与えられた場合、リモートセンシング画像セグメンテーション(RRSIS)の目標は、参照表現によって識別された対象対象物の画素レベルマスクを生成することである。自然のシナリオとは対照的に、RRSISの表現は複雑な地理空間的関係を伴い、対象の対象物は規模が大きく異なり、視覚的サリエンシが欠如しているため、正確なセグメンテーションを達成するのが困難になる。上記の課題に対処するため、クロスモーダル双方向相互作用モデル(CroBIM)と呼ばれる新しいRRSISフレームワークが提案されている。具体的には,文脈認識型プロンプト変調(CAPM)モジュールは,空間的位置関係とタスク固有の知識を言語的特徴に統合することにより,対象物を捕捉する能力を向上する。さらに,言語情報とマルチスケールの視覚的特徴を統合するために,言語誘導型特徴集合(LGFA)モジュールを導入し,特徴集約を強化するために注意欠陥補償機構を組み込んだ。最後に、相互干渉デコーダ(MID)は、カスケードされた双方向の相互アテンションを通じて、クロスモーダルな特徴アライメントを強化し、正確なセグメンテーションマスク予測を可能にするように設計されている。 RRSISの研究をさらに推し進めるために、52,472個の画像言語ラベル三重項からなる新しい大規模ベンチマークデータセットRISBenchを構築した。 RISBenchと他の2つの一般的なデータセットの大規模なベンチマークは、既存の最先端(SOTA)メソッドよりも提案されたCroBIMの優れたパフォーマンスを示している。 CroBIMとRISBenchデータセットのソースコードはhttps://github.com/HIT-SIRS/CroBIMで公開されている。 Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack visual saliency, thereby increasing the difficulty of achieving precise segmentation. To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM). Specifically, a context-aware prompt modulation (CAPM) module is designed to integrate spatial positional relationships and task-specific knowledge into the linguistic features, thereby enhancing the ability to capture the target object. Additionally, a language-guided feature aggregation (LGFA) module is introduced to integrate linguistic information into multi-scale visual features, incorporating an attention deficit compensation mechanism to enhance feature aggregation. Finally, a mutual-interaction decoder (MID) is designed to enhance cross-modal feature alignment through cascaded bidirectional cross-attention, thereby enabling precise segmentation mask prediction. To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets. Extensive benchmarking on RISBench and two other prevalent datasets demonstrates the superior performance of the proposed CroBIM over existing state-of-the-art (SOTA) methods. The source code for CroBIM and the RISBench dataset will be publicly available at https://github.com/HIT-SIRS/CroBIM	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# 自然言語による対角画像 Natural Language Induced Adversarial Images ( http://arxiv.org/abs/2410.08620v1 ) ライセンス: Link先を確認	Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu,	(参考訳) ディープラーニングモデルの脆弱性を示し、より堅牢なモデルの構築を支援するため、AIセキュリティにとって、敵攻撃の研究は重要である。画像に対する敵対的攻撃は、ノイズベースの攻撃、画像編集ベースの攻撃、遅延空間ベースの攻撃など、最も広く研究されている。しかし、これらの手法によって作られた敵の例は十分な意味情報を欠くことが多く、人間が自然条件下でのディープラーニングモデルの失敗モードを理解することは困難である。この制限に対処するため,自然言語による逆画像攻撃手法を提案する。中心となる考え方は、テキスト・ツー・イメージモデルを利用して入力プロンプトが与えられた逆画像を生成することである。より自然な逆画像の合成のための商用テキスト・ツー・イメージモデルを採用するために、勾配を必要とせずに離散的逆プロンプトを最適化するための適応型遺伝的アルゴリズム(GA)と、クエリ効率を向上させるための適応型単語空間削減手法を提案する。さらに、生成された画像のセマンティック一貫性を維持するためにCLIPを使用しました。実験の結果、"foggy"、"humid"、"stretching"などの高周波セマンティック情報が容易に分類誤りを引き起こすことがわかった。この敵対的意味情報は、生成された画像だけでなく、現実世界で撮影された写真にも存在している。また,いくつかの逆意味情報を未知の分類タスクに転送することも確認した。さらに,攻撃手法は異なるテキスト・画像モデル(例えば,Midjourney,DALL-E 3など)と画像分類器に転送できる。私たちのコードは、https://github.com/zxp555/Natural-Language-induced-Adversarial-Imagesで利用可能です。 Research of adversarial attacks is important for AI security because it shows the vulnerability of deep learning models and helps to build more robust models. Adversarial attacks on images are most widely studied, which include noise-based attacks, image editing-based attacks, and latent space-based attacks. However, the adversarial examples crafted by these methods often lack sufficient semantic information, making it challenging for humans to understand the failure modes of deep learning models under natural conditions. To address this limitation, we propose a natural language induced adversarial image attack method. The core idea is to leverage a text-to-image model to generate adversarial images given input prompts, which are maliciously constructed to lead to misclassification for a target model. To adopt commercial text-to-image models for synthesizing more natural adversarial images, we propose an adaptive genetic algorithm (GA) for optimizing discrete adversarial prompts without requiring gradients and an adaptive word space reduction method for improving query efficiency. We further used CLIP to maintain the semantic consistency of the generated images. In our experiments, we found that some high-frequency semantic information such as "foggy", "humid", "stretching", etc. can easily cause classifier errors. This adversarial semantic information exists not only in generated images but also in photos captured in the real world. We also found that some adversarial semantic information can be transferred to unknown classification tasks. Furthermore, our attack method can transfer to different text-to-image models (e.g., Midjourney, DALL-E 3, etc.) and image classifiers. Our code is available at: https://github.com/zxp555/Natural-Language-Induced-Adversarial-Images.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# Weak Supervision を用いた長期質問応答のための文脈情報検索 Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision ( http://arxiv.org/abs/2410.08623v1 ) ライセンス: Link先を確認	Philipp Christmann, Svitlana Vakulenko, Ionut Teodor Sorodoc, Bill Byrne, Adrià de Gispert,	(参考訳) LFQA(Long-form Question answering)は、エンドユーザの質問に対する詳細な回答を生成し、直接的な回答以上の関連情報を提供することを目的としている。しかし、既存のレトリバーは通常、そのようなコンテキスト情報に欠けている問題を直接対象とする情報に最適化されている。さらに、関連するコンテキストに関するトレーニングデータが不足しています。そこで本稿では,コンテキスト情報の検索を最適化する弱い監視手法を提案し,比較する。実験では、長期質問応答のためのデータセットであるASQAのエンドツーエンドQAパフォーマンスが改善されている。重要なことに、より文脈的な情報が検索されるにつれて、LFQAの関連ページリコールを14.7%改善し、生成した長文回答の接点を12.5%改善する。最後に、会話型QAデータセットの実験を通して、長文の回答は、おそらく後続の質問を予想することが多いことを示す。 Long-form question answering (LFQA) aims at generating in-depth answers to end-user questions, providing relevant information beyond the direct answer. However, existing retrievers are typically optimized towards information that directly targets the question, missing out on such contextual information. Furthermore, there is a lack of training data for relevant context. To this end, we propose and compare different weak supervision techniques to optimize retrieval for contextual information. Experiments demonstrate improvements on the end-to-end QA performance on ASQA, a dataset for long-form question answering. Importantly, as more contextual information is retrieved, we improve the relevant page recall for LFQA by 14.7% and the groundedness of generated long-form answers by 12.5%. Finally, we show that long-form answers often anticipate likely follow-up questions, via experiments on a conversational QA dataset.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# クロスドメインFew-shotグラフ異常検出に向けて Towards Cross-domain Few-shot Graph Anomaly Detection ( http://arxiv.org/abs/2410.08629v1 ) ライセンス: Link先を確認	Jiazhen Chen, Sichao Fu, Zhibin Zhang, Zheng Ma, Mingbin Feng, Tony S. Wirjanto, Qinmu Peng,	(参考訳) 少ないショットグラフ異常検出(GAD)は、限られた数のラベル付きトレーニングノードのガイダンスの下で、豊富なラベル付きテストノード間の異常パターンを識別することを目的として、最近注目を集めている。既存の数発のGADアプローチでは、スパースラベルを持つターゲットネットワークへの迅速な適応を容易にするために、リッチなラベル付き補助ネットワークで訓練されたメタトレーニング手法を採用するのが一般的である。しかし、これらの提案手法は、補助的ネットワークと標的ネットワークが同一のデータ分布に存在すると仮定することが多い。本稿では,関連性のある領域からの補助グラフを用いて,スパースラベル付きターゲットグラフ内の異常を識別することを目的とする,クロスドメイン・ショットGADのより一般的で複雑なシナリオについて検討する。ここでの課題は、ソースとターゲットドメイン間のデータ分散の相違により、ターゲットドメインにおけるスパースラベリングの不確実性によって複雑化されているため、簡単ではない。本稿では,上記の課題に対処するために,CDFS-GADと呼ばれるシンプルで効果的なフレームワークを提案する。 CDFS-GADは、クロスドメイン機能アライメントの強化を目的とした、ドメイン適応型グラフコントラスト学習モジュールを最初に導入する。次に、各ドメインに適したドメイン固有の特徴を抽出するプロンプトチューニングモジュールを更に設計する。さらに、ドメイン適応型ハイパースフィア分類損失は、ドメイン依存ノルムを利用して、最小限の監督下での正常インスタンスと異常インスタンスの識別を強化するために提案される。最後に、予測されたスコアをさらに洗練し、数ショット設定で信頼性を高めるための自己学習戦略が導入された。 12個の実世界のクロスドメインデータペアに対する大規模な実験は、既存のGAD法と比較して提案したCDFS-GADフレームワークの有効性を実証している。 Few-shot graph anomaly detection (GAD) has recently garnered increasing attention, which aims to discern anomalous patterns among abundant unlabeled test nodes under the guidance of a limited number of labeled training nodes. Existing few-shot GAD approaches typically adopt meta-training methods trained on richly labeled auxiliary networks to facilitate rapid adaptation to target networks that possess sparse labels. However, these proposed methods often assume that the auxiliary and target networks exist in the same data distributions-an assumption rarely holds in practical settings. This paper explores a more prevalent and complex scenario of cross-domain few-shot GAD, where the goal is to identify anomalies within sparsely labeled target graphs using auxiliary graphs from a related, yet distinct domain. The challenge here is nontrivial owing to inherent data distribution discrepancies between the source and target domains, compounded by the uncertainties of sparse labeling in the target domain. In this paper, we propose a simple and effective framework, termed CDFS-GAD, specifically designed to tackle the aforementioned challenges. CDFS-GAD first introduces a domain-adaptive graph contrastive learning module, which is aimed at enhancing cross-domain feature alignment. Then, a prompt tuning module is further designed to extract domain-specific features tailored to each domain. Moreover, a domain-adaptive hypersphere classification loss is proposed to enhance the discrimination between normal and anomalous instances under minimal supervision, utilizing domain-sensitive norms. Lastly, a self-training strategy is introduced to further refine the predicted scores, enhancing its reliability in few-shot settings. Extensive experiments on twelve real-world cross-domain data pairs demonstrate the effectiveness of the proposed CDFS-GAD framework in comparison to various existing GAD methods.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# CryoFM:Cryo-EM密度のフローベース基礎モデル CryoFM: A Flow-based Foundation Model for Cryo-EM Densities ( http://arxiv.org/abs/2410.08631v1 ) ライセンス: Link先を確認	Yi Zhou, Yilai Li, Jing Yuan, Quanquan Gu,	(参考訳) クリオ電子顕微鏡(cryo-EM)は構造生物学と薬物発見において強力な技術であり、高分解能で生体分子の研究を可能にする。構造生物学者によるCryo-EMを用いた重要な進歩により、様々な解像度で38,626以上のタンパク質密度マップが作成されている。しかし、Cryo-EMデータ処理アルゴリズムは、生体分子密度マップの知識の恩恵を受けていない。本研究では, 生成モデルとして設計された基礎モデルであるCryoFMについて, 高品質密度マップの分布を学習し, 下流タスクに効果的に一般化する。フローマッチングに基づいて構築されたCryoFMは、生物分子密度マップの以前の分布を正確に捉えるために訓練されている。さらに,Cryo-EMおよびCleo-Electronトモグラフィー(cryo-ET)において,Cryo-EMおよびCleo-Electronトモグラフィー(cryo-ET)における複数の下流タスクに対するフレキシブルな先行処理としてCRYOFMを活用するフロー後サンプリング手法を提案する。 Cryo-electron microscopy (cryo-EM) is a powerful technique in structural biology and drug discovery, enabling the study of biomolecules at high resolution. Significant advancements by structural biologists using cryo-EM have led to the production of over 38,626 protein density maps at various resolutions1. However, cryo-EM data processing algorithms have yet to fully benefit from our knowledge of biomolecular density maps, with only a few recent models being data-driven but limited to specific tasks. In this study, we present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps and generalizing effectively to downstream tasks. Built on flow matching, CryoFM is trained to accurately capture the prior distribution of biomolecular density maps. Furthermore, we introduce a flow posterior sampling method that leverages CRYOFM as a flexible prior for several downstream tasks in cryo-EM and cryo-electron tomography (cryo-ET) without the need for fine-tuning, achieving state-of-the-art performance on most tasks and demonstrating its potential as a foundational model for broader applications in these fields.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# ビーコンとしての言葉:高レベル言語プロンプトを用いたRLエージェントの誘導 Words as Beacons: Guiding RL Agents with High-Level Language Prompts ( http://arxiv.org/abs/2410.08632v1 ) ライセンス: Link先を確認	Unai Ruiz-Gonzalez, Alain Andres, Pedro G. Bascoy, Javier Del Ser,	(参考訳) 強化学習(RL)におけるスパース報酬環境は、探索に重大な課題をもたらし、しばしば非効率または不完全学習プロセスに繋がる。この問題に対処するために,複雑なタスクをサブゴールに分解することでエージェントの学習プロセスを導くために,LLM(Large Language Models)を「教師」として活用する教師学習型RLフレームワークを提案する。構造と目的のテキスト記述に基づいてRL環境を理解する能力が本質的に備わっているため、LLMは人間と同じような方法で環境のために定義されたタスクを達成するためのサブゴールを提供することができる。エージェントに対する位置的ターゲット,オブジェクト表現,LLMが直接生成する言語に基づく命令の3種類のサブゴールが提案されている。さらに,学習期間中にのみ LLM を問い合わせることができ,エージェントが LLM を介さずに環境内で操作できることを示す。提案手法の性能評価には,MiniGrid ベンチマークの様々な手続き的に生成された環境にサブゴールを付与する3つのオープンソース LLM (Llama, DeepSeek, Qwen) の評価を行った。実験結果から,このカリキュラムベースのアプローチは複雑なタスクの学習を加速し,学習過程における最大30～200倍の収束を達成し,複雑なタスクの探索を促進することが示されている。 Sparse reward environments in reinforcement learning (RL) pose significant challenges for exploration, often leading to inefficient or incomplete learning processes. To tackle this issue, this work proposes a teacher-student RL framework that leverages Large Language Models (LLMs) as "teachers" to guide the agent's learning process by decomposing complex tasks into subgoals. Due to their inherent capability to understand RL environments based on a textual description of structure and purpose, LLMs can provide subgoals to accomplish the task defined for the environment in a similar fashion to how a human would do. In doing so, three types of subgoals are proposed: positional targets relative to the agent, object representations, and language-based instructions generated directly by the LLM. More importantly, we show that it is possible to query the LLM only during the training phase, enabling agents to operate within the environment without any LLM intervention. We assess the performance of this proposed framework by evaluating three state-of-the-art open-source LLMs (Llama, DeepSeek, Qwen) eliciting subgoals across various procedurally generated environment of the MiniGrid benchmark. Experimental results demonstrate that this curriculum-based approach accelerates learning and enhances exploration in complex tasks, achieving up to 30 to 200 times faster convergence in training steps compared to recent baselines designed for sparse reward environments.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# 変圧器は思考の連鎖と効率よくパリティを解ける Transformers Provably Solve Parity Efficiently with Chain of Thought ( http://arxiv.org/abs/2410.08633v1 ) ライセンス: Link先を確認	Juno Kim, Taiji Suzuki,	(参考訳) この研究は、チェーン・オブ・シークレット(CoT)推論のための微調整に類似した中間状態を再帰的に生成することで、複雑な問題を解決するためのトレーニングトランスフォーマーの最初の理論的解析を提供する。 Wees et al (2023) による RNN の作業を拡張し, 基本的な$k$-parity 問題を解くために, 1層トランスフォーマーを訓練することを検討する。 1)任意の有限精度勾配に基づくアルゴリズムは、中間的な監督なしに、有限サンプルでパリティを解くためにかなりの反復を必要とする。 2) 中間パリティを損失関数に組み込んだ場合, モデルでは, それぞれの生成ステップにおいて, 推論チェーンの基底構造ラベルを付与した \emph{teacher forcing} によって支援された場合, 一つの勾配更新でパリティを学習することができる。 (3)教師の強制がなければ,モデルがCoT鎖をエンドツーエンドに生成しなければならない場合においても,中間ステップの音質を内部的に検証するために拡張データを用いる場合,パリティを効率的に学習することができる。これらの結果から,CoTを用いた変圧器の最適化から,タスクの分解や段階的推論が自然に発生すること,さらに,自己整合性検査がCoTの実証研究と整合して推論能力を向上させることが示唆された。 This work provides the first theoretical analysis of training transformers to solve complex problems by recursively generating intermediate states, analogous to fine-tuning for chain-of-thought (CoT) reasoning. We consider training a one-layer transformer to solve the fundamental $k$-parity problem, extending the work on RNNs by Wies et al. (2023). We establish three key results: (1) any finite-precision gradient-based algorithm, without intermediate supervision, requires substantial iterations to solve parity with finite samples. (2) In contrast, when intermediate parities are incorporated into the loss function, our model can learn parity in one gradient update when aided by \emph{teacher forcing}, where ground-truth labels of the reasoning chain are provided at each generation step. (3) Even without teacher forcing, where the model must generate CoT chains end-to-end, parity can be learned efficiently if augmented data is employed to internally verify the soundness of intermediate steps. These results rigorously show that task decomposition and stepwise reasoning naturally arise from optimizing transformers with CoT; moreover, self-consistency checking can improve reasoning ability, aligning with empirical studies of CoT.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# GAIで説明可能な個人化フェデレーション半教師付き学習 GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning ( http://arxiv.org/abs/2410.08634v1 ) ライセンス: Link先を確認	Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang,	(参考訳) Federated Learning (FL) は、モバイルユーザー(MU)がAIモデルを訓練するための一般的な分散アルゴリズムであるが、ラベル不足、非IIDデータ、説明不能など、現実のシナリオにFLを適用する際にいくつかの問題が発生する。その結果、XPFLと呼ばれるパーソナライズ可能なFLフレームワークを提案する。まず, GFed と呼ばれる, パーソナライズされた半教師付き学習を支援する生成AI(GAI)を導入する。特に地域訓練では, GAIモデルを用いて, 大規模未ラベルデータから学習し, 知識蒸留に基づく半教師あり学習を適用し, GAIモデルから得られた知識を用いて, ローカルFLモデルを訓練する。グローバルアグリゲーションにおいて、局所的およびグローバル的FLモデルを特定の割合で融合させることにより、各局所的モデルは、そのパーソナライズされた特性を維持しながら、他からの知識を取り入れられるようにすることで、新しい局所的FLモデルを得る。第2に、FLのための説明可能なAI機構、XFedを提案する。具体的には、局所学習において、局所FLモデルの入力と出力に一致する決定木を適用する。グローバルアグリゲーションでは、t分散確率的隣接埋め込み(t-SNE)を用いて、アグリゲーション前後の局所モデルを視覚化する。最後に,提案したXPFLフレームワークの有効性をシミュレーションにより検証した。 Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted personalized federated semi-supervised learning, called GFed. Particularly, in local training, we utilize a GAI model to learn from large unlabeled data and apply knowledge distillation-based semi-supervised learning to train the local FL model using the knowledge acquired from the GAI model. In global aggregation, we obtain the new local FL model by fusing the local and global FL models in specific proportions, allowing each local model to incorporate knowledge from others while preserving its personalized characteristics. Second, we propose an explainable AI mechanism for FL, named XFed. Specifically, in local training, we apply a decision tree to match the input and output of the local FL model. In global aggregation, we utilize t-distributed stochastic neighbor embedding (t-SNE) to visualize the local models before and after aggregation. Finally, simulation results validate the effectiveness of the proposed XPFL framework.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# ROC曲線下での勾配勾配勾配下での最適領域の効率的な線探索 Efficient line search for optimizing Area Under the ROC Curve in gradient descent ( http://arxiv.org/abs/2410.08635v1 ) ライセンス: Link先を確認	Jadon Fowler, Toby Dylan Hocking,	(参考訳) 受信器動作特性(ROC)曲線は二分法分類や変更点検出において有用であるが,AUC(Area Under the Curve)が一括的に一定であるため,学習に使用するのが困難である。近年、偽陰性率と偽陰性率のAUM(Area Under Min)が、AUCの差別化可能なサロゲートとして提案されている。本稿では,AUM/AUCの断片的線形/定数性について検討し,線形モデルを最適化する場合に,勾配降下(直線探索)の各ステップに最適な学習率を選択するための,新しい効率的な経路追従アルゴリズムを提案する。提案アルゴリズムは,ステップサイズが一定である勾配降下と同じ対数線形漸近時間複雑性を持つが,ステップサイズの関数としてAUM/AUCの完全な表現を演算する。本稿では,二項分類問題の実証研究において,提案アルゴリズムが高速かつ正確であることを検証し,変更点検出問題では,提案アルゴリズムはグリッド探索と同等に精度が高いが,高速であることを示す。 Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult to use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper we study the piecewise linear/constant nature of the AUM/AUC, and propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of gradient descent (line search), when optimizing a linear model. Remarkably, our proposed line search algorithm has the same log-linear asymptotic time complexity as gradient descent with constant step size, but it computes a complete representation of the AUM/AUC as a function of step size. In our empirical study of binary classification problems, we verify that our proposed algorithm is fast and exact; in changepoint detection problems we show that the proposed algorithm is just as accurate as grid search, but faster.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# ノイズ量子回路のアナログシミュレーション Analog simulation of noisy quantum circuits ( http://arxiv.org/abs/2410.08639v1 ) ライセンス: Link先を確認	Etienne Granet, Kévin Hémery, Henrik Dreyer,	(参考訳) 低いがゼロでないハードウェアノイズで量子回路をシミュレートすることは、ノイズなしではより困難であることが知られている。密度行列シミュレーション(空間オーバーヘッドが生じる)を実行するか、クラウス作用素をランダムに挿入する(ランタイムオーバーヘッドが生じる)「量子軌道」をサンプリングする必要がある。本稿では,低雑音下での同一性に近づいた演算子によって発生するトラジェクトリを用いて,ハードウェアノイズの表現に基づくシミュレーション手法を提案する。この表現は量子軌道上の分散を著しく減少させ、ノイズのシミュレーションを10ドルから100ドル程度で高速化する。副産物として、多重パウリチャネルを単一パウリチャネルの連結に分解する公式を提供する。 It is well-known that simulating quantum circuits with low but non-zero hardware noise is more difficult than without noise. It requires either to perform density matrix simulations (coming with a space overhead) or to sample over "quantum trajectories" where Kraus operators are inserted randomly (coming with a runtime overhead). We propose a simulation technique based on a representation of hardware noise in terms of trajectories generated by operators that remain close to identity at low noise. This representation significantly reduces the variance over the quantum trajectories, speeding up noisy simulations by factors around $10$ to $100$. As a by-product, we provide a formula to factorize multiple-Pauli channels into a concatenation of single Pauli channels.	翻訳日:2024-10-30 22:54:46 公開日:2024-10-11
# 降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水・降水 Multi-Source Temporal Attention Network for Precipitation Nowcasting ( http://arxiv.org/abs/2410.08641v1 ) ライセンス: Link先を確認	Rafael Pablos Sarabia, Joachim Nyborg, Morten Birk, Jeppe Liborius Sjørup, Anders Lillevang Vesterholt, Ira Assent,	(参考訳) 降水量は様々な産業で重要であり、気候変動の緩和と適応に重要な役割を果たしている。本研究では,降水量予測のための効率的な深層学習モデルを導入し,既存の物理モデルや外挿モデルよりも高い精度で降雨を最大8時間予測する。本モデルでは,マルチソース気象データと物理に基づく予測を利用して,時間と空間の両方で高精度な予測を行う。時間的注意ネットワークを通じて複雑な時空間ダイナミクスをキャプチャし、データ品質マップと動的しきい値を用いて最適化される。実験により、我々のモデルは最先端よりも優れており、進化する気象条件に対する高速な応答の可能性を強調している。 Precipitation nowcasting is crucial across various industries and plays a significant role in mitigating and adapting to climate change. We introduce an efficient deep learning model for precipitation nowcasting, capable of predicting rainfall up to 8 hours in advance with greater accuracy than existing operational physics-based and extrapolation-based models. Our model leverages multi-source meteorological data and physics-based forecasts to deliver high-resolution predictions in both time and space. It captures complex spatio-temporal dynamics through temporal attention networks and is optimized using data quality maps and dynamic thresholds. Experiments demonstrate that our model outperforms state-of-the-art, and highlight its potential for fast reliable responses to evolving weather conditions.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# ミームを超えて:テレグラム上の陰謀理論に対するマルチモーダルトピックモデリングアプローチ More than Memes: A Multimodal Topic Modeling Approach to Conspiracy Theories on Telegram ( http://arxiv.org/abs/2410.08642v1 ) ライセンス: Link先を確認	Elisabeth Steffen,	(参考訳) ネット上の陰謀論と関連コンテンツに関する研究は、伝統的にテキストデータに焦点を当ててきた。ソーシャルメディア上での(音声による)視覚データの普及と、このコミュニケーションの進化と動的な性質を捉えるために、研究者らはマルチモーダルオンラインコンテンツを分析するための教師なしアプローチの可能性を探り始めた。本研究は,ドイツ語のテレグラムチャンネルにおける陰謀論を解析するためのマルチモーダル・トピック・モデリングの可能性を探ることで,この分野に寄与する。本研究は,CLIPと組み合わせたBERTopicトピックモデリング手法を用いてテキストおよび視覚データの解析を行う。我々は2023年10月にドイツ語のTelegramチャンネルで、約40万件のTelegramメッセージのコーパスを分析した。ユーザ生成・テキスト画像オンラインコンテンツの中規模コーパスの研究におけるこのアプローチの可能性と課題について検討する。我々は、モーダル性、分析中に発見された異なるテキスト、画像ジャンル、定量的なモーダル間トピック分析、および陰謀論のコミュニケーションにおけるテキスト、ビジュアル、マルチモーダルの物語戦略の定性的なケーススタディについて考察する。 Research on conspiracy theories and related content online has traditionally focused on textual data. To address the increasing prevalence of (audio-)visual data on social media, and to capture the evolving and dynamic nature of this communication, researchers have begun to explore the potential of unsupervised approaches for analyzing multimodal online content. Our research contributes to this field by exploring the potential of multimodal topic modeling for analyzing conspiracy theories in German-language Telegram channels. Our work uses the BERTopic topic modeling approach in combination with CLIP for the analysis of textual and visual data. We analyze a corpus of ~40, 000 Telegram messages posted in October 2023 in 571 German-language Telegram channels known for disseminating conspiracy theories and other deceptive content. We explore the potentials and challenges of this approach for studying a medium-sized corpus of user-generated, text-image online content. We offer insights into the dominant topics across modalities, different text and image genres discovered during the analysis, quantitative inter-modal topic analyses, and a qualitative case study of textual, visual, and multimodal narrative strategies in the communication of conspiracy theories.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# SOAK: データサブセットにおけるパターンの類似性を推定するための、同じ/他の/すべてのK-foldクロスバリデーション SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets ( http://arxiv.org/abs/2410.08643v1 ) ライセンス: Link先を確認	Toby Dylan Hocking, Gabrielle Thibault, Cameron Scott Bodine, Paul Nelson Arellano, Alexander F Shenkin, Olivia Jasmine Lindly,	(参考訳) 機械学習の多くの実世界の応用において、これまでに収集したデータに基づいてトレーニングできるかどうかを知り、ある点で質的に異なる新しいテストデータサブセット(時間、地理的領域など)の正確な予測を得ることに興味があります。別の疑問は、データサブセットがモデルトレーニング中にサブセットを組み合わせるのに十分なほどよく似ているかどうかである。私たちは、両方の質問に答えるために使える新しい方法であるSOAK、同じ/他の/すべてのK-foldクロスバリデーションを提案します。 SOAKは、データの異なるサブセットに基づいてトレーニングされたモデルを体系的に比較し、固定されたテストサブセットの予測に使用し、データサブセットにおける学習可能/予測可能なパターンの類似性を推定します。 6つの新しい実データセット(地理的・時間的サブセットで予測が正確かどうかを確認するために、地理的・時間的サブセット)、3つのイメージペアデータセット(サブセットは異なるイメージタイプであり、類似したイメージで予測エラーが小さいことを確認するために、サブセットは異なるイメージタイプである)、および、事前に定義されたトレイン/テストスプリットを持つ11のベンチマークデータセット(事前定義されたスプリットの類似性をチェックするために、)でSOAKを使用した結果を示す。 In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# 背景サンプル処理による開語彙オブジェクト検出の高速化 Boosting Open-Vocabulary Object Detection by Handling Background Samples ( http://arxiv.org/abs/2410.08645v1 ) ライセンス: Link先を確認	Ruizhe Zeng, Lu Zhang, Xu Yang, Zhiyong Liu,	(参考訳) オープン語彙オブジェクト検出は、ベースと新規の両方のカテゴリを含む候補語彙リストからオブジェクトを正確に検出するタスクである。現在、多くのオープンボキャブラリ検出器がCLIPの印象的なゼロショット機能を活用して成功している。しかし,言語画像学習手法により,CLIPモデルは背景画像(ラベルのない画像)を効果的に扱うのに苦労している。この制限により、バックグラウンドサンプルを処理する際にCLIPに依存するオープン語彙検出器のサブ最適性能が得られる。本稿では,背景サンプル処理におけるCLIPの限界に対処する新しい手法であるオープン語彙検出器(BIRDet)の背景情報表現を提案する。具体的には、背景情報モデリング(BIM)を設計し、メインストリームのオープン語彙検出器に埋め込んだ固定背景を動的シーン情報に置き換え、画像関連背景表現に誘導する。この方法は、大小の領域を背景として分類する能力を効果的に向上させる。さらに、重なり合う領域の比率を利用して、部分領域を前景として誤分類する問題に対処するアルゴリズムであるpartial Object Suppression (POS)を導入する。 OV-COCO と OV-LVIS のベンチマーク実験により,提案手法は様々な開語彙検出器の性能向上を実現することができることを示した。 Open-vocabulary object detection is the task of accurately detecting objects from a candidate vocabulary list that includes both base and novel categories. Currently, numerous open-vocabulary detectors have achieved success by leveraging the impressive zero-shot capabilities of CLIP. However, we observe that CLIP models struggle to effectively handle background images (i.e. images without corresponding labels) due to their language-image learning methodology. This limitation results in suboptimal performance for open-vocabulary detectors that rely on CLIP when processing background samples. In this paper, we propose Background Information Representation for open-vocabulary Detector (BIRDet), a novel approach to address the limitations of CLIP in handling background samples. Specifically, we design Background Information Modeling (BIM) to replace the single, fixed background embedding in mainstream open-vocabulary detectors with dynamic scene information, and prompt it into image-related background representations. This method effectively enhances the ability to classify oversized regions as background. Besides, we introduce Partial Object Suppression (POS), an algorithm that utilizes the ratio of overlap area to address the issue of misclassifying partial regions as foreground. Experiments on OV-COCO and OV-LVIS benchmarks demonstrate that our proposed model is capable of achieving performance enhancements across various open-vocabulary detectors.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# Diffeo-Temporal Equivarianceによるフル教師なし動的MRI再構成 Fully Unsupervised Dynamic MRI Reconstruction via Diffeo-Temporal Equivariance ( http://arxiv.org/abs/2410.08646v1 ) ライセンス: Link先を確認	Andrew Wang, Mike Davies,	(参考訳) アンダーサンプド加速測定による動的MRI画像の再構成は、心臓運動、自由呼吸運動などの時空間分解能リアルタイムイメージングの迅速かつ高次化に不可欠である。ゲート型シネMRIのような古典的パラダイムは周期性を仮定し、真の動きのイメージングを禁止している。改良された深層学習法は、ダイナミックイメージングでは、地上の真理をフルサンプリングしたビデオは、真に入手できないため、根本的な欠陥がある。本研究では,MRIの自然な時空間分布を利用して,アンサンプ測定のみから動的MRIシーケンスを再構築するための教師なしフレームワークを提案する。 DDEI(Dynamic Diffomorphic Equivariant Imaging)は、SSDUのような最先端の非教師的手法よりも高速なダイナミック心臓イメージングにおいて優れる。我々の手法は基盤となるニューラルネットワークアーキテクチャに非依存であり、最新のモデルや後処理アプローチに適応するために使用することができる。コードとビデオのデモはhttps://github.com/Andrewwango/ddei.comにある。 Reconstructing dynamic MRI image sequences from undersampled accelerated measurements is crucial for faster and higher spatiotemporal resolution real-time imaging of cardiac motion, free breathing motion and many other applications. Classical paradigms, such as gated cine MRI, assume periodicity, disallowing imaging of true motion. Supervised deep learning methods are fundamentally flawed as, in dynamic imaging, ground truth fully-sampled videos are impossible to truly obtain. We propose an unsupervised framework to learn to reconstruct dynamic MRI sequences from undersampled measurements alone by leveraging natural geometric spatiotemporal equivariances of MRI. Dynamic Diffeomorphic Equivariant Imaging (DDEI) significantly outperforms state-of-the-art unsupervised methods such as SSDU on highly accelerated dynamic cardiac imaging. Our method is agnostic to the underlying neural network architecture and can be used to adapt the latest models and post-processing approaches. Our code and video demos are at https://github.com/Andrewwango/ddei.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# E-Motion:イベント系列拡散による将来の運動シミュレーション E-Motion: Future Motion Simulation via Event Sequence Diffusion ( http://arxiv.org/abs/2410.08649v1 ) ライセンス: Link先を確認	Song Wu, Zhiyu Zhu, Junhui Hou, Guangming Shi, Jinjian Wu,	(参考訳) 典型的な物体の将来の動きを予測することは、コンピュータビジョンにおける動的環境を解釈し、相互作用するための重要なタスクである。イベントベースのセンサーは、異常な時間的粒度でシーンの変化を捉え、これまで達成できなかった詳細と精度で将来の動きを予測するユニークな機会を提供する可能性がある。そこで本研究では,映像拡散モデルの強力な学習能力とイベントカメラのリッチな動作情報とを,モーションシミュレーションフレームワークとして統合することを提案する。具体的には、当初、イベントシーケンスデータセットに適応するために、トレーニング済みの安定したビデオ拡散モデルを使用します。このプロセスは、RGBビデオからイベント中心のドメインへの広範な知識の転送を容易にする。さらに、強化学習技術を利用して拡散モデルの逆生成軌道を強化するアライメント機構を導入し、性能と精度の向上を図る。本研究では, 自律走行車案内, ロボットナビゲーション, インタラクティブメディアなどのコンピュータビジョンアプリケーションにおいて, 動作フロー予測に革命をもたらす可能性を示す。本研究は,コンピュータビジョンシステムの解釈能力と予測精度の向上に向けた今後の研究の方向性を示唆するものである。 Forecasting a typical object's future motion is a critical task for interpreting and interacting with dynamic environments in computer vision. Event-based sensors, which could capture changes in the scene with exceptional temporal granularity, may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable. Inspired by that, we propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework. Specifically, we initially employ pre-trained stable video diffusion models to adapt the event sequence dataset. This process facilitates the transfer of extensive knowledge from RGB videos to an event-centric domain. Moreover, we introduce an alignment mechanism that utilizes reinforcement learning techniques to enhance the reverse generation trajectory of the diffusion model, ensuring improved performance and accuracy. Through extensive testing and validation, we demonstrate the effectiveness of our method in various complex scenarios, showcasing its potential to revolutionize motion flow prediction in computer vision applications such as autonomous vehicle guidance, robotic navigation, and interactive media. Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# エッジAIコラボレーション学習 - 不確実性推定に対するベイジアンアプローチ Edge AI Collaborative Learning: Bayesian Approaches to Uncertainty Estimation ( http://arxiv.org/abs/2410.08651v1 ) ライセンス: Link先を確認	Gleb Radchenko, Victoria Andrea Fill,	(参考訳) エッジコンピューティングの最近の進歩は、IoT(Internet of Things)デバイスのAI機能を大幅に強化した。しかし、これらの進歩は知識交換と資源管理における新たな課題をもたらし、特にエッジコンピューティング環境における時空間データの局所性に対処する。本研究では、自律的、ネットワーク対応、AI対応エッジデバイスに分散機械学習をデプロイするためのアルゴリズムと方法を検討する。独立エージェントが遭遇するデータの空間的変動を考慮した学習結果における信頼度の決定に焦点をあてる。本研究では,コラボレーティブマッピングを事例として,ベイズニューラルネットワーク(BNN)で拡張された分散ニューラルネットワーク最適化(DiNNO)アルゴリズムの不確実性推定への応用について検討する。我々はWebotsプラットフォームを用いた3次元環境シミュレーションを実装し、協調マッピングタスクをシミュレートし、分散学習における非同期ネットワーク通信のための独立プロセスにDiNNOアルゴリズムを分離し、BNNを用いた分散不確実性推定を統合する。本研究では,BNNが分散学習文脈における不確実性評価を効果的に支援できることを示す。特に、パラメータ正規化にKullback-Leibler分散を適用すると、他の正規化戦略と比較して分散BNNトレーニングにおける検証損失が12-30%減少した。 Recent advancements in edge computing have significantly enhanced the AI capabilities of Internet of Things (IoT) devices. However, these advancements introduce new challenges in knowledge exchange and resource management, particularly addressing the spatiotemporal data locality in edge computing environments. This study examines algorithms and methods for deploying distributed machine learning within autonomous, network-capable, AI-enabled edge devices. We focus on determining confidence levels in learning outcomes considering the spatial variability of data encountered by independent agents. Using collaborative mapping as a case study, we explore the application of the Distributed Neural Network Optimization (DiNNO) algorithm extended with Bayesian neural networks (BNNs) for uncertainty estimation. We implement a 3D environment simulation using the Webots platform to simulate collaborative mapping tasks, decouple the DiNNO algorithm into independent processes for asynchronous network communication in distributed learning, and integrate distributed uncertainty estimation using BNNs. Our experiments demonstrate that BNNs can effectively support uncertainty estimation in a distributed learning context, with precise tuning of learning hyperparameters crucial for effective uncertainty assessment. Notably, applying Kullback-Leibler divergence for parameter regularization resulted in a 12-30% reduction in validation loss during distributed BNN training compared to other regularization strategies.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# 自由空間原子アンサンブルからの超放射能放出における二階コヒーレンスの発生 Emergence of second-order coherence in the superradiant emission from a free-space atomic ensemble ( http://arxiv.org/abs/2410.08652v1 ) ライセンス: Link先を確認	Giovanni Ferioli, Igor Ferrier-Barbut, Antoine Browaeys,	(参考訳) 自由空間における寒冷Rb原子の長い雲による超放射能バーストの放出における2次時間コヒーレンスの進化について検討した。そのため、雲のパルス励起に続いて、2時間強度相関関数 $g_N^{(2)}(t_1, t_2)$ を測定する。バースト中に$g_N^{(2)}(t, t)$を監視して2次コヒーレンスの確立を観察し、最初に雲が安定した状態に準備されている状況と対比する。実験結果とDickeモデルの予測を比較し,有効原子数を用いて有限サイズ効果を推定し,そのモデルが早期に観測された傾向を再現することを示した。長い間、我々はディックのモデルを超えた特徴である亜ラジカル崩壊を観察した。最後に、$g_N^{(2)}(t_1, t_2)$を異なる時間で測定し、安定状態から始めると存在しないバースト中の反相関の出現を観察する。 We investigate the evolution of the second-order temporal coherence during the emission of a superradiant burst by an elongated cloud of cold Rb atoms in free space. To do so, we measure the two-times intensity correlation function $g_N^{(2)}(t_1, t_2)$ following the pulsed excitation of the cloud. By monitoring $g_N^{(2)}(t, t)$ during the burst, we observe the establishment of second-order coherence, and contrast it with the situation where the cloud is initially prepared in a steady state. We compare our findings to the predictions of the Dicke model, using an effective atom number to account for finite size effects, finding that the model reproduces the observed trend at early time. For longer times, we observe a subradiant decay, a feature that goes beyond Dicke's model. Finally, we measure the $g_N^{(2)}(t_1, t_2)$ at different times and observe the appearance of anti-correlations during the burst, that are not present when starting from a steady state.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# バイナリセグメンテーションの有限サンプル複雑度解析 Finite Sample Complexity Analysis of Binary Segmentation ( http://arxiv.org/abs/2410.08654v1 ) ライセンス: Link先を確認	Toby Dylan Hocking,	(参考訳) バイナリセグメンテーション(英: Binary segmentation)は、損失や可能性関数を最適化することで、逐次データセットを再帰的に分割する古典的なグリードアルゴリズムである。バイナリセグメンテーションは、空間や時間で測定されたデータセットの変化点検出や、決定木学習のサブルーチンとして広く使用されている。理論的には、$N$データと$K$分割、最悪の場合は$O(N K)$、最良の場合は$O(N \log K)$に対して非常に高速であるべきである。本稿では,与えられた有限$N$,$K$,最小セグメント長パラメータに対する二分分割の時間と空間の複雑さを解析するための新しい手法について述べる。まず,アルゴリズムが考慮すべき分割数の最大かつ最悪のケース数を計算できるアルゴリズムについて述べる。第2に,最善かつ最悪のケースを達成し,アルゴリズムの正しい実装をテストするために使用できる合成データについて述べる。最後に、実データについて実証的な分析を行い、二分分割が実際に最適な速度に近づいたことを示唆する。 Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood function. Binary segmentation is widely used for changepoint detection in data sets measured over space or time, and as a sub-routine for decision tree learning. In theory it should be extremely fast for $N$ data and $K$ splits, $O(N K)$ in the worst case, and $O(N \log K)$ in the best case. In this paper we describe new methods for analyzing the time and space complexity of binary segmentation for a given finite $N$, $K$, and minimum segment length parameter. First, we describe algorithms that can be used to compute the best and worst case number of splits the algorithm must consider. Second, we describe synthetic data that achieve the best and worst case and which can be used to test for correct implementation of the algorithm. Finally, we provide an empirical analysis of real data which suggests that binary segmentation is often close to optimal speed in practice.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# radarODE-MTL:ロバストレーダを用いたECG再構成のための偏心勾配アライメントを用いたマルチタスク学習フレームワーク radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG Reconstruction ( http://arxiv.org/abs/2410.08656v1 ) ライセンス: Link先を確認	Yuanyuan Zhang, Rui Yang, Yutao Yue, Eng Gee Lim,	(参考訳) ミリ波レーダーは、頑丈で正確なバイタルサイン監視を邪魔にならない方法で提供することを約束している。しかし、レーダー信号は周囲のノイズやランダムな体の動きによって伝搬が歪められ、微妙な心臓活動を台無しにし、バイタルサインの回復を損なう可能性がある。特に心電図(ECG)信号の回復はディープラーニングモデルに大きく依存しており、ノイズに敏感である。そこで本研究は,レーダベースのECG回復を3つの個別のタスクに創造的に分解し,一貫した雑音に対するロバスト性を高めるためにマルチタスク学習(MTL)フレームワークであるRadarODE-MTLを提案する。さらに,タスクの最適化における潜在的な対立を軽減するため,直交空間におけるタスク難易度に基づいてタスク固有の勾配を動的にトリミングする,新しいマルチタスク最適化戦略である偏心勾配アライメント(EGA)を提案する。提案するレーダノード-MTLとEGAを併用したレーダノード-MTLは,精度が著しく向上した公開データセット上で評価され,ノイズ下でも性能は一定である。実験結果から,レーダノード-MTLはレーダ信号から精度の高いECG信号を復元し,実環境における適用可能性を示す可能性が示唆された。コードは以下の通り。 http://github.com/ZYY0844/radarODE-MTL。 Millimeter-wave radar is promising to provide robust and accurate vital sign monitoring in an unobtrusive manner. However, the radar signal might be distorted in propagation by ambient noise or random body movement, ruining the subtle cardiac activities and destroying the vital sign recovery. In particular, the recovery of electrocardiogram (ECG) signal heavily relies on the deep-learning model and is sensitive to noise. Therefore, this work creatively deconstructs the radar-based ECG recovery into three individual tasks and proposes a multi-task learning (MTL) framework, radarODE-MTL, to increase the robustness against consistent and abrupt noises. In addition, to alleviate the potential conflicts in optimizing individual tasks, a novel multi-task optimization strategy, eccentric gradient alignment (EGA), is proposed to dynamically trim the task-specific gradients based on task difficulties in orthogonal space. The proposed radarODE-MTL with EGA is evaluated on the public dataset with prominent improvements in accuracy, and the performance remains consistent under noises. The experimental results indicate that radarODE-MTL could reconstruct accurate ECG signals robustly from radar signals and imply the application prospect in real-life situations. The code is available at: http://github.com/ZYY0844/radarODE-MTL.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# カントール構造ダイラックコム電位による透過 Transmission through Cantor structured Dirac comb potential ( http://arxiv.org/abs/2410.08658v1 ) ライセンス: Link先を確認	Mohammad Umar,	(参考訳) 本研究では,Cantor-structured Dirac comb potential (CDC-$\rho_{N}$) potential system とよばれるCantor-structured Dirac comb potentialを導入し,この新たなポテンシャル構成による非相対論的量子トンネルについて検討する。このシステムは、カントールポテンシャルの各矩形ポテンシャルセグメントの境界にデルタポテンシャルを配置することによって構成される。この研究は、フラクタル幾何学的ディラックコムポテンシャルによる量子トンネルの研究で初めて行われた。このポテンシャル系は、局所的な周期ポテンシャルを一般化するより広範なポテンシャルのクラスである超周期ポテンシャル(SPP)の特定の例を例示する。 SPPの理論的枠組みを用いて,本アーキテクチャの伝達確率に関する閉形式式を導出した。帯域状特徴の出現や波動ベクトル$k$の反射係数のスケーリング挙動など,ロー関数の有限積として表されるスケーリング関数によって制御される様々な伝送特性を報告する。このシステムの特に顕著な特徴は、鋭い透過共鳴の発生であり、非常に鋭い透過フィルタのような用途で有用であることが証明される。 In this study, we introduce the Cantor-structured Dirac comb potential, referred to as the Cantor Dirac comb (CDC-$\rho_{N}$) potential system, and investigate non-relativistic quantum tunneling through this novel potential configuration. This system is engineered by positioning delta potentials at the boundaries of each rectangular potential segment of Cantor potential. This study is the first to investigate quantum tunneling through a fractal geometric Dirac comb potential. This potential system exemplifies a particular instance of the super periodic potential (SPP), a broader class of potentials that generalize locally periodic potentials. Utilizing the theoretical framework of SPP, we derived a closed-form expression for the transmission probability for this potential architecture. We report various transmission characteristics, including the appearance of band-like features and the scaling behavior of the reflection coefficient with wave vector $k$, which is governed by a scaling function expressed as a finite product of the Laue function. A particularly striking feature of the system is the occurrence of sharp transmission resonances, which may prove useful in applications such as highly sharp transmission filters.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# 慎重に構造化された圧縮 - StarCraft IIデータの効率的な管理 Carefully Structured Compression: Efficiently Managing StarCraft II Data ( http://arxiv.org/abs/2410.08659v1 ) ライセンス: Link先を確認	Bryce Ferenczi, Rhys Newbury, Michael Burke, Tom Drummond,	(参考訳) データセットの作成と保存は、多くのデータセットが単純なイメージラベルペアまたはプレーンテキストであるため、機械学習における入力コストの見過ごされがちである。しかし、リアルタイム戦略ゲームStarCraft IIのような、より複雑な構造を持つデータセットは、所有権のコストを削減するために、より慎重な思考と戦略を必要とする。 StarCraft IIのシリアライズフレームワークを導入し、データセットの生成とストレージのコストを削減し、使用状況のエルゴノミクスを改善した。私たちは、textit{AlphaStar-Unplugged}の最も類似したデータセットに対してベンチマークを行い、作成コストとストレージの両方の観点から、私たちのフレームワークの利点を強調します。データセットを使用して、他のデータセットでトレーニングされた同等のモデルのパフォーマンスを超えるディープラーニングモデルをトレーニングします。導入されたデータセット変換と利用のフレームワークはオープンソースであり、デジタルツインシミュレーションのような類似した特徴を持つデータセットのフレームワークとして使用できる。事前変換されたStarCraft IIトーナメントのデータもオンラインで公開されている。 Creation and storage of datasets are often overlooked input costs in machine learning, as many datasets are simple image label pairs or plain text. However, datasets with more complex structures, such as those from the real time strategy game StarCraft II, require more deliberate thought and strategy to reduce cost of ownership. We introduce a serialization framework for StarCraft II that reduces the cost of dataset creation and storage, as well as improving usage ergonomics. We benchmark against the most comparable existing dataset from \textit{AlphaStar-Unplugged} and highlight the benefit of our framework in terms of both the cost of creation and storage. We use our dataset to train deep learning models that exceed the performance of comparable models trained on other datasets. The dataset conversion and usage framework introduced is open source and can be used as a framework for datasets with similar characteristics such as digital twin simulations. Pre-converted StarCraft II tournament data is also available online.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# QEFT: LLMの高効率微調整のための量子化 QEFT: Quantization for Efficient Fine-Tuning of LLMs ( http://arxiv.org/abs/2410.08661v1 ) ライセンス: Link先を確認	Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park,	(参考訳) 大規模言語モデル(LLM)のファインチューニングの利用が急速に増加し,推論効率を保ちながらファインチューニングを最適化することが重要である。しかし、推論速度、微調整速度、メモリ消費、そして最も重要なのはモデル品質など、あらゆる面で改善を必要とするため、これは難しいタスクである。従来の研究では、量子化と微調整を組み合わせることでこれを達成しようとしたが、4つの側面を同時に拡張することはできなかった。本研究では,QEFT(Quantization for Efficient Fine-Tuning)と呼ばれる軽量な手法を提案する。 QEFTは推論と微調整の両方を加速し、堅牢な理論基盤によってサポートされ、高い柔軟性を提供し、優れたハードウェア互換性を維持している。本実験により,QEFTは,少ない資源を使用しながら,完全精度パラメータ効率の高い微調整の品質と汎用性に適合することを示した。私たちのコードはhttps://github.com/xvyaward/qeft.comから入手可能です。 With the rapid growth in the use of fine-tuning for large language models (LLMs), optimizing fine-tuning while keeping inference efficient has become highly important. However, this is a challenging task as it requires improvements in all aspects, including inference speed, fine-tuning speed, memory consumption, and, most importantly, model quality. Previous studies have attempted to achieve this by combining quantization with fine-tuning, but they have failed to enhance all four aspects simultaneously. In this study, we propose a new lightweight technique called Quantization for Efficient Fine-Tuning (QEFT). QEFT accelerates both inference and fine-tuning, is supported by robust theoretical foundations, offers high flexibility, and maintains good hardware compatibility. Our extensive experiments demonstrate that QEFT matches the quality and versatility of full-precision parameter-efficient fine-tuning, while using fewer resources. Our code is available at https://github.com/xvyaward/qeft.	翻訳日:2024-10-30 22:45:00 公開日:2024-10-11
# DistDD: 勾配マッチングによる分散データ蒸留集約 DistDD: Distributed Data Distillation Aggregation through Gradient Matching ( http://arxiv.org/abs/2410.08665v1 ) ライセンス: Link先を確認	Peiran Wang, Haohan Wang,	(参考訳) 本稿では,クライアントのデバイスに直接データを蒸留することで,反復的なコミュニケーションの必要性を低減できる新しい学習フレームワークであるDistDDを紹介する。ノード間で反復的なモデル更新を必要とする従来のフェデレーション学習とは異なり、DistDDはグローバルな蒸留データセットを抽出し、フェデレーション学習のプライバシ標準を維持しながら通信コストを大幅に削減するワンタイム蒸留プロセスを促進する。 DistDDの蒸留データセットを活用することで、FLの開発者は、FLプロセス全体を何度も繰り返すことなく、ジャスト・イン・タイムのパラメータチューニングと、FL上のニューラルアーキテクチャ検索を実現できる。本研究では,DistDDアルゴリズムの詳細な収束証明を行い,その数学的安定性と信頼性を実証する。本実験は,DistDDの有効性とロバスト性を実証するものであり,特に非I.D.および誤ラベルデータシナリオにおいて,従来のフェデレート学習法と異なる複雑な実世界のデータ課題に対処する可能性を示している。また,実例におけるDistDDの適用性を評価し,NAS使用例におけるその有効性とコミュニケーションの省力化を実証した。 In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that requires iterative model updates across nodes, DistDD facilitates a one-time distillation process that extracts a global distilled dataset, maintaining the privacy standards of federated learning while significantly cutting down communication costs. By leveraging the DistDD's distilled dataset, the developers of the FL can achieve just-in-time parameter tuning and neural architecture search over FL without repeating the whole FL process multiple times. We provide a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications. Our experiments demonstrate the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios, showcasing its potential to handle complex real-world data challenges distinctively from conventional federated learning methods. We also evaluate DistDD's application in the use case and prove its effectiveness and communication-savings in the NAS use case.	翻訳日:2024-10-30 22:35:13 公開日:2024-10-11
# DeltaDQ:グループワイドドロップアウトと分離量子化による微調整LDMの超高速デルタ圧縮 DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization ( http://arxiv.org/abs/2410.08666v1 ) ライセンス: Link先を確認	Yanfeng Jiang, Zelan Yang, Bohua Chen, Shen Li, Yong Li, Tao Li,	(参考訳) 大規模言語モデルは、教師付き微調整により、様々な下流タスクにおいて例外的なパフォーマンスを達成する。しかし、下流のタスクと実践的な要件の多様性により、複数のフルパラメータの微調整モデルのデプロイが困難になる。デルタ重量を圧縮する現在の方法は、超高圧縮を達成するのに苦労し、配置オーバーヘッドを最小限に抑えられなかった。以上の課題に対処するために,グループワイドドロップアウトと分離量子化を用いてデルタ重みの超高圧縮を実現する,分散駆動型デルタ圧縮フレームワークDeltaDQを提案する。その結果, デルタ重みの行列計算による中間結果は, 極端に小さなばらつきとmin-max範囲特性を示し, バランスド中間結果と呼ばれる結果を得た。この現象を解き明かすため,グループワイド・ドロップアウトを導入し,最適なグループサイズを用いてデルタ重みのドロップアウトを行う。さらに、分離量子化を用いてスパース重みを量子化し、分解して低ビットを実現する。実験結果から,DeltaDQはパラメータスケールの異なるWizardMathモデルとWizardCoderモデルのベースラインと比較して,精度良く16倍の圧縮を実現していることがわかった。さらに、DeltaDQは超高圧縮比を示し、WizardMath-7Bモデルでは128倍、WizardMath-70Bモデルでは512倍の圧縮を実現している。 Large language models achieve exceptional performance on various downstream tasks through supervised fine-tuning. However, the diversity of downstream tasks and practical requirements makes deploying multiple full-parameter fine-tuned models challenging. Current methods that compress the delta weight struggle to achieve ultra-high compression, failing to minimize the deployment overhead. To address the above issue, we propose a novel distribution-driven delta compression framework DeltaDQ, which utilizes Group-wise Dropout and Separate Quantization to achieve ultra-high compression for the delta weight. We have observed that the matrix-computed intermediate results for the delta weight exhibit extremely small variance and min-max range characteristics, referred to as Balanced Intermediate Results. Exploiting this phenomenon, we introduce Group-wise Dropout to perform dropout on the delta weight using an optimal group size. Furthermore, using Separate Quantization, sparse weights are quantized and decomposed to achieve a lower bit. Experimental results show that DeltaDQ achieves 16x compression with improved accuracy compared to baselines for WizardMath and WizardCoder models across different parameter scales. Moreover, DeltaDQ demonstrates the ability for ultra-high compression ratio, achieving 128x compression for the WizardMath-7B model and 512x compression for the WizardMath-70B model.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# SmartPretrain: 動き予測のためのモデル非依存およびデータセット非依存表現学習 SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction ( http://arxiv.org/abs/2410.08669v1 ) ライセンス: Link先を確認	Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu,	(参考訳) 周囲のエージェントの将来の動きを予測することは、自律走行車(AV)がダイナミックで人間とロボットの混在した環境で安全に動作するために不可欠である。しかし、大規模な運転データセットの不足により、堅牢で一般化可能な運動予測モデルの開発が妨げられ、複雑な相互作用や道路測地を捉える能力が制限された。自然言語処理 (NLP) とコンピュータビジョン (CV) の最近の進歩に触発された自己教師型学習 (SSL) は、リッチで移動可能なシーン表現を学習するための動き予測コミュニティにおいて大きな注目を集めている。それでも、モーション予測のための既存の事前学習方法は、スケーラビリティと一般化性を制限するために、特定のモデルアーキテクチャと単一のデータセットに主に焦点を当てている。これらの課題に対処するために,モデル非依存かつデータセット非依存の動作予測のための汎用かつスケーラブルなSSLフレームワークであるSmartPretrainを提案する。提案手法は対照的かつ再構成的なSSLを統合し,生成的パラダイムと識別的パラダイムの両方の強みを活用して,アーキテクチャ制約を課さずに時空間的進化と相互作用を効果的に表現する。さらに、SmartPretrainでは、複数のデータセットを統合し、データボリューム、多様性、堅牢性を向上する、データセットに依存しないシナリオサンプリング戦略を採用している。複数のデータセットに対する大規模な実験は、SmartPretrainがデータセット、データ分割、主要なメトリクスにわたる最先端の予測モデルのパフォーマンスを一貫して改善していることを示している。例えば、SmartPretrainは、Forecast-MAEのMissRateを10.6%削減する。これらの結果は、小さなデータ体制の限界から解放された、モーション予測のための統一的でスケーラブルなソリューションとしてのSmartPretrainの有効性を強調している。コードはhttps://github.com/youngzhou 1999/SmartPretrainで公開されている。 Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions and road geometries. Inspired by recent advances in natural language processing (NLP) and computer vision (CV), self-supervised learning (SSL) has gained significant attention in the motion prediction community for learning rich and transferable scene representations. Nonetheless, existing pre-training methods for motion prediction have largely focused on specific model architectures and single dataset, limiting their scalability and generalizability. To address these challenges, we propose SmartPretrain, a general and scalable SSL framework for motion prediction that is both model-agnostic and dataset-agnostic. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms to effectively represent spatiotemporal evolution and interactions without imposing architectural constraints. Additionally, SmartPretrain employs a dataset-agnostic scenario sampling strategy that integrates multiple datasets, enhancing data volume, diversity, and robustness. Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics. For instance, SmartPretrain significantly reduces the MissRate of Forecast-MAE by 10.6%. These results highlight SmartPretrain's effectiveness as a unified, scalable solution for motion prediction, breaking free from the limitations of the small-data regime. Codes are available at https://github.com/youngzhou1999/SmartPretrain	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# SpikeBottleNet:デバイスエッジ共推論システムにおける特徴圧縮のためのエネルギー効率の良いスパイクニューラルネットワーク分割 SpikeBottleNet: Energy Efficient Spike Neural Network Partitioning for Feature Compression in Device-Edge Co-Inference Systems ( http://arxiv.org/abs/2410.08673v1 ) ライセンス: Link先を確認	Maruf Hassan, Steven Davy,	(参考訳) インテリジェントなモバイルアプリケーションの出現は、リソース制約のあるモバイルデバイスに強力なディープラーニングモデルをデプロイする上で、重要な需要を浮き彫りにしている。このコンテキストで有効なソリューションは、モバイルデバイスと近くのエッジサーバの間でディープニューラルネットワークを分割するデバイスエッジコ推論フレームワークである。このアプローチでは、デバイス上の計算と通信コストのバランスをとる必要がある。従来のディープニューラルネットワークアーキテクチャでは、連続的なデータ処理が必要であり、エッジデバイスによるエネルギー消費が大幅に増加する。これは、非常にエネルギー効率のよいニューラルネットワーク(SNN)をスパイクすることで実現される、バイナリでイベント駆動のアクティベーションを探求する。本研究では,SNNを統合することで既存のアーキテクチャを大幅に改善する,SpikeBottleNetという新しいアーキテクチャを提案する。我々の研究の重要な側面は、SNN用に特別に設計された中間的特徴圧縮技術の開発である。この手法は、SNNの分割コンピューティングアプローチを利用して、Spike ResNet50のような複雑なアーキテクチャを分割する。デバイスエッジコ推論システムにSNNのパワーを組み込むことにより、我々のSpikeBottleNetが最終畳み込み層において最大256倍のビット圧縮比を達成し、高い分類精度を維持しながらわずか2.5%の削減しか達成できないことを示す実験結果が得られた。さらに,ベースラインのBottleNet++アーキテクチャと比較して,以前の分割点における送信機能のサイズを75%削減する。さらに,エッジデバイスのエネルギー効率は,最大98倍に向上し,効率と性能の両面で著しく向上した。 The advent of intelligent mobile applications highlights the crucial demand for deploying powerful deep learning models on resource-constrained mobile devices. An effective solution in this context is the device-edge co-inference framework, which partitions a deep neural network between a mobile device and a nearby edge server. This approach requires balancing on-device computations and communication costs, often achieved through compressed intermediate feature transmission. Conventional deep neural network architectures require continuous data processing, leading to substantial energy consumption by edge devices. This motivates exploring binary, event-driven activations enabled by spiking neural networks (SNNs), known for their extremely energy efficiency. In this research, we propose a novel architecture named SpikeBottleNet, a significant improvement to the existing architecture by integrating SNNs. A key aspect of our investigation is the development of an intermediate feature compression technique specifically designed for SNNs. This technique leverages a split computing approach for SNNs to partition complex architectures, such as Spike ResNet50. By incorporating the power of SNNs within device-edge co-inference systems, experimental results demonstrate that our SpikeBottleNet achieves a significant bit compression ratio of up to 256x in the final convolutional layer while maintaining high classification accuracy with only a 2.5% reduction. Moreover, compared to the baseline BottleNet++ architecture, our framework reduces the transmitted feature size at earlier splitting points by 75%. Furthermore, in terms of the energy efficiency of edge devices, our methodology surpasses the baseline by a factor of up to 98, demonstrating significant enhancements in both efficiency and performance.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# 微粒な文レベルアラビア可読性アノテーションのガイドライン Guidelines for Fine-grained Sentence-level Arabic Readability Annotation ( http://arxiv.org/abs/2410.08674v1 ) ライセンス: Link先を確認	Nizar Habash, Hanada Taha-Thomure, Khalid N. Elmadani, Zeina Zeino, Abdallah Abushmaes,	(参考訳) 本稿では,多様な可読性レベルに整合したアラビア語リソースの包括的ニーズに対応するため,バランスド・アラビア可読性評価コーパス(BAREC)プロジェクトの基礎的枠組みと初期の成果について述べる。 Taha/Arabi21の可読性基準にインスパイアされたBARECは、幼稚園から大学院の理解まで、19の異なるレベルにわたる文レベルのアラビア文字の可読性を評価するための標準化された基準を提供することを目指している。 BARECの最終的な目標は、手動アノテーションとAI駆動ツールを組み合わせた多面的アプローチを通じて、幅広いジャンル、トピック、地域的なバリエーションを表す、包括的でバランスの取れたコーパスを作ることです。本稿は,10,631文/フレーズ(113,651語)の分析を通じて,本ガイドラインを実証した。 Quadratic Weighted Kappaが測定した平均的なペアワイドアノテータ間合意は79.9%であり、実質的な合意の水準の高さを反映している。また,自動可読性評価のベンチマーク結果についても報告する。我々は、アラビア語の研究・教育を支援するため、BARECコーパスとガイドラインを公に公開します。 This paper presents the foundational framework and initial findings of the Balanced Arabic Readability Evaluation Corpus (BAREC) project, designed to address the need for comprehensive Arabic language resources aligned with diverse readability levels. Inspired by the Taha/Arabi21 readability reference, BAREC aims to provide a standardized reference for assessing sentence-level Arabic text readability across 19 distinct levels, ranging in targets from kindergarten to postgraduate comprehension. Our ultimate goal with BAREC is to create a comprehensive and balanced corpus that represents a wide range of genres, topics, and regional variations through a multifaceted approach combining manual annotation with AI-driven tools. This paper focuses on our meticulous annotation guidelines, demonstrated through the analysis of 10,631 sentences/phrases (113,651 words). The average pairwise inter-annotator agreement, measured by Quadratic Weighted Kappa, is 79.9%, reflecting a high level of substantial agreement. We also report competitive results for benchmarking automatic readability assessment. We will make the BAREC corpus and guidelines openly accessible to support Arabic language research and education.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# Bukva: ロシアの手話Alphabet Bukva: Russian Sign Language Alphabet ( http://arxiv.org/abs/2410.08675v1 ) ライセンス: Link先を確認	Karina Kvanchiani, Petr Surovtsev, Alexander Nagaev, Elizaveta Petrova, Alexander Kapitanov,	(参考訳) 本稿では,ロシア手話(RSL)ダクティルとしても知られる,ロシア語の指先文字の認識について検討する。ダクティル (Dactyl) は手の動きが書かれた言語の個々の文字を表す手話の構成要素である。この手法は、固有名詞や専門用語など、特定の記号を使わずに単語を綴る。アルファベット学習シミュレータは、必須に単離されたダクチル認識アプリケーションである。既存のロシアのダクチルデータセットには、対象の不均一性、サンプルの不足、静的な兆候のみをカバーしていない。当社は、RSLダクチル認識のための、最初の本格的なオープンソースビデオデータセットであるBakvaを提供している。 RSLのアルファベット記号には3,757本のビデオと101本以上のサンプルがあり、ダイナミックなものも含まれている。クラウドソーシングプラットフォームを使用して、被験者の異質性を高め、結果として155人の聴覚障害者と難聴専門家がデータセット作成に参加しました。我々はTSM(Temporal Shift Module)ブロックを使って静的および動的サインを効果的に処理し、CPUのみのリアルタイム推論で83.6%のトップ-1の精度を達成する。データセット、デモコード、トレーニング済みモデルなどが公開されている。 This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl. Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language. This method is used to spell words without specific signs, such as proper nouns or technical terms. The alphabet learning simulator is an essential isolated dactyl recognition application. There is a notable issue of data shortage in isolated dactyl recognition: existing Russian dactyl datasets lack subject heterogeneity, contain insufficient samples, or cover only static signs. We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition. It contains 3,757 videos with more than 101 samples for each RSL alphabet sign, including dynamic ones. We utilized crowdsourcing platforms to increase the subject's heterogeneity, resulting in the participation of 155 deaf and hard-of-hearing experts in the dataset creation. We use a TSM (Temporal Shift Module) block to handle static and dynamic signs effectively, achieving 83.6% top-1 accuracy with a real-time inference with CPU only. The dataset, demo code, and pre-trained models are publicly available.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# インサイドヒューマンAI体験のデザイン空間 The Design Space of in-IDE Human-AI Experience ( http://arxiv.org/abs/2410.08676v1 ) ライセンス: Link先を確認	Agnia Sergeyuk, Ekaterina Koshchenko, Ilya Zakharov, Timofey Bryksin, Maliheh Izadi,	(参考訳) 現在、統合開発環境(IDE)におけるAI駆動ツールの統合は、ソフトウェア開発ライフサイクルを再構築している。既存の調査では、これらのツールが効率的で、コンテキストを認識し、正確で、ユーザフレンドリで、カスタマイズ可能で、安全であると期待されている。しかしながら、開発者のニーズや課題、特にIDE内のAIシステムと対話する場合や、異なるユーザグループの観点からの理解において、大きなギャップは依然として残っている。本稿では、IDEにおけるAIの採用者、チャーナー、非ユーザという3つの異なるグループの35人の開発者との構造化されたインタビューを通じて、このギャップに対処する。私たちの結果は、IDE内AIシステムにおける技術改善、インタラクション、アライメントの重要領域と、スキル構築とプログラミングタスクの簡略化を強調します。私たちの重要な発見は、よりパーソナライズされ、積極的に、信頼性の高いAIシステムの必要性を強調しています。また、コンテキスト認識およびプライバシ重視のソリューションの重要性を強調し、既存のワークフローとの統合性を改善します。さらに、私たちの調査によると、Adoptersは高度な機能と非断続的な統合を歓迎しているが、Churners氏は信頼性とプライバシの改善の必要性を強調している。非ユーザとは対照的に、採用の障壁として、スキル開発と倫理的懸念に焦点を当てます。最後に、私たちは、開発者ワークフローにおけるAI統合を強化することを目的とした業界実践者にレコメンデーションを提供します。 Nowadays, integration of AI-driven tools within Integrated Development Environments (IDEs) is reshaping the software development lifecycle. Existing research highlights that users expect these tools to be efficient, context-aware, accurate, user-friendly, customizable, and secure. However, a major gap remains in understanding developers' needs and challenges, particularly when interacting with AI systems in IDEs and from the perspectives of different user groups. In this work, we address this gap through structured interviews with 35 developers from three different groups: Adopters, Churners, and Non-Users of AI in IDEs to create a comprehensive Design Space of in-IDE Human-AI Experience. Our results highlight key areas of Technology Improvement, Interaction, and Alignment in in-IDE AI systems, as well as Simplifying Skill Building and Programming Tasks. Our key findings stress the need for AI systems that are more personalized, proactive, and reliable. We also emphasize the importance of context-aware and privacy-focused solutions and better integration with existing workflows. Furthermore, our findings show that while Adopters appreciate advanced features and non-interruptive integration, Churners emphasize the need for improved reliability and privacy. Non-Users, in contrast, focus on skill development and ethical concerns as barriers to adoption. Lastly, we provide recommendations for industry practitioners aiming to enhance AI integration within developer workflows.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# 地球観測用ハイブリッド量子ニューラルネットワークにおける鍵設計面の影響について On the impact of key design aspects in simulated Hybrid Quantum Neural Networks for Earth Observation ( http://arxiv.org/abs/2410.08677v1 ) ライセンス: Link先を確認	Lorenzo Papa, Alessandro Sebastianelli, Gabriele Meoni, Irene Amerini,	(参考訳) 量子コンピューティングは、機械学習タスクへの取り組みと改善のための新しい視点を導入した。さらに、量子技術とよく知られたディープラーニング(DL)アーキテクチャの統合は、地球観測(EO)や他の多くの研究分野など、様々な領域で注目を集める研究トレンドとして浮上している。しかしながら、EO文学における以前の関連する研究は、主に進化的なアーキテクチャの進歩に焦点を当てており、いくつかの重要なトピックが未解明のまま残されている。本研究は,EOタスクのためのハイブリッド量子マシンモデルの基礎研究の3つの事例を通じて,より適切なシミュレーションとその後のNISQ時代への展望に向けて,今後の研究研究の基盤となることを目的としている。より詳しくは、(1)ハイブリッド量子モデルのトレーニングにおいて、異なる量子ライブラリがどのように振る舞うかを考察し、その計算効率と有効性を評価する。第二に、(2)従来のモデルと量子強化モデルの両方において初期化値(シード値)に対する安定性と感度を解析する。最後に、EOアプリケーションにおけるハイブリッド量子アテンションベースのモデルの利点について検討し、量子回路をViTに組み込むことでモデル性能が向上する方法について検討する。 Quantum computing has introduced novel perspectives for tackling and improving machine learning tasks. Moreover, the integration of quantum technologies together with well-known deep learning (DL) architectures has emerged as a potential research trend gaining attraction across various domains, such as Earth Observation (EO) and many other research fields. However, prior related works in EO literature have mainly focused on convolutional architectural advancements, leaving several essential topics unexplored. Consequently, this research investigates through three cases of study fundamental aspects of hybrid quantum machine models for EO tasks aiming to provide a solid groundwork for future research studies towards more adequate simulations and looking at the post-NISQ era. More in detail, we firstly (1) investigate how different quantum libraries behave when training hybrid quantum models, assessing their computational efficiency and effectiveness. Secondly, (2) we analyze the stability/sensitivity to initialization values (i.e., seed values) in both traditional model and quantum-enhanced counterparts. Finally, (3) we explore the benefits of hybrid quantum attention-based models in EO applications, examining how integrating quantum circuits into ViTs can improve model performance.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# 不規則観測を伴う時空間課題の効率的な走査と再サンプリング Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations ( http://arxiv.org/abs/2410.08681v1 ) ライセンス: Link先を確認	Bryce Ferenczi, Michael Burke, Tom Drummond,	(参考訳) 繰り返しモデルの推論効率と、シーケンスモデリングのためのマルチヘッドアテンションの訓練の並列性を組み合わせることを目的としている。しかしながら、これらの研究の多くは、言語モデリングにおける個々のトークンや画像補完におけるピクセルといった、固定次元の観測空間でのタスクに焦点を当てている。異なる大きさの観測空間を扱うために,2次元の潜伏状態と観測状態の交叉位置を交互に交互に扱う新しいアルゴリズムを提案する。この再サンプリングサイクルはパフォーマンスに不可欠です。この領域における効率的なシーケンスモデリングを評価するために,本研究では,バウンディング粒子を追尾するシミュレーションエージェントと,プロのStarCraft IIゲームにおけるマイクロマネジメント分析という2つのマルチエージェント意図タスクを導入する。提案アルゴリズムは,従来の手法と比較して,パラメータカウントが低く,トレーニングや推論が高速である場合に比較して精度が向上する。 Various works have aimed at combining the inference efficiency of recurrent models and training parallelism of multi-head attention for sequence modeling. However, most of these works focus on tasks with fixed-dimension observation spaces, such as individual tokens in language modeling or pixels in image completion. To handle an observation space of varying size, we propose a novel algorithm that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information. We find this resampling cycle is critical for performance. To evaluate efficient sequence modeling in this domain, we introduce two multi-agent intention tasks: simulated agents chasing bouncing particles and micromanagement analysis in professional StarCraft II games. Our algorithm achieves comparable accuracy with a lower parameter count, faster training and inference compared to existing methods.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# ランダム化ベンチマークのハンズオン Hands-on Introduction to Randomized Benchmarking ( http://arxiv.org/abs/2410.08683v1 ) ライセンス: Link先を確認	Ana Silva, Eliska Greplova,	(参考訳) このチュートリアルの目的は、ランダム化されたベンチマーク技術の背後にある主要な原則の概要を提供することである。この分野の新参者は、トピックに慣れるために大量のバックグラウンド知識が必要であるという課題に直面している。我々の目的は、ランダム化ベンチマークの教育的導入を提供することによって、このプロセスを緩和することである。各章には付属するPythonノートが付属しており、各プロトコルの本質的なステップが説明されている。 The goal of this tutorial is to provide an overview of the main principles behind randomized benchmarking techniques. A newcomer to the field faces the challenge that a considerable amount of background knowledge is required to get familiar with the topic. Our purpose is to ease this process by providing a pedagogical introduction to randomized benchmarking. Every chapter is supplemented with an accompanying Python notebook, illustrating the essential steps of each protocol.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# LiDARシーンセマンティックセマンティックセグメンテーションの不確かさ推定とアウト・オブ・ディストリビューション検出 Uncertainty Estimation and Out-of-Distribution Detection for LiDAR Scene Semantic Segmentation ( http://arxiv.org/abs/2410.08687v1 ) ライセンス: Link先を確認	Hanieh Shojaei, Qianqian Zou, Max Mehltretter,	(参考訳) 新たな環境での安全なナビゲーションには、LiDARシーンのセグメンテーション、障害検出のアウト・オブ・ディストリビューション(OOD)、不確実性計算に頼って、自動運転車やロボットが環境を正確に解釈する必要がある。本稿では,OODサンプルから分布内ID(In-distriion)を識別し,1つの決定論的モデルの特徴空間を用いて,疫学的およびアレタリックな不確実性の両方を定量化する手法を提案する。意味的セグメンテーションネットワークを訓練した後、その特徴空間にガウス混合モデル(GMM)を装着する。 OODサンプルは、それぞれのガウス成分との正方形マハラノビス距離がチ二乗分布に一致するかどうかを確認し、追加のOODトレーニングセットを不要にすることで検出される。多変量ガウス分布の推定平均と共分散行列がガウス分布および逆ウィッシュアート分布に従っていることを考慮し、これらの分布からのサンプリングにより複数のGMMを生成し、分類変数による疫学的不確実性を評価する。アレター的不確実性は、ガウス成分内の責任値のエントロピーから導かれる。提案手法を深層アンサンブルとロジットサンプリングを用いて不確実性計算と比較することにより,疫学およびアレタリック不確かさの定量化,およびOODサンプルの検出を行う実世界のアプリケーションにおいて,その優れた性能を示す。深層アンサンブルは極めて不確実なサンプルを見逃すが,本手法は検出に成功し,高い疫学的不確かさを割り当てる。 Safe navigation in new environments requires autonomous vehicles and robots to accurately interpret their surroundings, relying on LiDAR scene segmentation, out-of-distribution (OOD) obstacle detection, and uncertainty computation. We propose a method to distinguish in-distribution (ID) from OOD samples and quantify both epistemic and aleatoric uncertainties using the feature space of a single deterministic model. After training a semantic segmentation network, a Gaussian Mixture Model (GMM) is fitted to its feature space. OOD samples are detected by checking if their squared Mahalanobis distances to each Gaussian component conform to a chi-squared distribution, eliminating the need for an additional OOD training set. Given that the estimated mean and covariance matrix of a multivariate Gaussian distribution follow Gaussian and Inverse-Wishart distributions, multiple GMMs are generated by sampling from these distributions to assess epistemic uncertainty through classification variability. Aleatoric uncertainty is derived from the entropy of responsibility values within Gaussian components. Comparing our method with deep ensembles and logit-sampling for uncertainty computation demonstrates its superior performance in real-world applications for quantifying epistemic and aleatoric uncertainty, as well as detecting OOD samples. While deep ensembles miss some highly uncertain samples, our method successfully detects them and assigns high epistemic uncertainty.	翻訳日:2024-10-30 22:35:12 公開日:2024-10-11
# Chain-of-Restoration:マルチタスク画像復元モデルはゼロショットステップバイステップユニバーサル画像復元器である Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers ( http://arxiv.org/abs/2410.08688v1 ) ライセンス: Link先を確認	Jin Cao, Deyu Meng, Xiangyong Cao,	(参考訳) 従来の研究は孤立分解型を対象としていたが、最近の研究は、複数の孤立分解の複雑な相互作用を含む複合劣化への対処に重点を置いている。本論文では, 劣化の可能性のある組み合わせの指数的な数によって生じる課題を認識し, モデルに一連の劣化基盤をトレーニングし, それらの基盤がゼロショットで構成できるような劣化を除去する, 新しいタスク設定である, ユニバーサルイメージ復元(UIR)を提案する。そこで本研究では, LLM を段階的に解決する Chain-of-Thought (Chain-of-Thought) に着想を得て, モデルに未知の複合劣化を段階的に除去するよう指示する Chain-of-Restoration (CoR) を提案する。単純な劣化判別器を事前訓練されたマルチタスクモデルに統合することにより、CoRはモデルがステップ毎に1つの劣化基準を除去し、未知の複合劣化から画像が完全に復元されるまでこのプロセスを継続するプロセスを促進する。大規模な実験により,CoRは複合劣化を除去する際のモデル性能を著しく向上し,全ての劣化を訓練したSoTA(State-of-The-Art)法に匹敵する結果を得た。コードはhttps://github.com/toummHus/Chain-of-Restorationでリリースされる。 Despite previous works typically targeting isolated degradation types, recent research has increasingly focused on addressing composite degradations which involve a complex interplay of multiple different isolated degradations. Recognizing the challenges posed by the exponential number of possible degradation combinations, we propose Universal Image Restoration (UIR), a new task setting that requires models to be trained on a set of degradation bases and then remove any degradation that these bases can potentially compose in a zero-shot manner. Inspired by the Chain-of-Thought which prompts LLMs to address problems step-by-step, we propose the Chain-of-Restoration (CoR), which instructs models to step-by-step remove unknown composite degradations. By integrating a simple Degradation Discriminator into pre-trained multi-task models, CoR facilitates the process where models remove one degradation basis per step, continuing this process until the image is fully restored from the unknown composite degradation. Extensive experiments show that CoR significantly improves model performance in removing composite degradations, achieving results comparable to or surpassing those of State-of-The-Art (SoTA) methods trained on all degradations. The code will be released at https://github.com/toummHus/Chain-of-Restoration.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# ビジョンランゲージブートストラップによるフレキシブル複雑度を考慮した動的マルチモーダル評価 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping ( http://arxiv.org/abs/2410.08695v1 ) ライセンス: Link先を確認	Yue Yang, Shuibai Zhang, Wenqi Shao, Kaipeng Zhang, Yi Bin, Yu Wang, Ping Luo,	(参考訳) LVLM(Large Vision-Language Models)は、視覚知覚や推論といったマルチモーダルなタスクにまたがる顕著な能力を示し、様々なマルチモーダルな評価ベンチマークで優れたパフォーマンスを実現している。しかし、これらのベンチマークは静的な性質を保持し、トレーニング済みのデータと重なり合うため、固定された複雑さの制約とデータ汚染の問題が発生する。これにより、評価の妥当性に関する懸念が高まる。これら2つの課題に対処するために,ビジョンランゲージブートストラッピング (VLB) と呼ばれる動的マルチモーダル評価プロトコルを導入する。 VLBは、データ汚染の低減と柔軟性のある複雑さを伴うLVLMの堅牢で包括的な評価を提供する。この目的のために、VLBは、画像と言語の両方を変更するマルチモーダルブートストラッピングモジュールを通じて、新しい視覚的質問応答サンプルを動的に生成する。様々なブートストラップ戦略を構成することで、VLBは様々な複雑さを持つ既存のベンチマークの動的変種を提供し、LVLMの進化する能力と共同で評価することができる。 SEEDBench, MMBench, MMEを含む複数のベンチマークにおいて, VLBはデータ汚染を著しく低減し, LVLMの性能限界を明らかにする。 Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across multimodal tasks such as visual perception and reasoning, leading to good performance on various multimodal evaluation benchmarks. However, these benchmarks keep a static nature and overlap with the pre-training data, resulting in fixed complexity constraints and data contamination issues. This raises the concern regarding the validity of the evaluation. To address these two challenges, we introduce a dynamic multimodal evaluation protocol called Vision-Language Bootstrapping (VLB). VLB provides a robust and comprehensive assessment for LVLMs with reduced data contamination and flexible complexity. To this end, VLB dynamically generates new visual question-answering samples through a multimodal bootstrapping module that modifies both images and language, while ensuring that newly generated samples remain consistent with the original ones by a judge module. By composing various bootstrapping strategies, VLB offers dynamic variants of existing benchmarks with diverse complexities, enabling the evaluation to co-evolve with the ever-evolving capabilities of LVLMs. Extensive experimental results across multiple benchmarks, including SEEDBench, MMBench, and MME, show that VLB significantly reduces data contamination and exposes performance limitations of LVLMs.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# AMPO: 自動マルチブランチプロンプト最適化 AMPO: Automatic Multi-Branched Prompt Optimization ( http://arxiv.org/abs/2410.08696v1 ) ライセンス: Link先を確認	Sheng Yang, Yurong Wu, Yan Gao, Zineng Zhou, Bin Benjamin Zhu, Xiaodi Sun, Jian-Guang Lou, Zhiming Ding, Anbang Hu, Yuan Fang, Yunsong Li, Junyan Chen, Linjun Yang,	(参考訳) プロンプトエンジニアリングは、大規模言語モデル(LLM)の性能を高めるために非常に重要である。複雑な問題に対処する場合、エンジニアはサンプルから複数のパターンを抽出し、関連するソリューションを注入してプロンプトを最適化し、満足な結果を達成する傾向があります。しかし、既存の自動プロンプト最適化技術は、多様なパターンを扱うのに苦労する単一フロー命令の生成に限られている。本稿では,障害事例をフィードバックとして多分岐プロンプトを反復的に開発可能な自動プロンプト最適化手法AMPOを提案する。私たちのゴールは、複雑なタスクにおける複数のパターンをよりよく扱うために、複数のブランチでプロンプトを構造化する新しい方法を探ることです。 5つのタスクにわたる実験では、AMPOが常に最良の結果を達成する。さらに,本手法は,最小限の探索戦略を採用することにより,大幅な最適化効率を示す。 Prompt engineering is very important to enhance the performance of large language models (LLMs). When dealing with complex issues, prompt engineers tend to distill multiple patterns from examples and inject relevant solutions to optimize the prompts, achieving satisfying results. However, existing automatic prompt optimization techniques are only limited to producing single flow instructions, struggling with handling diverse patterns. In this paper, we present AMPO, an automatic prompt optimization method that can iteratively develop a multi-branched prompt using failure cases as feedback. Our goal is to explore a novel way of structuring prompts with multi-branches to better handle multiple patterns in complex tasks, for which we introduce three modules: Pattern Recognition, Branch Adjustment, and Branch Pruning. In experiments across five tasks, AMPO consistently achieves the best results. Additionally, our approach demonstrates significant optimization efficiency due to our adoption of a minimal search strategy.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# SocialGaze: 大規模言語モデルにおける人間の社会的ノルムの統合の改善 SocialGaze: Improving the Integration of Human Social Norms in Large Language Models ( http://arxiv.org/abs/2410.08698v1 ) ライセンス: Link先を確認	Anvesh Rao Vijjini, Rakesh R. Menon, Jiayi Fu, Shashank Srivastava, Snigdha Chaturvedi,	(参考訳) 近年,大規模言語モデル(LLM)の推論能力の向上について多くの研究が行われてきたが,これらのモデルと社会的価値や規範との整合性を理解するにはギャップがある。社会的受容を判断する作業を紹介する。社会的受容は、社会的状況における人々の行動の受容性を判断し、合理化するモデルを必要とする。例えば、近所の人が、夜にペットを屋内で飼うようにコミュニティの他の人に頼むことは、社会的に受け入れられるのだろうか? LLMの社会的受容に対する理解は、しばしば人間のコンセンサスと不一致である。これを軽減するために,複数段階のプロンプトフレームワークであるSocialGazeを導入する。実験の結果,SocialGaze アプローチは GPT-3.5 モデルで最大11F1 点までヒトの判断との整合性を向上することが示された。また,性別(男性の方が不公平に判断される可能性が有意に高い)や年齢(LLMは年長のナレーターに対して人間とより一致している)といった特徴に関係した非難を割り当てる際のLLMのバイアスや相関も同定した。 While much research has explored enhancing the reasoning capabilities of large language models (LLMs) in the last few years, there is a gap in understanding the alignment of these models with social values and norms. We introduce the task of judging social acceptance. Social acceptance requires models to judge and rationalize the acceptability of people's actions in social situations. For example, is it socially acceptable for a neighbor to ask others in the community to keep their pets indoors at night? We find that LLMs' understanding of social acceptance is often misaligned with human consensus. To alleviate this, we introduce SocialGaze, a multi-step prompting framework, in which a language model verbalizes a social situation from multiple perspectives before forming a judgment. Our experiments demonstrate that the SocialGaze approach improves the alignment with human judgments by up to 11 F1 points with the GPT-3.5 model. We also identify biases and correlations in LLMs in assigning blame that is related to features such as the gender (males are significantly more likely to be judged unfairly) and age (LLMs are more aligned with humans for older narrators).	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# プログレッシブ・プルーニング:ストリームベースの通信の匿名性を推定する Progressive Pruning: Estimating Anonymity of Stream-Based Communication ( http://arxiv.org/abs/2410.08700v1 ) ライセンス: Link先を確認	Christoph Döpmann, Maximilian Weisenseel, Florian Tschorsch,	(参考訳) データのストリームは今日のインターネット上でユビキタス通信モデルになっている。強い匿名通信では、多くのmixnet設計によってgと仮定される単一独立メッセージという伝統的な概念と矛盾する。本研究では,ストリーム通信に固有の匿名性要因について検討する。本稿では,ストリームの匿名度を推定する手法であるProgressive Pruningを紹介した。交差点攻撃を模倣することにより、トラフィック分析攻撃に対するストリームの感受性をキャプチャする。本稿では,TorFSシミュレータを用いたTorの大規模シミュレーションにも適用し,ストリーム長,ユーザ数,ネットワーク上のストリームの分散が,匿名性に相互依存する影響があることを確かめる。我々の研究は、将来的にストリームベースの通信に強力な匿名性を提供するために解決すべき課題に注意を向けている。 Streams of data have become the ubiquitous communication model on today's Internet. For strong anonymous communication, this conflicts with the traditional notion of single, independent messages, as assumed e.g. by many mixnet designs. In this work, we investigate the anonymity factors that are inherent to stream communication. We introduce Progressive Pruning}, a methodology suitable for estimating the anonymity level of streams. By mimicking an intersection attack, it captures the susceptibility of streams against traffic analysis attacks. We apply it to simulations of tailored examples of stream communication as well as to large-scale simulations of Tor using our novel TorFS simulator, finding that the stream length, the number of users, and how streams are distributed over the network have interdependent impacts on anonymity. Our work draws attention to challenges that need to be solved in order to provide strong anonymity for stream-based communication in the future.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# Obelia: DAGベースのブロックチェーンを数百のバリデータにスケールアップ Obelia: Scaling DAG-Based Blockchains to Hundreds of Validators ( http://arxiv.org/abs/2410.08701v1 ) ライセンス: Link先を確認	George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, Mingwei Tian,	(参考訳) ObeliaはDAGベースの構造化コンセンサスプロトコルを改善し、数百のバリデータに対応できるように効果的にスケールできる。 Obeliaは2層検証システムを実装している。現在のプロトコルのようにブロックを提案するハイテイクなバリデーターのコアグループと、時折ブロックを作成できるローテイクな補助バリデーターのより大きなグループである。 Obeliaは補助バリデータにインセンティブを与え、コアバリデータのリカバリを支援し、既存のプロトコルとシームレスに統合する。 Obeliaは、数百のバリデータへのスケーリングや、多数の補助バリデータに対する信頼性の低い場合であっても、元のプロトコルと比較して可視的オーバーヘッドを発生させないことを示す。 Obelia improves upon structured DAG-based consensus protocols used in proof-of-stake systems, allowing them to effectively scale to accommodate hundreds of validators. Obelia implements a two-tier validator system. A core group of high-stake validators that propose blocks as in current protocols and a larger group of lower-stake auxiliary validators that occasionally author blocks. Obelia incentivizes auxiliary validators to assist recovering core validators and integrates seamlessly with existing protocols. We show that Obelia does not introduce visible overhead compared to the original protocol, even when scaling to hundreds of validators, or when a large number of auxiliary validators are unreliable.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# 局所プローブ単発読み出しによる個々の原子核のスピン寿命の研究 The spin lifetime of an individual atomic nucleus investigated via local-probe single-shot readout ( http://arxiv.org/abs/2410.08704v1 ) ライセンス: Link先を確認	Evert W. Stolte, Jinwon Lee, Hester Vennema, Rik Broekhoven, Esther Teng, Allard Katan, Lukas M. Veldman, Philip Willke, Sander Otte,	(参考訳) 核スピンは、彼らの環境からの優れた隔離に長寿命の磁気状態がある。同時に、スピン状態を読み書きするためには、周囲との限られた相互作用が必要である。したがって、核スピンの原子環境に関する詳細な知識と制御は、量子情報応用の条件を最適化するための鍵となる。走査型トンネル顕微鏡(STM)と電子スピン共鳴(ESR)を組み合わせることで、超微細相互作用を通じて個々の核スピンの原子スケール情報を提供する。このアプローチは核スピンエネルギーレベルをマッピングすることに成功したが、時間領域における固有の振る舞いについての洞察は限られている。ここでは,STMを用いた個別の^{\text{49}}$Ti核スピンの単発読み出しを実演する。パルス測定方式を用いることで、本質的な寿命は5.3$\pm$ 0.5秒となる。さらに、同じ原子中の電子スピンとのフリップフロップ相互作用を含むモデル計算によって支持されるESR駆動と直流トンネル電流の両方に対する応答を調べることにより、核スピンの励起・緩和機構を明らかにすることに光を当てた。これらの発見は、核スピン緩和の性質に関する原子スケールの洞察を与え、原子組立量子ビットプラットフォームの開発に関係している。 Nuclear spins owe their long-lived magnetic states to their excellent isolation from their environment. At the same time, a limited degree of interaction with their surroundings is necessary for reading and writing the spin state. Detailed knowledge of and control over the atomic environment of a nuclear spin is therefore key to optimizing conditions for quantum information applications. Scanning tunnelling microscopy (STM), combined with electron spin resonance (ESR), provides atomic-scale information of individual nuclear spins via the hyperfine interaction. While this approach proved successful in mapping the nuclear spin energy levels, insight in its intrinsic behaviour in the time domain remain limited. Here, we demonstrate single-shot readout of an individual $^{\text{49}}$Ti nuclear spin with an STM. Employing a pulsed measurement scheme, we find its intrinsic lifetime to be 5.3 $\pm$ 0.5 seconds. Furthermore, we shed light on reveal the pumping and relaxation mechanism of the nuclear spin by investigating its response to both ESR driving and DC tunneling current, which is supported by model calculations involving flip-flop interactions with the electron spin in the same atom. These findings give an atomic-scale insight into the nature of nuclear spin relaxation and are relevant for the development of atomically-assembled qubit platforms.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# 次元相関による離散拡散の蒸留 Distillation of Discrete Diffusion through Dimensional Correlations ( http://arxiv.org/abs/2410.08709v1 ) ライセンス: Link先を確認	Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji,	(参考訳) 拡散モデルは、生成モデリングの様々な分野において例外的な性能を示した。 VAEやGANといった競合製品よりも、サンプルの品質と多様性が優れているが、反復的な性質のためサンプリング速度が遅い。近年、蒸留技術と整合性モデルは連続領域においてこの問題を緩和しているが、離散拡散モデルはより高速な生成にいくつかの特別な課題を持っている。特に、現在の文献では、異なる次元(ピクセル、位置)間の相関は、そのモデリングと損失関数によって無視される。本稿では,拡張性を維持しながら次元相関を扱える離散拡散の混合モデルを提案し,既存のモデルの繰り返しを蒸留する損失関数のセットを提供する。第一に、次元独立モデルが多くのサンプリングステップを実行することを許された場合、データ分布をうまく近似できるし、第二に、損失関数は、そのような多くのステップの従来のモデルを、次元相関を学習することで、わずか数ステップで蒸留することができる。 CIFAR-10データセットで事前学習した連続時間離散拡散モデルを蒸留することにより,提案手法が実際に動作することを実証的に実証した。 Diffusion models have demonstrated exceptional performances in various fields of generative modeling. While they often outperform competitors including VAEs and GANs in sample quality and diversity, they suffer from slow sampling speed due to their iterative nature. Recently, distillation techniques and consistency models are mitigating this issue in continuous domains, but discrete diffusion models have some specific challenges towards faster generation. Most notably, in the current literature, correlations between different dimensions (pixels, locations) are ignored, both by its modeling and loss functions, due to computational limitations. In this paper, we propose "mixture" models in discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: first, that dimensionally independent models can well approximate the data distribution if they are allowed to conduct many sampling steps, and second, that our loss functions enables mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. We empirically demonstrate that our proposed method for discrete diffusions work in practice, by distilling a continuous-time discrete diffusion model pretrained on the CIFAR-10 dataset.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# トランスフォーマーを用いたオンチップ学習 On-Chip Learning via Transformer In-Context Learning ( http://arxiv.org/abs/2410.08711v1 ) ライセンス: Link先を確認	Jan Finkbeiner, Emre Neftci,	(参考訳) 自動回帰デコーダのみの変換器は、スケーラブルなシーケンス処理と生成モデルの主要なコンポーネントとなっている。しかし、トランスの自己アテンション機構では、各ステップ(Token)毎にメインメモリから先行トークンプロジェクションを転送する必要があるため、従来のプロセッサの性能は著しく制限される。自己保持は動的フィードフォワード層として見ることができ、マトリックスは局所的なシナプス可塑性の結果と同様に入力配列に依存している。この知見を用いて、オンチップの塑性プロセッサを用いて自己注意を計算するニューロモルフィックデコーダのみのトランスモデルを提案する。興味深いことに、トランスフォーマーのトレーニングによって、推論中に入力コンテキストを 'learn'' することができる。本稿では,Loihi 2プロセッサ上での変換器の文脈内学習能力を,数発の分類問題を解くことで実証する。これにより、事前訓練されたモデルの重要性、特に単純で局所的なバックプロパゲーションフリーな学習ルールを見つける能力を強調し、オンチップ学習とハードウェアフレンドリな適応を可能にする。 Autoregressive decoder-only transformers have become key components for scalable sequence processing and generation models. However, the transformer's self-attention mechanism requires transferring prior token projections from the main memory at each time step (token), thus severely limiting their performance on conventional processors. Self-attention can be viewed as a dynamic feed-forward layer, whose matrix is input sequence-dependent similarly to the result of local synaptic plasticity. Using this insight, we present a neuromorphic decoder-only transformer model that utilizes an on-chip plasticity processor to compute self-attention. Interestingly, the training of transformers enables them to ``learn'' the input context during inference. We demonstrate this in-context learning ability of transformers on the Loihi 2 processor by solving a few-shot classification problem. With this we emphasize the importance of pretrained models especially their ability to find simple, local, backpropagation free, learning rules enabling on-chip learning and adaptation in a hardware friendly manner.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# 海面障害物検出における表面反射の影響 Impact of Surface Reflections in Maritime Obstacle Detection ( http://arxiv.org/abs/2410.08713v1 ) ライセンス: Link先を確認	Samed Yalçın, Hazım Kemal Ekenel,	(参考訳) 海上障害物検出は、無人表面車両の自律走行において可能な障害物を検出することを目的としている。海上障害物検出の文脈では、水面は特定の状況下で鏡のように振る舞うことができ、画像に反射を引き起こす。以前の研究では、海上障害物検出タスクにおける物体検出のための偽陽性の源として表面反射が示されていた。本研究では,表面反射が検出性能に悪影響を及ぼすことを示す。 2つのカスタムデータセットでテストすることで、リフレクションの効果を測定します。第1のものは反射のあるイメージを含み、第2の反射は塗装されている。反射は様々な検出器間でmAPを1.2から9.6ポイント減少させる。反射面上の偽陽性を除去するために,Heatmap Based Sliding Filter という新しいフィルタ手法を提案する。提案手法は, 偽陽性の総数を34.64%削減し, 真陽性に最小限の影響を及ぼすことを示した。また、定性的解析を行い、提案手法が実際に反射の偽陽性を除去することを示す。データセットはhttps://github.com/SamedYalcin/MRADで見ることができる。 Maritime obstacle detection aims to detect possible obstacles for autonomous driving of unmanned surface vehicles. In the context of maritime obstacle detection, the water surface can act like a mirror on certain circumstances, causing reflections on imagery. Previous works have indicated surface reflections as a source of false positives for object detectors in maritime obstacle detection tasks. In this work, we show that surface reflections indeed adversely affect detector performance. We measure the effect of reflections by testing on two custom datasets, which we make publicly available. The first one contains imagery with reflections, while in the second reflections are inpainted. We show that the reflections reduce mAP by 1.2 to 9.6 points across various detectors. To remove false positives on reflections, we propose a novel filtering approach named Heatmap Based Sliding Filter. We show that the proposed method reduces the total number of false positives by 34.64% while minimally affecting true positives. We also conduct qualitative analysis and show that the proposed method indeed removes false positives on the reflections. The datasets can be found on https://github.com/SamedYalcin/MRAD.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# 量子アニールを用いたCDLPの一般解法 The generalized method of solving ECDLP using quantum annealing ( http://arxiv.org/abs/2410.08725v1 ) ライセンス: Link先を確認	Łukasz Dzierzkowski,	(参考訳) 本稿では,楕円曲線離散対数問題(ECDLP)の素数への変換を擬似非制約二項最適化(QUBO)問題に変換する手法の一般化を提案する。元の方法は、与えられた楕円曲線モデルが完全算術を持つことを要求する。新しいものにはそのような制限はなく、画期的です。上記の障害はもはや問題ではないため、アルゴリズムの最新バージョンは楕円曲線モデルに使用することができる。その結果、楕円曲線の任意のモデルにおいて、量子アニールを用いてCDLPを解くことができる。 This paper presents a generalization of a method allowing the transformation of the Elliptic Curve Discrete Logarithm Problem (ECDLP) over prime fields to the Quadratic Unconstrained Binary Optimization (QUBO) problem. The original method requires that a given elliptic curve model has complete arithmetic. The new one has no such restriction, which is a breakthrough. Since the mentioned obstacle is no longer a problem, the latest version of the algorithm may be used for any elliptic curve model. As a result, one may use quantum annealing to solve ECDLP on any given model of elliptic curves.	翻訳日:2024-10-30 22:25:15 公開日:2024-10-11
# 損失次元:遺伝子拡散における幾何学的記憶 Losing dimensions: Geometric memorization in generative diffusion ( http://arxiv.org/abs/2410.08727v1 ) ライセンス: Link先を確認	Beatrice Achilli, Enrico Ventura, Gianluigi Silvestri, Bao Pham, Gabriel Raya, Dmitry Krotov, Carlo Lucibello, Luca Ambrogioni,	(参考訳) 生成拡散過程は、統計物理学の基本概念と深く結びついている最先端の機械学習モデルである。データセットのサイズやネットワークの容量によって、それらの挙動は、ガラス相転移と呼ばれる現象において、連想記憶状態から一般化フェーズへ遷移することが知られている。ここでは、統計物理学の手法を用いて、生成拡散における記憶理論を多様体支援データに拡張する。理論的および実験的な結果から,異なる臨界時間における記憶効果と,その方向に沿ったデータの局所的分散に依存するデータセットサイズにより,異なる接部分空間が失われることが示唆された。おそらく反故意に、ある条件下では、高い分散の部分空間は、暗記効果によって最初に失われることが分かる。これは、データの顕著な特徴が個々のトレーニングポイントで完全に崩壊することなく記憶されるような、次元性の選択的損失につながる。我々は、画像データセットと線形多様体の両方で訓練されたネットワーク上での網羅的な実験により、我々の理論を検証し、理論的予測と顕著な定性的な一致をもたらす。 Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# 言語識別のためのN-gramから事前学習多言語モデルへ From N-grams to Pre-trained Multilingual Models For Language Identification ( http://arxiv.org/abs/2410.08728v1 ) ライセンス: Link先を確認	Thapelo Sindane, Vukosi Marivate,	(参考訳) 本稿では,南アフリカの11言語を対象としたN-gramモデルとLarge Pre-trained Multilingual Model for Language Identification (LID)について検討する。 N-gramモデルでは、各言語を効率的にモデル化する対象言語の効果的な頻度分布を確立するために、有効なデータサイズ選択が依然として不可欠であることを示し、言語ランキングを改善した。事前学習された多言語モデルに対しては、mBERT、RemBERT、XLM-r、Afri中心の多言語モデル(AfriBERTa、Afro-XLMr、AfroLM、Serengeti)の多言語モデル群をカバーする広範な実験を行う。さらに、これらのモデルを利用可能な大規模言語識別ツールと比較する: コンパクト言語検出器 v3 (CLD V3)、AfroLID、GlotLID、OpenLID。これらのことから,SerengetiはN-gramsからTransformerに平均して優れたモデルであることを示す。さらに,NHCLT + Vukzenzele corpus で訓練した軽量な BERT ベース LID モデル (za_BERT_lid) を提案する。 In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual models for Language Identification (LID) across 11 South African languages. For N-gram models, this study shows that effective data size selection remains crucial for establishing effective frequency distributions of the target languages, that efficiently model each language, thus, improving language ranking. For pre-trained multilingual models, we conduct extensive experiments covering a diverse set of massively pre-trained multilingual (PLM) models -- mBERT, RemBERT, XLM-r, and Afri-centric multilingual models -- AfriBERTa, Afro-XLMr, AfroLM, and Serengeti. We further compare these models with available large-scale Language Identification tools: Compact Language Detector v3 (CLD V3), AfroLID, GlotLID, and OpenLID to highlight the importance of focused-based LID. From these, we show that Serengeti is a superior model across models: N-grams to Transformers on average. Moreover, we propose a lightweight BERT-based LID model (za_BERT_lid) trained with NHCLT + Vukzenzele corpus, which performs on par with our best-performing Afri-centric models.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# 大規模言語モデルにおける韓国法言語理解のための実践的ベンチマークの開発 Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models ( http://arxiv.org/abs/2410.08731v1 ) ライセンス: Link先を確認	Yeeun Kim, Young Rok Choi, Eunkyung Choi, Jinhwan Choi, Hai Jin Park, Wonseok Hwang,	(参考訳) 大規模言語モデル (LLM) は法域において顕著な性能を示しており、GPT-4 は米国におけるUniform Bar Exam をパスしている。しかし、その効力は英語以外の言語における非標準化されたタスクやタスクに限られている。このことは、適用前に各法体系内でのLCMの慎重な評価の必要性を浮き彫りにしている。ここでは,(1)法的知識タスク(510例),(2)法的推論タスク(288例),(3)韓国の司法試験(4ドメイン,53タスク,2,510例)からなる,LLMの韓国語法的理解を評価するためのベンチマークであるKBLを紹介する。最初の2つのデータセットは、弁護士と密接なコラボレーションで開発され、現実的なシナリオにおいて、認定された方法でLSMを評価する。さらに, 法律実務者が研究に多用する広範囲な法律文書について考察し, 内部知識にのみ依存する閉書環境と, 韓国法や先例のコーパスを用いて, 検索強化世代(RAG)環境の両面からLCMを評価した。結果は、改善の余地と機会を示している。 Large language models (LLMs) have demonstrated remarkable performance in the legal domain, with GPT-4 even passing the Uniform Bar Exam in the U.S. However their efficacy remains limited for non-standardized tasks and tasks in languages other than English. This underscores the need for careful evaluation of LLMs within each legal system before application. Here, we introduce KBL, a benchmark for assessing the Korean legal language understanding of LLMs, consisting of (1) 7 legal knowledge tasks (510 examples), (2) 4 legal reasoning tasks (288 examples), and (3) the Korean bar exam (4 domains, 53 tasks, 2,510 examples). First two datasets were developed in close collaboration with lawyers to evaluate LLMs in practical scenarios in a certified manner. Furthermore, considering legal practitioners' frequent use of extensive legal documents for research, we assess LLMs in both a closed book setting, where they rely solely on internal knowledge, and a retrieval-augmented generation (RAG) setting, using a corpus of Korean statutes and precedents. The results indicate substantial room and opportunities for improvement.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# フェデレートラーニングにおける深層漏洩防止のためのグラディエント Gradients Stand-in for Defending Deep Leakage in Federated Learning ( http://arxiv.org/abs/2410.08734v1 ) ライセンス: Link先を確認	H. Yi, H. Ren, C. Hu, Y. Li, J. Deng, X. Xie,	(参考訳) フェデレートラーニング(FL)はプライバシ保護の基盤となり、モデル勾配を中央サーバに送信するのみながら、機密データのローカライズにパラダイムをシフトしている。この戦略は、プライバシ保護を強化し、集中型データストレージシステムに固有の脆弱性を最小限にするように設計されている。その革新的なアプローチにもかかわらず、最近の実証研究は、特に勾配の交換に関して、FLの潜在的な弱点を強調している。そこで本研究では,勾配漏れ防止,すなわち<AdaDefense</a>の新たな効果的手法を提案する。モデル収束は、異なるタイプの最適化手法を用いて達成できるという考えに従えば、中央サーバ上のグローバル勾配集約のための実際の局所勾配ではなく、局所的なスタンドインを使うことを提案する。提案手法は, 勾配リークを効果的に防止するだけでなく, モデル全体の性能に大きな影響を与えないことを保証する。提案手法の有効性を裏付ける理論的枠組みを提示する。一般的なベンチマーク実験によって支持された広範囲な実証実験により、我々のアプローチがモデル完全性を維持し、勾配リークに対して堅牢であることが確認され、安全かつ効率的なFLを追求する上で重要なステップとなる。 Federated Learning (FL) has become a cornerstone of privacy protection, shifting the paradigm towards localizing sensitive data while only sending model gradients to a central server. This strategy is designed to reinforce privacy protections and minimize the vulnerabilities inherent in centralized data storage systems. Despite its innovative approach, recent empirical studies have highlighted potential weaknesses in FL, notably regarding the exchange of gradients. In response, this study introduces a novel, efficacious method aimed at safeguarding against gradient leakage, namely, ``AdaDefense". Following the idea that model convergence can be achieved by using different types of optimization methods, we suggest using a local stand-in rather than the actual local gradient for global gradient aggregation on the central server. This proposed approach not only effectively prevents gradient leakage, but also ensures that the overall performance of the model remains largely unaffected. Delving into the theoretical dimensions, we explore how gradients may inadvertently leak private information and present a theoretical framework supporting the efficacy of our proposed method. Extensive empirical tests, supported by popular benchmark experiments, validate that our approach maintains model integrity and is robust against gradient leakage, marking an important step in our pursuit of safe and efficient FL.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# Bad Nebors:VPNプロバイダネットワークの理解について Bad Neighbors: On Understanding VPN Provider Networks ( http://arxiv.org/abs/2410.08737v1 ) ライセンス: Link先を確認	Teemu Rytilahti, Thorsten Holz,	(参考訳) VPN(Virtual Private Network)ソリューションは、インターネット上でプライベートネットワークを安全に接続するために使用される。企業環境でのメリットに加えて、VPNはプライバシを重視したユーザに対して、プライバシを保護し、位置情報ベースのコンテンツブロックや検閲を回避するために販売されている。これにより、世界中の多くのファンタージュポイントを月間価格で提供するターンキーVPNサービスの市場が誕生した。 VPNプロバイダは、マーケティングにプライバシーとセキュリティの利点を多用しているが、そのような主張は一般的には測定と裏付けが難しい。 VPNエコシステムに関する研究はいくつかあるが、これまでのすべての研究は、分析において重要な部分を省略している。 i) プロバイダは,自身のネットワークインフラストラクチャをどのように構成し,セキュアにするか? そして (ii)顧客を他の顧客からどのくらい守っているか。これらの質問に答えるために,VPNプロバイダとその数千のVPNエンドポイントを大規模に分析する自動計測システムを開発した。 VPNが非インターネットルータのIPアドレスを使って内部で動作しているという事実を考えると、他のアクセス不能なネットワークへのアクセスを可能にする可能性がある。適切に保護されていない場合は、これらのプロバイダの内部ネットワークを不注意に公開するか、あるいはサービスに接続された他のクライアントでさえ悪くなる可能性がある。結果から,他のVPN顧客が直接公開されていない場合であっても,テスト対象のVPNサービスプロバイダの大部分において,内部で不安定なネットワークに対するトラフィックフィルタリングが広く欠如していることが示唆された。我々は、影響を受けた提供者や他の利害関係者にこの調査結果を開示し、状況を改善するためのガイダンスを提供した。 Virtual Private Network (VPN) solutions are used to connect private networks securely over the Internet. Besides their benefits in corporate environments, VPNs are also marketed to privacy-minded users to preserve their privacy, and to bypass geolocation-based content blocking and censorship. This has created a market for turnkey VPN services offering a multitude of vantage points all over the world for a monthly price. While VPN providers are heavily using privacy and security benefits in their marketing, such claims are generally hard to measure and substantiate. While there exist some studies on the VPN ecosystem, all prior works omit a critical part in their analyses: (i) How well do the providers configure and secure their own network infrastructure? and (ii) How well are they protecting their customers from other customers? To answer these questions, we have developed an automated measurement system with which we conduct a large-scale analysis of VPN providers and their thousands of VPN endpoints. Considering the fact that VPNs work internally using non-Internet-routable IP addresses, they might enable access to otherwise inaccessible networks. If not properly secured, this can inadvertently expose internal networks of these providers, or worse, even other clients connected to their services. Our results indicate a widespread lack of traffic filtering towards internally routable networks on the majority of tested VPN service providers, even in cases where no other VPN customers were directly exposed. We have disclosed our findings to the affected providers and other stakeholders, and offered guidance to improve the situation.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# MMLF:不確かさ推定によるオブジェクト検出のためのマルチモーダル・マルチクラスレイトフュージョン MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation ( http://arxiv.org/abs/2410.08739v1 ) ライセンス: Link先を確認	Qihang Yang, Yang Zhao, Hong Cheng,	(参考訳) 自律運転は、単一モーダルアプローチに関連する制限を克服するために、複数のモーダルからの情報を統合する高度な物体検出技術を必要とする。早期核融合と複雑度における多様なデータの整合性の課題は、深層核融合がもたらす過度な問題と相まって、決定レベルでの後期核融合の有効性を浮き彫りにした。後期核融合は、元の検出器のネットワーク構造を変更することなくシームレスな統合を保証する。本稿では,マルチクラス検出が可能なレイトフュージョンのための先駆的マルチモーダル・マルチクラスレイトフュージョン法を提案する。 KITTI検証と公式試験データセットを用いた核融合実験は、自律運転における多モード物体検出のための汎用的なソリューションとして我々のモデルを提示し、大幅な性能向上を示す。さらに,本手法では,不確実性解析を分類融合プロセスに組み込んで,モデルをより透明で信頼性の高いものにし,カテゴリ予測に対する信頼性の高い洞察を提供する。 Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# Hespi: Hebarium検体シートからの情報を自動的に検出するパイプライン Hespi: A pipeline for automatically detecting information from hebarium specimen sheets ( http://arxiv.org/abs/2410.08740v1 ) ライセンス: Link先を確認	Robert Turnbull, Emily Fitzgerald, Karen Thompson, Joanne L. Birch,	(参考訳) 生物多様性に関するデータは、生物、環境、気候、および保護科学のために研究されている。検体画像からのデータの抽出にはレートシフトが必要であり、これらのデータの人為的転写に依存するボトルネックを排除できる。我々は,先進的なコンピュータビジョン技術を用いて,デジタル画像からハーバリウム標本の収集データのカタログ前サブセットを抽出する"Hespi"(Herbarium Specimen sheet PIpeline)を開発した。パイプラインは2つのオブジェクト検出モデルを統合する。第1はテキストベースのラベルのバウンディングボックスを検出し、第2はテキストベースのデータフィールドのバウンディングボックスを検出する。このパイプラインは、テキストベースの機関ラベルを印刷、タイプ、手書き、または組み合わせとして分類し、データ抽出に光学文字認識(OCR)と手書き文字認識(HTR)を適用する。認識されたテキストは、分類名の権威あるデータベースに対して修正される。抽出されたテキストは、マルチモーダル大言語モデル(LLM)の側近で補正される。 Hespiは、国際的な草原から標本シート画像を含むテストデータセットのテキストを正確に検出し、抽出する。パイプラインのコンポーネントはモジュール化されており、ユーザは自身のデータで独自のモデルをトレーニングし、提供されたモデルの代わりに使用することができる。 Specimen associated biodiversity data are sought after for biological, environmental, climate, and conservation sciences. A rate shift is required for the extraction of data from specimen images to eliminate the bottleneck that the reliance on human-mediated transcription of these data represents. We applied advanced computer vision techniques to develop the `Hespi' (HErbarium Specimen sheet PIpeline), which extracts a pre-catalogue subset of collection data on the institutional labels on herbarium specimens from their digital images. The pipeline integrates two object detection models; the first detects bounding boxes around text-based labels and the second detects bounding boxes around text-based data fields on the primary institutional label. The pipeline classifies text-based institutional labels as printed, typed, handwritten, or a combination and applies Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for data extraction. The recognized text is then corrected against authoritative databases of taxon names. The extracted text is also corrected with the aide of a multimodal Large Language Model (LLM). Hespi accurately detects and extracts text for test datasets including specimen sheet images from international herbaria. The components of the pipeline are modular and users can train their own models with their own data and use them in place of the models provided.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# Look Gauss, no Pose: 正確な Pose 初期化を伴わない Gaussian Splatting を用いた新しいビュー合成 Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization ( http://arxiv.org/abs/2410.08743v1 ) ライセンス: Link先を確認	Christian Schmidt, Jens Piekenbrinck, Bastian Leibe,	(参考訳) 3D Gaussian Splattingは、最近、一連の入力画像から高速で正確なノベルビュー合成のための強力なツールとして登場した。しかし、多くの新しいビュー合成アプローチと同様に、正確なカメラポーズ情報に依存しており、正確なカメラポーズの取得が難しい、あるいは不可能な現実のシナリオにおける適用性を制限している。本稿では, 外部カメラパラメータを測光残差に対して最適化することにより, 3次元ガウス散乱フレームワークの拡張を提案する。解析的勾配を導出し、その計算を既存の高性能CUDA実装と統合する。これにより、6-DoFカメラのポーズ推定などの下流タスクや、関節再建やカメラの改良が可能になる。特に,現実の場面におけるポーズ推定の高速化と高精度化を実現している。提案手法は,3次元シーンを高精度なポーズ情報を必要とせず,幾何学とカメラのポーズを協調的に最適化すると同時に,新しいビュー合成における最先端の結果を達成し,迅速な3次元シーンの再構築を可能にする。我々のアプローチは、競合するほとんどのメソッドよりも大幅に高速で、レンダリングでは数倍高速です。実世界のシーンや複雑な軌跡をシミュレーションし、LLFF上での最先端の成果を達成しつつ、最も効率的な競合手法と比較してランタイムを2倍から4倍に減らした。ソースコードはhttps://github.com/Schmiddo/noposegs.comで入手できる。 3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at https://github.com/Schmiddo/noposegs .	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# 最適輸送によるゼロショットオフライン模倣学習 Zero-Shot Offline Imitation Learning via Optimal Transport ( http://arxiv.org/abs/2410.08751v1 ) ライセンス: Link先を確認	Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, Georg Martius,	(参考訳) ゼロショットの模倣学習アルゴリズムは、テスト時にたった1つのデモから、目に見えない振る舞いを再現するという約束を持っている。既存の実践的なアプローチでは、専門家のデモンストレーションを一連の目標と見なし、ハイレベルなゴールセレクタと低レベルなゴール条件のポリシーで模倣を可能にする。しかし、この枠組みは、個々の目標を達成するためのエージェントの即時行動は、長期的な目的を損なう可能性がある。そこで本研究では,模倣学習に固有の占領目標を直接最適化することにより,この問題を緩和する新しい手法を提案する。本稿では,目標条件付き値関数を,学習世界モデルを用いて近似した占領地間距離に引き上げることを提案する。得られた手法は、オフラインで最適でないデータから学習することができ、複雑な連続ベンチマークで示すように、非ミオピックでゼロショットの模倣が可能である。 Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# PILLAR:AIによるプライバシ脅威モデリングツール PILLAR: an AI-Powered Privacy Threat Modeling Tool ( http://arxiv.org/abs/2410.08755v1 ) ライセンス: Link先を確認	Majid Mollaeefar, Andrea Bissoli, Silvio Ranise,	(参考訳) LLM(Large Language Models)の急速な進化により、プライバシエンジニアリングを含む幅広い分野に人工知能を適用する新たな可能性が解放された。現代のアプリケーションは、機密性の高いユーザーデータを扱いやすくなっているため、プライバシーを守ることがこれまで以上に重要になっている。プライバシーを効果的に保護するには、システム開発プロセスの初期に潜在的な脅威を特定し、対処する必要がある。 LINDDUNのようなフレームワークは、これらのリスクを明らかにするための構造化されたアプローチを提供するが、その価値にもかかわらず、かなりの手作業、専門家の入力、詳細なシステム知識を必要とすることが多い。これにより、プロセスに時間がかかり、エラーが発生しやすい。 LINDDUNのような現在のプライバシー脅威モデリング手法は、一般的に複雑なデータフロー図(DFD)の作成と分析と、潜在的なプライバシー問題を特定するためのシステム記述に依存している。これらのアプローチは徹底しているが、ユーザが提供するデータの正確さに大きく依存しているため、面倒である可能性がある。さらに、彼らはしばしば、どのように優先順位を付けるかを明確に示さずに、多くの脅威を発生させます。これらの課題に対応するために, LLMとLINDDUNフレームワークを統合する新たなツールであるPILLAR(Privacy Risk Identification with LINDDUN and LLM Analysis Report)を導入する。 PILLARは、DFDの生成、脅威の分類、リスクの優先順位付けなど、LINDDUNプロセスの重要な部分を自動化する。 LLMの機能を活用することで、PILLARはシステムの自然言語記述をユーザからの最小限の入力で包括的脅威モデルに変換し、開発者の作業負荷を低減し、プロセスの効率性と正確性を向上させることができる。 The rapid evolution of Large Language Models (LLMs) has unlocked new possibilities for applying artificial intelligence across a wide range of fields, including privacy engineering. As modern applications increasingly handle sensitive user data, safeguarding privacy has become more critical than ever. To protect privacy effectively, potential threats need to be identified and addressed early in the system development process. Frameworks like LINDDUN offer structured approaches for uncovering these risks, but despite their value, they often demand substantial manual effort, expert input, and detailed system knowledge. This makes the process time-consuming and prone to errors. Current privacy threat modeling methods, such as LINDDUN, typically rely on creating and analyzing complex data flow diagrams (DFDs) and system descriptions to pinpoint potential privacy issues. While these approaches are thorough, they can be cumbersome, relying heavily on the precision of the data provided by users. Moreover, they often generate a long list of threats without clear guidance on how to prioritize them, leaving developers unsure of where to focus their efforts. In response to these challenges, we introduce PILLAR (Privacy risk Identification with LINDDUN and LLM Analysis Report), a new tool that integrates LLMs with the LINDDUN framework to streamline and enhance privacy threat modeling. PILLAR automates key parts of the LINDDUN process, such as generating DFDs, classifying threats, and prioritizing risks. By leveraging the capabilities of LLMs, PILLAR can take natural language descriptions of systems and transform them into comprehensive threat models with minimal input from users, reducing the workload on developers and privacy experts while improving the efficiency and accuracy of the process.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# アーキテクチャに依存しないグラフ変換によるGNNの強化:システム分析 Enhancing GNNs with Architecture-Agnostic Graph Transformations: A Systematic Analysis ( http://arxiv.org/abs/2410.08759v1 ) ライセンス: Link先を確認	Zhifei Li, Gerrit Großmann, Verena Wolf,	(参考訳) 近年、さまざまなグラフニューラルネットワーク(GNN)アーキテクチャが登場し、それぞれに独自の長所、短所、複雑さがある。 GNNの性能を高めるための前処理ステップとして,リスイッチやリフト,中央値のノードアノテーションなど,さまざまなテクニックが採用されている。しかし、広く受け入れられているベストプラクティスは存在しない。アーキテクチャと事前処理がパフォーマンスに与える影響は、しばしば不透明である。本研究では,標準データセット間の共通GNNアーキテクチャの性能に対する前処理ステップとして,グラフ変換が与える影響を系統的に検討する。これらのモデルは、表現性と呼ばれる非同型グラフを識別する能力に基づいて評価される。以上の結果から,特定の変換,特に集中度を指標とした拡張ノードの特徴は,常に表現性を向上することが明らかとなった。しかし、グラフ符号化のような手法は表現性を高めつつ、広く使われているピソンパッケージの数値的不正確さを導入している。さらに、3-WLと4-WLの区別不能なグラフを含む複雑なタスクに対処する場合、これらの前処理技術は限定的である。 In recent years, a wide variety of graph neural network (GNN) architectures have emerged, each with its own strengths, weaknesses, and complexities. Various techniques, including rewiring, lifting, and node annotation with centrality values, have been employed as pre-processing steps to enhance GNN performance. However, there are no universally accepted best practices, and the impact of architecture and pre-processing on performance often remains opaque. This study systematically explores the impact of various graph transformations as pre-processing steps on the performance of common GNN architectures across standard datasets. The models are evaluated based on their ability to distinguish non-isomorphic graphs, referred to as expressivity. Our findings reveal that certain transformations, particularly those augmenting node features with centrality measures, consistently improve expressivity. However, these gains come with trade-offs, as methods like graph encoding, while enhancing expressivity, introduce numerical inaccuracies widely-used python packages. Additionally, we observe that these pre-processing techniques are limited when addressing complex tasks involving 3-WL and 4-WL indistinguishable graphs.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# FedNLのアンロック:自己完結型コンピュータ最適化実装 Unlocking FedNL: Self-Contained Compute-Optimized Implementation ( http://arxiv.org/abs/2410.08760v1 ) ライセンス: Link先を確認	Konstantin Burlachenko, Peter Richtárik,	(参考訳) Federated Learning(FL)は、インテリジェントエージェントが機械学習(ML)モデルを分散的にトレーニングし、ローカルデータを共有する必要がなくなる、新たなパラダイムである。最近の研究(arXiv:2106.02969)ではフェデレートニュートン学習(FedNL)アルゴリズムのファミリーが導入されており、FLと大規模最適化に二階法を適用するための重要なステップとなっている。しかし、FedNLの試作機は、3つの重大な欠点を示している。 (i)厳格なワークステーションで1回の実験を行うのに4.8時間を要する。 (ii)プロトタイプはマルチノード設定のみをシミュレートする。 (iii)リソース制約のあるアプリケーションへのプロトタイプの統合は困難である。理論と実践のギャップを埋めるため,単一ノードおよび複数ノード設定のためのFedNL,FedNL-LS,FedNL-PPの自己完結実装を提案する。我々の研究は上記の問題を解決し、壁時計の時間をx1000に短縮する。このFedNLは、単一ノード -- CVXPY (arXiv:1603.00943) とマルチノード -- Apache Spark (arXiv:1505.06807)、Ray/Scikit-Learn (arXiv:1712.05889) でロジスティック回帰をトレーニングする代替手段より優れている。最後に,FedNLの適応型TopLEKとキャッシュ対応RandSeqKの2つの実測圧縮機を提案し,FedNLの理論を満たす。 Federated Learning (FL) is an emerging paradigm that enables intelligent agents to collaboratively train Machine Learning (ML) models in a distributed manner, eliminating the need for sharing their local data. The recent work (arXiv:2106.02969) introduces a family of Federated Newton Learn (FedNL) algorithms, marking a significant step towards applying second-order methods to FL and large-scale optimization. However, the reference FedNL prototype exhibits three serious practical drawbacks: (i) It requires 4.8 hours to launch a single experiment in a sever-grade workstation; (ii) The prototype only simulates multi-node setting; (iii) Prototype integration into resource-constrained applications is challenging. To bridge the gap between theory and practice, we present a self-contained implementation of FedNL, FedNL-LS, FedNL-PP for single-node and multi-node settings. Our work resolves the aforementioned issues and reduces the wall clock time by x1000. With this FedNL outperforms alternatives for training logistic regression in a single-node -- CVXPY (arXiv:1603.00943), and in a multi-node -- Apache Spark (arXiv:1505.06807), Ray/Scikit-Learn (arXiv:1712.05889). Finally, we propose two practical-orientated compressors for FedNL - adaptive TopLEK and cache-aware RandSeqK, which fulfill the theory of FedNL.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# 個人情報のクロスチェーン共有:不均一で相互運用可能なブロックチェーン Cross-chain Sharing of Personal Health Records: Heterogeneous and Interoperable Blockchains ( http://arxiv.org/abs/2410.08762v1 ) ライセンス: Link先を確認	Yongyang Lv, Xiaohong Li, Yingwenbo Wang, Kui Chen, Zhe Hou, Ruitao Feng,	(参考訳) 医療情報学の普及に伴い、貴重な個人健康記録(PHR)が生み出されている。同時に、ブロックチェーン技術は医療機関のセキュリティを強化した。しかしながら、これらの機関は、しばしば孤立したデータサイロとして機能し、PHRの潜在的な価値を制限する。異なるブロックチェーン上の病院間でのデータ共有の需要が高まるにつれ、クロスチェーンデータ共有の課題への対処が重要になる。ブロックチェーン間でPHRを共有する場合、医療用IoT(Internet of Things)デバイスの限られたストレージと計算能力は、大量のPHRのストレージと複雑な計算処理を複雑にする。さらに、さまざまなブロックチェーン暗号システムと内部攻撃のリスクにより、PHRのクロスチェーン共有はさらに複雑になる。本稿では、異種および相互運用可能なブロックチェーン間でPHRを共有する手法を提案する。医療用IoTデバイスは、InterPlanetary File SystemでリアルタイムPHRを暗号化し、保存することができる。拡張されたプロキシ再暗号化(PRE)アルゴリズムは、ブロックチェーン暗号システムの違いに対処する。多次元解析は、このスキームが堅牢なセキュリティと優れた性能を提供することを示した。 With the widespread adoption of medical informatics, a wealth of valuable personal health records (PHR) has been generated. Concurrently, blockchain technology has enhanced the security of medical institutions. However, these institutions often function as isolated data silos, limiting the potential value of PHRs. As the demand for data sharing between hospitals on different blockchains grows, addressing the challenge of cross-chain data sharing becomes crucial. When sharing PHRs across blockchains, the limited storage and computational capabilities of medical Internet of Things (IoT) devices complicate the storage of large volumes of PHRs and the handling of complex calculations. Additionally, varying blockchain cryptosystems and the risk of internal attacks further complicate the cross-chain sharing of PHRs. This paper proposes a scheme for sharing PHRs across heterogeneous and interoperable blockchains. Medical IoT devices can encrypt and store real-time PHRs in an InterPlanetary File System, requiring only simple operations for data sharing. An enhanced proxy re-encryption(PRE) algorithm addresses the differences in blockchain cryptosystems. Multi-dimensional analysis demonstrates that this scheme offers robust security and excellent performance.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# 法的質問応答システムの基盤性測定 Measuring the Groundedness of Legal Question-Answering Systems ( http://arxiv.org/abs/2410.08764v1 ) ライセンス: Link先を確認	Dietrich Trautmann, Natalia Ostapuk, Quentin Grail, Adrian Alan Pol, Guglielmo Bonifazi, Shang Gao, Martin Gajek,	(参考訳) 法的問合せのような高度な領域では、生成的AIシステムの正確性と信頼性が最重要となる。本研究は、AI生成応答の基盤性を評価するための様々な手法の総合的なベンチマークを示し、信頼性を大幅に向上することを目的としている。我々の実験には、類似度に基づくメトリクスと自然言語推論モデルが含まれており、応答が与えられた文脈で十分に確立されているかどうかを評価する。また,大規模言語モデルに対する異なるプロンプト戦略を探求し,非接地応答の検出を改善する。提案手法の有効性を,資料との整合性に着目した検索強化プロンプトから,法的なクエリと対応する応答に特化して設計されたグラウンドニング分類コーパスを用いて検証した。その結果, 生成応答の基底性分類の可能性を示し, マクロF1スコアは0.8。さらに、本手法は、通常、生成プロセスに従って、実際のアプリケーションに適合するかどうかを判断するために、レイテンシの観点から評価した。この機能は、追加の手動検証や自動応答再生を引き起こすプロセスに必須である。本研究は, 法的な環境下での生成AIの信頼性を向上させるために, 様々な検出手法の可能性を示すものである。 In high-stakes domains like legal question-answering, the accuracy and trustworthiness of generative AI systems are of paramount importance. This work presents a comprehensive benchmark of various methods to assess the groundedness of AI-generated responses, aiming to significantly enhance their reliability. Our experiments include similarity-based metrics and natural language inference models to evaluate whether responses are well-founded in the given contexts. We also explore different prompting strategies for large language models to improve the detection of ungrounded responses. We validated the effectiveness of these methods using a newly created grounding classification corpus, designed specifically for legal queries and corresponding responses from retrieval-augmented prompting, focusing on their alignment with source material. Our results indicate potential in groundedness classification of generated responses, with the best method achieving a macro-F1 score of 0.8. Additionally, we evaluated the methods in terms of their latency to determine their suitability for real-world applications, as this step typically follows the generation process. This capability is essential for processes that may trigger additional manual verification or automated response regeneration. In summary, this study demonstrates the potential of various detection methods to improve the trustworthiness of generative AI in legal settings.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# スーパータグの特徴をニューラル不連続なconstituent Parsingに統合する Integrating Supertag Features into Neural Discontinuous Constituent Parsing ( http://arxiv.org/abs/2410.08766v1 ) ライセンス: Link先を確認	Lukas Mielczarek,	(参考訳) 構文解析は自然言語処理において必須であり、構成構造は構文の記述として広く使われている。伝統的な選挙区の見解では、構成要素は隣接した単語で構成されていることが要求されるが、これはドイツ語のような言語に共通する非局所的な依存関係による構文の分析において困難を生じさせる。そのため、ドイツ語のNeGraやTIGER、英語のDPTBのような多くのツリーバンクでは、長距離依存は横断エッジによって表現される。様々な文法形式は不連続な木を記述するために使われてきた。トランジションベースの構文解析は、明示的な文法の必要性を排除して、この要因を減らすことを目的としている。代わりに、ニューラルネットワークは、大きな注釈付きコーパスで教師付き学習を使用して、生テキスト入力が与えられた木を生成するように訓練される。 Coavoux と Cohen (2019) によって開発されたスタックフリーな遷移型構文解析器のエレガントな提案は、最悪の2次時間における文上の不連続な構成木を導出することに成功している。本研究の目的は,遷移型不連続成分解析へのスーパータグ情報の導入を検討することである。 CCG(Steedman, 1989)のような語彙化された文法形式では、情報カテゴリーは文中の単語に割り当てられ、文の構文を構成するためのビルディングブロックとして機能する。これらのスーパータグは単語の構造的役割と周囲のアイテムとの構文的関係を示す。本研究は,ニューラルパーサ(ピペリン)の追加入力として専用スーパータガーを使用し,パーシングとスーパータグ(マルチタスク)の両方のためのニューラルモデルの共同トレーニングを行うことにより,スーパータグ情報を組み込むことを検討した。 CCGに加えて、いくつかのフレームワーク(LTAG-spinal、LCFRS)やシーケンスラベリングタスク(チャンキング、依存性解析)も、解析の補助タスクとして適合性を比較する。 Syntactic parsing is essential in natural-language processing, with constituent structure being one widely used description of syntax. Traditional views of constituency demand that constituents consist of adjacent words, but this poses challenges in analysing syntax with non-local dependencies, common in languages like German. Therefore, in a number of treebanks like NeGra and TIGER for German and DPTB for English, long-range dependencies are represented by crossing edges. Various grammar formalisms have been used to describe discontinuous trees - often with high time complexities for parsing. Transition-based parsing aims at reducing this factor by eliminating the need for an explicit grammar. Instead, neural networks are trained to produce trees given raw text input using supervised learning on large annotated corpora. An elegant proposal for a stack-free transition-based parser developed by Coavoux and Cohen (2019) successfully allows for the derivation of any discontinuous constituent tree over a sentence in worst-case quadratic time. The purpose of this work is to explore the introduction of supertag information into transition-based discontinuous constituent parsing. In lexicalised grammar formalisms like CCG (Steedman, 1989) informative categories are assigned to the words in a sentence and act as the building blocks for composing the sentence's syntax. These supertags indicate a word's structural role and syntactic relationship with surrounding items. The study examines incorporating supertag information by using a dedicated supertagger as additional input for a neural parser (pipeline) and by jointly training a neural model for both parsing and supertagging (multi-task). In addition to CCG, several other frameworks (LTAG-spinal, LCFRS) and sequence labelling tasks (chunking, dependency parsing) will be compared in terms of their suitability as auxiliary tasks for parsing.	翻訳日:2024-10-30 22:15:28 公開日:2024-10-11
# リコンストラクション型チャネルプルーニングによるエッジデバイス上での多対象追跡の効率化 Efficient Multi-Object Tracking on Edge Devices via Reconstruction-Based Channel Pruning ( http://arxiv.org/abs/2410.08769v1 ) ライセンス: Link先を確認	Jan Müller, Adrian Pigors,	(参考訳) マルチオブジェクトトラッキング(MOT)技術の進歩は、重要なセキュリティとプライバシの懸念に対処しながらハイパフォーマンスを維持するという2つの課題を示す。センシティブな個人情報が関与する歩行者追跡のようなアプリケーションでは、データが外部サーバに送信された場合、プライバシー侵害やデータ誤用が重大な問題となる。これらのリスクを軽減するため、スマートカメラなどのエッジデバイス上でデータを直接処理することが、実行可能なソリューションとして浮上した。エッジコンピューティングは、センシティブな情報がローカルのままであることを保証する。しかし、エッジデバイス上でのMOTの実装には、その課題がないわけではない。エッジデバイスは通常、限られた計算資源を持ち、これらの制約の下でリアルタイムのパフォーマンスを提供できる高度に最適化されたアルゴリズムの開発を必要とする。最先端のMOTアルゴリズムの計算要求とエッジデバイスの能力の相違は、大きな障害を強調する。これらの課題に対処するために、現代のMOTシステムで使用されるような複雑なネットワークの圧縮に適したニューラルネットワークプルーニング手法を提案する。このアプローチは、NVIDIAのJetson Orin Nanoのような制限されたエッジデバイスの制約の中で、高い精度と効率を確保することで、MOTパフォーマンスを最適化する。本手法の適用により,高い精度を維持しつつモデルサイズを最大70%削減し,さらにJetson Orin Nanoの性能向上を実現し,エッジコンピューティングアプリケーションへのアプローチの有効性を実証する。 The advancement of multi-object tracking (MOT) technologies presents the dual challenge of maintaining high performance while addressing critical security and privacy concerns. In applications such as pedestrian tracking, where sensitive personal data is involved, the potential for privacy violations and data misuse becomes a significant issue if data is transmitted to external servers. To mitigate these risks, processing data directly on an edge device, such as a smart camera, has emerged as a viable solution. Edge computing ensures that sensitive information remains local, thereby aligning with stringent privacy principles and significantly reducing network latency. However, the implementation of MOT on edge devices is not without its challenges. Edge devices typically possess limited computational resources, necessitating the development of highly optimized algorithms capable of delivering real-time performance under these constraints. The disparity between the computational requirements of state-of-the-art MOT algorithms and the capabilities of edge devices emphasizes a significant obstacle. To address these challenges, we propose a neural network pruning method specifically tailored to compress complex networks, such as those used in modern MOT systems. This approach optimizes MOT performance by ensuring high accuracy and efficiency within the constraints of limited edge devices, such as NVIDIA's Jetson Orin Nano. By applying our pruning method, we achieve model size reductions of up to 70% while maintaining a high level of accuracy and further improving performance on the Jetson Orin Nano, demonstrating the effectiveness of our approach for edge computing applications.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# 治療結果予測のための因果機械学習 Causal machine learning for predicting treatment outcomes ( http://arxiv.org/abs/2410.08770v1 ) ライセンス: Link先を確認	Stefan Feuerriegel, Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Konstantin Hess, Alicia Curth, Stefan Bauer, Niki Kilbertus, Isaac S. Kohane, Mihaela van der Schaar,	(参考訳) 因果機械学習(ML)は、有効性と毒性を含む治療結果を予測するフレキシブルでデータ駆動の方法を提供し、薬物の評価と安全性をサポートする。因果MLの重要な利点は、個別化された治療効果を推定できるため、臨床的な意思決定を個々の患者プロファイルにパーソナライズすることができることである。因果MLは、臨床治験データと、臨床登録や電子健康記録などの実世界のデータの両方と組み合わせて使用することができるが、バイアスや誤予測を避けるには注意が必要である。本稿では、因果ML(従来の統計学やMLのアプローチ)の利点を論じ、主要な構成要素と手順を概説する。最後に, 因果MLの信頼性とクリニックへの効果的な翻訳を推奨する。 Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# HpEIS: マルチメディアインタラクティブシステムのための手話埋め込み学習 HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems ( http://arxiv.org/abs/2410.08779v1 ) ライセンス: Link先を確認	Songpei Xu, Xuri Ge, Chaitanya Kaul, Roderick Murray-Smith,	(参考訳) 本稿では,ユーザのフレキシブルな手ポーズを,様々な手ポーズで訓練された可変オートエンコーダ(VAE)を用いて2次元の視覚空間にマッピングする仮想センサとして,HpEIS(Hand-pose Embedding Interactive System)を提案する。 HpEISは、カメラのみを外部手ポーズ取得装置として使用することにより、マルチメディアコレクションにおけるユーザ探索の視覚的解釈と誘導可能なサポートを可能にする。システム安定性とスムーズな要件に関する一般的なユーザビリティの問題について,専門家や未経験者のパイロット実験を通じて確認する。次に、手動データ強化、損失関数に反ジッタ正規化項を追加し、回転点の安定化と1ユーロフィルタに基づく後処理の平滑化を含む、安定性と平滑化の改善を図った。目標選択実験(n=12)において,動作指示窓条件を使わずに,タスク完了時間と目標地点までの最終距離を測定してHpEISを評価する。 HpEISは学習可能、柔軟、安定、スムーズな手の動きのインタラクション体験を提供する。 We present a novel Hand-pose Embedding Interactive System (HpEIS) as a virtual sensor, which maps users' flexible hand poses to a two-dimensional visual space using a Variational Autoencoder (VAE) trained on a variety of hand poses. HpEIS enables visually interpretable and guidable support for user explorations in multimedia collections, using only a camera as an external hand pose acquisition device. We identify general usability issues associated with system stability and smoothing requirements through pilot experiments with expert and inexperienced users. We then design stability and smoothing improvements, including hand-pose data augmentation, an anti-jitter regularisation term added to loss function, stabilising post-processing for movement turning points and smoothing post-processing based on One Euro Filters. In target selection experiments (n=12), we evaluate HpEIS by measures of task completion time and the final distance to target points, with and without the gesture guidance window condition. Experimental responses indicate that HpEIS provides users with a learnable, flexible, stable and smooth mid-air hand movement interaction experience.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# VideoSAM: オープンワールドビデオセグメンテーション VideoSAM: Open-World Video Segmentation ( http://arxiv.org/abs/2410.08781v1 ) ライセンス: Link先を確認	Pinxue Guo, Zixu Zhao, Jianxiong Gao, Chongruo Wu, Tong He, Zheng Zhang, Tianjun Xiao, Wenqiang Zhang,	(参考訳) ビデオセグメンテーションは、ロボット工学と自動運転の進歩、特にビデオフレーム間の連続的な知覚とオブジェクトの関連が重要となるオープンワールド環境では、不可欠である。 Segment Anything Model(SAM)は静的画像セグメンテーションに優れているが、その能力をビデオセグメンテーションに拡張することは大きな課題である。私たちは2つの大きなハードルに取り組みます。 a) フレーム間のオブジェクトの関連付けにおけるSAMの埋め込み制限 b) 対象区分の粒度不整合この目的のために,動的環境におけるオブジェクト追跡とセグメンテーションの整合性を改善することで,これらの課題に対処するためのエンドツーエンドフレームワークであるVideoSAMを紹介した。 VideoSAMは集約されたバックボーンRADIOを統合し、類似度メトリクスを通じてオブジェクトアソシエーションを可能にし、安定したオブジェクトトラッキングのためのメモリメカニズムを備えたCycle-ack-Pairs Propagationを導入している。さらに,SAMデコーダ内に自己回帰型オブジェクトトークン機構を組み込んで,フレーム間の一貫した粒度を維持する。提案手法は, UVO と BURST のベンチマーク, および RoboTAP のロボットビデオで広範に評価され, 実世界のシナリオにおけるその有効性とロバスト性を示す。すべてのコードは利用可能です。 Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception and object association across video frames are critical. While the Segment Anything Model (SAM) has excelled in static image segmentation, extending its capabilities to video segmentation poses significant challenges. We tackle two major hurdles: a) SAM's embedding limitations in associating objects across frames, and b) granularity inconsistencies in object segmentation. To this end, we introduce VideoSAM, an end-to-end framework designed to address these challenges by improving object tracking and segmentation consistency in dynamic environments. VideoSAM integrates an agglomerated backbone, RADIO, enabling object association through similarity metrics and introduces Cycle-ack-Pairs Propagation with a memory mechanism for stable object tracking. Additionally, we incorporate an autoregressive object-token mechanism within the SAM decoder to maintain consistent granularity across frames. Our method is extensively evaluated on the UVO and BURST benchmarks, and robotic videos from RoboTAP, demonstrating its effectiveness and robustness in real-world scenarios. All codes will be available.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# 因果次数の効率的な微分可能発見 Efficient Differentiable Discovery of Causal Order ( http://arxiv.org/abs/2410.08787v1 ) ライセンス: Link先を確認	Mathieu Chevalley, Arash Mehrjou, Patrick Schwab,	(参考訳) Intersort, Chevalley et al (2024) のアルゴリズムでは、DAG(Directed Acyclic Graph)モデルにおける変数の因果順序を発見するためのスコアベースの手法が提案され、既存の手法より優れている。しかし、パームタヘドロン上のスコアベースの方法として、Intersortは計算コストが高く、非微分可能であり、ゲノム学や気候モデルのような大規模データセットに関わる問題や、エンドツーエンドの勾配に基づく学習フレームワークに統合される可能性を制限する。我々は、異なるソート手法とランキング手法を用いてインターソートを改定することで、この制限に対処する。提案手法により、因果順序付けのスケーラブルで微分可能な最適化が可能となり、下流タスクの正規化として連続スコア関数を組み込むことができる。実験の結果,因果発見アルゴリズムは因果順序を規則化し,提案手法の有効性を実証した。我々の研究は、因果順序の規則化を微分可能なモデルの訓練に効率的に組み込むための扉を開くことで、純粋に関連性のある教師付き学習の長期的制限に対処する。 In the algorithm Intersort, Chevalley et al. (2024) proposed a score-based method to discover the causal order of variables in a Directed Acyclic Graph (DAG) model, leveraging interventional data to outperform existing methods. However, as a score-based method over the permutahedron, Intersort is computationally expensive and non-differentiable, limiting its ability to be utilised in problems involving large-scale datasets, such as those in genomics and climate models, or to be integrated into end-to-end gradient-based learning frameworks. We address this limitation by reformulating Intersort using differentiable sorting and ranking techniques. Our approach enables scalable and differentiable optimization of causal orderings, allowing the continuous score function to be incorporated as a regularizer in downstream tasks. Empirical results demonstrate that causal discovery algorithms benefit significantly from regularizing on the causal order, underscoring the effectiveness of our method. Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models and thereby addresses a long-standing limitation of purely associational supervised learning.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# Superpipeline: 大規模モデルにおけるGPUメモリ使用量削減のためのユニバーサルアプローチ Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models ( http://arxiv.org/abs/2410.08791v1 ) ライセンス: Link先を確認	Reza Abbasi, Sernam Lim,	(参考訳) 機械学習モデル、特に自然言語処理とコンピュータビジョンの急速な成長は、これらのモデルを限られたリソースでハードウェア上で実行する際の課題につながっている。本稿では,トレーニングと推論の両方において,制約ハードウェア上での大規模AIモデルの実行を最適化する新しいフレームワークであるSuperpipelineを紹介する。このアプローチでは、モデルを個々のレイヤに分割することでモデル実行を動的に管理し、これらのレイヤをGPUとCPUメモリ間で効率的に転送する。 Superpipelineは、モデル精度と許容する処理速度を維持しながら、実験でGPUメモリ使用量を最大60%削減します。これにより、利用可能なGPUメモリを超えるモデルを効果的に実行することが可能になる。推論や特定のモデルタイプに主にフォーカスする既存のソリューションとは異なり、Superpipelineは大規模言語モデル(LLM)、視覚言語モデル(VLM)、ビジョンベースモデルに適用できる。 Superpipelineのパフォーマンスを、さまざまなモデルとハードウェアセットアップでテストしました。この手法には、GPUメモリ使用量と処理速度のバランスを微調整できる2つの重要なパラメータが含まれている。重要なのは、Superpipelineはモデルのパラメータの再トレーニングや変更を必要とせず、元のモデルの出力が変わらないことを保証することだ。 Superpipelineのシンプルさと柔軟性は、限られたハードウェア上で高度なAIモデルを扱う研究者や専門家にとって有用である。これにより、既存のハードウェアでより大きなモデルやより大きなバッチサイズを使用することが可能になり、多くの機械学習アプリケーションにまたがるイノベーションをスピードアップする可能性がある。この作業は、高度なAIモデルをよりアクセスしやすくし、リソース制限された環境でのデプロイメントを最適化するための重要なステップとなる。 Superpipelineのコードはhttps://github.com/abbasiReza/super-pipeline.comで公開されている。 The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new framework designed to optimize the execution of large AI models on constrained hardware during both training and inference. Our approach involves dynamically managing model execution by dividing models into individual layers and efficiently transferring these layers between GPU and CPU memory. Superpipeline reduces GPU memory usage by up to 60% in our experiments while maintaining model accuracy and acceptable processing speeds. This allows models that would otherwise exceed available GPU memory to run effectively. Unlike existing solutions that focus mainly on inference or specific model types, Superpipeline can be applied to large language models (LLMs), vision-language models (VLMs), and vision-based models. We tested Superpipeline's performance across various models and hardware setups. The method includes two key parameters that allow fine-tuning the balance between GPU memory use and processing speed. Importantly, Superpipeline does not require retraining or changing model parameters, ensuring that the original model's output remains unchanged. Superpipeline's simplicity and flexibility make it useful for researchers and professionals working with advanced AI models on limited hardware. It enables the use of larger models or bigger batch sizes on existing hardware, potentially speeding up innovation across many machine learning applications. This work marks an important step toward making advanced AI models more accessible and optimizing their deployment in resource-limited environments. The code for Superpipeline is available at https://github.com/abbasiReza/super-pipeline.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# VLM See, Robot Do:人間のデモビデオから視覚言語モデルによるロボット行動計画へ VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model ( http://arxiv.org/abs/2410.08792v1 ) ライセンス: Link先を確認	Beichen Wang, Juexiao Zhang, Shuwen Dong, Irving Fang, Chen Feng,	(参考訳) 視覚言語モデル(VLM)は、ロボット工学において、常識推論と一般化可能性の能力のために最近採用されている。既存の作業では、自然言語命令からタスクと動作計画を生成するためにVLMを適用し、ロボット学習のためのトレーニングデータをシミュレートしている。本研究では,VLMを用いて人間のデモ映像を解釈し,ロボットによるタスク計画を生成する。本手法は,キーフレーム選択,視覚知覚,VLM推論をパイプラインに統合する。そこで我々は,VLMが人間の実演を「見る」ことができ,それに対応する計画をロボットに「見る」ように説明できるので,SeeeDoと名付けた。提案手法の有効性を検証するため,3つのカテゴリにまたがるピック・アンド・プレイス・タスクを実演する長期的人間ビデオの集合を収集し,最新のビデオ入力VLMを含むいくつかのベースラインに対して,SeeDoを総合的にベンチマークする指標セットを設計した。実験はSeeeDoの優れたパフォーマンスを示している。さらに、シミュレーション環境と実際のロボットアームの両方に、生成されたタスクプランを配置した。 Vision Language Models (VLMs) have recently been adopted in robotics for their capability in common sense reasoning and generalizability. Existing work has applied VLMs to generate task and motion planning from natural language instructions and simulate training data for robot learning. In this work, we explore using VLM to interpret human demonstration videos and generate robot task planning. Our method integrates keyframe selection, visual perception, and VLM reasoning into a pipeline. We named it SeeDo because it enables the VLM to ''see'' human demonstrations and explain the corresponding plans to the robot for it to ''do''. To validate our approach, we collected a set of long-horizon human videos demonstrating pick-and-place tasks in three diverse categories and designed a set of metrics to comprehensively benchmark SeeDo against several baselines, including state-of-the-art video-input VLMs. The experiments demonstrate SeeDo's superior performance. We further deployed the generated task plans in both a simulation environment and on a real robot arm.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# ソーシャルメディアにおけるうつ病モデルへのNLPアプローチの現状 : 新型コロナ後展望 On the State of NLP Approaches to Modeling Depression in Social Media: A Post-COVID-19 Outlook ( http://arxiv.org/abs/2410.08793v1 ) ライセンス: Link先を確認	Ana-Maria Bucur, Andreea-Codrina Moldovan, Krutika Parvatikar, Marcos Zampieri, Ashiqur R. KhudaBukhsh, Liviu P. Dinu,	(参考訳) ソーシャルメディアにおけるメンタルヘルス状態を予測するための計算的アプローチは、ここ数年で大きく研究されてきた。このトピックに関する複数の調査が公開されており、この領域における研究の総合的な説明をコミュニティに提供する。あらゆる精神状態の中で、うつ病は世界中で流行しているため、最も広く研究されている。 2020年初頭に始まった新型コロナウイルス(COVID-19)の世界的なパンデミックは、世界中のメンタルヘルスに大きな影響を与えた。新型コロナウイルスの感染拡大(ロックダウンなど)の鈍化と、その後の多くの国で経験した経済不況は、人々の生活やメンタルヘルスに大きな影響を与えている。研究によると、人口のうつ病率は50%以上増加している。本稿では、ソーシャルメディアにおける抑うつをモデル化するための自然言語処理(NLP)アプローチに関する調査を行い、新型コロナウイルス後展望を提供する。本調査は, ソーシャルメディアにおけるうつ病のモデル化に対するパンデミックの影響の理解に寄与する。新型コロナウイルスのパンデミックの状況において、最先端のアプローチと新しいデータセットがどのように使われているのかを概説する。最後に、公平性、説明責任、倫理を考慮したメンタルヘルスデータの収集・処理における倫理的問題についても論じる。 Computational approaches to predicting mental health conditions in social media have been substantially explored in the past years. Multiple surveys have been published on this topic, providing the community with comprehensive accounts of the research in this area. Among all mental health conditions, depression is the most widely studied due to its worldwide prevalence. The COVID-19 global pandemic, starting in early 2020, has had a great impact on mental health worldwide. Harsh measures employed by governments to slow the spread of the virus (e.g., lockdowns) and the subsequent economic downturn experienced in many countries have significantly impacted people's lives and mental health. Studies have shown a substantial increase of above 50% in the rate of depression in the population. In this context, we present a survey on natural language processing (NLP) approaches to modeling depression in social media, providing the reader with a post-COVID-19 outlook. This survey contributes to the understanding of the impacts of the pandemic on modeling depression in social media. We outline how state-of-the-art approaches and new datasets have been used in the context of the COVID-19 pandemic. Finally, we also discuss ethical issues in collecting and processing mental health data, considering fairness, accountability, and ethics.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# M$^3$-Impute:Msk-guided Representation Learning for Missing Value Imputation M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation ( http://arxiv.org/abs/2410.08794v1 ) ライセンス: Link先を確認	Zhongyi Yu, Zhenghao Wu, Shuhan Zhong, Weifeng Su, S. -H. Gary Chan, Chul-Ho Lee, Weipeng Zhuo,	(参考訳) データ分析と機械学習に重大な課題をもたらす一般的な問題である。この問題は、欠落した値を正確に埋める効果的な計算法の開発を必要とし、それによってデータセットの全体的な品質と有用性を向上する。しかし、既存の計算手法では、埋め込み初期化段階におけるデータの「欠落」情報を明示的に考慮し、学習過程における絡み合った特徴とサンプル相関をモデル化するに足りず、性能が低下する。 M$^3$-Imputeを提案し、新しいマスキング手法との相関関係を明確化することを目的としている。 M$^3$-Imputeはまずデータを二部グラフとしてモデル化し、グラフニューラルネットワークを用いてノード埋め込みを学習する。次に,M$^3$-Impute の新規特徴相関ユニット (FRU) とサンプル相関ユニット (SRU) を用いて最適化し,特徴相関とサンプル相関を効果的に捉える。 M$^3$-Imputeの有効性は,M$^3$-Imputeが平均20点,第2best MAEが平均4点であった。 Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of explicitly considering the `missingness' information in the data during the embedding initialization stage and modeling the entangled feature and sample correlations during the learning process, thus leading to inferior performance. We propose M$^3$-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes. M$^3$-Impute first models the data as a bipartite graph and uses a graph neural network to learn node embeddings, where the refined embedding initialization process directly incorporates the missingness information. They are then optimized through M$^3$-Impute's novel feature correlation unit (FRU) and sample correlation unit (SRU) that effectively captures feature and sample correlations for imputation. Experiment results on 25 benchmark datasets under three different missingness settings show the effectiveness of M$^3$-Impute by achieving 20 best and 4 second-best MAE scores on average.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# Calibrated Computation-Aware Gaussian Processs Calibrated Computation-Aware Gaussian Processes ( http://arxiv.org/abs/2410.08796v1 ) ライセンス: Link先を確認	Disha Hegde, Mohamed Adil, Jon Cockayne,	(参考訳) ガウス過程は、トレーニングセットのサイズで立方体をスケーリングしたことで知られており、非常に大きな回帰問題への適用を妨げている。計算を意識したガウス過程(CAGP)は、確率線形解法を利用して複雑性を減らし、計算量の削減による計算の不確かさを増大させることにより、このスケーリング問題に対処する。しかし、最も一般的に使用されるCAGPフレームワークは(時には劇的に)保守的な不確実性定量化をもたらし、実際は後部非現実的である。本研究では, 有効確率線形解法が厳密な統計的意味で校正された場合, 誘導CAGPも同様であることを示す。そこで我々は,基礎となる確率線形解法に対してガウス・シーデルの繰り返しを用いた新しいCAGPフレームワークCAGP-GSを提案する。 CAGP-GSは、テストセットが低次元でイテレーションが少ない場合、既存のアプローチと比較して好適に動作する。合成問題に対する校正性を検証し, 大規模大域温度回帰問題に対する既存手法との比較を行った。 Gaussian processes are notorious for scaling cubically with the size of the training set, preventing application to very large regression problems. Computation-aware Gaussian processes (CAGPs) tackle this scaling issue by exploiting probabilistic linear solvers to reduce complexity, widening the posterior with additional computational uncertainty due to reduced computation. However, the most commonly used CAGP framework results in (sometimes dramatically) conservative uncertainty quantification, making the posterior unrealistic in practice. In this work, we prove that if the utilised probabilistic linear solver is calibrated, in a rigorous statistical sense, then so too is the induced CAGP. We thus propose a new CAGP framework, CAGP-GS, based on using Gauss-Seidel iterations for the underlying probabilistic linear solver. CAGP-GS performs favourably compared to existing approaches when the test set is low-dimensional and few iterations are performed. We test the calibratedness on a synthetic problem, and compare the performance to existing approaches on a large-scale global temperature regression problem.	翻訳日:2024-10-30 22:05:43 公開日:2024-10-11
# OpenGPT-Xモデルファミリーのデータ処理 Data Processing for the OpenGPT-X Model Family ( http://arxiv.org/abs/2410.08800v1 ) ライセンス: Link先を確認	Nicolo' Brandizzi, Hammam Abdelwahab, Anirban Bhowmick, Lennard Helmer, Benny Jörg Stein, Pavel Denisov, Qasid Saleem, Michael Fromm, Mehdi Ali, Richard Rutmann, Farzad Naderi, Mohamad Saif Agy, Alexander Schwirjow, Fabian Küch, Luzian Hahn, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Dennis Wegener, Nicolas Flores-Herr, Joachim Köhler, Johannes Leveling,	(参考訳) 本稿では,オープンかつ高性能な多言語大言語モデル(LLM)の構築を目的とした大規模イニシアチブであるOpenGPT-Xプロジェクトで開発されたデータ準備パイプラインの概要について述べる。プロジェクトの目標は、欧州連合内の現実世界のアプリケーションに特に焦点を絞った、主要なヨーロッパのすべての言語をカバーするモデルを提供することだ。データ選択と要件定義からモデルトレーニングの最終データセットの準備まで、すべてのデータ処理手順を説明します。これらのカテゴリはそれぞれ異なるパイプラインで処理され、キュレートされたデータは最小限のフィルタリングとWebデータを必要とするため、キュレートされたデータとWebデータとを区別します。この区別は、両方のパイプラインのための特別なアルゴリズムソリューションの開発を導いた。処理方法の説明に加えて、データセットの詳細な分析、透明性の向上、ヨーロッパのデータ規制との整合性も提供する。最後に、プロジェクト中に直面する重要な洞察と課題を共有し、LLMの大規模多言語データ準備における今後の取り組みを推奨する。 This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs). The project goal is to deliver models that cover all major European languages, with a particular focus on real-world applications within the European Union. We explain all data processing steps, starting with the data selection and requirement definition to the preparation of the final datasets for model training. We distinguish between curated data and web data, as each of these categories is handled by distinct pipelines, with curated data undergoing minimal filtering and web data requiring extensive filtering and deduplication. This distinction guided the development of specialized algorithmic solutions for both pipelines. In addition to describing the processing methodologies, we provide an in-depth analysis of the datasets, increasing transparency and alignment with European data regulations. Finally, we share key insights and challenges faced during the project, offering recommendations for future endeavors in large-scale multilingual data preparation for LLMs.	翻訳日:2024-10-30 21:55:58 公開日:2024-10-11
# RAGシステム評価手法:構成依存性検証を事例として A Methodology for Evaluating RAG Systems: A Case Study On Configuration Dependency Validation ( http://arxiv.org/abs/2410.08801v1 ) ライセンス: Link先を確認	Sebastian Simon, Alina Mailach, Johannes Dorn, Norbert Siegmund,	(参考訳) Retrieval-augmented Generation (RAG) は、大規模な言語モデルの能力を高め、幻覚や古い知識の欠如に関する制限に対応するために、さまざまなコンポーネント、設計決定、ドメイン固有の適応の傘である。どの設計決定が良好な性能をもたらすかは不明であるため、RAGシステムの開発はしばしば実験的であり、健全で信頼性の高い結果を得るためには、体系的かつ健全な方法論に従う必要がある。しかしながら、この技術への関心が高まりつつも、RAG評価の一般的に受け入れられている方法論は存在しない。本稿では,RAGシステムの健全かつ信頼性の高い評価手法の最初の青写真と,実際のソフトウェア工学研究課題であるソフトウェア技術における構成依存性の検証への適用性を示す。まとめると、我々は2つの新しい貢献をしている。一ガイドラインを表わす実演を含む、RAGシステム評価のための新しい再利用可能な方法論設計 (II) この手法に従って開発されたRAGシステムにより, 依存性検証の分野において, 高い精度を達成できる。ブループリントのデモンストレーションでは、適切なベースラインとメトリクスを選択する上で重要な役割、定性的な失敗分析から派生した系統的なRAG改善の必要性、レプリケーションと評価を促進するための重要な設計決定の報告プラクティスが重要な洞察となっている。 Retrieval-augmented generation (RAG) is an umbrella of different components, design decisions, and domain-specific adaptations to enhance the capabilities of large language models and counter their limitations regarding hallucination and outdated and missing knowledge. Since it is unclear which design decisions lead to a satisfactory performance, developing RAG systems is often experimental and needs to follow a systematic and sound methodology to gain sound and reliable results. However, there is currently no generally accepted methodology for RAG evaluation despite a growing interest in this technology. In this paper, we propose a first blueprint of a methodology for a sound and reliable evaluation of RAG systems and demonstrate its applicability on a real-world software engineering research task: the validation of configuration dependencies across software technologies. In summary, we make two novel contributions: (i) A novel, reusable methodological design for evaluating RAG systems, including a demonstration that represents a guideline, and (ii) a RAG system, which has been developed following this methodology, that achieves the highest accuracy in the field of dependency validation. For the blueprint's demonstration, the key insights are the crucial role of choosing appropriate baselines and metrics, the necessity for systematic RAG refinements derived from qualitative failure analysis, as well as the reporting practices of key design decision to foster replication and evaluation.	翻訳日:2024-10-30 21:55:58 公開日:2024-10-11
# ベイズ最適化のためのバッチエネルギーエントロピー取得 Batched Energy-Entropy acquisition for Bayesian Optimization ( http://arxiv.org/abs/2410.08804v1 ) ライセンス: Link先を確認	Felix Teufel, Carsten Stahlhut, Jesper Ferkinghoff-Borg,	(参考訳) ベイズ最適化(英: Bayesian Optimization、BO)は、ブラックボックス関数のサンプル効率のよいグローバル最適化を行うための、魅力的な機械学習フレームワークである。最適化プロセスは、BOの各ラウンドで取得するポイントを選択する取得関数によってガイドされる。バッチBOでは、複数の点が並列に取得される場合、一般的に使用される取得関数は高次元かつ難解であり、サンプリングベースの代替関数が使われる。本稿では,バッチをネイティブに処理可能なガウス過程を持つBOの統計物理学に基づく獲得関数を提案する。 BEEBO(Batched Energy-Entropy acquisition for BO)は、最適化プロセスの探索・探索トレードオフの厳密な制御を可能にし、ヘテロスケダスティックブラックボックス問題に一般化する。我々は,BEEBOが既存の手法と競合する性能を示しながら,様々な問題に適用可能であることを実証する。 Bayesian optimization (BO) is an attractive machine learning framework for performing sample-efficient global optimization of black-box functions. The optimization process is guided by an acquisition function that selects points to acquire in each round of BO. In batched BO, when multiple points are acquired in parallel, commonly used acquisition functions are often high-dimensional and intractable, leading to the use of sampling-based alternatives. We propose a statistical physics inspired acquisition function for BO with Gaussian processes that can natively handle batches. Batched Energy-Entropy acquisition for BO (BEEBO) enables tight control of the explore-exploit trade-off of the optimization process and generalizes to heteroskedastic black-box problems. We demonstrate the applicability of BEEBO on a range of problems, showing competitive performance to existing methods.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# コード変換しないで、コード変換する - LLMを使った精密コード書き換えを目指して Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs ( http://arxiv.org/abs/2410.08806v1 ) ライセンス: Link先を確認	Chris Cummins, Volker Seeker, Jordi Armengol-Estapé, Aram H. Markosyan, Gabriel Synnaeve, Hugh Leather,	(参考訳) コードを書き直し、リファクタリングし、最適化するためのツールは、速くて正しいべきです。大型言語モデル (LLM) は、その性質上、これらの性質は持っていない。しかし、コードを改善するのにLLMを使うことには大きなチャンスがあります。コード変換ではなくコード変換にLLMを使うことについて検討する。本稿では,実行とフィードバックを取り入れた少数のインプット/アウトプットコード例から,コード変換を合成する連鎖的アプローチを提案する。直接書き換えアプローチとは異なり、LCM生成した変換は検査、デバッグ、検証が容易である。書き直しのロジックは明示的にコード化され、容易に適応できます。コード変換を実行するのに必要な計算量は、LLM書き換えの計算に比べて数分である。我々は16のPythonコード変換に対するアプローチを検証したところ、LLM生成した変換は7つの変換に対して完全に正確であり、他の変換に対して直接LLM書き換えを行うよりも正確ではないことがわかった。 LLMコード書き換えの精度を改善するために、さらなる研究を奨励したいと考えています。 Tools for rewriting, refactoring and optimizing code should be fast and correct. Large language models (LLMs), by their nature, possess neither of these qualities. Yet, there remains tremendous opportunity in using LLMs to improve code. We explore the use of LLMs not to transform code, but to code transforms. We propose a chain-of-thought approach to synthesizing code transformations from a small number of input/output code examples that incorporates execution and feedback. Unlike the direct rewrite approach, LLM-generated transformations are easy to inspect, debug, and validate. The logic of the rewrite is explicitly coded and easy to adapt. The compute required to run code transformations is minute compared to that of LLM rewriting. We test our approach on 16 Python code transformations and find that LLM- generated transforms are perfectly precise for 7 of them and less imprecise than direct LLM rewriting on the others. We hope to encourage further research to improving the precision of LLM code rewriting.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# PoisonBench: データ中毒に対する大規模言語モデルの脆弱性を評価する PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning ( http://arxiv.org/abs/2410.08811v1 ) ライセンス: Link先を確認	Tingchen Fu, Mrinank Sharma, Philip Torr, Shay B. Cohen, David Krueger, Fazl Barez,	(参考訳) 優先度学習は、現在のLLMを整列するための中心的な要素であるが、このプロセスはデータ中毒攻撃に対して脆弱である。この問題に対処するために、好み学習中のデータ中毒に対する大規模言語モデルの感受性を評価するベンチマークであるPoisonBenchを紹介する。データ中毒攻撃は、隠れた悪意のあるコンテンツやバイアスを含むために大きな言語モデル応答を操作できるため、モデルが正常に機能しているように見える間に有害または意図しないアウトプットを生成する可能性がある。 8つの現実的なシナリオに2つの異なる攻撃タイプをデプロイし、21の広く使用されているモデルを評価します。本研究は,(1) パラメータサイズを拡大することは, 攻撃に対するレジリエンスを本質的に向上させるものではないこと,(2) 攻撃効果とデータ中毒率との間に対数線的関係があること,(3) データ中毒の影響は, 有毒データに含まれない外挿トリガに一般化することができること,などの傾向を明らかにした。これらの結果は、現在の嗜好学習技術の弱点を明らかにし、悪意あるモデルやデータ操作に対するより堅牢な防御の必要性を強調している。 Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To address this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to data poisoning during preference learning. Data poisoning attacks can manipulate large language model responses to include hidden malicious content or biases, potentially causing the model to generate harmful or unintended outputs while appearing to function normally. We deploy two distinct attack types across eight realistic scenarios, assessing 21 widely-used models. Our findings reveal concerning trends: (1) Scaling up parameter size does not inherently enhance resilience against poisoning attacks; (2) There exists a log-linear relationship between the effects of the attack and the data poison ratio; (3) The effect of data poisoning can generalize to extrapolated triggers that are not included in the poisoned data. These results expose weaknesses in current preference learning techniques, highlighting the urgent need for more robust defenses against malicious models and data manipulation.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# 災害時コンテンツ分類のためのソーシャルコンテキスト対応グラフ型マルチモーダル注意学習フレームワーク A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies ( http://arxiv.org/abs/2410.08814v1 ) ライセンス: Link先を確認	Shahid Shafi Dar, Mohammad Zia Ur Rehman, Karan Bais, Mohammed Abdul Haseeb, Nagendra Kumara,	(参考訳) 危機時において、ソーシャルメディア上で共有される災害関連情報の迅速かつ正確な分類は、効果的な災害対応と公共の安全のために不可欠である。このような重要なイベントの間、個人はソーシャルメディアを使ってコミュニケーションし、マルチモーダルテキストとビジュアルコンテンツを共有します。しかし、フィルターのない多種多様なデータが大量に流入しているため、人道的組織は、この情報を効率的に活用する上で困難に直面している。災害関連コンテンツを分類する既存の方法は、正確な分類に欠かせないユーザの信頼性、情緒的文脈、社会的相互作用情報をモデル化できないことが多い。このギャップに対処するために、グラフベースのニューラルネットワークを用いてテキストと視覚の複雑な関係をキャプチャするCrisSpotと、ユーザ中心情報とコンテンツ中心情報を統合するSocial Context Featuresを提案する。 Inverted Dual Embedded Attention (IDEA)を導入し、データ内の調和パターンとコントラストパターンの両方をキャプチャし、マルチモーダルインタラクションを強化し、よりリッチな洞察を提供する。さらに,TSEqD (Turkey-Syria Earthquake Dataset) について述べる。広範な実験を通じて、CrisisSpotは、公開可能なCrisisMMDデータセットとTSEqDデータセットの最先端の手法と比較して、平均的なF1スコアの9.45%と5.01%の上昇を達成した。 In times of crisis, the prompt and precise classification of disaster-related information shared on social media platforms is crucial for effective disaster response and public safety. During such critical events, individuals use social media to communicate, sharing multimodal textual and visual content. However, due to the significant influx of unfiltered and diverse data, humanitarian organizations face challenges in leveraging this information efficiently. Existing methods for classifying disaster-related content often fail to model users' credibility, emotional context, and social interaction information, which are essential for accurate classification. To address this gap, we propose CrisisSpot, a method that utilizes a Graph-based Neural Network to capture complex relationships between textual and visual modalities, as well as Social Context Features to incorporate user-centric and content-centric information. We also introduce Inverted Dual Embedded Attention (IDEA), which captures both harmonious and contrasting patterns within the data to enhance multimodal interactions and provide richer insights. Additionally, we present TSEqD (Turkey-Syria Earthquake Dataset), a large annotated dataset for a single disaster event, containing 10,352 samples. Through extensive experiments, CrisisSpot demonstrated significant improvements, achieving an average F1-score gain of 9.45% and 5.01% compared to state-of-the-art methods on the publicly available CrisisMMD dataset and the TSEqD dataset, respectively.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# 臨床経過における不確実性を考慮した最適治療選択 Uncertainty-Aware Optimal Treatment Selection for Clinical Time Series ( http://arxiv.org/abs/2410.08816v1 ) ライセンス: Link先を確認	Thomas Schwarz, Cecilia Casolo, Niki Kilbertus,	(参考訳) パーソナライズドメディカルでは、様々な時間枠で治療結果を予測し、最適化する能力が不可欠である。さらに、特定の予算制約内で費用対効果のある治療を選択する能力も重要である。反事実軌道の推定の最近の進歩にもかかわらず、これらの推定に基づく最適な治療選択への直接的なリンクは欠落している。本稿では, コスト制約に固執するパーソナライズされた治療計画を推奨するために, 反事実推定手法と不確実性定量化を統合した新しい手法を提案する。本手法は, 連続処理変数の扱いと, 予測信頼性を向上させるための不確実性定量化の導入に特有である。本手法は2つのシミュレーションデータセットを用いて検証し,1つは心血管系,もう1つはCOVID-19に焦点を当てた。提案手法は, 提案手法の信頼性と精度が向上し, 提案手法の精度が向上することを示す。各種設定における我々の手法の堅牢性は、パーソナライズされた医療ソリューションの幅広い適用可能性を示している。 In personalized medicine, the ability to predict and optimize treatment outcomes across various time frames is essential. Additionally, the ability to select cost-effective treatments within specific budget constraints is critical. Despite recent advancements in estimating counterfactual trajectories, a direct link to optimal treatment selection based on these estimates is missing. This paper introduces a novel method integrating counterfactual estimation techniques and uncertainty quantification to recommend personalized treatment plans adhering to predefined cost constraints. Our approach is distinctive in its handling of continuous treatment variables and its incorporation of uncertainty quantification to improve prediction reliability. We validate our method using two simulated datasets, one focused on the cardiovascular system and the other on COVID-19. Our findings indicate that our method has robust performance across different counterfactual estimation baselines, showing that introducing uncertainty quantification in these settings helps the current baselines in finding more reliable and accurate treatment selection. The robustness of our method across various settings highlights its potential for broad applicability in personalized healthcare solutions.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# グラフによるクビット再利用のためのクビットネットワーク(GidNET)の同定 Graph-based identification of qubit network (GidNET) for qubit reuse ( http://arxiv.org/abs/2410.08817v1 ) ライセンス: Link先を確認	Gideon Uchehara, Tor M. Aamodt, Olivia Di Matteo,	(参考訳) 量子コンピューティングは、現在の量子アーキテクチャの限られた量子ビットの可用性において、アルゴリズムの実行に不可欠な量子リソースを最適化するという課題を導入する。既存の量子ビットの再利用アルゴリズムは、最適性とスケーラビリティのトレードオフに直面しており、いくつかは最適な再利用を達成するが、計算の複雑さによってスケーラビリティが制限されている。本稿では、量子回路における量子ビットの再利用を最適化するアルゴリズムであるGidNET(Graph-based Identification of qubit NETwork)を紹介する。 GidNETは、回路のDAG(Directed Acyclic Graph)表現とその対応する候補行列を解析することにより、より効率的に量子ビット再利用のための高品質な経路を特定する。確立されたアルゴリズム(特にQNET [1])との比較研究を通じて、GidNETはコンパイルされた回路幅を幾何平均で4.4%削減し、より大きな回路で最大21%まで到達するだけでなく、平均実行時間97.4%(幾何平均スピードアップ)と99.3%(142.9Xスピードアップ)の様々な回路サイズで拡張された計算速度とスケーリングを示す。さらに、GidNETはQiskitよりも回路幅の低減に優れ、59.3%の平均的な改善を実現し、最大の試験回路では最大72%の削減を実現している。これらの結果は、GidNETが回路幅と実行時間を改善する能力を示し、量子コンピュータの量子ビット数に制限のあるソリューションを提供する。 Quantum computing introduces the challenge of optimizing quantum resources crucial for executing algorithms within the limited qubit availability of current quantum architectures. Existing qubit reuse algorithms face a trade-off between optimality and scalability, with some achieving optimal reuse but limited scalability due to computational complexities, while others exhibit reduced runtime at the expense of optimality. This paper introduces GidNET (Graph-based Identification of qubit NETwork), an algorithm for optimizing qubit reuse in quantum circuits. By analyzing the circuit's Directed Acyclic Graph (DAG) representation and its corresponding candidate matrix, GidNET identifies higher-quality pathways for qubit reuse more efficiently. Through a comparative study with established algorithms, notably QNET [1], GidNET not only achieves a consistent reduction in compiled circuit widths by a geometric mean of 4.4%, reaching up to 21% in larger circuits, but also demonstrates enhanced computational speed and scaling, with average execution time reduction of 97.4% (i.e., 38.5X geometric mean speedup) and up to 99.3% (142.9X speedup) across various circuit sizes. Furthermore, GidNET consistently outperforms Qiskit in circuit width reduction, achieving an average improvement of 59.3%, with maximum reductions of up to 72% in the largest tested circuits. These results demonstrate GidNET's ability to improve circuit width and runtime, offering a solution for quantum computers with limited numbers of qubits.	翻訳日:2024-10-30 21:55:57 公開日:2024-10-11
# モナドとしての粗粒化と複合化 Coarse-graining and compounding as monads ( http://arxiv.org/abs/2410.08818v1 ) ライセンス: Link先を確認	Alex Wilce,	(参考訳) 実験手順を含む2つの非常に基本的な構成は、実験の粗い粒度のバージョンの形成と、分岐するシーケンシャルな実験の形成である。後者は、以前の測定結果に対する状態の条件付けを可能にする。同じ実験の粗粒度が異なる結果の条件が1つあった場合、干渉効果の可能性が生じる。ここでは、(一般)確率モデルの適切な圏上のモナドの観点から、両方の構成を定式化する方法を示す。さらに,これらは分配法則によって結びついており,粗粒化と逐次測定の両方の下で確率モデルの閉包を記述する複合モナドが可能であることを示す。これら3つのモナドの代数は特徴的であり、干渉の可能性や効果の逐次生成に関する教訓、そして古い量子学の文献からいくつかのテーマと結びついている。 Two very basic constructions involving experimental procedures are the formation of coarse-grained versions of experiments, and the formation of branching sequential experiments. The latter allow for the conditioning of states on the results of previous measurements. When one conditions on the results of different coarse-grainings of the same previous experiment, the possibility of interference effects arises. Here, I show how to formulate both constructions in terms of monads on a suitable category of (general) probabilistic models. Moreover, I show that these are connected by distributive law, allowing for a composite monad describing the closure of a probabilistic model under both coarse-graining and sequential measurement. Algebras for all three monads are characterized; lessons are drawn regarding the possibility of interference and also regarding the formation of sequential products of effects; and connections are made with some themes from the older quantum-logical literature.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# Retriever-and-Memory:Adaptive Note-Enhanced Retrieval-Augmented Generation Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation ( http://arxiv.org/abs/2410.08821v1 ) ライセンス: Link先を確認	Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, Yixuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun,	(参考訳) Retrieval-Augmented Generation (RAG) は、外部知識の導入を通じて、オープンドメイン質問応答タスク(OpenQA)において、大規模言語モデル(LLM)が生成する事実エラーと幻覚出力の問題を緩和する。しかし、複雑なQAの場合、既存のRAG法ではLSMを用いて検索タイミングを積極的に予測し、検索タイミングが実際の情報要求を正確に反映しているかどうかに関わらず、検索した情報を生成するために直接利用する。そこで本稿では,Retriever-and-Memoryパラダイムに従って,反復情報収集,アダプティブメモリレビュアー,タスク指向ジェネレータを含む複雑なQAタスクに対して,Adaptive Note-Enhanced RAG(Adaptive-Note)と呼ばれる汎用RAGアプローチを提案する。具体的には、Adaptive-Noteは、知識の成長に関する包括的な見解を導入し、ノートの形で新しい情報を反復的に収集し、それらを既存の最適な知識構造に更新し、高品質な知識相互作用を強化する。さらに,適切な知識探索を促進するために,適応的な音符ベースの停止探索戦略を用いて,「何を検索し,いつ停止するか」を判断する。我々は,5つの複雑なQAデータセットについて広範な実験を行い,本手法とその構成要素の優位性と有効性を示した。コードとデータはhttps://github.com/thunlp/Adaptive-Noteにある。 Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge. For complex QA, however, existing RAG methods use LLMs to actively predict retrieval timing and directly use the retrieved information for generation, regardless of whether the retrieval timing accurately reflects the actual information needs, or sufficiently considers prior retrieved knowledge, which may result in insufficient information gathering and interaction, yielding low-quality answers. To address these, we propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks, which includes the iterative information collector, adaptive memory reviewer, and task-oriented generator, while following a new Retriever-and-Memory paradigm. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes and updating them into the existing optimal knowledge structure, enhancing high-quality knowledge interactions. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration. We conduct extensive experiments on five complex QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components. The code and data are at https://github.com/thunlp/Adaptive-Note.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# SOLD:Slot Object-Centric Latent Dynamicsを用いた強化学習 SOLD: Reinforcement Learning with Slot Object-Centric Latent Dynamics ( http://arxiv.org/abs/2410.08822v1 ) ライセンス: Link先を確認	Malte Mosbach, Jan Niklas Ewertz, Angel Villar-Corrales, Sven Behnke,	(参考訳) 潜在力学モデルを学ぶことは、エージェントの環境に対する理解をタスクに依存しない表現を提供する。この知識をモデルベース強化学習に活用することは、想定されたロールアウトの内部で学習することで、モデルフリーメソッドよりもサンプル効率を向上させる可能性を秘めている。さらに、潜在空間は行動モデルへの入力として機能するため、世界モデルによって学習された情報表現は、望ましいスキルの効率的な学習を促進する。既存のほとんどの手法は環境状態の全体的表現に依存している。対照的に、人間は物体とその相互作用を推論し、アクションがどのように周囲の特定の部分に影響を及ぼすかを予測します。そこで本研究では,物体中心のラテントダイナミクス(SOLD)のためのSlot-Attention for Object-centric Latent Dynamics(SOLD)を提案する。構造化潜在空間は、モデル解釈可能性を改善するだけでなく、振る舞いモデルが推論する価値のある入力空間も提供することを実証する。以上の結果から,SOLDはモデルベースRLアルゴリズムであるDreamerV3よりも高い性能を示し,リレーショナル推論と低レベルの操作能力の両面から評価した。ビデオはhttps://slot-latent-dynamics.github.io/で公開されている。 Learning a latent dynamics model provides a task-agnostic representation of an agent's understanding of its environment. Leveraging this knowledge for model-based reinforcement learning holds the potential to improve sample efficiency over model-free methods by learning inside imagined rollouts. Furthermore, because the latent space serves as input to behavior models, the informative representations learned by the world model facilitate efficient learning of desired skills. Most existing methods rely on holistic representations of the environment's state. In contrast, humans reason about objects and their interactions, forecasting how actions will affect specific parts of their surroundings. Inspired by this, we propose Slot-Attention for Object-centric Latent Dynamics (SOLD), a novel algorithm that learns object-centric dynamics models in an unsupervised manner from pixel inputs. We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over. Our results show that SOLD outperforms DreamerV3, a state-of-the-art model-based RL algorithm, across a range of benchmark robotic environments that evaluate for both relational reasoning and low-level manipulation capabilities. Videos are available at https://slot-latent-dynamics.github.io/.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# 3D GANのワンショット生成領域適応 One-shot Generative Domain Adaptation in 3D GANs ( http://arxiv.org/abs/2410.08824v1 ) ライセンス: Link先を確認	Ziqiang Li, Yi Wu, Chaoyue Wang, Xue Rui, Bin Li,	(参考訳) 3D対応画像生成は、トレーニングを安定させ、オーバーフィッティングのリスクを軽減するために、広範なトレーニングデータを必要とする。本稿では,まず,ワンショット3D生成領域適応(One-shot 3D Generative Domain Adaptation, GDA)と呼ばれる新しい課題について考察する。ワンショット3D GDAは、高い忠実度、大きな多様性、クロスドメインの一貫性、マルチビューの一貫性といった特定の属性の追求によって特徴付けられる。本稿では,一対一のGDA法である3D-Adapterについて述べる。我々のアプローチは、微調整のための制限された重みセットを司法的に選択することから始まり、その後、適応を容易にするために4つの高度な損失関数を活用する。また、適応プロセスを強化するために、効率的なプログレッシブ微調整戦略も実装した。これら3つの技術コンポーネントの相乗効果により、3D-Adapterは、3D GDAのすべての望ましい特性に対して、定量的および定性的に、またがって優れた性能を達成することができる。さらに、3D-Adapterはその能力をゼロショットシナリオまでシームレスに拡張し、事前訓練されたジェネレータの潜在空間内での補間、再構築、編集といった重要なタスクの可能性を保っている。コードはhttps://github.com/iceli1007/3D-Adapter.comから入手できる。 3D-aware image generation necessitates extensive training data to ensure stable training and mitigate the risk of overfitting. This paper first considers a novel task known as One-shot 3D Generative Domain Adaptation (GDA), aimed at transferring a pre-trained 3D generator from one domain to a new one, relying solely on a single reference image. One-shot 3D GDA is characterized by the pursuit of specific attributes, namely, high fidelity, large diversity, cross-domain consistency, and multi-view consistency. Within this paper, we introduce 3D-Adapter, the first one-shot 3D GDA method, for diverse and faithful generation. Our approach begins by judiciously selecting a restricted weight set for fine-tuning, and subsequently leverages four advanced loss functions to facilitate adaptation. An efficient progressive fine-tuning strategy is also implemented to enhance the adaptation process. The synergy of these three technological components empowers 3D-Adapter to achieve remarkable performance, substantiated both quantitatively and qualitatively, across all desired properties of 3D GDA. Furthermore, 3D-Adapter seamlessly extends its capabilities to zero-shot scenarios, and preserves the potential for crucial tasks such as interpolation, reconstruction, and editing within the latent space of the pre-trained generator. Code will be available at https://github.com/iceli1007/3D-Adapter.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# X線蛍光データキューブを用いた視覚変換器による仮想絵画復元に向けて Towards virtual painting recolouring using Vision Transformer on X-Ray Fluorescence datacubes ( http://arxiv.org/abs/2410.08826v1 ) ライセンス: Link先を確認	Alessandro Bombini, Fernando García-Avello Bofías, Francesca Giambi, Chiara Ruberto,	(参考訳) 本研究では,X線蛍光(XRF)分析の生データを用いて仮想絵画再構成を行うパイプラインを定義し,その検証を行った。データセットサイズを小さくするために、XRFスペクトルのデータベースから合成データセットを生成し、さらに、より優れた一般化能力を確保するために(インメモリサイズと推論時間の問題に対処するために)、XRFスペクトルを低次元のK-Meansフレンドリな計量空間に埋め込むためのDeep Variational Embeddingネットワークを定義する。そこで我々は,カラー画像の組込みXRF画像へのアサインのために,モデルのセットを訓練する。ここでは、ビジュアル品質のメトリクスの観点から、考案されたパイプラインのパフォーマンスを報告します。 In this contribution, we define (and test) a pipeline to perform virtual painting recolouring using raw data of X-Ray Fluorescence (XRF) analysis on pictorial artworks. To circumvent the small dataset size, we generate a synthetic dataset, starting from a database of XRF spectra; furthermore, to ensure a better generalisation capacity (and to tackle the issue of in-memory size and inference time), we define a Deep Variational Embedding network to embed the XRF spectra into a lower dimensional, K-Means friendly, metric space. We thus train a set of models to assign coloured images to embedded XRF images. We report here the devised pipeline performances in terms of visual quality metrics, and we close on a discussion on the results.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# アンラーニング手法は言語モデル重みから情報を除去するか? Do Unlearning Methods Remove Information from Language Model Weights? ( http://arxiv.org/abs/2410.08827v1 ) ライセンス: Link先を確認	Aghyad Deeb, Fabien Roger,	(参考訳) 大規模言語モデルによるサイバーセキュリティ攻撃の実行、バイオ兵器の作成、人間の操作に関する知識は、誤用のリスクを引き起こす。これまでの研究では、この知識を解き放つ方法が提案されている。歴史的に、未学習のテクニックがモデルの重みから情報を取り除いているのか、単にアクセスしにくくしているだけなのかは定かではない。これらの2つの目的を解消するために、モデル重みから情報を取り除くための敵評価手法を提案する。我々は、攻撃者が取り除かれるはずの事実にアクセスできるようにし、それを用いて、攻撃者はアクセス可能な事実から推測できない他の事実から他の事実を復元しようとする。本研究では,既存のアンラーニング手法に適用した場合の学習前精度の88%を,アクセシブルな事実を微調整で再現できることを示し,モデル重みから情報を取り除く際の限界を明らかにした。 Large Language Models' knowledge of how to perform cyber-security attacks, create bioweapons, and manipulate humans poses risks of misuse. Previous work has proposed methods to unlearn this knowledge. Historically, it has been unclear whether unlearning techniques are removing information from the model weights or just making it harder to access. To disentangle these two objectives, we propose an adversarial evaluation method to test for the removal of information from model weights: we give an attacker access to some facts that were supposed to be removed, and using those, the attacker tries to recover other facts from the same distribution that cannot be guessed from the accessible facts. We show that using fine-tuning on the accessible facts can recover 88% of the pre-unlearning accuracy when applied to current unlearning methods, revealing the limitations of these methods in removing information from the model weights.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# 開封分子シークレット:説明可能で計算可能な分子特性予測のためのLLM拡張線形モデル Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction ( http://arxiv.org/abs/2410.08829v1 ) ライセンス: Link先を確認	Zhuoran Li, Xu Sun, Wanyu Lin, Jiannong Cao,	(参考訳) 説明可能な分子特性予測は、薬物発見や物質科学など、様々な科学分野において不可欠である。本質的には説明責任を提供するが、線形モデルは複雑で非線形なパターンを捉えるのに苦労する。一方、大規模言語モデル(LLM)は、強力な推論能力を通じて正確な予測を得られるが、それらの予測に対して化学的に意味のある説明は得られない。この研究は、LCMの知識を活用して、忠実な説明を伴う正確な分子特性予測のための単純で強力な線形モデルを構築する、MoleXと呼ばれる新しいフレームワークを提案する。 MoleXの核心は、単純な線形モデルを用いて複雑な分子構造-プロパティ関係をモデル化することであり、LCMの知識と巧妙な校正戦略によって強化される。具体的には,LLM埋め込みからタスク関連知識の最大量を抽出するために,情報ボトルネックに着想を得た微調整と疎度誘導次元減少を用いる。これらの情報埋め込みは、説明可能な推論のために線形モデルに適合するために使用される。さらに,線形モデルによる複雑なLLM埋め込みの表現力の不足に起因する予測誤差に対処するために,余剰校正を導入し,LLMの予測力を回復し,全体的な精度を向上させる。理論的には、MoleXの説明可能性を正当化する数学的基礎を提供する。大規模な実験により、MoleXは分子特性予測において既存の手法よりも優れており、予測性能、説明可能性、効率性の新たなマイルストーンを確立している。特に、MoleXはCPU推論を可能にし、大規模データセット処理を高速化する。さらに、キャリブレーションは説明責任を損なうことなくモデル性能を最大12.7%向上させる。 Explainable molecular property prediction is essential for various scientific fields, such as drug discovery and material science. Despite delivering intrinsic explainability, linear models struggle with capturing complex, non-linear patterns. Large language models (LLMs), on the other hand, yield accurate predictions through powerful inference capabilities yet fail to provide chemically meaningful explanations for their predictions. This work proposes a novel framework, called MoleX, which leverages LLM knowledge to build a simple yet powerful linear model for accurate molecular property prediction with faithful explanations. The core of MoleX is to model complicated molecular structure-property relationships using a simple linear model, augmented by LLM knowledge and a crafted calibration strategy. Specifically, to extract the maximum amount of task-relevant knowledge from LLM embeddings, we employ information bottleneck-inspired fine-tuning and sparsity-inducing dimensionality reduction. These informative embeddings are then used to fit a linear model for explainable inference. Moreover, we introduce residual calibration to address prediction errors stemming from linear models' insufficient expressiveness of complex LLM embeddings, thus recovering the LLM's predictive power and boosting overall accuracy. Theoretically, we provide a mathematical foundation to justify MoleX's explainability. Extensive experiments demonstrate that MoleX outperforms existing methods in molecular property prediction, establishing a new milestone in predictive performance, explainability, and efficiency. In particular, MoleX enables CPU inference and accelerates large-scale dataset processing, achieving comparable performance 300x faster with 100,000 fewer parameters than LLMs. Additionally, the calibration improves model performance by up to 12.7% without compromising explainability.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# モンテカルロ木探索による逆低バンドギャップ分子の対称性制約生成 Symmetry-Constrained Generation of Diverse Low-Bandgap Molecules with Monte Carlo Tree Search ( http://arxiv.org/abs/2410.08833v1 ) ライセンス: Link先を確認	Akshay Subramanian, James Damewood, Juno Nam, Kevin P. Greenman, Avni P. Singhal, Rafael Gómez-Bombarelli,	(参考訳) 有機光電子材料は、その溶液の処理性、機械的柔軟性、チューニング可能な電子特性のために、次世代の電子機器にとって有望な道である。特に、近赤外(NIR)感受性分子は、夜間監視装置や生体医用イメージングに特有の用途を持つ。分子工学は、太陽電池の電力変換効率(PCE)を大幅に改善し、NIR領域のスペクトルカバレッジを向上するY系列分子のような非フルレン受容体(NFA)の開発において重要な役割を担っている。しかしながら、目的とする光電子特性を持つ分子を体系的に設計する一方で、合成アクセシビリティを確保することは課題である。そこで我々は,対称性を考慮したフラグメント分解アルゴリズムとフラグメント制約モンテカルロ木探索(MCTS)ジェネレータを用いて,有機電子分子のドメイン中心で特許を負うデータセットの構造的先行性を活用する。提案手法は,TD-DFT計算で検証されるように,特許データセットから対称性制約を保ちながら,赤方偏移吸収を示す候補を生成する。 Organic optoelectronic materials are a promising avenue for next-generation electronic devices due to their solution processability, mechanical flexibility, and tunable electronic properties. In particular, near-infrared (NIR) sensitive molecules have unique applications in night-vision equipment and biomedical imaging. Molecular engineering has played a crucial role in developing non-fullerene acceptors (NFAs) such as the Y-series molecules, which have significantly improved the power conversion efficiency (PCE) of solar cells and enhanced spectral coverage in the NIR region. However, systematically designing molecules with targeted optoelectronic properties while ensuring synthetic accessibility remains a challenge. To address this, we leverage structural priors from domain-focused, patent-mined datasets of organic electronic molecules using a symmetry-aware fragment decomposition algorithm and a fragment-constrained Monte Carlo Tree Search (MCTS) generator. Our approach generates candidates that retain symmetry constraints from the patent dataset, while also exhibiting red-shifted absorption, as validated by TD-DFT calculations.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# Scratchのためのブロックベースのテスティングフレームワーク A Block-Based Testing Framework for Scratch ( http://arxiv.org/abs/2410.08835v1 ) ライセンス: Link先を確認	Patric Feldmeier, Gordon Fraser, Ute Heuer, Florian Obermüller, Siegfried Steckenbiller,	(参考訳) Scratchのようなブロックベースのプログラミング環境は、入門プログラミングコースで広く使われている。構文上のエラーをなくすことで、重要なプログラミング概念の学習を容易にするが、望ましいプログラムの振る舞いを損なう論理的エラーは、それでも可能である。このようなエラーを見つけるには、テスト、すなわちプログラムを実行して動作をチェックする必要がある。多くのプログラミング環境では、このステップはコードとして実行可能なテストを提供することで自動化することができる。これは学習者にとって十分であることは間違いないが、自動テストの欠如は、生徒のソリューションに対するフィードバックを希望する教師にとって阻害される可能性がある。この問題に対処するために、自動テストの作成を可能にするブロックのカテゴリをScratchに導入する。これらのブロックによって、学生や教師も、よく知られたブロックベースのプログラミングロジックを使用して、テストを作成し、Scratch環境から直接フィードバックを受け取ることができる。学生ソリューションの作成を容易にし、バッチ処理を可能にするため、Scratchユーザインタフェースをテストインターフェースに付随させて拡張する。このテストフレームワークを、人気のあるScratchゲーム用のテストを作成した28人の教師を対象に評価し、その後、これらのテストを用いて学生実装の評価とフィードバックを行った。 21人の生徒ソリューションの機能を手作業で評価するよりも、総合的な精度は0.93であり、教師がテストを作成し、効果的に利用できることが示されている。その後の調査では、教師はブロックベースのテストアプローチが有用であると考えている。 Block-based programming environments like Scratch are widely used in introductory programming courses. They facilitate learning pivotal programming concepts by eliminating syntactical errors, but logical errors that break the desired program behaviour are nevertheless possible. Finding such errors requires testing, i.e., running the program and checking its behaviour. In many programming environments, this step can be automated by providing executable tests as code; in Scratch, testing can only be done manually by invoking events through user input and observing the rendered stage. While this is arguably sufficient for learners, the lack of automated testing may be inhibitive for teachers wishing to provide feedback on their students' solutions. In order to address this issue, we introduce a new category of blocks in Scratch that enables the creation of automated tests. With these blocks, students and teachers alike can create tests and receive feedback directly within the Scratch environment using familiar block-based programming logic. To facilitate the creation and to enable batch processing sets of student solutions, we extend the Scratch user interface with an accompanying test interface. We evaluated this testing framework with 28 teachers who created tests for a popular Scratch game and subsequently used these tests to assess and provide feedback on student implementations. An overall accuracy of 0.93 of teachers' tests compared to manually evaluating the functionality of 21 student solutions demonstrates that teachers are able to create and effectively use tests. A subsequent survey confirms that teachers consider the block-based test approach useful.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# SAR画像と局所河川ゲージ観測を用いた物理誘導型洪水地域検出ニューラルネットワーク A physics-guided neural network for flooding area detection using SAR imagery and local river gauge observations ( http://arxiv.org/abs/2410.08837v1 ) ライセンス: Link先を確認	Monika Gierszewska, Tomasz Berezowski,	(参考訳) 川流域の洪水範囲は、川のゲージ観測と関連している。水位が高くなればなるほど、洪水地域は大きくなる。合成開口レーダ\textquoteright s (SAR) により、レーダー画像は、単純なしきい値から深層学習モデルまで、様々な方法で浸水範囲を推定するためによく用いられてきた。本研究では,洪水領域検出のための物理誘導型ニューラルネットワークを提案する。提案手法は,インプットデータとして,センチネル1の時系列画像と,各画像に割り当てられた河川の水位を推定する。水深域の予測和と河川水位分布の局所水位観測とのピアソン相関係数を損失関数として適用した。本手法の有効性を,デジタル地形モデルと光学衛星画像から得られた参照水マップと比較することにより,5つの研究領域で評価した。 IoUは非水系では0.89,非水系では0.96であった。さらに、他の教師なし手法と比較した。提案したニューラルネットワークは他の手法よりも高いIoUを提供し、特に河川の低水位時に登録されたSAR画像に対してである。 The flooding extent area in a river valley is related to river gauge observations. The higher the water elevation, the larger the flooding area. Due to synthetic aperture radar\textquoteright s (SAR) capabilities to penetrate through clouds, radar images have been commonly used to estimate flooding extent area with various methods, from simple thresholding to deep learning models. In this study, we propose a physics-guided neural network for flooding area detection. Our approach takes as input data the Sentinel 1 time-series images and the water elevations in the river assigned to each image. We apply the Pearson correlation coefficient between the predicted sum of water extent areas and the local water level observations of river water elevations as the loss function. The effectiveness of our method is evaluated in five different study areas by comparing the predicted water maps with reference water maps obtained from digital terrain models and optical satellite images. The highest Intersection over Union (IoU) score achieved by our models was 0.89 for the water class and 0.96 for the non-water class. Additionally, we compared the results with other unsupervised methods. The proposed neural network provided a higher IoU than the other methods, especially for SAR images registered during low water elevation in the river.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# ワンショットハンドアバターのための対話型3次元ガウススプレイティングの学習 Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars ( http://arxiv.org/abs/2410.08840v1 ) ライセンス: Link先を確認	Xuan Huang, Hanhui Li, Wanquan Liu, Xiaodan Liang, Yiqiang Yan, Yuhao Cheng, Chengqiang Gao,	(参考訳) 本稿では,3次元ガウススプラッティング(GS)と単一画像入力と手を相互作用するアニマタブルアバターを提案する。単科目向けに設計された既存のGSベースの手法は、限られた入力ビュー、様々なポーズ、オクルージョンによって満足のいく結果をもたらすことが多い。これらの課題に対処するために,2段階の相互作用を意識した新しいGSフレームワークを導入する。特に手の動きに対処するため、最適化に基づくアイデンティティマップと学習に基づく潜在幾何学的特徴とニューラルテクスチャマップに手の手の3Dプレゼンテーションを分離する。学習ベースの機能は、ポーズ、形状、テクスチャの信頼性の高い事前情報を提供するために訓練されたネットワークによってキャプチャされ、最適化ベースのアイデンティティマップは、アウト・オブ・ディストリビューションハンドの効率的なワンショットフィッティングを可能にする。さらに,対話対応型注目モジュールと自己適応型ガウス改良モジュールを考案した。これらのモジュールは、既存のGSベースの手法の限界を克服し、手動と手動の相互作用のある領域における画像のレンダリング品質を向上させる。提案手法は,大規模なInterHand2.6Mデータセットの広範な実験により検証され,画像品質の最先端性能を著しく向上する。 Project Page: \url{https://github.com/XuanHuang0/GuassianHand} In this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality. Project Page: \url{https://github.com/XuanHuang0/GuassianHand}.	翻訳日:2024-10-30 21:45:38 公開日:2024-10-11
# メッセージパッシングニューラルネットワークと強化学習によるアクセシビリティの平等のための公共交通ネットワーク設計 Public Transport Network Design for Equality of Accessibility via Message Passing Neural Networks and Reinforcement Learning ( http://arxiv.org/abs/2410.08841v1 ) ライセンス: Link先を確認	Duo Wang, Maximilien Chau, Andrea Araldo,	(参考訳) 公共交通機関(PT)ネットワークの設計は、道路上の個々の車両の数を減らし、汚染と渋滞を緩和するために不可欠である。これにより、都市持続性は効率的なPTと密結合される。トランスポート・ネットワーク・デザイン (TND) における現在のアプローチは一般的に、一般化されたコスト、すなわち演算子とユーザのコストを含むユニークな数値を最適化することを目的としている。我々は、モビリティニーズを満たす能力としてPTの品質を意図しているので、PTアクセシビリティ、すなわちPTを介して周囲の関心点にたどり着くことに注力する。郊外は一般的にPTアクセシビリティの貧弱さに悩まされており、住民は自家用車に依存していると非難している。そこで我々は,アクセシビリティの地理的分布の不平等を最小限に抑えるために,バス路線を設計する問題に取り組む。我々は、最先端のメッセージパッシングニューラルネットワーク(MPNN)と強化学習を組み合わせる。モントリオール市におけるメタヒューリスティックス(典型的にはTND)に対する手法の有効性を示す。 Designing Public Transport (PT) networks able to satisfy mobility needs of people is essential to reduce the number of individual vehicles on the road, and thus pollution and congestion. Urban sustainability is thus tightly coupled to an efficient PT. Current approaches on Transport Network Design (TND) generally aim to optimize generalized cost, i.e., a unique number including operator and users' costs. Since we intend quality of PT as the capability of satisfying mobility needs, we focus instead on PT accessibility, i.e., the ease of reaching surrounding points of interest via PT. PT accessibility is generally unequally distributed in urban regions: suburbs generally suffer from poor PT accessibility, which condemns residents therein to be dependent on their private cars. We thus tackle the problem of designing bus lines so as to minimize the inequality in the geographical distribution of accessibility. We combine state-of-the-art Message Passing Neural Networks (MPNN) and Reinforcement Learning. We show the efficacy of our method against metaheuristics (classically used in TND) in a use case representing in simplified terms the city of Montreal.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# 相関雑音による量子状態の低減のための超光信号の証人 Superluminal signalling witness for quantum state reduction with correlated noise ( http://arxiv.org/abs/2410.08844v1 ) ライセンス: Link先を確認	Aritro Mukherjee, Lisa Lenstra, Lotte Mertens, Jasper van Wezel,	(参考訳) 量子状態還元のモデルは、微視的なスケールで観測可能な効果を持たないが、マクロな物体の力学を支配しているシュリンガー方程式の弱い修正を提案することで量子測定の問題に対処する。このようなモデルに対するマスター方程式の線型性を強化することで、シュリンガー方程式への修正が超光信号の可能性をもたらさないことが保証される。しかし、量子状態還元モデルの大規模なクラス、特に相関ノイズを用いた全てのモデルでは、量子状態に対するマスター方程式の定式化は、極めて困難または不可能である。ここでは、相関ノイズを含む一般的な量子状態還元モデルに適用可能な超光信号の目撃者を定式化する。いくつかの関連するケースに証人を適用し、相関ノイズモデルが一般に超音速シグナリングを可能にすることを発見した。ここで紹介した目撃者は、そのようなモデルを構築するための厳密なガイドを提供する。 Models for quantum state reduction address the quantum measurement problem by suggesting weak modifications to Schr\"odinger's equation that have no observable effect at microscopic scales, but dominate the dynamics of macroscopic objects. Enforcing linearity of the master equation for such models ensures that modifications to Schr\"odinger's equation do not introduce a possibility for superluminal signalling. In large classes of quantum state reduction models, however, and in particular in all models employing correlated noise, formulating a master equation for the quantum state is prohibitively difficult or impossible. Here, we formulate a witness for superluminal signalling that is applicable to generic quantum state reduction models, including those involving correlated noise. We apply the witness to several relevant cases, and find that correlated-noise models in general allow for superluminal signalling. We suggest how specific models may be able to avoid it, and that the witness introduced here provides a stringent guide to constructing such models.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# 有限空間と離散時間における平均場最適停止のためのディープラーニングアルゴリズム Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time ( http://arxiv.org/abs/2410.08850v1 ) ライセンス: Link先を確認	Lorenzo Magnino, Yuchen Zhu, Mathieu Laurière,	(参考訳) 最適停止は、リスク管理、金融、経済学、そして最近コンピュータ科学の分野における応用を見出した最適化の基本的な問題である。エージェント群が協調して有限空間の離散時間最適停止問題を解き、マルチエージェント最適停止(MAOS)と呼ばれるマルチエージェント設定に拡張する。有限エージェントの場合の解法は,エージェント数が非常に大きい場合に計算的に禁止されるので,エージェント数が無限に近づくにつれて得られる平均フィールド最適停止(MFOS)問題を研究する。 MFOSがMAOSによく近似したソリューションであることを示す。また,平均場制御理論に基づく動的プログラミング原理(DPP)を実証する。次に,2つのディープラーニング手法を提案する。1つは最適決定を学習するために完全な軌道をシミュレートし,もう1つは後方誘導でDPPを活用する。空間次元最大300の6つの異なる問題に対する数値実験により,これらの手法の有効性を実証する。我々の知る限りでは、これはMFOSを有限空間と離散時間で研究し、この種の問題に対して効率的でスケーラブルな計算手法を提案する最初の研究である。 Optimal stopping is a fundamental problem in optimization that has found applications in risk management, finance, economics, and recently in the fields of computer science. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where a group of agents cooperatively solves finite-space, discrete-time optimal stopping problems. Solving the finite-agent case is computationally prohibitive when the number of agents is very large, so this work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity. We prove that MFOS provides a good approximate solution to MAOS. We also prove a dynamic programming principle (DPP), based on the theory of mean field control. We then propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction; both methods train neural networks for the optimal stopping decisions. We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300. To the best of our knowledge, this is the first work to study MFOS in finite space and discrete time, and to propose efficient and scalable computational methods for this type of problem.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# 優先ランク付けにおける大規模言語モデルの整合性の測定 Measuring the Inconsistency of Large Language Models in Preferential Ranking ( http://arxiv.org/abs/2410.08851v1 ) ライセンス: Link先を確認	Xiutian Zhao, Ke Wang, Wei Peng,	(参考訳) 大規模言語モデル(LLM)の最近の進歩にもかかわらず、彼らのバイアスと幻覚の問題は継続し、一貫した優先格付けを提供する能力は未定である。本研究では,厳密な決定空間や絶対解が欠如しているシナリオにおいて,LLMが一貫した順序的嗜好を提供する能力について検討する。順序理論に基づく一貫性の形式化を導入し、推移性、非対称性、可逆性、無関係な代替品からの独立性などの基準を概説する。以上の結果から, 位置バイアスが強く, 移動性が低いこと, 選択が不適切な選択肢によって容易に揺れることが示唆された。これらの知見は、LLMが生成する優先格付けにおいて重大な矛盾を浮き彫りにしており、これらの制限に対処するためのさらなる研究の必要性を浮き彫りにしている。 Despite large language models' (LLMs) recent advancements, their bias and hallucination issues persist, and their ability to offer consistent preferential rankings remains underexplored. This study investigates the capacity of LLMs to provide consistent ordinal preferences, a crucial aspect in scenarios with dense decision space or lacking absolute answers. We introduce a formalization of consistency based on order theory, outlining criteria such as transitivity, asymmetry, reversibility, and independence from irrelevant alternatives. Our diagnostic experiments on selected state-of-the-art LLMs reveal their inability to meet these criteria, indicating a strong positional bias and poor transitivity, with preferences easily swayed by irrelevant alternatives. These findings highlight a significant inconsistency in LLM-generated preferential rankings, underscoring the need for further research to address these limitations.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback ( http://arxiv.org/abs/2410.08852v1 ) ライセンス: Link先を確認	Michelle Zhao, Reid Simmons, Henny Admoni, Aaditya Ramdas, Andrea Bajcsy,	(参考訳) インタラクティブな模倣学習(IL)において、不確実性定量化は、学習者(つまりロボット)が、オンラインの専門家(すなわち人間)から積極的にフィードバックを求めることによって、デプロイメント中に遭遇する分散シフトと競合する手段を提供する。それまでの作業では、アンサンブルの不一致やモンテカルロのドロップアウトといったメカニズムを使用して、ブラックボックスのILポリシーが不確実である場合の定量化を行っている。その代わり、ロボットの不確実性をオンラインに適応させるために、デプロイメント時間中に受信した専門家のフィードバックを活用できる不確実性定量化アルゴリズムが必要である、と私たちは主張する。そこで,本研究では,地層構造ラベルのストリームからオンラインの予測区間を構築するための分布自由化手法である,オンラインコンフォメーション予測について述べる。しかし、人間ラベルは、対話型IL設定において断続的である。したがって、共形予測側からは、間欠的なラベルの確率モデルを活用し、漸近的カバレッジ保証を維持し、所望のカバレッジレベルを実証的に達成する、間欠的量子化追跡(IQT)と呼ばれる新しい不確実性定量化アルゴリズムを導入する。対話型ILの側面から、ロボットがIQTで校正された予測間隔を、デプロイ時の不確実性の信頼性の高い尺度として使用し、より専門家のフィードバックを積極的にクエリする新しいアプローチであるConformalDAggerを開発する。我々は、ConformalDAggerを、専門家の方針の変化のため、分散シフトが(そしてそうでない)シナリオで、事前の不確実性を認識したDAggerメソッドと比較する。 7DOFロボットマニピュレータ上でのシミュレーションおよびハードウェア展開において、ConformalDAggerは、専門家がシフトする際の高い不確実性を検知し、ベースラインよりも介入の数を増やし、ロボットが新しい振る舞いをより早く学習できるようにする。 In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn't) present because of changes in the expert's policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# ハイブリッドLDM-DDQNを用いたV2I通信と自律運転の協調最適化 Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving ( http://arxiv.org/abs/2410.08854v1 ) ライセンス: Link先を確認	Zijiang Yan, Hao Zhou, Hina Tabassum, Xue Liu,	(参考訳) 大規模言語モデル(LLM)は、その卓越した推論と理解能力により、最近かなりの関心を集めている。本研究は、車両間通信(V2I)と自律運転(AD)ポリシーを共同で最適化することを目的として、車載ネットワークにLLMを適用することを検討する。我々は,交通流の最大化と道路安全の衝突を避けるためにAD意思決定のためのLCMをデプロイし,V2I最適化のためのダブルディープQ-ラーニングアルゴリズム(DDQN)を用いて,受信したデータレートを最大化し,頻繁なハンドオーバを低減する。特に LLM 対応 AD では, 従来検討されていた AD 体験をユークリッド距離を用いて識別し, 過去の善と悪の判断から LLM を学習し, さらなる改善を図る。次に、LLMベースのAD決定はV2I問題の状態の一部となり、DDQNはV2Iの決定を最適化する。その後、ADとV2Iの決定は収束するまで反復的に最適化される。このような反復的最適化アプローチは、LLMと従来の強化学習技術との相互作用をよりよく探求し、ネットワーク最適化と管理にLLMを使うことの可能性を明らかにする。最後に,提案手法が従来のDDQNアルゴリズムよりも優れており,より高速な収束と平均報酬が期待できることを示す。 Large language models (LLMs) have received considerable interest recently due to their outstanding reasoning and comprehension capabilities. This work explores applying LLMs to vehicular networks, aiming to jointly optimize vehicle-to-infrastructure (V2I) communications and autonomous driving (AD) policies. We deploy LLMs for AD decision-making to maximize traffic flow and avoid collisions for road safety, and a double deep Q-learning algorithm (DDQN) is used for V2I optimization to maximize the received data rate and reduce frequent handovers. In particular, for LLM-enabled AD, we employ the Euclidean distance to identify previously explored AD experiences, and then LLMs can learn from past good and bad decisions for further improvement. Then, LLM-based AD decisions will become part of states in V2I problems, and DDQN will optimize the V2I decisions accordingly. After that, the AD and V2I decisions are iteratively optimized until convergence. Such an iterative optimization approach can better explore the interactions between LLMs and conventional reinforcement learning techniques, revealing the potential of using LLMs for network optimization and management. Finally, the simulations demonstrate that our proposed hybrid LLM-DDQN approach outperforms the conventional DDQN algorithm, showing faster convergence and higher average rewards.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# MATCH:不均一エッジデバイスのためのモデル対応TVMベースのコンパイル MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices ( http://arxiv.org/abs/2410.08855v1 ) ライセンス: Link先を確認	Mohamed Amine Hamdi, Francesco Daghero, Giuseppe Maria Sarda, Josse Van Delm, Arne Symons, Luca Benini, Marian Verhelst, Daniele Jahier Pagliari, Alessio Burrello,	(参考訳) ヘテロジニアスエッジプラットフォームへのディープニューラルネットワーク(DNN)の展開の合理化、同じマイクロコントローラユニット(MCU)命令プロセッサとテンソル計算のためのハードウェアアクセラレータの結合は、TinyML分野における重要な課題の1つとなっている。最高のパフォーマンスのDNNコンパイルツールチェーンは、通常、単一のMCUファミリに対して深くカスタマイズされており、異なる異種MCUファミリへの移植は、ほぼすべてのコンパイラの労働集約的な再開発を意味する。一方、TVMのような再ターゲット可能なツールチェーンは、カスタムアクセラレータの能力を活用できず、一般的なコードを生成する。この双対性を克服するために、MATCHを紹介します。これはTVMベースのDNNデプロイメントフレームワークで、カスタマイズ可能なモデルベースのハードウェア抽象化のおかげで、様々なMCUプロセッサやアクセラレーターを簡単にアジャイルに再ターゲットできるように設計されています。ハードウェアコストモデルで強化された汎用的かつ再ターゲット可能なマッピングフレームワークは、抽象ハードウェアモデルとSoC固有のAPIの定義を必要とせず、さまざまなターゲット上でカスタムツールチェーンと競合し、さらに性能を向上できることを示す。 MATCHを2種類の異種MCU,GAP9,DIANAで試験した。 MLPerf Tiny スイート MATCH の4つの DNN モデルでは、搭載された HW アクセラレータの活用により、通常の TVM と比較して、DIANA 上での推論遅延を 60.88 倍に削減している。 DIANA用の完全にカスタマイズされたツールチェーンであるHTVMと比較して、レイテンシは依然として16.94%削減しています。 GAP9では、同じベンチマークを使用して、DNNアクセラレータと利用可能な8コアクラスタを相乗的に活用する異種DNNマッピングアプローチのおかげで、専用のDORYコンパイラと比較してレイテンシを2.15倍改善しています。 Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous edge platforms, coupling within the same micro-controller unit (MCU) instruction processors and hardware accelerators for tensor computations, is becoming one of the crucial challenges of the TinyML field. The best-performing DNN compilation toolchains are usually deeply customized for a single MCU family, and porting to a different heterogeneous MCU family implies labor-intensive re-development of almost the entire compiler. On the opposite side, retargetable toolchains, such as TVM, fail to exploit the capabilities of custom accelerators, resulting in the generation of general but unoptimized code. To overcome this duality, we introduce MATCH, a novel TVM-based DNN deployment framework designed for easy agile retargeting across different MCU processors and accelerators, thanks to a customizable model-based hardware abstraction. We show that a general and retargetable mapping framework enhanced with hardware cost models can compete with and even outperform custom toolchains on diverse targets while only needing the definition of an abstract hardware model and a SoC-specific API. We tested MATCH on two state-of-the-art heterogeneous MCUs, GAP9 and DIANA. On the four DNN models of the MLPerf Tiny suite MATCH reduces inference latency by up to 60.88 times on DIANA, compared to using the plain TVM, thanks to the exploitation of the on-board HW accelerator. Compared to HTVM, a fully customized toolchain for DIANA, we still reduce the latency by 16.94%. On GAP9, using the same benchmarks, we improve the latency by 2.15 times compared to the dedicated DORY compiler, thanks to our heterogeneous DNN mapping approach that synergically exploits the DNN accelerator and the eight-cores cluster available on board.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# トークンレベルキャラクタリゼーションによるLLMの秘密記憶の復号化 Decoding Secret Memorization in Code LLMs Through Token-Level Characterization ( http://arxiv.org/abs/2410.08858v1 ) ライセンス: Link先を確認	Yuqing Nie, Chong Wang, Kailong Wang, Guoai Xu, Guosheng Xu, Haoyu Wang,	(参考訳) コード大言語モデル(LLM)は、プログラムコードの生成、理解、操作において顕著な能力を示した。しかし、彼らのトレーニングプロセスは必然的に機密情報の記憶につながり、深刻なプライバシーリスクを生じさせる。 LLMの記憶に関する既存の研究は、主に、広汎な幻覚や標的の機密情報の非効率な抽出といった制限に悩まされる、迅速な工学技術に依存している。本稿では,トークンの確率に基づいて,コードLLMが生成する実・偽の秘密を特徴付ける新しい手法を提案する。我々は、本物の秘密と幻覚的秘密を区別する4つの重要な特徴を特定し、実の秘密と偽の秘密を区別する洞察を与える。既存の作業の限界を克服するために,識別された特徴から派生したトークンレベルの特徴を活用してトークン復号プロセスを導出する2段階の手法であるDESECを提案する。 DESECは、プロキシコードLLMを使用してオフライントークンスコアリングモデルを構築し、トークン可能性を再割り当てすることでデコードプロセスのガイドにスコアリングモデルを使用する。多様なデータセットを用いた4つの最先端のCode LLMに関する広範な実験を通じて、我々はDESECの優れたパフォーマンスを実証し、既存のベースラインよりも高い信頼性と実際のシークレットを抽出した。本研究は,Code LLMに関連するプライバシー漏洩リスクを広範囲に評価する上で,トークンレベルのアプローチの有効性を強調した。 Code Large Language Models (LLMs) have demonstrated remarkable capabilities in generating, understanding, and manipulating programming code. However, their training process inadvertently leads to the memorization of sensitive information, posing severe privacy risks. Existing studies on memorization in LLMs primarily rely on prompt engineering techniques, which suffer from limitations such as widespread hallucination and inefficient extraction of the target sensitive information. In this paper, we present a novel approach to characterize real and fake secrets generated by Code LLMs based on token probabilities. We identify four key characteristics that differentiate genuine secrets from hallucinated ones, providing insights into distinguishing real and fake secrets. To overcome the limitations of existing works, we propose DESEC, a two-stage method that leverages token-level features derived from the identified characteristics to guide the token decoding process. DESEC consists of constructing an offline token scoring model using a proxy Code LLM and employing the scoring model to guide the decoding process by reassigning token likelihoods. Through extensive experiments on four state-of-the-art Code LLMs using a diverse dataset, we demonstrate the superior performance of DESEC in achieving a higher plausible rate and extracting more real secrets compared to existing baselines. Our findings highlight the effectiveness of our token-level approach in enabling an extensive assessment of the privacy leakage risks associated with Code LLMs.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# LLMとVLMの時代の音声記述生成:転送可能な生成AI技術の概要 Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies ( http://arxiv.org/abs/2410.08860v1 ) ライセンス: Link先を確認	Yingqiang Gao, Lukas Fischer, Alexa Lintner, Sarah Ebling,	(参考訳) オーディオ記述(ADs)は、視覚障害者や視覚障害者がテレビや映画などでデジタルメディアコンテンツにアクセスするのを支援するための音響注釈として機能する。訓練されたAD専門家が通常提供するアクセシビリティサービスとして、ADの生成には多大な人的努力が必要であり、プロセスに時間と費用がかかる。自然言語処理(NLP)とコンピュータビジョン(CV)の最近の進歩、特に大規模言語モデル(LLM)と視覚言語モデル(VLM)は、自動AD生成に一歩近づいた。本稿では, LLM と VLM の時代におけるAD 生成に関連する技術について概説し, 最先端の NLP と CV 技術が AD の生成にどのように応用され, 将来に必要な研究方向を特定できるかを論じる。 Audio descriptions (ADs) function as acoustic commentaries designed to assist blind persons and persons with visual impairments in accessing digital media content on television and in movies, among other settings. As an accessibility service typically provided by trained AD professionals, the generation of ADs demands significant human effort, making the process both time-consuming and costly. Recent advancements in natural language processing (NLP) and computer vision (CV), particularly in large language models (LLMs) and vision-language models (VLMs), have allowed for getting a step closer to automatic AD generation. This paper reviews the technologies pertinent to AD generation in the era of LLMs and VLMs: we discuss how state-of-the-art NLP and CV technologies can be applied to generate ADs and identify essential research directions for the future.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# 胸部X線画像における一般化可能な疾患診断の基礎モデル A foundation model for generalizable disease diagnosis in chest X-ray images ( http://arxiv.org/abs/2410.08861v1 ) ライセンス: Link先を確認	Lijian Xu, Ziyu Ni, Hao Sun, Hongsheng Li, Shaoting Zhang,	(参考訳) 医療人工知能(AI)は、疾患診断のための堅牢なツールを提供することで、胸部X線像(CXR)の解釈に革命をもたらしている。しかし、これらのAIモデルの有効性は、大量のタスク固有のラベル付きデータに依存することと、様々な臨床的な設定で一般化できないことによって制限されることが多い。これらの課題に対処するため、我々はCXRBaseを紹介した。CXRBaseは、難解なCXR画像から多目的表現を学習するための基礎モデルであり、様々な臨床タスクへの効率的な適応を容易にする。 CXRBaseは最初、自己教師付き学習手法を使用して、1.04百万の未ラベルのCXRイメージのかなりのデータセットでトレーニングされている。このアプローチにより、明示的なラベルを必要とせずに、モデルが意味のあるパターンを識別できる。この初期段階の後、CXRBaseはラベル付きデータで微調整され、疾患検出の性能を高め、正確な胸部疾患の分類を可能にする。 CXRBaseは、モデルパフォーマンスを改善するための一般化可能なソリューションを提供し、専門家のアノテーションのワークロードを軽減することで、胸部イメージングから幅広い臨床AIアプリケーションを可能にする。 Medical artificial intelligence (AI) is revolutionizing the interpretation of chest X-ray (CXR) images by providing robust tools for disease diagnosis. However, the effectiveness of these AI models is often limited by their reliance on large amounts of task-specific labeled data and their inability to generalize across diverse clinical settings. To address these challenges, we introduce CXRBase, a foundational model designed to learn versatile representations from unlabelled CXR images, facilitating efficient adaptation to various clinical tasks. CXRBase is initially trained on a substantial dataset of 1.04 million unlabelled CXR images using self-supervised learning methods. This approach allows the model to discern meaningful patterns without the need for explicit labels. After this initial phase, CXRBase is fine-tuned with labeled data to enhance its performance in disease detection, enabling accurate classification of chest diseases. CXRBase provides a generalizable solution to improve model performance and alleviate the annotation workload of experts to enable broad clinical AI applications from chest imaging.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses ( http://arxiv.org/abs/2410.08864v1 ) ライセンス: Link先を確認	Grzegorz Głuch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta,	(参考訳) 両プレーヤー間の対話プロトコルとして,バックドアベースの透かしと敵防御の既存の定義を定式化し,拡張する。これらのスキームの存在は、本質的にそれらが設計されている学習タスクと結びついている。我々の主な結果は、ほぼすべての差別的学習タスクにおいて、少なくとも2つのうちの1つ(透かしまたは敵の防御)が存在していることを示している。ほぼ全ての」という用語は、第3の直感的だが必要な選択肢、すなわち転送可能な攻撃と呼ばれるスキームも特定することを意味する。転送可能な攻撃によって、データ分布と区別がつかないようなクエリを効率よく計算し、全ての効率的なディフェンダーを騙す。この目的のために,同相暗号と呼ばれる暗号ツールを用いた構築による転送可能な攻撃の必要性を実証する。さらに、転送可能な攻撃の概念を満たすタスクは、暗号化プリミティブを意味し、基礎となるタスクは計算的に複雑である必要があることを示す。これら2つの事実は、転送可能な攻撃と暗号化の間に「等価」が存在することを示している。最後に、有界VC次元のタスクのクラスが対角防御を持ち、それらのサブクラスが透かしを持つことを示す。 We formalize and extend existing definitions of backdoor-based watermarks and adversarial defenses as interactive protocols between two players. The existence of these schemes is inherently tied to the learning tasks for which they are designed. Our main result shows that for almost every discriminative learning task, at least one of the two -- a watermark or an adversarial defense -- exists. The term "almost every" indicates that we also identify a third, counterintuitive but necessary option, i.e., a scheme we call a transferable attack. By transferable attack, we refer to an efficient algorithm computing queries that look indistinguishable from the data distribution and fool all efficient defenders. To this end, we prove the necessity of a transferable attack via a construction that uses a cryptographic tool called homomorphic encryption. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive, thus requiring the underlying task to be computationally complex. These two facts imply an "equivalence" between the existence of transferable attacks and cryptography. Finally, we show that the class of tasks of bounded VC-dimension has an adversarial defense, and a subclass of them has a watermark.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# Perccottus gleniiにおけるゲノム型耐凍性の機械学習解析による予測 Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii ( http://arxiv.org/abs/2410.08867v1 ) ライセンス: Link先を確認	Lilin Fan, Xuqing Chai, Zhixiong Tian, Yihang Qiao, Zhen Wang, Yifan Zhang,	(参考訳) 凍結耐性を有する唯一の魚であるPerccottus gleniiのゲノム配列の解析は、生物が極端環境にどのように適応しているかを理解する上で重要な意味を持つ。これらの問題に対処するために、我々は、Perccottus gleniiの遺伝子配列をNeodontobutis hainanensで比較群として分析するために機械学習技術を用いる。まず、我々は5つの遺伝子配列ベクター化方法と超長期遺伝子配列を処理する方法を提案した。オーディナルエンコーディング、ワンホットエンコーディング、K-merエンコーディングという3つのベクター化方法の比較研究を行い、最適なエンコーディング方法を特定するために、オーディナルエンコーディング、ワンホットエンコーディング、K-merエンコーディングという3つのモデルを構築した。 Analysis of the genome sequence of Perccottus glenii, the only fish known to possess freeze tolerance, holds significant importance for understanding how organisms adapt to extreme environments, Traditional biological analysis methods are time-consuming and have limited accuracy, To address these issues, we will employ machine learning techniques to analyze the gene sequences of Perccottus glenii, with Neodontobutis hainanens as a comparative group, Firstly, we have proposed five gene sequence vectorization methods and a method for handling ultra-long gene sequences, We conducted a comparative study on the three vectorization methods: ordinal encoding, One-Hot encoding, and K-mer encoding, to identify the optimal encoding method, Secondly, we constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree, The dataset used by these classification models was extracted from the National Center for Biotechnology Information database, and we vectorized the sequence matrices using the optimal encoding method, K-mer, The Random Forest model, which is the optimal model, achieved a classification accuracy of up to 99, 98 , Lastly, we utilized SHAP values to conduct an interpretable analysis of the optimal classification model, Through ten-fold cross-validation and the AUC metric, we identified the top 10 features that contribute the most to the model's classification accuracy, This demonstrates that machine learning methods can effectively replace traditional manual analysis in identifying genes associated with the freeze tolerance phenotype in Perccottus glenii.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# Actor-Criticアルゴリズムの大域収束のための改良されたサンプル複雑性 Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms ( http://arxiv.org/abs/2410.08868v1 ) ライセンス: Link先を確認	Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor,	(参考訳) 本稿では,既存の局所収束結果を超えて,O(\epsilon^{-3})$のサンプル複雑性を大幅に改善したアクタ・クリティック・アルゴリズムのグローバル収束を確立する。以前の研究は、再帰の2乗勾配の有界化に対して$O(\epsilon^{-2})$の局所収束保証を提供しており、これは勾配支配補題を用いて$O(\epsilon^{-4})$の大域的なサンプル複雑性に変換する。アクターと批評家の両方のステップサイズを減少させる従来の手法とは対照的に、批評家にとって一定のステップサイズは期待の収束を保証するのに十分であることを示す。この重要な洞察は、アクター単独でステップサイズを小さくすることは、アクターと批評家の両方にとってノイズを扱うのに十分であることを示している。本研究は,一定のステップサイズに依存する多くのアルゴリズムの実用的成功を理論的に支援するものである。 In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results. Previous works provide local convergence guarantees with a sample complexity of $O(\epsilon^{-2})$ for bounding the squared gradient of the return, which translates to a global sample complexity of $O(\epsilon^{-4})$ using the gradient domination lemma. In contrast to traditional methods that employ decreasing step sizes for both the actor and critic, we demonstrate that a constant step size for the critic is sufficient to ensure convergence in expectation. This key insight reveals that using a decreasing step size for the actor alone is sufficient to handle the noise for both the actor and critic. Our findings provide theoretical support for the practical success of many algorithms that rely on constant step sizes.	翻訳日:2024-10-30 21:35:51 公開日:2024-10-11
# LLMにおけるSAE特性の進化 Evolution of SAE Features Across Layers in LLMs ( http://arxiv.org/abs/2410.08869v1 ) ライセンス: Link先を確認	Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim,	(参考訳) トランスフォーマーベースの言語モデルのためのスパースオートエンコーダは通常、レイヤ毎に独立して定義される。この研究では、隣接層における特徴間の統計的関係を分析し、前方通過を通して機能がどのように進化するかを理解する。私たちは、機能とその最もよく似た隣人のためのグラフ視覚化インターフェイスを提供し、レイヤ間で関連する機能のコミュニティを構築します。いくつかの特徴は、以前の特徴の準ブール結合として表すことができ、いくつかの特徴は、後続のレイヤーでより特殊化される。 Sparse Autoencoders for transformer-based language models are typically defined independently per layer. In this work we analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass. We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers. We find that a considerable amount of features are passed through from a previous layer, some features can be expressed as quasi-boolean combinations of previous features, and some features become more specialized in later layers.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# フラジル・ジャイアンツ : サブポピュレーション・アタックに対するモデルの感受性を理解する Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks ( http://arxiv.org/abs/2410.08872v1 ) ライセンス: Link先を確認	Isha Gupta, Hidde Lycklama, Emanuel Opel, Evan Rose, Anwar Hithnawi,	(参考訳) 機械学習モデルがますます複雑化するにつれ、その堅牢性や信頼性に対する懸念がますます強まっている。これらのモデルの重大な脆弱性はデータ中毒攻撃であり、敵は意図的にトレーニングデータを変更してモデルのパフォーマンスを低下させる。これらの攻撃の特にステルスな形態の1つはサブポピュレーション中毒であり、データセット内の異なるサブグループを対象としており、全体的なパフォーマンスはほぼ無傷である。これらの攻撃がサブポピュレーション内で一般化する能力は、データセット内の疎外されたグループや疎外されたグループに悪影響を及ぼすため、現実世界の設定において重大なリスクをもたらす。本研究では,モデル複雑性がサブポピュレーション中毒に対する感受性にどのように影響するかを検討する。我々は,過パラメータ化モデルが大容量であるため,不注意に記憶し,対象のサブポピュレーションを誤って分類する方法を説明する理論的枠組みを導入する。提案理論を検証するため,一般的なモデルアーキテクチャを用いた大規模画像およびテキストデータセットの広範な実験を行った。以上のパラメータを持つモデルでは, サブポピュレーション中毒に対して有意に脆弱である。さらに、より小さな人間の解釈可能なサブグループに対する攻撃は、これらのモデルによって検出されないことが多い。これらの結果は、サブポピュレーションの脆弱性に対処する防衛を開発する必要性を強調している。 As machine learning models become increasingly complex, concerns about their robustness and trustworthiness have become more pressing. A critical vulnerability of these models is data poisoning attacks, where adversaries deliberately alter training data to degrade model performance. One particularly stealthy form of these attacks is subpopulation poisoning, which targets distinct subgroups within a dataset while leaving overall performance largely intact. The ability of these attacks to generalize within subpopulations poses a significant risk in real-world settings, as they can be exploited to harm marginalized or underrepresented groups within the dataset. In this work, we investigate how model complexity influences susceptibility to subpopulation poisoning attacks. We introduce a theoretical framework that explains how overparameterized models, due to their large capacity, can inadvertently memorize and misclassify targeted subpopulations. To validate our theory, we conduct extensive experiments on large-scale image and text datasets using popular model architectures. Our results show a clear trend: models with more parameters are significantly more vulnerable to subpopulation poisoning. Moreover, we find that attacks on smaller, human-interpretable subgroups often go undetected by these models. These results highlight the need to develop defenses that specifically address subpopulation vulnerabilities.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# 依存型高階論理における選択実験 Experiments with Choice in Dependently-Typed Higher-Order Logic ( http://arxiv.org/abs/2410.08874v1 ) ライセンス: Link先を確認	Daniel Ranalter, Chad E. Brown, Cezary Kaliszyk,	(参考訳) 最近、DHOLと呼ばれる高階論理の拡張が導入され、言語を依存型で豊かにし、強力な拡張型理論を生み出した。本稿では,DHOLに選択を付加する方法を2つ提案する。我々は、ヒルベルトの不定選択作用素$\epsilon$によりDHOL項構造を拡張し、選択項のHOL選択への変換を定義し、既存のDHOLからHOLへの変換を拡張し、翻訳の延長が完全であることを示し、音性について議論する。我々は最終的に、選択を必要とするHOL問題の集合に対する拡張翻訳を評価する。 Recently an extension to higher-order logic -- called DHOL -- was introduced, enriching the language with dependent types, and creating a powerful extensional type theory. In this paper we propose two ways how choice can be added to DHOL. We extend the DHOL term structure by Hilbert's indefinite choice operator $\epsilon$, define a translation of the choice terms to HOL choice that extends the existing translation from DHOL to HOL and show that the extension of the translation is complete and give an argument for soundness. We finally evaluate the extended translation on a set of dependent HOL problems that require choice.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# 動的ネットワークのオンライン設計 Online design of dynamic networks ( http://arxiv.org/abs/2410.08875v1 ) ライセンス: Link先を確認	Duo Wang, Andrea Araldo, Mounim El Yacoubi,	(参考訳) ネットワーク(例えば、通信または輸送ネットワーク)の設計は、ネットワークの運用前に計画段階で主にオフラインで行われる。一方で、動的ネットワーク、すなわち時間とともに進化するネットワークを特徴づけることに多大な努力が注がれている。本稿では,動的ネットワークのオンライン設計手法を提案する。ネットワークが動的で確率的な環境で運用する必要がある場合、その必要が生じる。この場合、環境の変化に反応し、特定のパフォーマンス目標を維持するために、時間をかけて、リアルタイムでネットワークを構築したい場合もあります。我々はモンテカルロ木探索に基づく転がり地平線最適化により,このオンライン設計問題に取り組む。オンラインネットワーク設計の可能性は、未来的な動的公共交通ネットワークの設計のために示されており、バス路線は、確率的なユーザ需要に適応するために、その場で構築されている。このようなシナリオでは、ニューヨーク市のタクシーデータからの要求をシミュレートする、最先端の動的車両ルーティング問題(VRP)解決法と比較する。車両軌道を独立に拡張する従来のVRP法とは違って,複雑なユーザ旅行が可能な路線バスの構造的ネットワークの構築が可能となり,システム性能が向上する。 Designing a network (e.g., a telecommunication or transport network) is mainly done offline, in a planning phase, prior to the operation of the network. On the other hand, a massive effort has been devoted to characterizing dynamic networks, i.e., those that evolve over time. The novelty of this paper is that we introduce a method for the online design of dynamic networks. The need to do so emerges when a network needs to operate in a dynamic and stochastic environment. In this case, one may wish to build a network over time, on the fly, in order to react to the changes of the environment and to keep certain performance targets. We tackle this online design problem with a rolling horizon optimization based on Monte Carlo Tree Search. The potential of online network design is showcased for the design of a futuristic dynamic public transport network, where bus lines are constructed on the fly to better adapt to a stochastic user demand. In such a scenario, we compare our results with state-of-the-art dynamic vehicle routing problem (VRP) resolution methods, simulating requests from a New York City taxi dataset. Differently from classic VRP methods, that extend vehicle trajectories in isolation, our method enables us to build a structured network of line buses, where complex user journeys are possible, thus increasing system performance.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# 多変量時系列異常検出のためのグラフアライメント Interdependency Matters: Graph Alignment for Multivariate Time Series Anomaly Detection ( http://arxiv.org/abs/2410.08877v1 ) ライセンス: Link先を確認	Yuanyi Wang, Haifeng Sun, Chengsen Wang, Mengde Zhu, Jingyu Wang, Wei Tang, Qi Qi, Zirui Zhuang, Jianxin Liao,	(参考訳) 多変量時系列(MTS)における異常検出は、データマイニングや産業における様々な用途において重要である。現在の産業的手法は、一般に教師なしの学習タスクとして異常検出にアプローチし、ノイズのないラベルなしデータセットの正規分布を推定することによって偏差を推定することを目的としている。これらの手法は、精度を高めるために、グラフ構造を通してチャネル間の相互依存性をますます取り入れている。しかし,MSSチャネル間の相互依存性の変化は,通常のデータから異常データへのシフトが重要であるため,相互依存性の役割は従来より重要である。この観測は、これらの相互依存グラフ列の変化によって \textit{anomalies が検出できることを示唆している。グラフアライメントによるMADGA (MTS Anomaly Detection via Graph Alignment) をグラフアライメント (GA) 問題として再定義する。 MADGAは、サブシーケンスを動的にグラフに変換して、進化する相互依存性を捉える。グラフアライメントは、これらのグラフ間で行われ、コストを最小化し、通常のデータの距離を効果的に最小化し、異常データに対して最大化するアライメント計画が最適化される。 GAアプローチでは、ノードとエッジの両方を明示的にアライメントし、ノードはワッサーシュタイン距離、エッジはグロモフ=ワッサーシュタイン距離を用いる。我々の知る限り、これはGAのMTS異常検出への最初の応用であり、この目的のために相互依存を明示的に活用する。多様な実世界のデータセットに関する大規模な実験は、MADGAの有効性を検証し、異常を検出し、相互依存を区別する能力を示し、さまざまなシナリオで一貫して最先端の達成を実現している。 Anomaly detection in multivariate time series (MTS) is crucial for various applications in data mining and industry. Current industrial methods typically approach anomaly detection as an unsupervised learning task, aiming to identify deviations by estimating the normal distribution in noisy, label-free datasets. These methods increasingly incorporate interdependencies between channels through graph structures to enhance accuracy. However, the role of interdependencies is more critical than previously understood, as shifts in interdependencies between MTS channels from normal to anomalous data are significant. This observation suggests that \textit{anomalies could be detected by changes in these interdependency graph series}. To capitalize on this insight, we introduce MADGA (MTS Anomaly Detection via Graph Alignment), which redefines anomaly detection as a graph alignment (GA) problem that explicitly utilizes interdependencies for anomaly detection. MADGA dynamically transforms subsequences into graphs to capture the evolving interdependencies, and Graph alignment is performed between these graphs, optimizing an alignment plan that minimizes cost, effectively minimizing the distance for normal data and maximizing it for anomalous data. Uniquely, our GA approach involves explicit alignment of both nodes and edges, employing Wasserstein distance for nodes and Gromov-Wasserstein distance for edges. To our knowledge, this is the first application of GA to MTS anomaly detection that explicitly leverages interdependency for this purpose. Extensive experiments on diverse real-world datasets validate the effectiveness of MADGA, demonstrating its capability to detect anomalies and differentiate interdependencies, consistently achieving state-of-the-art across various scenarios.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# マルチモーダル核融合による制御核融合のQ分布予測 Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion ( http://arxiv.org/abs/2410.08879v1 ) ライセンス: Link先を確認	Shiao Wang, Yifeng Wang, Qingchuan Ma, Xiao Wang, Ning Yan, Qingquan Yang, Guosheng Xu, Jin Tang,	(参考訳) Q分布予測は、制御された核融合における重要な研究方向であり、予測課題を解決するための重要なアプローチとしてディープラーニングが出現する。本稿では,Q分布予測の複雑さに対処するために,ディープラーニング技術を活用する。具体的には,コンピュータビジョンにおけるマルチモーダル融合法について検討し,元の1次元データと2次元線画像データを統合してバイモーダル入力を生成する。さらに,特徴抽出とバイモーダル情報の相互融合にトランスフォーマーの注意機構を用いる。大規模な実験により,本手法の有効性が検証され,Q分布の予測誤差が著しく低減された。 Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D data to form a bimodal input. Additionally, we employ the Transformer's attention mechanism for feature extraction and the interactive fusion of bimodal information. Extensive experiments validate the effectiveness of our approach, significantly reducing prediction errors in Q-distribution.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# スピン鎖の量子セルオートマトンとカテゴリー双対性 Quantum cellular automata and categorical duality of spin chains ( http://arxiv.org/abs/2410.08884v1 ) ライセンス: Link先を確認	Corey Jones, Kylan Schatz, Dominic J. Williamson,	(参考訳) 二重性は量子スピン鎖の研究において中心的な役割を果たし、量子相図や相転移の構造に関する洞察を与える。本研究では、スピン鎖上の対称性を反映する局所作用素の代数間の有界スプレッド同型として定義される圏双対について研究する。我々は、行列積作用素代数で表されるユニタリ融合圏に対応する一般化大域対称性を考える。双対性に関する根本的な問題は、単位行列積作用素を尊重するすべての局所作用素によって生成されるより大きい代数上で量子セルオートマトンに拡張できるかどうかである。有限群の現場表現である従来の大域的対称性に対して、この大きな代数は単に鎖内の個々のスピンに関連する代数のテンソル積である。ドップリッヒ=ハーグ=ロバーツ双加群の機械を用いた拡張問題の解を提案する。我々の解は、双対性の拡張が存在するときの鮮明な分類的基準を提供する。可能な拡張の集合が、関連する対称性圏における可逆対象上のトーソルを形成することを示す。提案手法は,従来のグローバル対称性を尊重する量子セルオートマトンに適用した場合の文献から既存の結果を復元する。 Dualities play a central role in the study of quantum spin chains, providing insight into the structure of quantum phase diagrams and phase transitions. In this work we study categorical dualities, which are defined as bounded-spread isomorphisms between algebras of symmetry-respecting local operators on a spin chain. We consider generalized global symmetries that correspond to unitary fusion categories, which are represented by matrix-product operator algebras. A fundamental question about dualities is whether they can be extended to quantum cellular automata on the larger algebra generated by all local operators that respect the unit matrix-product operator. For conventional global symmetries, which are on-site representations of finite groups, this larger algebra is simply the tensor product of algebras associated to individual spins in the chain. We present a solution to the extension problem using the machinery of Doplicher-Haag-Roberts bimodules. Our solution provides a crisp categorical criterion for when an extension of a duality exists. We show that the set of possible extensions form a torsor over the invertible objects in the relevant symmetry category. Our approach recovers existing results from the literature when applied to quantum cellular automata that respect a conventional global symmetry.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# GPTは設計原理に基づいて図形設計を評価することができるか? Can GPTs Evaluate Graphic Design Based on Design Principles? ( http://arxiv.org/abs/2410.08885v1 ) ライセンス: Link先を確認	Daichi Haraguchi, Naoto Inoue, Wataru Shimoda, Hayato Mitani, Seiichi Uchida, Kota Yamaguchi,	(参考訳) 基礎モデルの最近の進歩は、グラフィックデザイン生成において有望な能力を示している。図形設計の評価にLMM(Large Multimodal Models)を用いた研究がいくつか行われており、LMMが品質を適切に評価できると仮定しているが、その評価が信頼できるかどうかは定かではない。グラフィックデザインの品質を評価する一つの方法は、デザインが設計者の一般的な実践である基本的なグラフィックデザイン原則に準拠しているかどうかを評価することである。本稿では,60名の被験者から収集した注釈を用いた設計原則に基づくGPTに基づく評価とヒューリスティック評価の挙動を比較した。我々の実験では、GPTは細部を区別することはできないが、人間のアノテーションと合理的に良い相関関係を持ち、設計原理に基づくヒューリスティックな指標に類似した傾向を示し、グラフィックデザインの品質を評価することができることを示唆している。私たちのデータセットはhttps://cyberagentailab.github.io/Graphic-design-evaluationで公開されています。 Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer's common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at https://cyberagentailab.github.io/Graphic-design-evaluation .	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# 機械学習を用いた銀行ローン予測 Bank Loan Prediction Using Machine Learning Techniques ( http://arxiv.org/abs/2410.08886v1 ) ライセンス: Link先を確認	F M Ahosanul Haque, Md. Mahedi Hassan,	(参考訳) 銀行は、消費者やビジネスローンを通じて、あらゆる金融エコシステムにおける経済の発展に重要である。しかし、貸与はリスクを生じさせるため、銀行はデフォルトの確率を減らすために、応募者の財務的地位を決定する必要がある。そのため、現在、多くの銀行がデータ分析と最先端技術を採用し、そのプロセスにおけるより良い意思決定に到達している。機械学習アルゴリズムを適用する予測モデリング手法により、返済確率が規定される。本研究プロジェクトでは,ローン承認プロセスの精度と効率をさらに向上するために,いくつかの機械学習手法を適用する。我々は、機械学習手法を用いて、148,670のインスタンスと37の属性のデータセットに取り組んできた。対象のプロパティはローンのアプリケーションを「承認」グループと「承認」グループに分離します。さまざまな機械学習技術、すなわち、決定木分類、AdaBoosting、ランダムフォレスト分類、SVM、GaussianNBが使用されている。その後、モデルは訓練され、評価された。中でも最高の性能のアルゴリズムはAdaBoostingであり、99.99%の精度を達成した。以上の結果から,アンサンブル学習が,ローン承認決定の予測能力の向上に有効であることを示す。提示された作業は、金融分野に機械学習を適用する上で有用な洞察を提供する、極めて正確で効率的なローン予測モデルを達成する可能性を示している。 Banks are important for the development of economies in any financial ecosystem through consumer and business loans. Lending, however, presents risks; thus, banks have to determine the applicant's financial position to reduce the probabilities of default. A number of banks have currently, therefore, adopted data analytics and state-of-the-art technology to arrive at better decisions in the process. The probability of payback is prescribed by a predictive modeling technique in which machine learning algorithms are applied. In this research project, we will apply several machine learning methods to further improve the accuracy and efficiency of loan approval processes. Our work focuses on the prediction of bank loan approval; we have worked on a dataset of 148,670 instances and 37 attributes using machine learning methods. The target property segregates the loan applications into "Approved" and "Denied" groups. various machine learning techniques have been used, namely, Decision Tree Categorization, AdaBoosting, Random Forest Classifier, SVM, and GaussianNB. Following that, the models were trained and evaluated. Among these, the best-performing algorithm was AdaBoosting, which achieved an incredible accuracy of 99.99%. The results therefore show how ensemble learning works effectively to improve the prediction skills of loan approval decisions. The presented work points to the possibility of achieving extremely accurate and efficient loan prediction models that provide useful insights for applying machine learning to financial domains.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# 近代ホップフィールドネットワークによる核融合におけるメモリ対応Q分布予測 Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network ( http://arxiv.org/abs/2410.08889v1 ) ライセンス: Link先を確認	Qingchuan Ma, Shiao Wang, Tong Zheng, Xiaodong Dai, Yifeng Wang, Qingquan Yang, Xiao Wang,	(参考訳) 本研究は, クリーンエネルギーソリューションの進展の鍵となる, 長期安定核融合課題におけるQ分布予測の課題に対処する。本稿では,現代ホップフィールドネットワークを応用し,歴史写真から連想記憶を取り入れた革新的なディープラーニングフレームワークを提案する。新たにコンパイルされたデータセットを用いて,Q-distribution予測の強化におけるアプローチの有効性を実証する。提案手法は, 過去の記憶情報をこの文脈で初めて活用し, 予測精度を向上し, 核融合研究の最適化に寄与することを示す。 This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approach in enhancing Q-distribution prediction. The proposed method represents a significant advancement by leveraging historical memory information for the first time in this context, showcasing improved prediction accuracy and contributing to the optimization of nuclear fusion research.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# モアレ量子材料における強相互作用性双極子励起子の超輝度 Superradiance of strongly interacting dipolar excitons in moiré quantum materials ( http://arxiv.org/abs/2410.08891v1 ) ライセンス: Link先を確認	Jan Kumlin, Ajit Srivastava, Thomas Pohl,	(参考訳) 2次元のヘテロ構造で生成されるモワーイ格子は相互作用する電子と励起子のリッチな多体物理学を示し、同時に有望な光電子応用を示唆している。そこで,本研究では,モーアエ格子の深いサブ波長特性と強い励起オンサイト相互作用から生じるモアエ励起子の協調放射率について検討する。特に, 層間励起子間の静的双極子-双極子相互作用がそれらの協調光学特性に強く影響し, モアレエ励起子の秩序相の超放射性を高めつつ, 乱れ状態の超放射性を抑制することを示した。さらに,ドーピングは励起子のサブラジカル状態の超ラジカルダイナミクスを発生させることにより,光学的協調性の直接制御を可能にすることを示す。以上の結果から,分子間相互作用が強く相互作用する多体系における協調光学現象を探索するためのユニークなプラットフォームとして,分子間モワールエキシトンが提供され,量子非線形光学への応用が期待できることが示された。 Moir\'e lattices created in two-dimensional heterostructures exhibit rich many-body physics of interacting electrons and excitons and, at the same time, suggest promising optoelectronic applications. Here, we study the cooperative radiance of moir\'e excitons that is demonstrated to emerge from the deep subwavelength nature of the moir\'e lattice and the strong excitonic onsite interaction. In particular, we show that the static dipole-dipole interaction between interlayer excitons can strongly affect their cooperative optical properties, suppressing superradiance of disordered states while enhancing superradiance of ordered phases of moir\'e excitons. Moreover, we show that doping permits direct control of optical cooperativity, e.g., by generating supperradiant dynamics of otherwise subradiant states of excitons. Our results show that interlayer moir\'e excitons offer a unique platform for exploring cooperative optical phenomena in strongly interacting many-body systems, thus, holding promise for applications in quantum nonlinear optics.	翻訳日:2024-10-30 21:26:05 公開日:2024-10-11
# フェデレーション・ラーニングの実践 - 振り返りと予測 Federated Learning in Practice: Reflections and Projections ( http://arxiv.org/abs/2410.08892v1 ) ライセンス: Link先を確認	Katharine Daly, Hubert Eichner, Peter Kairouz, H. Brendan McMahan, Daniel Ramage, Zheng Xu,	(参考訳) Federated Learning(FL)は、複数のエンティティがローカルデータを交換することなく、共同で共有モデルを学ぶことができる機械学習技術である。過去10年間で、FLシステムは大きな進歩を遂げ、さまざまな学習領域にわたる数百万のデバイスにスケールアップし、意味のある差分プライバシー(DP)保証を提供してきた。 Google、Apple、Metaといった組織によるプロダクションシステムは、FLの現実的な適用性を実証しています。しかし、サーバ側のDP保証の検証や異種デバイス間のトレーニングの調整など、大きな課題が残っている。さらに、大規模な(マルチモーダル)モデルや、トレーニング、推論、パーソナライゼーションの間の曖昧な線のような新興トレンドは、従来のFLフレームワークに挑戦する。そこで本稿では,厳密な定義よりもプライバシの原則を優先する再定義FLフレームワークを提案する。信頼性のある実行環境とオープンソースエコシステムを活用して、これらの課題に対処し、FLの今後の進歩を促進するための道筋を図示します。 Federated Learning (FL) is a machine learning technique that enables multiple entities to collaboratively learn a shared model without exchanging their local data. Over the past decade, FL systems have achieved substantial progress, scaling to millions of devices across various learning domains while offering meaningful differential privacy (DP) guarantees. Production systems from organizations like Google, Apple, and Meta demonstrate the real-world applicability of FL. However, key challenges remain, including verifying server-side DP guarantees and coordinating training across heterogeneous devices, limiting broader adoption. Additionally, emerging trends such as large (multi-modal) models and blurred lines between training, inference, and personalization challenge traditional FL frameworks. In response, we propose a redefined FL framework that prioritizes privacy principles rather than rigid definitions. We also chart a path forward by leveraging trusted execution environments and open-source ecosystems to address these challenges and facilitate future advancements in FL.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# ドラマ「Mamba-Enabled Model-Based Reinforcement Learning is Sample and Parameter Efficient」 Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient ( http://arxiv.org/abs/2410.08893v1 ) ライセンス: Link先を確認	Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill,	(参考訳) モデルベース強化学習(RL)は、ほとんどのモデルフリーなRLアルゴリズムを悩ませるデータ非効率性に対する解決策を提供する。しかしながら、堅牢な世界モデルを学ぶには、計算とトレーニングにコストがかかる複雑で深いアーキテクチャを必要とすることが多い。世界モデルの中では、動的モデルは特に正確な予測に不可欠であり、様々な動的モデルアーキテクチャがそれぞれ独自の課題を持って検討されている。現在、リカレントニューラルネットワーク(RNN)ベースの世界モデルは、グラデーションの消滅や、長期的な依存関係を効果的に取得することの難しさといった問題に直面している。対照的に、トランスフォーマーの使用は、メモリと計算の複雑さが$O(n^2)$となり、$n$がシーケンス長を表すという、自己保持機構のよく知られた問題に悩まされている。これらの課題に対処するために、我々は状態空間モデル(SSM)に基づく世界モデルを提案し、特にMambaをベースとして、長期的依存関係を効果的に把握し、より長いトレーニングシーケンスの使用を容易にし、メモリと計算の複雑さを$O(n)で達成する。また、トレーニングの初期段階において、誤った世界モデルによって引き起こされる亜最適性を緩和する新しいサンプリング手法を導入し、前述の手法と組み合わせて、700万のトレーニング可能なパラメータワールドモデルのみを用いて、他の最先端モデルベースRLアルゴリズムに匹敵する正規化スコアを得る。このモデルはアクセス可能で、市販のラップトップでトレーニングすることができる。私たちのコードはhttps://github.com/realwenlongwang/drama.git.comで利用可能です。 Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often demands complex and deep architectures, which are expensive to compute and train. Within the world model, dynamics models are particularly crucial for accurate predictions, and various dynamics-model architectures have been explored, each with its own set of challenges. Currently, recurrent neural network (RNN) based world models face issues such as vanishing gradients and difficulty in capturing long-term dependencies effectively. In contrast, use of transformers suffers from the well-known issues of self-attention mechanisms, where both memory and computational complexity scale as $O(n^2)$, with $n$ representing the sequence length. To address these challenges we propose a state space model (SSM) based world model, specifically based on Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and facilitating the use of longer training sequences efficiently. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early stages of training, combining it with the aforementioned technique to achieve a normalised score comparable to other state-of-the-art model-based RL algorithms using only a 7 million trainable parameter world model. This model is accessible and can be trained on an off-the-shelf laptop. Our code is available at https://github.com/realwenlongwang/drama.git.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# 脳MRIにおけるT1wとT1マップのコントラスト合成のための条件付き生成モデル Conditional Generative Models for Contrast-Enhanced Synthesis of T1w and T1 Maps in Brain MRI ( http://arxiv.org/abs/2410.08894v1 ) ライセンス: Link先を確認	Moritz Piening, Fabian Altekrüger, Gabriele Steidl, Elke Hattingen, Eike Steidl,	(参考訳) ガドリニウム系造影剤(GBCA)による造影剤は、神経放射線学における腫瘍診断に欠かせないツールである。ガドリニウム投与前後のグリオ芽腫の脳MRI検査に基づいて,ニューラルネットによる増強予測と2つの新しい寄与について検討した。まず、仮想エンハンスメントにおける不確実性定量化のための生成モデル、より正確に条件付き拡散とフローマッチングの可能性について検討する。第2に, 定量的MRIとT1強調画像によるT1スキャンの性能について検討した。 T1重み付きスキャンとは対照的に、これらのスキャンは物理的に意味があり、したがって同等のボクセル範囲の利点がある。これらの2つのモードのネットワーク予測性能と非互換な灰色の値スケールを比較するために,Dice と Jaccard のスコアを用いたコントラスト強調領域のセグメンテーションを評価する。モデル全体では、T1重み付きスキャンよりもT1スキャンの方がセグメンテーションが優れている。 Contrast enhancement by Gadolinium-based contrast agents (GBCAs) is a vital tool for tumor diagnosis in neuroradiology. Based on brain MRI scans of glioblastoma before and after Gadolinium administration, we address enhancement prediction by neural networks with two new contributions. Firstly, we study the potential of generative models, more precisely conditional diffusion and flow matching, for uncertainty quantification in virtual enhancement. Secondly, we examine the performance of T1 scans from quantitive MRI versus T1-weighted scans. In contrast to T1-weighted scans, these scans have the advantage of a physically meaningful and thereby comparable voxel range. To compare network prediction performance of these two modalities with incompatible gray-value scales, we propose to evaluate segmentations of contrast-enhanced regions of interest using Dice and Jaccard scores. Across models, we observe better segmentations with T1 scans than with T1-weighted scans.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# Few-Shot Vision-Language Model Adaptationのためのキャリブレーションキャッシュモデル Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation ( http://arxiv.org/abs/2410.08895v1 ) ライセンス: Link先を確認	Kun Ding, Qiang Yu, Haojian Zhang, Gaofeng Meng, Shiming Xiang,	(参考訳) キャッシュベースのアプローチは、視覚言語モデル(VLM)の適応には効率的かつ効率的である。それでも、既存のキャッシュモデルは、3つの重要な側面を見落としている。 1) 事前学習VLMは画像-テキスト類似性に最適化されており, 画像-画像類似性の重要性を無視し, 事前学習と適応のギャップを生じさせる。 2) 現在のキャッシュモデルは, 重量関数を構築しながら, トレーニングサンプル間の複雑な関係を無視するNadaraya-Watson (N-W) 推定器に基づいている。 3) 限られたサンプルの条件下では, キャッシュモデルにより生成されたロジットは不確実性が高く, 信頼性を考慮せずに直接これらのロジットを使用することは問題となる可能性がある。本研究は上記の課題に対処するための3つのキャリブレーションモジュールを提案する。類似度キャリブレーションは、ラベルなし画像を用いて画像と画像の類似性を洗練する。学習可能なプロジェクション層をCLIPのトレーニング済みイメージエンコーダ上に残差接続し,自己監督型コントラスト損失を最小化してパラメータを最適化する。重み行列を重み関数に導入し、トレーニングサンプル間の関係を適切にモデル化し、既存のキャッシュモデルをN-W推定器よりも精度の高いガウス過程(GP)回帰器に変換する。信頼性校正は、GP回帰によって計算される予測分散を利用して、キャッシュモデルのロジットを動的に再スケールし、キャッシュモデルの出力がその信頼性レベルに基づいて適切に調整されることを保証する。さらに,GPの複雑さを低減するため,グループベースの学習戦略を提案する。上記の設計を統合することで、トレーニング不要とトレーニング不要の両方の亜種を提案する。 11個のショット分類データセットに対する大規模な実験により、提案手法が最先端の性能を達成できることが検証された。 Cache-based approaches stand out as both effective and efficient for adapting vision-language models (VLMs). Nonetheless, the existing cache model overlooks three crucial aspects. 1) Pre-trained VLMs are mainly optimized for image-text similarity, neglecting the importance of image-image similarity, leading to a gap between pre-training and adaptation. 2) The current cache model is based on the Nadaraya-Watson (N-W) estimator, which disregards the intricate relationships among training samples while constructing weight function. 3) Under the condition of limited samples, the logits generated by cache model are of high uncertainty, directly using these logits without accounting for the confidence could be problematic. This work presents three calibration modules aimed at addressing the above challenges. Similarity Calibration refines the image-image similarity by using unlabeled images. We add a learnable projection layer with residual connection on top of the pre-trained image encoder of CLIP and optimize the parameters by minimizing self-supervised contrastive loss. Weight Calibration introduces a precision matrix into the weight function to adequately model the relation between training samples, transforming the existing cache model to a Gaussian Process (GP) regressor, which could be more accurate than N-W estimator. Confidence Calibration leverages the predictive variances computed by GP Regression to dynamically re-scale the logits of cache model, ensuring that the cache model's outputs are appropriately adjusted based on their confidence levels. Besides, to reduce the high complexity of GPs, we further propose a group-based learning strategy. Integrating the above designs, we propose both training-free and training-required variants. Extensive experiments on 11 few-shot classification datasets validate that the proposed methods can achieve state-of-the-art performance.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# MAD-TD: モデル拡張データによる高更新比RLの安定化 MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL ( http://arxiv.org/abs/2410.08896v1 ) ライセンス: Link先を確認	Claas A Voelcker, Marcel Hussing, Eric Eaton, Amir-massoud Farahmand, Igor Gilitschenski,	(参考訳) 深層強化学習(英語版) (RL) エージェントを構築し、サンプルが少ないことで良い政策を見つけることは、非常に難しいことが判明した。サンプル効率を達成するために、最近の研究は、新しいサンプル毎に多数の勾配ステップを持つニューラルネットワークの更新について検討している。このような高アップデート・トゥ・データ(UTD)比は、強い経験的性能を示す一方で、トレーニングプロセスに不安定をもたらす。従来のアプローチでは、この不安定性に対処するために、周期的なニューラルネットワークパラメータリセットに依存する必要があったが、トレーニングプロセスの再起動は多くの現実世界のアプリケーションでは不可能であり、リセット間隔をチューニングする必要がある。本稿では,限定サンプルを用いた安定トレーニングにおける中核的課題の1つとして,学習価値関数が役立たずの行動に一般化できないことに着目した。我々は、学習された世界モデルから少量のデータで、政治外のRLトレーニングプロセスを強化することで、この問題を直接緩和する。本稿では,時間差分学習のためのモデル拡張データ (MAD-TD) を用いて,高UTDトレーニングを安定させ,DeepMind制御スイートの最も困難なタスクにおいて,競争性能を実現する。実験では,データ生成に優れたモデルを採用することの重要性,MAD-TDが価値過大評価に対処する能力,そして継続学習における実用的安定性の向上を強調した。 Building deep reinforcement learning (RL) agents that find a good policy with few samples has proven notoriously challenging. To achieve sample efficiency, recent work has explored updating neural networks with large numbers of gradient steps for every new sample. While such high update-to-data (UTD) ratios have shown strong empirical performance, they also introduce instability to the training process. Previous approaches need to rely on periodic neural network parameter resets to address this instability, but restarting the training process is infeasible in many real-world applications and requires tuning the resetting interval. In this paper, we focus on one of the core difficulties of stable training with limited samples: the inability of learned value functions to generalize to unobserved on-policy actions. We mitigate this issue directly by augmenting the off-policy RL training process with a small amount of data generated from a learned world model. Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD) uses small amounts of generated data to stabilize high UTD training and achieve competitive performance on the most challenging tasks in the DeepMind control suite. Our experiments further highlight the importance of employing a good model to generate data, MAD-TD's ability to combat value overestimation, and its practical stability gains for continued learning.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# 低次元から高次元への一般化とその長さ一般化への応用 Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization ( http://arxiv.org/abs/2410.08898v1 ) ライセンス: Link先を確認	Yang Chen, Yitao Liang, Zhouchen Lin,	(参考訳) 低次元から高次元への一般化(LDHD)は、訓練データが高次元テスト空間の低次元部分空間に制限されるOOD(Out-of-Distribution)の特殊な場合である。各インスタンスが潜時変数から生成され、潜時変数の次元が問題スケールを反映していると仮定すると、潜時空間におけるLDHD一般化によって、長さ一般化における固有のスケーリングチャレンジを捉えることができる。理論的には、LDHDの一般化は、事前知識を利用して適切な帰納バイアスを与えることなく、一般的には達成不可能であることを実証する。具体的には,ブール関数のLDHD一般化について検討する。我々は、(S)GDで訓練された異なるアーキテクチャが、独立集合 w.r.t. 異なる独立集合に収束することを検証する。 LDHD一般化は、対象関数がこの帰納バイアスと一致する場合にのみ達成可能である。 LDHDの一般化から長さの一般化への洞察を応用し、より優れたLDHDの一般化を実現するために、構造潜在空間を変更することによるCoTの有効性を説明する。また,データ形式のような内在的なLDHD一般化とニュアンスの両方を扱うために,位置埋め込み設計の原理を提案する。原理に従って,データフォーマットのニュアンスを扱うために,RPEを修復するRPE-Squareという新しい位置埋め込みを提案する。 Low-Dimension-to-High-Dimension (LDHD) generalization is a special case of Out-of-Distribution (OOD) generalization, where the training data are restricted to a low-dimensional subspace of the high-dimensional testing space. Assuming that each instance is generated from a latent variable and the dimension of the latent variable reflects the problem scale, the inherent scaling challenge in length generalization can be captured by the LDHD generalization in the latent space. We theoretically demonstrate that LDHD generalization is generally unattainable without exploiting prior knowledge to provide appropriate inductive bias. Specifically, we explore LDHD generalization in Boolean functions. We verify that different architectures trained with (S)GD converge to \emph{min-degree interpolators w.r.t. different independent sets}. LDHD generalization is achievable if and only if the target function coincides with this inductive bias. Applying the insights from LDHD generalization to length generalization, we explain the effectiveness of CoT as changing the structure latent space to enable better LDHD generalization. We also propose a principle for position embedding design to handle both the inherent LDHD generalization and the nuisances such as the data format. Following the principle, we propose a novel position embedding called RPE-Square that remedies the RPE for dealing with the data format nuisance.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# データ構造とアルゴリズムのコースにおけるChatGPTの利用:教師の立場から Utilizing ChatGPT in a Data Structures and Algorithms Course: A Teaching Assistant's Perspective ( http://arxiv.org/abs/2410.08899v1 ) ライセンス: Link先を確認	Pooriya Jamie, Reyhaneh Hajihashemi, Sharareh Alipour,	(参考訳) ChatGPTのような大きな言語モデル(LLM)を統合することは、コンピュータサイエンス教育の分野に革命をもたらしている。これらのモデルは、学生の学習を充実させ、迅速なフィードバックと補足的な学習資源を提供するために、教師支援(TA)を支援する新しい可能性を提供する。この研究は、データ構造とアルゴリズム(DSA)コースにおけるChatGPTの使用について、特にTAの監督と組み合わせて検討している。その結果,ChatGPTを構造化プロンプトとアクティブTAガイダンスに組み込むことで,複雑なアルゴリズム概念の理解が促進され,エンゲージメントが向上し,学業成績が向上することが示唆された。しかし、学術的整合性と複雑な問題に取り組む上でのLLMの限界に対処する上での課題が存在する。この研究は、学生がAI生成コンテンツへの依存を減らし、全体的な教育的影響を増幅する上で、活発なTA関与の重要性を浮き彫りにしている。結果は、LLMは教育に有利であるが、その成功には継続的な監視と、AIと人間の指導の思慮深いバランスが必要であることを示唆している。 Integrating large language models (LLMs) like ChatGPT is revolutionizing the field of computer science education. These models offer new possibilities for enriching student learning and supporting teaching assistants (TAs) in providing prompt feedback and supplementary learning resources. This research delves into the use of ChatGPT in a data structures and algorithms (DSA) course, particularly when combined with TA supervision. The findings demonstrate that incorporating ChatGPT with structured prompts and active TA guidance enhances students' understanding of intricate algorithmic concepts, boosts engagement, and elevates academic performance. However, challenges exist in addressing academic integrity and the limitations of LLMs in tackling complex problems. The study underscores the importance of active TA involvement in reducing students' reliance on AI-generated content and amplifying the overall educational impact. The results suggest that while LLMs can be advantageous for education, their successful integration demands continuous oversight and a thoughtful balance between AI and human guidance.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# ソーシャルメディアにおけるクロスドメイン調停スタンス分類のベンチマーク A Benchmark for Cross-Domain Argumentative Stance Classification on Social Media ( http://arxiv.org/abs/2410.08900v1 ) ライセンス: Link先を確認	Jiaqing Yuan, Ruijie Xi, Munindar P. Singh,	(参考訳) 論証的姿勢分類は、特定のトピックに対する著者の視点を特定する上で重要な役割を担っている。しかし、諸藩にまたがって多種多様な議論文を生成することは困難である。既存のベンチマークは、単一のドメインからのものや、限られたトピックにフォーカスすることが多い。さらに、正確なラベリングのための手動アノテーションは、時間と労力がかかる。これらの課題に対処するため、我々は、人間のアノテーションの必要性を回避するために、プラットフォームルール、手軽に利用可能な専門家によるコンテンツ、および大きな言語モデルを活用することを提案する。提案手法は,4,498件のトピック的クレームと3つのソースからの30,961件の引数からなるマルチドメインベンチマークを生成する。完全な教師付き、ゼロショット、数ショットの設定でデータセットをベンチマークし、異なる方法論の長所と短所に光を当てます。私たちは匿名のためにこの研究でデータセットとコードを公開しています。 Argumentative stance classification plays a key role in identifying authors' viewpoints on specific topics. However, generating diverse pairs of argumentative sentences across various domains is challenging. Existing benchmarks often come from a single domain or focus on a limited set of topics. Additionally, manual annotation for accurate labeling is time-consuming and labor-intensive. To address these challenges, we propose leveraging platform rules, readily available expert-curated content, and large language models to bypass the need for human annotation. Our approach produces a multidomain benchmark comprising 4,498 topical claims and 30,961 arguments from three sources, spanning 21 domains. We benchmark the dataset in fully supervised, zero-shot, and few-shot settings, shedding light on the strengths and limitations of different methodologies. We release the dataset and code in this study at hidden for anonymity.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# 最適輸送による生涯イベント検出 Lifelong Event Detection via Optimal Transport ( http://arxiv.org/abs/2410.08905v1 ) ライセンス: Link先を確認	Viet Dao, Van-Cuong Pham, Quyen Tran, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen,	(参考訳) 連続イベント検出(Continuous Event Detection, CED)は、(新しいイベントタイプを持つ)新しいタスクの学習が、以前のタスクのパフォーマンスを損なうという破滅的な忘れ現象のために、非常に難しい課題となる。本稿では,各分類モジュールの最適化と,事前学習した言語モデルによって定義されたクラス固有の性質を整合させるため,最適な輸送原理を生かしたライフサイクルイベント検出(LEDOT)を提案する。提案手法は,リプレイセット,プロトタイプラテント表現,革新的な最適輸送コンポーネントを統合する。 MAVENとACEデータセットの大規模な実験は、LEDOTの優れたパフォーマンスを示し、一貫して最先端のベースラインを上回っている。その結果、LEDOTは継続的な事象検出の先駆的なソリューションとして評価され、進化する環境における破滅的な忘れに対処するための、より効果的でニュアンスなアプローチを提供する。 Continual Event Detection (CED) poses a formidable challenge due to the catastrophic forgetting phenomenon, where learning new tasks (with new coming event types) hampers performance on previous ones. In this paper, we introduce a novel approach, Lifelong Event Detection via Optimal Transport (LEDOT), that leverages optimal transport principles to align the optimization of our classification module with the intrinsic nature of each class, as defined by their pre-trained language modeling. Our method integrates replay sets, prototype latent representations, and an innovative Optimal Transport component. Extensive experiments on MAVEN and ACE datasets demonstrate LEDOT's superior performance, consistently outperforming state-of-the-art baselines. The results underscore LEDOT as a pioneering solution in continual event detection, offering a more effective and nuanced approach to addressing catastrophic forgetting in evolving environments.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# 重み付き単一光子源のキャラクタリゼーションの展望 Oversights in Characterising Heralded Single Photon Sources ( http://arxiv.org/abs/2410.08906v1 ) ライセンス: Link先を確認	Hugh Barrett, Imad I. Faruque,	(参考訳) 理想的な光源の開発は、量子アプリケーションのための統合フォトニクス技術の実践的実装における根本的な課題である。本稿では,本研究で得られた光子源の現状を解析し,輝度や遮蔽効率などの重要なパラメータがいかに特徴付けられるかという不整合を指摘する。次に、文献結果の公平な比較を容易にするための考察を提案する。 The development of ideal sources is a fundamental challenge for the practical implementation of integrated photonic technologies for quantum applications. In this paper we analyse the state-of-the-art in heralded single photon sources; pointing out inconsistencies in the how key parameters, such as brightness and heralding efficiency, are characterised. We then suggest considerations that could be made to facilitate fairer comparison between literature results.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# フォトニック量子状態の低温フィードフォワード Cryogenic Feedforward of a Photonic Quantum State ( http://arxiv.org/abs/2410.08908v1 ) ライセンス: Link先を確認	Frederik Thiele, Niklas Lamberty, Thomas Hummel, Nina A. Lange, Lorenzo M. Procopio, Aishi Barua, Sebastian Lengeling, Viktor Quiring, Christof Eigner, Christine Silberhorn, Tim J. Bartley,	(参考訳) 絡み合ったフォトニック量子状態の測定に基づく変調は、光量子情報処理の基盤技術である。このタスクを低レイテンシで実行するには、単一光子レベルの検出器と電子論理処理と近接した光変調の両方を組み合わせる必要がある。技術的に関係のあるテレコム波長帯では、光子誘起超伝導の分解に基づく高効率、低ノイズ、高速検出器を用いてフォトニック量子状態の検出が最善である。したがって、これらのデバイスをフィードフォワードに使用するには、低温条件下での全てのコンポーネントの相互互換性が必要である。ここでは、量子光源上の準光子数分解測定を用いて低遅延フィードフォワードを示す。具体的には、マルチピクセル超伝導ナノワイヤ単光子検出器、増幅器、ロジック、および4K以下の集積電気光学変調器を用いる。アイドラーモードの光子数測定の条件付きパラメトリックダウンコンバージョン源の信号モードを (23+/-3)nsで変調する。光子数判別は信号モード光子統計を積極的に操作するが、これは光子量子コンピューティングの中心的な要素であり、偏光子源に依存している。これは、測定、増幅、論理、変調を含む最も速い量子フォトニックフィードフォワード実験のための重要なベンチマークである。これは量子コンピューティング、通信、シミュレーションプロトコルに直接的な応用がある。 Modulation conditioned on measurements on entangled photonic quantum states is a cornerstone technology of optical quantum information processing. Performing this task with low latency requires combining single-photon-level detectors with both electronic logic processing and optical modulation in close proximity. In the technologically relevant telecom wavelength band, detection of photonic quantum states is best performed with high-efficiency, low-noise, and high-speed detectors based on the photon-induced breakdown of superconductivity. Therefore, using these devices for feedforward requires mutual compatibility of all components under cryogenic conditions. Here, we demonstrate low-latency feedforward using a quasi-photon-number-resolved measurement on a quantum light source. Specifically, we use a multipixel superconducting nanowire single-photon detector, amplifier, logic, and an integrated electro-optic modulator in situ below 4K. We modulate the signal mode of a spontaneous parametric down-conversion source, conditional on a photon-number measurement of the idler mode, with a total latency of (23+/-3)ns. The photon-number discrimination actively manipulates the signal mode photon statistics, which is itself a central component in photonic quantum computing reliant on heralded single-photon sources. This represents an important benchmark for the fastest quantum photonic feedforward experiments comprising measurement, amplification, logic and modulation. This has direct applications in quantum computing, communication, and simulation protocols.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# LASSOによるテスト駆動ソフトウェア実験: LLMベンチマークの例 Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example ( http://arxiv.org/abs/2410.08911v1 ) ライセンス: Link先を確認	Marcus Kessel,	(参考訳) テスト駆動ソフトウェア実験(TDSE)の迅速な開発と実行のための標準化されたツールの欠如 — すなわち、ソフトウェア主題の実行と“事実上の”実行時の動作の観察と分析を含む実験だ。本稿では,TDSEを行うための最小限のドメイン固有言語とデータ構造を提供するLASSOという汎用解析プラットフォームを提案する。 TDSEの設計と実行を実行可能なスクリプト言語でユーザに与えることで、LASSOは静的に決定されたプロパティに加えて、実行時のセマンティクスと実行特性の効率的な評価を可能にする。本稿では、自己完結型、再利用可能な、拡張可能な研究用スクリプトを用いて、コード生成のためのLCMの信頼性を評価するためのLASSOのスクリプト機能の実例を示す。 LASSO プラットフォームは https://softwareobservatorium.github.io/ で無料で利用可能であり、YouTube では https://youtu.be/tzY9oNTWXzw でデモビデオが公開されている。 Empirical software engineering faces a critical gap: the lack of standardized tools for rapid development and execution of Test-Driven Software Experiments (TDSEs) - that is, experiments that involve the execution of software subjects and the observation and analysis of their "de facto" run-time behavior. In this paper we present a general-purpose analysis platform called LASSO that provides a minimal set of domain-specific languages and data structures to conduct TDSEs. By empowering users with an executable scripting language to design and execute TDSEs, LASSO enables efficient evaluation of run-time semantics and execution characteristics in addition to statically determined properties. We present an example TDSE that demonstrates the practical benefits of LASSO's scripting capabilities for assessing the reliability of LLMs for code generation by means of a self-contained, reusable and extensible study script. The LASSO platform is freely available at: https://softwareobservatorium.github.io/, and a demo video is available on YouTube: https://youtu.be/tzY9oNTWXzw	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# 非局所アレン・カーンおよびカーン・ヒリヤード位相場モデルのエンド・ツー・エンド深層学習法 An End-to-End Deep Learning Method for Solving Nonlocal Allen-Cahn and Cahn-Hilliard Phase-Field Models ( http://arxiv.org/abs/2410.08914v1 ) ライセンス: Link先を確認	Yuwei Geng, Olena Burkovska, Lili Ju, Guannan Zhang, Max Gunzburger,	(参考訳) 本研究では,非局所的なアレン・カーン(AC)およびカーン・ヒリアード(CH)位相場モデルの効率的なエンドツーエンドディープラーニング手法を提案する。この試みの動機の1つは、離散化された偏微分方程式に基づくACまたはCH位相場モデルが位相間の拡散界面をもたらすという事実から導かれる。本研究では、非質量保存型非局所ACまたはCH相場モデルについて、正規性、対数性、障害物二重井戸ポテンシャルについて紹介する。非局所性のため、これらのモデルのいくつかは位相を分離する完全にシャープなインターフェースを特徴としている。このようなモデルの離散化は、単一の格子セル幅しか持たない位相間の遷移につながる可能性がある。もう1つの動機は、離散化された非局所位相場モデルを解くための高コストを改善するためにディープラーニングアプローチを使用することである。この目的のために、カスタマイズされたニューラルネットワークの損失関数は、ACまたはCHモデルの完全離散近似の残余を用いて定義され、これはフーリエコロケーション法と時間半単純近似を適用した結果である。モデル内の長距離相互作用に対処するために、ニューラルネットワークモデルへの入力チャネルとして非局所カーネルを組み込むことにより、ニューラルネットワークのアーキテクチャを調整する。次に, 提案手法の精度, 構造保存特性, 予測能力, コスト削減について検討した。 We propose an efficient end-to-end deep learning method for solving nonlocal Allen-Cahn (AC) and Cahn-Hilliard (CH) phase-field models. One motivation for this effort emanates from the fact that discretized partial differential equation-based AC or CH phase-field models result in diffuse interfaces between phases, with the only recourse for remediation is to severely refine the spatial grids in the vicinity of the true moving sharp interface whose width is determined by a grid-independent parameter that is substantially larger than the local grid size. In this work, we introduce non-mass conserving nonlocal AC or CH phase-field models with regular, logarithmic, or obstacle double-well potentials. Because of non-locality, some of these models feature totally sharp interfaces separating phases. The discretization of such models can lead to a transition between phases whose width is only a single grid cell wide. Another motivation is to use deep learning approaches to ameliorate the otherwise high cost of solving discretized nonlocal phase-field models. To this end, loss functions of the customized neural networks are defined using the residual of the fully discrete approximations of the AC or CH models, which results from applying a Fourier collocation method and a temporal semi-implicit approximation. To address the long-range interactions in the models, we tailor the architecture of the neural network by incorporating a nonlocal kernel as an input channel to the neural network model. We then provide the results of extensive computational experiments to illustrate the accuracy, structure-preserving properties, predictive capabilities, and cost reductions of the proposed method.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# AutoPersuade: 説得力のある問題を評価し説明するためのフレームワーク AutoPersuade: A Framework for Evaluating and Explaining Persuasive Arguments ( http://arxiv.org/abs/2410.08917v1 ) ライセンス: Link先を確認	Till Raphael Saenger, Musashi Hinck, Justin Grimmer, Brandon M. Stewart,	(参考訳) 我々は、説得力のあるメッセージを構築するための3つのフレームワークであるAutoPersuadeを紹介した。まず、人間の評価で議論の大規模なデータセットをキュレートする。次に、説得性に影響を与える議論の特徴を特定するための新しいトピックモデルを開発する。最後に、このモデルを用いて、新しい議論の有効性を予測し、異なるコンポーネントの因果的影響を評価して説明する。我々は,動物愛護論の実験的研究を通じてAutoPersuadeを検証し,その効果を人間の研究とアウト・オブ・サンプル予測で実証した。 We introduce AutoPersuade, a three-part framework for constructing persuasive messages. First, we curate a large dataset of arguments with human evaluations. Next, we develop a novel topic model to identify argument features that influence persuasiveness. Finally, we use this model to predict the effectiveness of new arguments and assess the causal impact of different components to provide explanations. We validate AutoPersuade through an experimental study on arguments for veganism, demonstrating its effectiveness with human studies and out-of-sample predictions.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# AIのためのウィキメディアデータ:NLPタスクとAI支援編集のためのウィキメディアデータセットのレビュー Wikimedia data for AI: a review of Wikimedia datasets for NLP tasks and AI-assisted editing ( http://arxiv.org/abs/2410.08918v1 ) ライセンス: Link先を確認	Isaac Johnson, Lucie-Aimée Kaffee, Miriam Redi,	(参考訳) ウィキメディアコンテンツはAIコミュニティ、特に言語モデリングコミュニティで広く利用されている。本稿では, Wikimediaデータを学習前, 訓練後, モデル評価において, NLPタスクで活用するためのさまざまな方法について概説する。我々は、ウィキメディアコンテンツをもっと活用する機会を指摘するとともに、言語モデリングコミュニティがウィキメディアエディターのニーズをもっと集中させる方法を特定する。特に、ウィキメディアデータの追加ソースの導入、ウィキメディアの原則をエンコードするLLMのベンチマークの強化、ウィキメディア由来のデータセットにおける多言語主義の強化が求められます。 Wikimedia content is used extensively by the AI community and within the language modeling community in particular. In this paper, we provide a review of the different ways in which Wikimedia data is curated to use in NLP tasks across pre-training, post-training, and model evaluations. We point to opportunities for greater use of Wikimedia content but also identify ways in which the language modeling community could better center the needs of Wikimedia editors. In particular, we call for incorporating additional sources of Wikimedia data, a greater focus on benchmarks for LLMs that encode Wikimedia principles, and greater multilingualism in Wikimedia-derived datasets.	翻訳日:2024-10-30 21:16:19 公開日:2024-10-11
# CNNの高パラメータ・コンパタンス評価 Efficient Hyperparameter Importance Assessment for CNNs ( http://arxiv.org/abs/2410.08920v1 ) ライセンス: Link先を確認	Ruinan Wang, Ian Nabney, Mohammad Golbabaee,	(参考訳) ハイパーパラメータの選択は機械学習パイプラインの重要な側面であり、モデルの堅牢性、安定性、一般化能力に大きな影響を与える。ニューラルネットワークに関連する複雑なハイパーパラメータ空間と計算資源と時間の制約を考えると、すべてのハイパーパラメータを最適化するのは現実的ではない。この文脈では、ハイパーパラメータ重要度評価(HIA)を活用することで、探索空間を狭めることで、貴重なガイダンスを提供することができる。これにより、機械学習の実践者は、時間とリソースを保持しながら、モデルパフォーマンスに最も大きな影響を与えながら、ハイパーパラメータに最適化作業に集中することができる。本稿では, 畳み込みニューラルネットワーク(CNN)におけるハイパーパラメータの重み付けを, N-RReliefFアルゴリズムを用いて定量化し, ディープラーニング分野におけるHIA手法の適用の基礎となる。 10の一般的な画像分類データセットから1万以上のCNNモデルをトレーニングし、ハイパーパラメータ設定インスタンスとそれに対応するパフォーマンスメトリクスを含む包括的なデータセットを取得することで、広範な研究を行う。調査対象のハイパーパラメータのうち,CNNモデルの上位5つの重要なハイパーパラメータは,畳み込み層数,学習速度,ドロップアウト率,オプティマイザ数,エポック数である。 Hyperparameter selection is an essential aspect of the machine learning pipeline, profoundly impacting models' robustness, stability, and generalization capabilities. Given the complex hyperparameter spaces associated with Neural Networks and the constraints of computational resources and time, optimizing all hyperparameters becomes impractical. In this context, leveraging hyperparameter importance assessment (HIA) can provide valuable guidance by narrowing down the search space. This enables machine learning practitioners to focus their optimization efforts on the hyperparameters with the most significant impact on model performance while conserving time and resources. This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF, laying the groundwork for applying HIA methodologies in the Deep Learning field. We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets, thereby acquiring a comprehensive dataset containing hyperparameter configuration instances and their corresponding performance metrics. It is demonstrated that among the investigated hyperparameters, the top five important hyperparameters of the CNN model are the number of convolutional layers, learning rate, dropout rate, optimizer and epoch.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# 強化学習のためのAI生成コードを用いた認知エンゲージメント手法の設計空間の探索 Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning ( http://arxiv.org/abs/2410.08922v1 ) ライセンス: Link先を確認	Majeed Kazemitabaar, Oliver Huang, Sangho Suh, Austin Z. Henley, Tovi Grossman,	(参考訳) 初心者プログラマは、プログラミングの概念を学ぶためのコードを生成するために、Large Language Models (LLMs) に依存している。しかし、この相互作用は表面的なエンゲージメントをもたらし、学習者に学習の錯覚を与え、スキル開発を妨げる。この問題に対処するために,我々は,AI生成コードとのより深いエンゲージメントを促進することを目的とした,7つの認知エンゲージメント技術を開発するために,体系的な設計調査を行った。本稿では, 設計過程, 初期7つの手法, および, 対象間実験(N=82)の結果について述べる。その後,トップテクニックを反復的に洗練し,対象内(N=42。我々は,各手法が導入する摩擦,学習者がAIを介さずに同型タスクに概念を適用することを支援すること,学習者の認識と実際のコーディング能力の整合化の成功を評価する。最終的に、私たちの結果は最も効果的なテクニックを強調します。ステップバイステップの問題解決プロセスを通じて学習者を誘導し、AIと対話的な対話を行い、対応するコードが明らかにされる前に各段階で何をする必要があるかを促す。 Novice programmers are increasingly relying on Large Language Models (LLMs) to generate code for learning programming concepts. However, this interaction can lead to superficial engagement, giving learners an illusion of learning and hindering skill development. To address this issue, we conducted a systematic design exploration to develop seven cognitive engagement techniques aimed at promoting deeper engagement with AI-generated code. In this paper, we describe our design process, the initial seven techniques and results from a between-subjects study (N=82). We then iteratively refined the top techniques and further evaluated them through a within-subjects study (N=42). We evaluate the friction each technique introduces, their effectiveness in helping learners apply concepts to isomorphic tasks without AI assistance, and their success in aligning learners' perceived and actual coding abilities. Ultimately, our results highlight the most effective technique: guiding learners through the step-by-step problem-solving process, where they engage in an interactive dialog with the AI, prompting what needs to be done at each stage before the corresponding code is revealed.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# 補間と推論の改善のためのパス最小化潜在ODE Path-minimizing Latent ODEs for improved extrapolation and inference ( http://arxiv.org/abs/2410.08923v1 ) ライセンス: Link先を確認	Matt L. Sampson, Peter Melchior,	(参考訳) 潜在ODEモデルは動的システムの柔軟な記述を提供するが、外挿と複雑な非線形力学の予測に苦労することがある。潜在ODEアプローチは、未知のシステムパラメータと初期条件を特定するエンコーダを暗黙的に頼りにしている。この二分法は、時間に依存しない潜在表現を奨励することで利用することができる。遅延空間における一般的な変分ペナルティを、各システムのパス長の$\ell_2$ペナルティに置き換えることで、モデルは異なる構成のシステムのペナルティと容易に区別できるデータ表現を学ぶ。 GRU, RNN, LSTMエンコーダ/デコーダのベースラインODEモデルと比較して, より高速なトレーニング, より小さなモデル, より正確な補間, 長時間の補間を行う。また,Lotka-Volterraパラメータと初期条件のシミュレーションに基づく推論において,条件付き正規化フローのデータ要約として潜伏子を用いて優れた結果を示す。トレーニング損失の変化は、デコーダが使用する特定の認識ネットワークに依存しないため、他の潜在ODEモデルにも容易に適用できる。 Latent ODE models provide flexible descriptions of dynamic systems, but they can struggle with extrapolation and predicting complicated non-linear dynamics. The latent ODE approach implicitly relies on encoders to identify unknown system parameters and initial conditions, whereas the evaluation times are known and directly provided to the ODE solver. This dichotomy can be exploited by encouraging time-independent latent representations. By replacing the common variational penalty in latent space with an $\ell_2$ penalty on the path length of each system, the models learn data representations that can easily be distinguished from those of systems with different configurations. This results in faster training, smaller models, more accurate interpolation and long-time extrapolation compared to the baseline ODE models with GRU, RNN, and LSTM encoder/decoders on tests with damped harmonic oscillator, self-gravitating fluid, and predator-prey systems. We also demonstrate superior results for simulation-based inference of the Lotka-Volterra parameters and initial conditions by using the latents as data summaries for a conditional normalizing flow. Our change to the training loss is agnostic to the specific recognition network used by the decoder and can therefore easily be adopted by other latent ODE models.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# DiffPO:潜在的な結果の学習分布のための因果拡散モデル DiffPO: A causal diffusion model for learning distributions of potential outcomes ( http://arxiv.org/abs/2410.08924v1 ) ライセンス: Link先を確認	Yuchen Ma, Valentyn Melnychuk, Jonas Schweisthal, Stefan Feuerriegel,	(参考訳) 医療における意思決定には、観察データからの介入の潜在的な結果を予測することが不可欠であるが、因果推論の根本的な問題のため、課題は困難である。既存の手法は、不確実な定量化を伴わない潜在的な結果の見積もりに限られているため、潜在的な結果の分布に関する完全な情報は無視されるのが一般的である。本稿では,DiffPOと呼ばれる新しい因果拡散モデルを提案する。 DiffPOでは, 条件付き偏微分モデルを用いて複雑な分布を学習し, 新たな直交拡散損失による選択バイアスに対処する。我々のDiffPO法のもうひとつの強みは、非常に柔軟なことだ(例えば、CATEのような様々な因果量も推定できる)。様々な実験において,本手法が最先端性能を実現することを示す。 Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# HyperPg -- 解釈可能なディープラーニングのためのハイパースフィア上の原型ガウス HyperPg -- Prototypical Gaussians on the Hypersphere for Interpretable Deep Learning ( http://arxiv.org/abs/2410.08925v1 ) ライセンス: Link先を確認	Maximilian Xiling Li, Korbinian Franz Rudolf, Nils Blank, Rudolf Lioutikov,	(参考訳) 原型学習法はブラックボックス深層学習モデルの解釈可能な代替手段を提供する。 ProtoPNetのようなアプローチは、テストイメージのどの部分が、トレーニングイメージから既知の原型部品を"見える"かを学び、予測力とケースベースの推論の固有の解釈可能性を組み合わせる。しかし、既存のアプローチには2つの主な欠点がある: (A) 統計的信頼性のない決定論的類似性スコアのみに依存する。 B)プロトタイプは人間の入力なしにブラックボックスで学習される。この研究は、潜在空間における超球面上のガウス分布を利用した新しいプロトタイプ表現であるHyperPgを導入し、平均と分散を学習可能とした。 HyperPgプロトタイプは潜在空間におけるクラスタの拡散に適応し、出力可能性スコアを出力する。新しいアーキテクチャであるHyperPgNetは、HyperPgを活用して、ピクセルレベルのアノテーションから人間のコンセプトに沿ったプロトタイプを学ぶ。その結果、各プロトタイプは、色、画像テクスチャ、または画像対象の一部といった特定の概念を表現している。基礎モデル上に構築された概念抽出パイプラインは、ピクセルレベルのアノテーションを提供し、人間のラベル付けの労力を大幅に削減する。 CUB-200-2011とStanford Carsデータセットの実験では、HyperPgNetは他のプロトタイプ学習アーキテクチャよりも優れており、パラメータやトレーニングステップが少ないことが示されている。さらに、概念に準拠したHyperPgプロトタイプは透過的に学習され、モデルの解釈性が向上する。 Prototype Learning methods provide an interpretable alternative to black-box deep learning models. Approaches such as ProtoPNet learn, which part of a test image "look like" known prototypical parts from training images, combining predictive power with the inherent interpretability of case-based reasoning. However, existing approaches have two main drawbacks: A) They rely solely on deterministic similarity scores without statistical confidence. B) The prototypes are learned in a black-box manner without human input. This work introduces HyperPg, a new prototype representation leveraging Gaussian distributions on a hypersphere in latent space, with learnable mean and variance. HyperPg prototypes adapt to the spread of clusters in the latent space and output likelihood scores. The new architecture, HyperPgNet, leverages HyperPg to learn prototypes aligned with human concepts from pixel-level annotations. Consequently, each prototype represents a specific concept such as color, image texture, or part of the image subject. A concept extraction pipeline built on foundation models provides pixel-level annotations, significantly reducing human labeling effort. Experiments on CUB-200-2011 and Stanford Cars datasets demonstrate that HyperPgNet outperforms other prototype learning architectures while using fewer parameters and training steps. Additionally, the concept-aligned HyperPg prototypes are learned transparently, enhancing model interpretability.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# SAM 2によるゼロショット・プルパイル・セグメンテーション:1400万枚以上の画像のケーススタディ Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images ( http://arxiv.org/abs/2410.08926v1 ) ライセンス: Link先を確認	Virmarie Maquiling, Sean Anthony Byrne, Diederick C. Niehorster, Marco Carminati, Enkelejda Kasneci,	(参考訳) 本稿では、視線推定と視線追跡技術の進歩において、視基盤モデルSAM 2の変換可能性について検討する。アノテーションの時間を大幅に短縮し、デプロイの容易さを通じて技術的な障壁を減らし、セグメンテーションの精度を高めることにより、SAM 2は研究者や実践者が直面する重要な課題に対処する。ゼロショットセグメンテーション機能を利用すると、ユーザ入力が最小限で、ビデオ毎のワンクリックで、仮想現実(virtual reality)セットアップや、ウェアラブルアイトラッカーを使用して記録された世界最大の統合データセットを含む、さまざまなデータセットから1400万以上のアイイメージに対してSAM 2をテストしました。注目すべきは、瞳孔分割タスクにおいてSAM 2は、目の画像のみに基づいて訓練されたドメイン固有モデルのパフォーマンスと一致し、微調整なしで、最大93%の得点を達成していることである。さらに、これらの広く使われているデータセットに対して、コードとセグメンテーションマスクを提供し、さらなる研究を促進する。 We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation and eye tracking technologies. By significantly reducing annotation time, lowering technical barriers through its ease of deployment, and enhancing segmentation accuracy, SAM 2 addresses critical challenges faced by researchers and practitioners. Utilizing its zero-shot segmentation capabilities with minimal user input-a single click per video-we tested SAM 2 on over 14 million eye images from diverse datasets, including virtual reality setups and the world's largest unified dataset recorded using wearable eye trackers. Remarkably, in pupil segmentation tasks, SAM 2 matches the performance of domain-specific models trained solely on eye images, achieving competitive mean Intersection over Union (mIoU) scores of up to 93% without fine-tuning. Additionally, we provide our code and segmentation masks for these widely used datasets to promote further research.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# 詩と映像条件編集によるテキスト・ツー・モーションモデルにおける動き変化の促進 Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing ( http://arxiv.org/abs/2410.08931v1 ) ライセンス: Link先を確認	Clayton Leite, Yu Xiao,	(参考訳) テキスト記述から人間のポーズのシーケンスを生成するテキスト・ツー・モーションモデルが注目されている。しかし、データ不足のため、これらのモデルが生成できる動きの範囲はまだ限られている。例えば、現在のテキスト・トゥ・モーションモデルでは、トレーニングデータには武道のキックしか含まれていないため、足の甲板でサッカーを蹴る動きは生じない。本稿では,既存の基本動作を修正するための条件として,短いビデオクリップや画像を使用する新しい手法を提案する。このアプローチでは、モデルのキックに対する理解が先行として機能し、フットボールキックのビデオやイメージが後部として機能し、所望の動作の生成を可能にする。これらの追加モダリティを条件として組み込むことで、本手法は、テキストモーションデータセットの制限を克服し、トレーニングセットに存在しない動作を生成することができる。 26名の被験者によるユーザスタディでは、歩行、ランニング、しゃがみ、蹴りといったテキスト・モーション・データセット(例えば、HumanML3D)で一般的に表現される動きに匹敵するリアルな動きを、我々のアプローチが生み出すことを示した。 Text-to-motion models that generate sequences of human poses from textual descriptions are garnering significant attention. However, due to data scarcity, the range of motions these models can produce is still limited. For instance, current text-to-motion models cannot generate a motion of kicking a football with the instep of the foot, since the training data only includes martial arts kicks. We propose a novel method that uses short video clips or images as conditions to modify existing basic motions. In this approach, the model's understanding of a kick serves as the prior, while the video or image of a football kick acts as the posterior, enabling the generation of the desired motion. By incorporating these additional modalities as conditions, our method can create motions not present in the training set, overcoming the limitations of text-motion datasets. A user study with 26 participants demonstrated that our approach produces unseen motions with realism comparable to commonly represented motions in text-motion datasets (e.g., HumanML3D), such as walking, running, squatting, and kicking.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# ゼロレート通信制約下における分散量子仮説テスト Distributed Quantum Hypothesis Testing under Zero-rate Communication Constraints ( http://arxiv.org/abs/2410.08937v1 ) ライセンス: Link先を確認	Sreejith Sreekumar, Christoph Hirche, Hao-Chung Cheng, Mario Berta,	(参考訳) 量子仮説テストにおけるエラー確率間のトレードオフは、現在では集中的な設定でよく理解されているが、分散設定ではあまり知られていない。本稿では,2つのリモートパーティ間で共有される二部量子状態を推定する分散二分仮説テスト問題について検討する。一方のパーティは0レートでテスターに古典的情報を伝達する(他方のパーティは0レート以上で古典的または量子的情報を伝達する)。我々の主な貢献として、代案の下の状態が積であるとき、この問題の指数に対して効率よく計算可能なシングルレター式を導出する。一般の場合、スタイン指数は正規化相対エントロピーの最大値最適化を含むマルチレター式によって与えられることを示す。これは、完全に古典的な場合のシングルレターとなるが、古典的量子状態の一般の場合と同様の方法では既に行われていないことをさらに証明する。結果の逆方向を証明するための鍵となるツールとして、独立した関心を持つかもしれない爆発性レムマの量子バージョンを開発する。 The trade-offs between error probabilities in quantum hypothesis testing are by now well-understood in the centralized setting, but much less is known for distributed settings. Here, we study a distributed binary hypothesis testing problem to infer a bipartite quantum state shared between two remote parties, where one of these parties communicates classical information to the tester at zero-rate (while the other party communicates classical or quantum information to the tester at zero-rate or higher). As our main contribution, we derive an efficiently computable single-letter formula for the Stein's exponent of this problem, when the state under the alternative is product. For the general case, we show that the Stein's exponent is given by a multi-letter expression involving max-min optimization of regularized measured relative entropy. While this becomes single-letter for the fully classical case, we further prove that this already does not happen in the same way for classical-quantum states in general. As a key tool for proving the converse direction of our results, we develop a quantum version of the blowing-up lemma which may be of independent interest.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# KinDEL:KinaseインヒビターのためのDNAエンコードライブラリデータセット KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors ( http://arxiv.org/abs/2410.08938v1 ) ライセンス: Link先を確認	Benson Chen, Tomasz Danel, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Gabriel H. S. Dreiman, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts,	(参考訳) DNAエンコードライブラリ(英: DNA-Encoded Libraries、DEL)は、様々な化学空間を効率よく特徴付けるための小さな分子ライブラリーである。 DELを用いた選択実験は、薬物発見の取り組みにおいて重要なものであり、ヒット発見のための高スループットスクリーンを可能にする。しかし、パブリックなDELデータセットの可用性は限られており、そのようなデータを処理するように設計された計算技術の進歩を妨げる。このギャップを埋めるために、KinDELは、Mitogen-Activated Protein Kinase 14 (MAPK14) とDisdisoidin Domain Receptor Tyrosine Kinase 1 (DDR1) の2つのキナーゼ上で、初めて公開されたDELデータセットの1つである。このデータモダリティへの関心は、特定の分子構造の周りに密集してサンプリングする、広範囲に監督された化学データを生成する能力によって増大している。このようなデータの1つの応用を実証し、異なる機械学習手法をベンチマークし、ヒット同定のための予測モデルを開発する。最後に、生物物理学的なアッセイデータ(オン・アンド・オフ・DNA)を提供し、より小さな分子のサブセットでモデルを検証する。ベンチマークのデータとコードは、https://github.com/insitro/kindel.comで参照できます。 DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at: https://github.com/insitro/kindel.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# クロスエフェクトと行列分解モデルの結合による線形コスト非バイアス後推定 Linear-cost unbiased posterior estimates for crossed effects and matrix factorization models via couplings ( http://arxiv.org/abs/2410.08939v1 ) ライセンス: Link先を確認	Paolo Maria Ceriani, Giacomo Zanella,	(参考訳) 我々は、ブロックされたギブスサンプリング器(BGS)の結合に基づいて、マルコフ連鎖モンテカルロ(MCMC)スキームを設計、解析し、その総計算コストはパラメータやデータポイントの数と線形にスケールする。本手法は,条件付き独立ブロックを持つ高次元BGSに対して設計され,適用可能である。ガウス目標に対する合理化に要する反復回数の有界性は,BGS計画の緩和時間と対数係数とを一致させる,実用的な2段階結合戦略が合理化時間を達成することを示唆している。提案手法の実用的妥当性を説明するため,高次元交叉ランダム効果と確率行列分解モデルに適用し,収束速度を向上した新しいBGS手法を開発した。我々の手法は線形コスト(通常は数千のパラメータを持つ問題に対して数回のBGS反復しか必要としない)でバイアスのない後続推定を提供し、それらのモデルの頻繁な推定とベイズ的推定の両方に適合する。 We design and analyze unbiased Markov chain Monte Carlo (MCMC) schemes based on couplings of blocked Gibbs samplers (BGSs), whose total computational costs scale linearly with the number of parameters and data points. Our methodology is designed for and applicable to high-dimensional BGS with conditionally independent blocks, which are often encountered in Bayesian modeling. We provide bounds on the expected number of iterations needed for coalescence for Gaussian targets, which imply that practical two-step coupling strategies achieve coalescence times that match the relaxation times of the original BGS scheme up to a logarithmic factor. To illustrate the practical relevance of our methodology, we apply it to high-dimensional crossed random effect and probabilistic matrix factorization models, for which we develop a novel BGS scheme with improved convergence speed. Our methodology provides unbiased posterior estimates at linear cost (usually requiring only a few BGS iterations for problems with thousands of parameters), matching state-of-the-art procedures for both frequentist and Bayesian estimation of those models.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# 協調光-物質相互作用のための工学的双極子-双極子カップリング Engineering dipole-dipole couplings for enhanced cooperative light-matter interactions ( http://arxiv.org/abs/2410.08940v1 ) ライセンス: Link先を確認	Adam Burgess, Madeline C. Waller, Erik M. Gauger, Robert Bennett,	(参考訳) 協調的な光学効果は分子双極子間の相互作用によって有効化され、制御される。ここでは、単純な幾何的配座の可能性を超越して、平行双極子の環の中に金属球が配置され、どのようにして、環系内で「ガイド・スライディング」状態を生成する効果的なハミルトニアンを生成するかを示す。これにより、ノイズの多い室温環境での定常的な超吸収が可能となり、従来の設計よりもはるかに簡単に実装できる。この例で示すように、我々のアプローチは、超吸収系を超えて、多数の量子エネルギー輸送系に広がる分子構造における協調的な光物質効果を調整するための強力な設計パラダイムである。 Cooperative optical effects are enabled and controlled by interactions between molecular dipoles, meaning that their mutual orientation is of paramount importance to, for example, superabsorbing light-harvesting antennas. Here we show how to move beyond the possibilities of simple geometric tailoring, demonstrating how a metallic sphere placed within a ring of parallel dipoles engineers an effective Hamiltonian that generates "guide-sliding" states within the ring system. This allows steady-state superabsorption in noisy room temperature environments, outperforming previous designs while being significantly simpler to implement. As exemplified by this showcase, our approach represents a powerful design paradigm for tailoring cooperative light-matter effects in molecular structures that extends beyond superabsorbing systems, to a huge array of quantum energy transport systems.	翻訳日:2024-10-30 21:06:06 公開日:2024-10-11
# MeshGS: 高品質レンダリングのための適応型メッシュアライメントガウシアンスプレイティング MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering ( http://arxiv.org/abs/2410.08941v1 ) ライセンス: Link先を確認	Jaehoon Choi, Yonghan Lee, Hyungtae Lee, Heesung Kwon, Dinesh Manocha,	(参考訳) 近年,3Dガウススプラッティングは高忠実度レンダリング結果を生成する能力に注目されている。同時に、ゲーム、アニメーション、AR/VRといったほとんどのアプリケーションは、メッシュベースの表現を使用して3Dシーンを表現および描画する。本稿では,メッシュ表現を3次元ガウススプラットと統合し,再現された現実世界のシーンの高品質なレンダリングを実現する手法を提案する。特に、距離に基づくガウススプラッティング手法を導入し、メッシュ表面とガウススプラットを整列させ、レンダリングに寄与しない冗長ガウススプラットを除去する。それぞれのガウススプラットとメッシュ表面の間の距離を,強結合と緩結合ガウススプラットの区別として検討する。タイトなバウンドのスプレートは平らで、メッシュ形状とよく整合している。ゆるやかにバウンドしたガウススプレートは、レンダリングの観点から再構成された3Dメッシュのアーティファクトを説明するために使用される。メッシュ幾何学にガウススプラットを結合する学習戦略を提案し,両タイプのスプラットを考慮に入れた。この文脈では,トレーニング過程において,厳密な境界を持つガウススプラットとメッシュ表面を正確に整合させることを目的とした,いくつかの正規化手法を導入する。我々は,mip-NeRF 360とDeep Blendingデータセットを用いた大規模・非有界シーンにおける本手法の有効性を検証した。提案手法は、2dB以上のPSNRを達成し、特に屋外のmip-NeRF 360データセットにおいて、メッシュベースのガウススプラッティング法を1.3dBPSNRで上回り、より優れたレンダリング品質を示す。本研究では,ガウススプラットの種類別に分析を行い,元の3次元ガウススプラッティングと比較してガウススプラットの数を30%削減する。 Recently, 3D Gaussian splatting has gained attention for its capability to generate high-fidelity rendering results. At the same time, most applications such as games, animation, and AR/VR use mesh-based representations to represent and render 3D scenes. We propose a novel approach that integrates mesh representation with 3D Gaussian splats to perform high-quality rendering of reconstructed real-world scenes. In particular, we introduce a distance-based Gaussian splatting technique to align the Gaussian splats with the mesh surface and remove redundant Gaussian splats that do not contribute to the rendering. We consider the distance between each Gaussian splat and the mesh surface to distinguish between tightly-bound and loosely-bound Gaussian splats. The tightly-bound splats are flattened and aligned well with the mesh geometry. The loosely-bound Gaussian splats are used to account for the artifacts in reconstructed 3D meshes in terms of rendering. We present a training strategy of binding Gaussian splats to the mesh geometry, and take into account both types of splats. In this context, we introduce several regularization techniques aimed at precisely aligning tightly-bound Gaussian splats with the mesh surface during the training process. We validate the effectiveness of our method on large and unbounded scene from mip-NeRF 360 and Deep Blending datasets. Our method surpasses recent mesh-based neural rendering techniques by achieving a 2dB higher PSNR, and outperforms mesh-based Gaussian splatting methods by 1.3 dB PSNR, particularly on the outdoor mip-NeRF 360 dataset, demonstrating better rendering quality. We provide analyses for each type of Gaussian splat and achieve a reduction in the number of Gaussian splats by 30% compared to the original 3D Gaussian splatting.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 合成データの可能性の最大化:ランダム行列理論からの考察 Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory ( http://arxiv.org/abs/2410.08942v1 ) ライセンス: Link先を確認	Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou, Reda Alami, Ahmed Alzubaidi, Hakim Hacid,	(参考訳) 合成データは大規模な言語モデルのトレーニングに注目されているが、品質の悪いデータはパフォーマンスを損なう可能性がある(例えば、Shumailov et al (2023)、Seddik et al (2024))。潜在的な解決策はデータプルーニングであり、スコア関数(人間または機械のフィードバック)に基づいた高品質のデータのみを保持する。 Feng et al (2024)は、サンプルのサイズが大きくなるにつれて、合成データに基づいて訓練されたモデルを分析した。確率行列理論を用いてこれを拡張し、実データと実データを組み合わせた2値分類器の性能を高次元設定で導出する。本研究は, 生成モデルの品質と検証戦略に焦点をあてて, 合成データにより性能が向上する条件を明らかにした。また, 合成ラベルノイズのスムーズな位相遷移を示すとともに, 無限試料限界における先行シャープな挙動と対比した。おもちゃモデルと大規模言語モデルによる実験は、我々の理論結果を検証する。 Synthetic data has gained attention for training large language models, but poor-quality data can harm performance (see, e.g., Shumailov et al. (2023); Seddik et al. (2024)). A potential solution is data pruning, which retains only high-quality data based on a score function (human or machine feedback). Previous work Feng et al. (2024) analyzed models trained on synthetic data as sample size increases. We extend this by using random matrix theory to derive the performance of a binary classifier trained on a mix of real and pruned synthetic data in a high dimensional setting. Our findings identify conditions where synthetic data could improve performance, focusing on the quality of the generative model and verification strategy. We also show a smooth phase transition in synthetic label noise, contrasting with prior sharp behavior in infinite sample limits. Experiments with toy models and large language models validate our theoretical results.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 並列流域分割:GPUに基づく階層的画像分割 Parallel Watershed Partitioning: GPU-Based Hierarchical Image Segmentation ( http://arxiv.org/abs/2410.08946v1 ) ライセンス: Link先を確認	Varduhi Yeghiazaryan, Yeva Gabrielyan, Irina Voiculescu,	(参考訳) 多くの画像処理アプリケーションは、画像が「類似」の領域に分割することに依存している。流域と滝の変換は数学的形態素クラスタリング技術として確立されている。これらはどちらも、1回に1つのピクセル群が決定される現代のアプリケーションや、隣接情報が関係する現代的なアプリケーションに関係している。我々はGPUのための3つの新しい並列パーティショニングアルゴリズムを導入する。繰り返し、流域アルゴリズムを適用して、入力画像上に分割領域の階層を形成するウォーターフォール結果を生成する。私たちの流域アルゴリズムは、2Dと3Dの両方で競合実行時間を実現し、800メガボクセルの画像を1.4秒未満で処理します。また、この完全決定論的画像分割を機械学習に基づくセマンティックセグメンテーションの事前処理ステップとして利用する方法についても示す。これはスーパーピクセルアルゴリズムの役割を置き換えるもので、結果として同等の精度と高速なトレーニング時間が得られる。 Many image processing applications rely on partitioning an image into disjoint regions whose pixels are 'similar.' The watershed and waterfall transforms are established mathematical morphology pixel clustering techniques. They are both relevant to modern applications where groups of pixels are to be decided upon in one go, or where adjacency information is relevant. We introduce three new parallel partitioning algorithms for GPUs. By repeatedly applying watershed algorithms, we produce waterfall results which form a hierarchy of partition regions over an input image. Our watershed algorithms attain competitive execution times in both 2D and 3D, processing an 800 megavoxel image in less than 1.4 sec. We also show how to use this fully deterministic image partitioning as a pre-processing step to machine learning based semantic segmentation. This replaces the role of superpixel algorithms, and results in comparable accuracy and faster training times.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 都市間リアルタイム評価のためのメタトランスファー学習型時間グラフネットワーク Meta-Transfer Learning Empowered Temporal Graph Networks for Cross-City Real Estate Appraisal ( http://arxiv.org/abs/2410.08947v1 ) ライセンス: Link先を確認	Weijia Zhang, Jindong Han, Hao Liu, Wei Fan, Hao Wang, Hui Xiong,	(参考訳) 不動産評価は不動産取引、投資分析、不動産税など様々な取り組みにおいて重要である。近年,Webプラットフォームからのオンライントランザクションデータを活用することで,不動産評価に大きな期待が寄せられている。それでも、ディープラーニングはデータ収集であり、データしか持たない巨大な小都市には、さほど当てはまらないかもしれない。そこで我々は,MetaTransfer Learning Empowered Temporal Graph Networks (MetaTransfer)を提案する。具体的には,イベントトリガー付きテンポラルグラフネットワークを設計し,進化する不動産取引間の不規則な時空間相関をモデル化する。さらに,都市全体の不動産評価をマルチタスク動的グラフリンクラベル予測問題として定式化し,都市内の各コミュニティの評価額を個別のタスクとみなす。コミュニティ単位の不動産価格分布に対応するために,複数のコミュニティ間の都市内知識共有とタスク固有のパラメータ生成を同時に行うために,ハイパーネットワークベースのマルチタスク学習モジュールを提案する。さらに、複数のソース都市からのトレーニングトランザクションインスタンスを適応的に再重み付けし、負の転送を軽減し、都市間知識伝達の有効性を向上させるトリレベル最適化に基づくメタラーニングフレームワークを提案する。最後に、5つの実世界のデータセットに基づく広範な実験は、11のベースラインアルゴリズムと比較してMetaTransferの顕著な優位性を示している。 Real estate appraisal is important for a variety of endeavors such as real estate deals, investment analysis, and real property taxation. Recently, deep learning has shown great promise for real estate appraisal by harnessing substantial online transaction data from web platforms. Nonetheless, deep learning is data-hungry, and thus it may not be trivially applicable to enormous small cities with limited data. To this end, we propose Meta-Transfer Learning Empowered Temporal Graph Networks (MetaTransfer) to transfer valuable knowledge from multiple data-rich metropolises to the data-scarce city to improve valuation performance. Specifically, by modeling the ever-growing real estate transactions with associated residential communities as a temporal event heterogeneous graph, we first design an Event-Triggered Temporal Graph Network to model the irregular spatiotemporal correlations between evolving real estate transactions. Besides, we formulate the city-wide real estate appraisal as a multi-task dynamic graph link label prediction problem, where the valuation of each community in a city is regarded as an individual task. A Hypernetwork-Based Multi-Task Learning module is proposed to simultaneously facilitate intra-city knowledge sharing between multiple communities and task-specific parameters generation to accommodate the community-wise real estate price distribution. Furthermore, we propose a Tri-Level Optimization Based Meta- Learning framework to adaptively re-weight training transaction instances from multiple source cities to mitigate negative transfer, and thus improve the cross-city knowledge transfer effectiveness. Finally, extensive experiments based on five real-world datasets demonstrate the significant superiority of MetaTransfer compared with eleven baseline algorithms.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# LLM人口における社会的慣習のダイナミクス--自然発生・集団発生・転換点 The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points ( http://arxiv.org/abs/2410.08948v1 ) ライセンス: Link先を確認	Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli,	(参考訳) 社会慣習は社会と経済の基盤である。 AIエージェントの軍団がますます人間と相互作用するにつれて、共通の慣習を形成する能力は、行動のコーディネート、社会への統合、影響をいかに効果的に行うかを決定する。そこで本研究では,大言語モデル (LLM) エージェントの集団内における規則の動態について,シミュレーションによる対話を用いて検討する。まず,LLM間の局所的な相互作用から,グローバルに受け入れられる社会的慣行が自然に生じることを示す。第2に、個々のエージェントが偏りがないように見える場合でも、このプロセス中に集団バイアスがどれだけ強く現れるかを示す。第3に、コミットしたLDMの少数集団が、新しい社会慣行を確立することによって、社会変革を促進する方法について検討する。これらの少数派集団が臨界規模に達すると、確立した行動を常に覆すことができることを示す。いずれの場合も、最小限のマルチエージェントモデルから予測した実験結果とは対照的に、LLMエージェントの特定の役割を分離することができる。我々の結果は、AIシステムが明示的なプログラミングなしに、いかに自律的に規範を開発できるかを明らかにし、人間の価値観や社会的目標と整合したAIシステムの設計に影響を及ぼす。 Social conventions are the foundation for social and economic life. As legions of AI agents increasingly interact with each other and with humans, their ability to form shared conventions will determine how effectively they will coordinate behaviors, integrate into society and influence it. Here, we investigate the dynamics of conventions within populations of Large Language Model (LLM) agents using simulated interactions. First, we show that globally accepted social conventions can spontaneously arise from local interactions between communicating LLMs. Second, we demonstrate how strong collective biases can emerge during this process, even when individual agents appear to be unbiased. Third, we examine how minority groups of committed LLMs can drive social change by establishing new social conventions. We show that once these minority groups reach a critical size, they can consistently overturn established behaviors. In all cases, contrasting the experimental results with predictions from a minimal multi-agent model allows us to isolate the specific role of LLM agents. Our results clarify how AI systems can autonomously develop norms without explicit programming and have implications for designing AI systems that align with human values and societal goals.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 一般化された「スキップ接続」の逆変換性について On the Adversarial Transferability of Generalized "Skip Connections" ( http://arxiv.org/abs/2410.08950v1 ) ライセンス: Link先を確認	Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin,	(参考訳) スキップ接続は、より深く、より強力な現代のディープモデルにとって重要な要素である。通常のシナリオ(自然事例における最先端の分類性能)において大きな成功を収めたにもかかわらず、我々は、逆シナリオ下でのスキップ接続の興味深い特性、すなわち、スキップ接続を使用することにより、高度に転送可能な逆例を容易に生成できるかどうかを調査し、同定する。特に、ResNetのような(スキップ接続を持つ)モデルでは、バックプロパゲーションの際の崩壊係数に従って、残りのモジュールよりもスキップ接続からの勾配を多く使うことで、高い転送性を持つ逆例を作成できることが分かる。上記の方法はSkip Gradient Method (SGM) と呼ばれる。視覚領域におけるResNetのようなモデルから始まったが、私たちはSGMをさらに高度なアーキテクチャに拡張し、ViT(Vision Transformer)や長さの異なるパスやその他のドメインを持つモデル、すなわち自然言語処理へと拡張する。本稿では,ResNet,Transformer,Inceptions,Neural Architecture Search,Large Language Models(LLMs)など,さまざまなモデルに対する包括的なトランスファー攻撃を行う。ほぼすべてのケースにおいて,SGMを利用することで,クラフト攻撃の転送可能性を大幅に向上できることを示す。さらに、実用上の大きな複雑さを考慮すると、SGMはモデルや標的攻撃のアンサンブルの伝達性や、現在の防衛に対するステルスネスを向上できることを示す。最終的に、SGMの動作に関する理論的説明と実証的な洞察を提供する。本研究は, モデル設計のアーキテクチャ特性に関する新たな対立研究の動機となるだけでなく, モデル設計の安全性に関するさらなる課題を提起するものである。私たちのコードはhttps://github.com/mo666666/SGMで公開されています。 Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that using more gradients from the skip connections rather than the residual modules according to a decay factor during backpropagation allows one to craft adversarial examples with high transferability. The above method is termed as Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs) and models with length-varying paths and other domains, i.e. natural language processing. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models (LLMs). We show that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, considering the big complexity for practical use, we further demonstrate that SGM can even improve the transferability on ensembles of models or targeted attacks and the stealthiness against current defenses. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design. Our code is available at https://github.com/mo666666/SGM.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# Chebyshev polynomials による急速グラスマン平均値の検討 Rapid Grassmannian Averaging with Chebyshev Polynomials ( http://arxiv.org/abs/2410.08956v1 ) ライセンス: Link先を確認	Brighton Ancelin, Alex Saad-Falcon, Kason Ancelin, Justin Romberg,	(参考訳) 我々は、グラスマン多様体上の点の集合を集中的および分散的設定の両方で効率的に平均化する新しいアルゴリズムを提案する。グラスマン点は機械学習、コンピュータビジョン、信号処理でユビキタスに使われ、(しばしば低次元の)部分空間を通してデータを表現している。これらの点を平均化することは、多くのタスク(特に非集中的な環境では)にとって重要であるが、既存の手法は、多様体の非ユークリッド幾何学のため、残念ながら計算的に高価である。提案アルゴリズムであるRapid Grassmannian Averaging (RGrAv) とDecentralized Rapid Grassmannian Averaging (DRGrAv) は、この問題のスペクトル構造を利用して、小さな行列乗算とQR因子化のみを用いて、平均を高速に計算する。我々は,最適性の理論的保証と,我々のアルゴリズムが最小時間で高精度な解を提供することで最先端の手法より優れていることを示す数値実験を提供する。追加実験では,ビデオモーションデータに基づくK平均クラスタリング,RGrAvとDRGrAvを汎用的なグラスマン平均化のための強力なツールとして確立した。 We propose new algorithms to efficiently average a collection of points on a Grassmannian manifold in both the centralized and decentralized settings. Grassmannian points are used ubiquitously in machine learning, computer vision, and signal processing to represent data through (often low-dimensional) subspaces. While averaging these points is crucial to many tasks (especially in the decentralized setting), existing methods unfortunately remain computationally expensive due to the non-Euclidean geometry of the manifold. Our proposed algorithms, Rapid Grassmannian Averaging (RGrAv) and Decentralized Rapid Grassmannian Averaging (DRGrAv), overcome this challenge by leveraging the spectral structure of the problem to rapidly compute an average using only small matrix multiplications and QR factorizations. We provide a theoretical guarantee of optimality and present numerical experiments which demonstrate that our algorithms outperform state-of-the-art methods in providing high accuracy solutions in minimal time. Additional experiments showcase the versatility of our algorithms to tasks such as K-means clustering on video motion data, establishing RGrAv and DRGrAv as powerful tools for generic Grassmannian averaging.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 決定のリフテッド係数:高速モデルフリー予測間隔と確率フリーモデル比較 Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison ( http://arxiv.org/abs/2410.08958v1 ) ライセンス: Link先を確認	Daniel Salnikov, Kevin Michalewicz, Dan Leonte,	(参考訳) 本稿では,予測値と観測値の相関が大きくなるにつれて,モデルフリーな予測区間を導出する$\textit{lifted linear model}$を提案する。これらの間隔は$\textit{Lifted Coefficient of determined}$、予測ベース設定における任意の損失関数のモデル比較基準、例えば回帰、分類、カウントを動機付けます。予測区間をより一般的な誤差分布に拡張し、回帰のための高速モデルフリーな外乱検出アルゴリズムを提案する。最後に,この枠組みを数値実験により説明する。 We propose the $\textit{lifted linear model}$, and derive model-free prediction intervals that become tighter as the correlation between predictions and observations increases. These intervals motivate the $\textit{Lifted Coefficient of Determination}$, a model comparison criterion for arbitrary loss functions in prediction-based settings, e.g., regression, classification or counts. We extend the prediction intervals to more general error distributions, and propose a fast model-free outlier detection algorithm for regression. Finally, we illustrate the framework via numerical experiments.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 非IIDデータによるコルモゴロフ・アルノルドネットワークの評価 Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data ( http://arxiv.org/abs/2410.08961v1 ) ライセンス: Link先を確認	Arthur Mendonça Sasse, Claudio Miceli de Farias,	(参考訳) 連邦コルモゴロフ・アルノルドネットワーク(F-KAN)はすでに提案されているが、その評価は初期段階にある。我々は,MNIST分類タスクにおいて,MNIST分類タスクにおいて,100ラウンドのフェデレート学習において,Kans(B-splines と Radial Basis Function をアクティベーション関数として使用)とMulti- Layer Perceptrons(MLP)を比較した。各モデルに対する15の試行の結果、MLPが達成した最高の精度は、Spline-KANによって半分の時間(ラウンド)で達成でき、計算時間はわずかに増加した。 Federated Kolmogorov-Arnold Networks (F-KANs) have already been proposed, but their assessment is at an initial stage. We present a comparison between KANs (using B-splines and Radial Basis Functions as activation functions) and Multi- Layer Perceptrons (MLPs) with a similar number of parameters for 100 rounds of federated learning in the MNIST classification task using non-IID partitions with 100 clients. After 15 trials for each model, we show that the best accuracies achieved by MLPs can be achieved by Spline-KANs in half of the time (in rounds), with just a moderate increase in computing time.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 多言語自己改善のための言語不均衡駆動リワード Language Imbalance Driven Rewarding for Multilingual Self-improving ( http://arxiv.org/abs/2410.08964v1 ) ライセンス: Link先を確認	Wen Yang, Junhong Wu, Chen Wang, Chengqing Zong, Jiajun Zhang,	(参考訳) 大規模言語モデル(LLM)は多くのタスクで最先端のパフォーマンスを達成した。しかし、これらの進歩は英語や中国語のような「第一級」の言語に大きく恩恵を受けており、他の多くの言語が不足している。この不均衡は、より広範なアプリケーションを制限する一方で、言語間の自然な選好ランキングを生成し、自己改善的な方法でLLMの多言語機能をブートストラップする機会を提供する。そこで我々は, LLM内の支配的言語と非支配的言語との間の固有不均衡を報酬信号として活用する$\textit{Language Im Balance Driven Rewarding}$を提案する。反復的なDPO訓練は、このアプローチが非支配言語におけるLLM性能を高めるだけでなく、支配言語の性能も向上し、反復的な報酬信号が得られることを示した。このアプローチの2回にわたる微調整のMeta-Llama-3-8B-インストラクションにより、命令追従タスクと算術推論タスクの多言語パフォーマンスが継続的に改善され、X-AlpacaEvalのリードボードでは平均7.46%、MGSMベンチマークでは13.9%の精度で改善されたことが証明された。この研究は初期の探索として機能し、LLMの多言語自己改善の道を開いた。 Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks. However, these advancements have predominantly benefited "first-class" languages such as English and Chinese, leaving many other languages underrepresented. This imbalance, while limiting broader applications, generates a natural preference ranking between languages, offering an opportunity to bootstrap the multilingual capabilities of LLM in a self-improving manner. Thus, we propose $\textit{Language Imbalance Driven Rewarding}$, where the inherent imbalance between dominant and non-dominant languages within LLMs is leveraged as a reward signal. Iterative DPO training demonstrates that this approach not only enhances LLM performance in non-dominant languages but also improves the dominant language's capacity, thereby yielding an iterative reward signal. Fine-tuning Meta-Llama-3-8B-Instruct over two iterations of this approach results in continuous improvements in multilingual performance across instruction-following and arithmetic reasoning tasks, evidenced by an average improvement of 7.46% win rate on the X-AlpacaEval leaderboard and 13.9% accuracy on the MGSM benchmark. This work serves as an initial exploration, paving the way for multilingual self-improvement of LLMs.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# 制御可能な安全アライメント: 異種安全要件に対する推論時間適応 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements ( http://arxiv.org/abs/2410.08968v1 ) ライセンス: Link先を確認	Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme,	(参考訳) 大規模言語モデル(LLM)の安全性アライメントに関する現在のパラダイムは、ワンサイズフィットのアプローチに従っている。このアプローチは、文化や地域によって異なる社会的規範に直面する柔軟性に欠ける。加えて、ユーザはさまざまな安全ニーズを抱えており、静的安全性基準を持つモデルが有用であるには制限がかかりすぎると同時に、再調整にコストがかかりすぎる。我々は,モデルの再トレーニングを伴わず,多様な安全要件に適応するフレームワークとして,制御可能な安全アライメント(CoSA)を提案する。固定されたモデルを整列する代わりに、システムプロンプトの一部として提供される安全設定(所望の安全行動に関する自由形式の自然言語記述)に従うようにモデルを整列させます。モデル安全性の振る舞いを調整するために、認証されたユーザは、そのような安全設定を推論時にのみ変更する必要がある。これを実現するために,多種多様な安全設定に容易に適応するために,LLMを整列するデータ中心方式であるCoSAlignを提案する。さらに,CoSA-Scoreに要約して,有用性と構成された安全性を両立する新しい制御可能性評価プロトコルを考案し,現実世界のLLMユースケースを多種多様な安全性要件とそれに対応する評価プロンプトで構成した,人為的なベンチマークであるCoSApienを構築した。我々は,CoSAlignがコンテキスト内アライメントを含む強いベースラインに対して,制御可能性を大幅に向上させることを示す。我々の枠組みは, LLMにおける多元的人的価値の表現と適応性を向上し, 実用性を高めることを目的としている。 The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach: the model refuses to interact with any content deemed unsafe by the model provider. This approach lacks flexibility in the face of varying social norms across cultures and regions. In addition, users may have diverse safety needs, making a model with static safety standards too restrictive to be useful, as well as too costly to be re-aligned. We propose Controllable Safety Alignment (CoSA), a framework designed to adapt models to diverse safety requirements without re-training. Instead of aligning a fixed model, we align models to follow safety configs -- free-form natural language descriptions of the desired safety behaviors -- that are provided as part of the system prompt. To adjust model safety behavior, authorized users only need to modify such safety configs at inference time. To enable that, we propose CoSAlign, a data-centric method for aligning LLMs to easily adapt to diverse safety configs. Furthermore, we devise a novel controllability evaluation protocol that considers both helpfulness and configured safety, summarizing them into CoSA-Score, and construct CoSApien, a human-authored benchmark that consists of real-world LLM use cases with diverse safety requirements and corresponding evaluation prompts. We show that CoSAlign leads to substantial gains of controllability over strong baselines including in-context alignment. Our framework encourages better representation and adaptation to pluralistic human values in LLMs, and thereby increasing their practicality.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# スパーストランスアーキテクチャにおけるキーワード検出を用いたグローバルアテンション設計 Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures ( http://arxiv.org/abs/2410.08971v1 ) ライセンス: Link先を確認	Evan Lucas, Dylan Kangas, Timothy C Havens,	(参考訳) 本稿では,一般的なスパーストランスアーキテクチャであるLongformer Encoder-Decoderの拡張を提案する。スパーストランスフォーマーの一般的な課題の1つは、ドキュメントの開始と終了で議論されたトピック間の接続など、長い範囲のコンテキストのエンコーディングに苦労できることである。複数のベンチマークデータセット上での抽象的な要約タスクに対して,グローバルな注目度を選択的に向上する手法を提案する。追加のキーワードで書き起こしをプレフィックスし、これらのキーワードにグローバルな注意を向けることで、いくつかのベンチマークデータセットでゼロショット、少数ショット、微調整ケースの改善が示される。 In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.	翻訳日:2024-10-30 20:56:20 公開日:2024-10-11
# ALVIN: インタポレーションによるアクティブラーニング ALVIN: Active Learning Via INterpolation ( http://arxiv.org/abs/2410.08972v1 ) ライセンス: Link先を確認	Michalis Korakakis, Andreas Vlachos, Adrian Weller,	(参考訳) Active Learningは、ラベルなしデータのプールから最も有用なインスタンスを選択することで、アノテーションの労力を最小限にすることを目的としている。しかし、典型的なアクティブラーニング手法は、職業分類データセットにおいて、特定のクラスで不均等に表現される確率が異なるクラス内の異なるサンプルグループの存在を見落としている。この監視により、モデルは予測のためのショートカット、すなわち、入力属性とよく表現されたグループで発生するラベルの間の急激な相関に依存する。この問題に対処するため,本論文では,表現空間の例群間に位置するアンカーを作成するために,未表現群とよく表現された群間のクラス内補間を行うアクティブラーニングVia Interpolation(ALVIN)を提案する。アノテーションのためにアンカーの近くにあるインスタンスを選択することで、ALVINはショートカットの影響に反する表現空間の領域にモデルを公開する情報的な例を特定する。重要なことに、このモデルはこれらの例を高い確実性と見なしているため、典型的なアクティブラーニング手法によって無視される可能性が高い。感情分析、自然言語推論、パラフレーズ検出を含む6つのデータセットの実験結果から、ALVINは、分布内および分布外の両方において、最先端のアクティブな学習方法より優れていることが示された。 Active Learning aims to minimize annotation effort by selecting the most useful instances from a pool of unlabeled data. However, typical active learning methods overlook the presence of distinct example groups within a class, whose prevalence may vary, e.g., in occupation classification datasets certain demographics are disproportionately represented in specific classes. This oversight causes models to rely on shortcuts for predictions, i.e., spurious correlations between input attributes and labels occurring in well-represented groups. To address this issue, we propose Active Learning Via INterpolation (ALVIN), which conducts intra-class interpolations between examples from under-represented and well-represented groups to create anchors, i.e., artificial points situated between the example groups in the representation space. By selecting instances close to the anchors for annotation, ALVIN identifies informative examples exposing the model to regions of the representation space that counteract the influence of shortcuts. Crucially, since the model considers these examples to be of high certainty, they are likely to be ignored by typical active learning methods. Experimental results on six datasets encompassing sentiment analysis, natural language inference, and paraphrase detection demonstrate that ALVIN outperforms state-of-the-art active learning methods in both in-distribution and out-of-distribution generalization.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# UniGlyph: ユニバーサル言語表現のためのセブンセグメンテーションスクリプト UniGlyph: A Seven-Segment Script for Universal Language Representation ( http://arxiv.org/abs/2410.08974v1 ) ライセンス: Link先を確認	G. V. Bency Sherin, A. Abijesh Euphrine, A. Lenora Moreen, L. Arun Jose,	(参考訳) UniGlyph(ユニグリフ、UniGlyph)は、7つの文字から派生したスクリプトを用いて、普遍的な文字変換システムを構築するために設計された構築言語(言語)である。 UniGlyphの目標は、幅広い音声を表現できるフレキシブルで一貫したスクリプトを提供することによって、言語間のコミュニケーションを促進することである。本稿では,UniGlyphの設計について検討し,そのスクリプト構造,音声マッピング,音訳規則について詳述する。このシステムは、国際音声アルファベット(IPA)および従来の文字集合の欠陥に対処し、言語間での音声の多様性を表現するためのコンパクトで汎用的な方法を提供する。ピッチと長さのマーカーにより、UniGlyphは小さな文字集合を維持しながら正確な音声表現を保証する。 UniGlyphの応用例としては、自然言語処理や多言語音声認識といった人工知能の統合、さまざまな言語間のコミュニケーションの強化などがある。動物の音素音の追加、異なる種に固有のスクリプトを割り当てること、UniGlyphの範囲を人間のコミュニケーションを超えて広げることなど、今後の拡張について論じる。本研究では,言語間コミュニケーション,教育音声学,AI駆動型アプリケーションにおいて,UniGlyphが言語的ギャップを埋める可能性を示す。 UniGlyph is a constructed language (conlang) designed to create a universal transliteration system using a script derived from seven-segment characters. The goal of UniGlyph is to facilitate cross-language communication by offering a flexible and consistent script that can represent a wide range of phonetic sounds. This paper explores the design of UniGlyph, detailing its script structure, phonetic mapping, and transliteration rules. The system addresses imperfections in the International Phonetic Alphabet (IPA) and traditional character sets by providing a compact, versatile method to represent phonetic diversity across languages. With pitch and length markers, UniGlyph ensures accurate phonetic representation while maintaining a small character set. Applications of UniGlyph include artificial intelligence integrations, such as natural language processing and multilingual speech recognition, enhancing communication across different languages. Future expansions are discussed, including the addition of animal phonetic sounds, where unique scripts are assigned to different species, broadening the scope of UniGlyph beyond human communication. This study presents the challenges and solutions in developing such a universal script, demonstrating the potential of UniGlyph to bridge linguistic gaps in cross-language communication, educational phonetics, and AI-driven applications.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# グラフ混合依存下のオンライン-PAC一般化境界 Online-to-PAC generalization bounds under graph-mixing dependencies ( http://arxiv.org/abs/2410.08977v1 ) ライセンス: Link先を確認	Baptiste Abélès, Eugenio Clerico, Gergely Neu,	(参考訳) 統計学習における従来の一般化結果は、独立に描画された例からなるトレーニングデータセットを必要とする。この独立性の仮定を緩和しようとする最近の試みの多くは、純粋に時間的(混合)依存か、非隣接頂点が独立確率変数に対応するグラフ依存かを検討した。どちらのアプローチにも独自の制限があり、前者は時間的順序構造を必要とし、後者は依存性間の強度を定量化する方法がない。この研究では、依存がグラフ距離で崩壊するフレームワークを提案することによって、これらの2つの研究の行を橋渡しする。我々は,集中度を導出し,グラフ構造を取り入れたオンライン学習フレームワークを導入することにより,オンライン-PACフレームワークを活用した一般化バウンダリを導出する。結果として生じる高確率一般化は、混合率とグラフの色数の両方に依存する。 Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing) dependencies, or graph-dependencies, where non-adjacent vertices correspond to independent random variables. Both approaches have their own limitations, the former requiring a temporal ordered structure, and the latter lacking a way to quantify the strength of inter-dependencies. In this work, we bridge these two lines of work by proposing a framework where dependencies decay with graph distance. We derive generalization bounds leveraging the online-to-PAC framework, by deriving a concentration result and introducing an online learning framework incorporating the graph structure. The resulting high-probability generalization guarantees depend on both the mixing rate and the graph's chromatic number.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# 量子ネットワーク構築のためのインターネット原則の活用 Leveraging Internet Principles to Build a Quantum Network ( http://arxiv.org/abs/2410.08980v1 ) ライセンス: Link先を確認	Leonardo Bacciottini, Aparimit Chandra, Matheus Guedes De Andrade, Nitish K. Panigrahy, Shahrooz Pouryousef, Nageswara S. V. Rao, Emily Van Milligen, Gayane Vardoyan, Don Towsley,	(参考訳) 量子インターネットの運用アーキテクチャを設計することは、物理学の法則と技術的な制約によって課される基本的な制約の両方を考慮して難しい課題である。本稿では,量子特化要素のほとんどを抽象化し,パケットスイッチングに基づく量子ネットワークアーキテクチャを定式化する手法を提案する。このようなリフレーミングは、インターネット内で利用可能な多くのツールやプロトコルを利用する機会を提供する。実例として、量子終端ノードと中間ノードがそれぞれ需要と資源利用を効果的に制御するアーキテクチャを含む、古典的な混雑制御とアクティブキュー管理プロトコルを量子ネットワークに適合させ、適応させる。その結果、これらの古典的ネットワーキングツールは、量子メモリのデコヒーレンスとの戦いに効果的に利用でき、エンド・ツー・エンドの忠実度を目標値に維持できることがわかった。 Designing an operational architecture for the Quantum Internet is a challenging task in light of both fundamental limitations imposed by the laws of physics and technological constraints. Here, we propose a method to abstract away most of the quantum-specific elements and formulate a best-effort quantum network architecture based on packet-switching, akin to that of the classical Internet. Such reframing provides an opportunity to exploit the many tools and protocols available and well-understood within the Internet. As an illustration, we tailor and adapt classical congestion control and active queue management protocols to quantum networks, comprising an architecture wherein quantum end- and intermediate nodes effectively regulate demand and resource utilization, respectively. Results show that these classical networking tools can be effectively used to combat quantum memory decoherence and keep end-to-end fidelity around a target value.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# DEL:ニューラルレンダリングによる3次元粒子動力学学習用離散要素学習器 DEL: Discrete Element Learner for Learning 3D Particle Dynamics with Neural Rendering ( http://arxiv.org/abs/2410.08983v1 ) ライセンス: Link先を確認	Jiaxu Wang, Jingkai Sun, Junhao He, Ziyi Zhang, Qiang Zhang, Mingyuan Sun, Renjing Xu,	(参考訳) 学習ベースシミュレータは、3次元基底が利用可能である場合に粒子動力学をシミュレートする大きな可能性を示すが、粒子間通信は必ずしもアクセスできない。ニューラルレンダリングの開発は、逆レンダリングにより2次元画像から3次元ダイナミックスを学ぶための新しいソリューションを提供する。しかし、既存のアプローチはまだ2Dから3Dへの不確実性に起因する不適切な性質に悩まされており、例えば、特定の2D画像は様々な3D粒子分布に対応できる。このような不確実性を緩和するため、従来の機械的に解釈可能なフレームワークを物理先行として検討し、学習ベースのバージョンに拡張する。簡単に言えば、学習可能なグラフカーネルを古典的な離散要素解析(DEA)フレームワークに組み込んで、新しい力学統合学習システムを実装する。この場合、グラフネットワークカーネルは、動的マッピング全体ではなく、DEAフレームワーク内の特定の機械的演算子を近似するためにのみ使用される。本手法は,強い物理原理を統合することで,部分的な2次元観測から様々な物質の力学を統一的に学習することができる。実験により、この文脈では、我々のアプローチは他の学習シミュレータよりもはるかに優れており、異なるレンダラー、少ないトレーニングサンプル、少ないカメラビューに対して堅牢であることが示された。 Learning-based simulators show great potential for simulating particle dynamics when 3D groundtruth is available, but per-particle correspondences are not always accessible. The development of neural rendering presents a new solution to this field to learn 3D dynamics from 2D images by inverse rendering. However, existing approaches still suffer from ill-posed natures resulting from the 2D to 3D uncertainty, for example, specific 2D images can correspond with various 3D particle distributions. To mitigate such uncertainty, we consider a conventional, mechanically interpretable framework as the physical priors and extend it to a learning-based version. In brief, we incorporate the learnable graph kernels into the classic Discrete Element Analysis (DEA) framework to implement a novel mechanics-integrated learning system. In this case, the graph network kernels are only used for approximating some specific mechanical operators in the DEA framework rather than the whole dynamics mapping. By integrating the strong physics priors, our methods can effectively learn the dynamics of various materials from the partial 2D observations in a unified manner. Experiments show that our approach outperforms other learned simulators by a large margin in this context and is robust to different renderers, fewer training samples, and fewer camera views.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# SubZero: メモリ効率の良いLLMファインチューニングのためのランダムサブスペースゼロ階最適化 SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning ( http://arxiv.org/abs/2410.08989v1 ) ライセンス: Link先を確認	Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Hua Huang,	(参考訳) 細調整された大規模言語モデル(LLM)は、様々な下流タスクに有効であることが証明されている。しかし、LSMのサイズが大きくなるにつれて、バックプロパゲーションのメモリ要求はますます禁じられている。ゼロ階法(ZO)最適化法は、前方通過を用いて勾配を推定するが、勾配推定の分散は通常、モデルのパラメータ次元$\unicode{x2013}$aの重要な問題と線形にスケールする。本稿では,LLMの高次元性に起因する課題に対処するため,ランダムな部分空間ゼロ階数最適化(SubZero)を提案する。トレーニング性能を向上しつつ、メモリ消費を大幅に削減するLLM用に調整された低ランク摂動を導入する。さらに,我々の勾配推定は後方伝播勾配を近似し,従来のZO法よりも低分散を示し,SGDと組み合わせることで収束を保証する。実験結果から,SubZeroは様々な言語モデリングタスクにおいて,MeZOのような標準ZOアプローチと比較して微調整性能を高め,より高速な収束を実現することが示された。 Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model's parameter dimension$\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs' high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# 科学は探索である:概念メタファー理論のための計算フロンティア Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory ( http://arxiv.org/abs/2410.08991v1 ) ライセンス: Link先を確認	Rebecca M. M. Hicke, Ross Deans Kristensen-McLachlan,	(参考訳) メタファーはどこにでもあります。それらは、最も洗練された詩から、乾いた学術的な散文まで、自然言語のあらゆる領域に広く現れる。言語認知科学における重要な研究の体系は、概念的メタファーの存在、すなわち別の言語の言語における経験領域の体系的な構造化を論じている。概念的比喩は単なる修辞的繁栄ではなく、人間の認知における類推の役割の重要な証拠である。本稿では,Large Language Models (LLMs) が,自然言語データにおけるそのような概念的メタファの存在を正確に識別し,説明できるかどうかを問う。メタファアノテーションガイドラインに基づく新しいプロンプト手法を用いて,LLMが概念的メタファに関する大規模計算研究において有望なツールであることを実証した。さらに,LLMは,人間のアノテーションに設計された手続き的ガイドラインを適用可能であることを示し,言語知識の驚くほどの深さを示す。 Metaphors are everywhere. They appear extensively across all domains of natural language, from the most sophisticated poetry to seemingly dry academic prose. A significant body of research in the cognitive science of language argues for the existence of conceptual metaphors, the systematic structuring of one domain of experience in the language of another. Conceptual metaphors are not simply rhetorical flourishes but are crucial evidence of the role of analogical reasoning in human cognition. In this paper, we ask whether Large Language Models (LLMs) can accurately identify and explain the presence of such conceptual metaphors in natural language data. Using a novel prompting technique based on metaphor annotation guidelines, we demonstrate that LLMs are a promising tool for large-scale computational research on conceptual metaphors. Further, we show that LLMs are able to apply procedural guidelines designed for human annotators, displaying a surprising depth of linguistic knowledge.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# 大規模言語モデルのためのトークン空間の構造 The structure of the token space for large language models ( http://arxiv.org/abs/2410.08993v1 ) ライセンス: Link先を確認	Michael Robinson, Sourya Dey, Shauna Sweet,	(参考訳) 大規模言語モデルは、発話のセグメント(トークン)を高次元の周囲の潜在空間に配置することで、自然言語に存在する相関構造を符号化する。我々は,大規模言語モデルの振る舞いと制約の基本的な第一原理を理解するために,このトークン部分空間の位相的および幾何学的構造を理解することが重要であることを主張する。本稿では,トークン部分空間の次元およびリッチスカラー曲率を推定し,中程度のサイズの3つのオープンソース大言語モデル(GPT2,LLEMMA7B,MISTRAL7B)に適用する。これら3つのモデルにおいて、これらの測度を用いて、トークン部分空間は多様体ではなく、代わりに成層多様体であることが分かる。さらに,次元と曲率がモデルの生成流速と相関し,モデル挙動に影響を及ぼす可能性が示唆された。 Large language models encode the correlational structure present in natural language by fitting segments of utterances (tokens) into a high dimensional ambient latent space upon which the models then operate. We assert that in order to develop a foundational, first-principles understanding of the behavior and limitations of large language models, it is crucial to understand the topological and geometric structure of this token subspace. In this article, we present estimators for the dimension and Ricci scalar curvature of the token subspace, and apply it to three open source large language models of moderate size: GPT2, LLEMMA7B, and MISTRAL7B. In all three models, using these measurements, we find that the token subspace is not a manifold, but is instead a stratified manifold, where on each of the individual strata, the Ricci curvature is significantly negative. We additionally find that the dimension and curvature correlate with generative fluency of the models, which suggest that these findings have implications for model behavior.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# 一般化線形モデルを用いた不均衡分類のための最適ダウンサンプリング Optimal Downsampling for Imbalanced Classification with Generalized Linear Models ( http://arxiv.org/abs/2410.08994v1 ) ライセンス: Link先を確認	Yan Chen, Jose Blanchet, Krzysztof Dembczynski, Laura Fee Nern, Aaron Flores,	(参考訳) ダウンサンプリング(英: Downsampling)またはアンダーサンプリング(英: Under-Sampling)は、大規模かつ高度に不均衡な分類モデル(英語版)の文脈で利用される技法である。一般化線形モデル(GLM)を用いた不均衡分類のための最適ダウンサンプリングについて検討した。擬似最大確率推定器を提案し,その漸近正規性について,標本サイズが大きくなるにつれて不均衡な個体群が増加する状況下で検討する。導入した推定器の理論的保証を提供する。さらに,統計的精度と計算効率のバランスをとる基準を用いて,最適なダウンサンプリング率を算出する。合成データと実験データの両方を用いて数値実験を行い、理論結果のさらなる検証を行い、導入した推定器が一般に利用可能な代替手段より優れていることを示す。 Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We propose a pseudo maximum likelihood estimator and study its asymptotic normality in the context of increasingly imbalanced populations relative to an increasingly large sample size. We provide theoretical guarantees for the introduced estimator. Additionally, we compute the optimal downsampling rate using a criterion that balances statistical accuracy and computational efficiency. Our numerical experiments, conducted on both synthetic and empirical data, further validate our theoretical results, and demonstrate that the introduced estimator outperforms commonly available alternatives.	翻訳日:2024-10-30 20:46:27 公開日:2024-10-11
# 大規模言語モデルに基づく自然言語推論における仮説のみのバイアス Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference ( http://arxiv.org/abs/2410.08996v1 ) ライセンス: Link先を確認	Grace Proebsting, Adam Poliak,	(参考訳) 我々は、クラウドソースワーカーをLLMに置き換えて自然言語推論(NLI)を書けるかどうかをテストする。我々は,GPT-4,Llama-2,Mistral 7bを用いて,Stanford NLIコーパスの一部を再現し,仮説のみの分類器を訓練し,LLMによる仮説がアノテーションのアーティファクトを含むか否かを判断する。 LLMによるNLIデータセットでは、BERTベースの仮説のみの分類器が86～96%の精度で達成しており、これらのデータセットには仮説のみのアーティファクトが含まれていることを示している。また, LLM 生成仮説では, GPT-4 が生成する1万以上の矛盾点に "swimming in a pool" というフレーズが出現し, しばしば "give-aways" が現れる。我々の分析は、NLIにおける十分に証明されたバイアスがLLM生成データに持続できるという実証的な証拠を提供する。 We test whether replacing crowdsource workers with LLMs to write Natural Language Inference (NLI) hypotheses similarly results in annotation artifacts. We recreate a portion of the Stanford NLI corpus using GPT-4, Llama-2 and Mistral 7b, and train hypothesis-only classifiers to determine whether LLM-elicited hypotheses contain annotation artifacts. On our LLM-elicited NLI datasets, BERT-based hypothesis-only classifiers achieve between 86-96% accuracy, indicating these datasets contain hypothesis-only artifacts. We also find frequent "give-aways" in LLM-generated hypotheses, e.g. the phrase "swimming in a pool" appears in more than 10,000 contradictions generated by GPT-4. Our analysis provides empirical evidence that well-attested biases in NLI can persist in LLM-generated data.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# 古典的にシミュレート可能な量子回路を用いたユニタリダイナミクスの分離 Disentangling unitary dynamics with classically simulable quantum circuits ( http://arxiv.org/abs/2410.09001v1 ) ライセンス: Link先を確認	Gerald E. Fux, Benjamin Béri, Rosario Fazio, Emanuele Tirrito,	(参考訳) テンソルネットワーク法と安定化器フォーマリズムを組み合わせて量子多体系の効率的な古典的シミュレーションを構築することができるかを検討する。このために、量子回路とハミルトン力学の両方を研究する。 Tゲートあるいはより一般的な位相ゲートをドープしたディープクリフォード回路であっても、パウリ作用素の期待は効率的にシミュレートできる。これは、結果として生じる状態が広範な絡み合いと広範な非安定性の両方を示すという事実にもかかわらずである。ハミルトニアン力学では、古典的なシミュレーションは急速に非効率になるが、テンソルネットワークと共にマッチゲート回路を用いることで、自由フェルミオン積分性に近い多体量子系の効率的なシミュレーションが提案される。 We explore to which extent it is possible to construct efficient classical simulation of quantum many body systems using a combination of tensor network methods and the stabilizer formalism. For this we study both quantum circuit and Hamiltonian dynamics. We find that expectations of Pauli operators can be simulated efficiently even for deep Clifford circuits doped with T-gates or more general phase gates, provided the number of non-Clifford gates is smaller or approximately equal to the system size. This is despite the fact that the resulting states exhibit both extensive entanglement and extensive nonstabilizerness. For the Hamiltonian dynamics we find that the classical simulation generically quickly becomes inefficient, but suggest the use of matchgate circuits alongside tensor networks for the efficient simulation of many-body quantum systems near free-fermion integrability.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# DA-Ada:Domain Adaptive Object DetectionのためのDomain-Aware Adapterを学習する DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection ( http://arxiv.org/abs/2410.09004v1 ) ライセンス: Link先を確認	Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Xinkai Song, Xiaqing Li, Yongwei Zhao, Ling Li, Yunji Chen,	(参考訳) ドメイン適応オブジェクト検出(DAOD)は、注釈付きソースドメインで訓練された検出器を、未ラベルのターゲットドメインに一般化することを目的としている。視覚言語モデル(VLM)は、視覚的エンコーダを凍結し、ドメインに依存しないアダプタを挿入することで、DAODのドメイン不変知識を学習することができる。しかし、ドメインに依存しないアダプタは、必然的にソースドメインに偏っている。これは、未ラベル領域、すなわち対象領域のドメイン固有の知識を識別する有益な知識を放棄する。そこで本研究では,DAODタスクに適した新しいドメイン・アウェア・アダプタ(DA-Ada)を提案する。重要なポイントは、本質的な一般知識とドメイン不変知識の間のドメイン固有の知識を活用することである。 DA-Adaは、ドメイン不変知識を学ぶためのドメイン不変アダプタ(DIA)と、ビジュアルエンコーダによって破棄された情報からドメイン固有知識を注入するドメイン特化アダプタ(DSA)から構成される。複数のDAODタスクに対する総合的な実験により、DA-Adaはドメイン適応オブジェクト検出を促進するために、ドメイン認識型ビジュアルエンコーダを効率的に推論できることが示されている。私たちのコードはhttps://github.com/Therock90421/DA-Ada.comで公開されています。 Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. As the visual-language models (VLMs) can provide essential general knowledge on unseen images, freezing the visual encoder and inserting a domain-agnostic adapter can learn domain-invariant knowledge for DAOD. However, the domain-agnostic adapter is inevitably biased to the source domain. It discards some beneficial knowledge discriminative on the unlabelled domain, i.e., domain-specific knowledge of the target domain. To solve the issue, we propose a novel Domain-Aware Adapter (DA-Ada) tailored for the DAOD task. The key point is exploiting domain-specific knowledge between the essential general knowledge and domain-invariant knowledge. DA-Ada consists of the Domain-Invariant Adapter (DIA) for learning domain-invariant knowledge and the Domain-Specific Adapter (DSA) for injecting the domain-specific knowledge from the information discarded by the visual encoder. Comprehensive experiments over multiple DAOD tasks show that DA-Ada can efficiently infer a domain-aware visual encoder for boosting domain adaptive object detection. Our code is available at https://github.com/Therock90421/DA-Ada.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# 電力線データスペクトルを持つ2層ネットワークにおけるニューラルスケーリング法則の解析 Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra ( http://arxiv.org/abs/2410.09005v1 ) ライセンス: Link先を確認	Roman Worschech, Bernd Rosenow,	(参考訳) ニューラルスケーリング法則は、ディープニューラルネットワークのパフォーマンスが、トレーニングデータサイズ、モデルの複雑さ、トレーニング時間などの重要な要因とどのようにスケールするかを説明している。経験的な観察にもかかわらず、これらのスケーリング法則の理論的理解は依然として限られている。本研究では, 統計力学の手法を用いて, 生徒と教師の双方が2層ニューラルネットワークである1パス確率勾配勾配を解析する。本研究は、主に、パワー・ロースペクトルを示すデータ共分散行列に対する一般化誤差とその挙動に焦点を当てる。線形活性化関数に対して、一般化誤差の解析式を導出し、異なる学習体制を探索し、パワーロースケーリングが発生する条件を特定する。さらに,特徴学習体制における非線形活性化関数に解析を拡張し,データ共分散行列のパワーロッドスペクトルが学習力学に与える影響について検討する。重要なことに、対称台地の長さは、データ共分散行列の異なる固有値の数と隠れ単位の数に依存し、これらの台地が様々な構成の下でどのように振る舞うかを示す。さらに,データ共分散行列がパワーロースペクトルを持つ場合,指数関数からパワーロー収束への遷移が明らかとなった。この研究は、ニューラルネットワークのスケーリング法則の理論的理解に寄与し、複雑なデータ構造を含む実践シナリオにおける学習性能の最適化に関する洞察を提供する。 Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time, often following power-law behaviors over multiple orders of magnitude. Despite their empirical observation, the theoretical understanding of these scaling laws remains limited. In this work, we employ techniques from statistical mechanics to analyze one-pass stochastic gradient descent within a student-teacher framework, where both the student and teacher are two-layer neural networks. Our study primarily focuses on the generalization error and its behavior in response to data covariance matrices that exhibit power-law spectra. For linear activation functions, we derive analytical expressions for the generalization error, exploring different learning regimes and identifying conditions under which power-law scaling emerges. Additionally, we extend our analysis to non-linear activation functions in the feature learning regime, investigating how power-law spectra in the data covariance matrix impact learning dynamics. Importantly, we find that the length of the symmetric plateau depends on the number of distinct eigenvalues of the data covariance matrix and the number of hidden units, demonstrating how these plateaus behave under various configurations. In addition, our results reveal a transition from exponential to power-law convergence in the specialized phase when the data covariance matrix possesses a power-law spectrum. This work contributes to the theoretical understanding of neural scaling laws and provides insights into optimizing learning performance in practical scenarios involving complex data structures.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# SuperCorrect: エラー駆動インサイトによる言語モデルの監視と修正 SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights ( http://arxiv.org/abs/2410.09008v1 ) ライセンス: Link先を確認	Ling Yang, Zhaochen Yu, Tianjun Zhang, Minkai Xu, Joseph E. Gonzalez, Bin Cui, Shuicheng Yan,	(参考訳) GPT-4、PaLM、LLaMAのような大規模言語モデル(LLM)は、様々な推論タスクにおいて大幅に改善されている。しかし、Llama-3-8BやDeepSeekMath-Baseのような小さなモデルは、推論エラーを効果的に識別し修正できないため、複雑な数学的推論に苦慮している。最近のリフレクションベースの手法は、自己回帰と自己補正を可能にすることでこれらの問題に対処することを目的としているが、彼らは推論ステップにおけるエラーを独立に検出する際の課題に直面している。これらの制約を克服するために,より小さな学生モデルの推論と反映の両方を教師モデルを用いて監督し,修正する新しい2段階フレームワークであるSuperCorrectを提案する。第1段階では、教師モデルから階層的な高レベルかつ詳細な思考テンプレートを抽出し、よりきめ細かい推論思考を導き出す学生モデルを指導する。第2段階では,教師の学習中の補正トレースを追従することにより,学生モデルの自己補正能力を高めるために,クロスモデル協調直接選好最適化(DPO)を導入する。このクロスモデルDPOアプローチは、学生モデルに対して、教師モデルからの誤り駆動的な洞察による誤った思考を効果的に見つけ、解決し、その思考のボトルネックを破り、挑戦する問題に取り組むために新しいスキルと知識を得るように教える。大規模な実験は、従来の方法よりも、我々の優位性を一貫して示している。特に、我々のSuperCorrect-7Bモデルは、強力なDeepSeekMath-7Bの7.8%/5.3%、Qwen2.5-Math-7Bの15.1%/6.3%をMATH/GSM8Kベンチマークで上回り、全ての7BモデルでSOTA性能が向上した。コード:https://github.com/YangLing0818/SuperCorrect-llm Large language models (LLMs) like GPT-4, PaLM, and LLaMA have shown significant improvements in various reasoning tasks. However, smaller models such as Llama-3-8B and DeepSeekMath-Base still struggle with complex mathematical reasoning because they fail to effectively identify and correct reasoning errors. Recent reflection-based methods aim to address these issues by enabling self-reflection and self-correction, but they still face challenges in independently detecting errors in their reasoning steps. To overcome these limitations, we propose SuperCorrect, a novel two-stage framework that uses a large teacher model to supervise and correct both the reasoning and reflection processes of a smaller student model. In the first stage, we extract hierarchical high-level and detailed thought templates from the teacher model to guide the student model in eliciting more fine-grained reasoning thoughts. In the second stage, we introduce cross-model collaborative direct preference optimization (DPO) to enhance the self-correction abilities of the student model by following the teacher's correction traces during training. This cross-model DPO approach teaches the student model to effectively locate and resolve erroneous thoughts with error-driven insights from the teacher model, breaking the bottleneck of its thoughts and acquiring new skills and knowledge to tackle challenging problems. Extensive experiments consistently demonstrate our superiority over previous methods. Notably, our SuperCorrect-7B model significantly surpasses powerful DeepSeekMath-7B by 7.8%/5.3% and Qwen2.5-Math-7B by 15.1%/6.3% on MATH/GSM8K benchmarks, achieving new SOTA performance among all 7B models. Code: https://github.com/YangLing0818/SuperCorrect-llm	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# 合成テキストから3次元生成のためのセマンティックスコア蒸留サンプリング Semantic Score Distillation Sampling for Compositional Text-to-3D Generation ( http://arxiv.org/abs/2410.09009v1 ) ライセンス: Link先を確認	Ling Yang, Zixiang Zhang, Junlin Han, Bohan Zeng, Runjia Li, Philip Torr, Wentao Zhang,	(参考訳) テキスト記述から高品質な3Dアセットを生成することは、コンピュータグラフィックスと視覚研究において重要な課題である。 3Dデータの不足のため、最先端のアプローチでは、Score Distillation Sampling (SDS) によって最適化された事前訓練された2D拡散プリエントを利用する。進歩にもかかわらず、複数のオブジェクトや複雑なインタラクションを含む複雑な3Dシーンを作るのはまだ難しい。これを解決するため、最近の手法ではボックスやレイアウトのガイダンスが組み込まれている。しかしながら、これらのレイアウト誘導構成法は、一般に粗く表現力に欠けるため、細粒度制御に苦慮することが多い。これらの課題を克服するために、合成テキストから3D生成の表現性と精度を効果的に向上する新しいSDS手法、Semantic Score Distillation Sampling(SemanticSDS)を導入する。このアプローチは、異なるレンダリングビュー間の一貫性を維持し、さまざまなオブジェクトとパーツを明確に区別する、新しいセマンティック埋め込みを統合します。これらの埋め込みはセマンティックマップに変換され、領域固有のSDSプロセスを指示し、正確な最適化と構成生成を可能にする。明示的なセマンティックガイダンスを活用することで,既存の事前学習拡散モデルの合成能力を解放し,特に複雑なオブジェクトやシーンにおいて,3Dコンテンツ生成において優れた品質を実現する。実験の結果,セマンティックSDSフレームワークは最先端の複雑な3Dコンテンツを生成するのに極めて有効であることがわかった。コード:https://github.com/YangLing0818/SemanticSDS-3D Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content. Code: https://github.com/YangLing0818/SemanticSDS-3D	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# CVAM-Pose:多目的単眼球推定のための条件変分オートエンコーダ CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation ( http://arxiv.org/abs/2410.09010v1 ) ライセンス: Link先を確認	Jianyu Zhao, Wei Quan, Bogdan J. Matuszewski,	(参考訳) 剛体オブジェクトのポーズを推定することはコンピュータビジョンにおける基本的な問題の1つであり、自動化や拡張現実にまたがる様々な応用がある。既存のほとんどのアプローチでは、オブジェクトの3Dモデルや深度データに大きく依存して、オブジェクトのクラス戦略ごとにひとつのネットワークを採用しています。本稿では,これらの制約に対処する多目的単分子ポーズ推定のための新しい手法CVAM-Poseを提案する。 CVAM-Pose法はラベル埋め込み型条件付き変分オートエンコーダネットワークを用いて、単一の低次元潜在空間における複数のオブジェクトの正規化表現を暗黙的に抽象化する。この自動符号化プロセスは、投影カメラによって撮影された画像のみを使用し、オブジェクトの閉塞やシーンの乱れに対して堅牢である。オブジェクトのクラスは1ホットエンコードされ、ネットワーク全体に埋め込まれます。ラベル埋め込みされたポーズ回帰戦略は、連続的なポーズ表現を利用した学習された潜在空間表現を解釈する。多目的シナリオに対するCVAM-Pose法のスケーラビリティと効率性を示す。提案されたCVAM-Poseは、競合する潜在空間アプローチよりも優れている。例えば、Linemod-Occludedデータセットの$\mathrm{AR_{VSD}}$メトリックを使用して評価すると、AAEおよびMulti-Pathメソッドよりも25%と20%よい。また、BOPチャレンジで報告された3Dモデルに依存したメソッドに匹敵する結果が得られる。コード提供: https://github.com/JZhao12/CVAM-Pose Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend heavily on objects' 3D models, depth data, and employ a time-consuming iterative refinement, which could be impractical for some applications. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation that addresses these limitations. The CVAM-Pose method employs a label-embedded conditional variational autoencoder network, to implicitly abstract regularised representations of multiple objects in a single low-dimensional latent space. This autoencoding process uses only images captured by a projective camera and is robust to objects' occlusion and scene clutter. The classes of objects are one-hot encoded and embedded throughout the network. The proposed label-embedded pose regression strategy interprets the learnt latent space representations utilising continuous pose representations. Ablation tests and systematic evaluations demonstrate the scalability and efficiency of the CVAM-Pose method for multi-object scenarios. The proposed CVAM-Pose outperforms competing latent space approaches. For example, it is respectively 25% and 20% better than AAE and Multi-Path methods, when evaluated using the $\mathrm{AR_{VSD}}$ metric on the Linemod-Occluded dataset. It also achieves results somewhat comparable to methods reliant on 3D models reported in BOP challenges. Code available: https://github.com/JZhao12/CVAM-Pose	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# ソフトウェアエンジニアリングとファンデーションモデル: ファンデーションモデルによる業界ブログからの洞察 Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models ( http://arxiv.org/abs/2410.09012v1 ) ライセンス: Link先を確認	Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan,	(参考訳) 大規模言語モデル(LLM)のような基礎モデル(FM)は、ソフトウェア工学(SE)を含む多くの分野に大きな影響を与えている。 SEとFMの相互作用により、FMをSEプラクティス(FM4SE)に統合し、SE方法論をFM(SE4FM)に適用した。これらの傾向に対する学術的貢献に関する文献調査はいくつか存在するが、我々はまず実践者の見解を提供する。 FM4SEと997 SE4FMのブログ投稿を主要なテクノロジー企業から分析し、FMによる調査手法を利用して、議論された活動やタスクを体系的にラベル付けし要約する。我々は、コード生成が最も顕著なFM4SEタスクであるのに対して、FMはコード理解、要約、APIレコメンデーションといった他の多くのSEアクティビティに活用されていることを観察した。 SE4FMに関するブログ記事の大部分は、モデルデプロイメントと運用、システムアーキテクチャとオーケストレーションに関するものだ。クラウドのデプロイに重点を置いているが、FMを圧縮し、エッジやモバイルデバイスなどの小さなデバイスにデプロイすることへの関心が高まっている。得られた知見にインスパイアされた今後の8つの研究方向を概説し、学術的な発見と現実世界の応用とのギャップを埋めることを目的としている。本研究は,FM4SE と SE4FM の実践的応用に関する知識の体系を充実させるだけでなく,技術及びグレー文学領域における文献調査を行う上で,FMs を強力かつ効率的な手法としての有用性を示すものである。データセット、結果、コード、使用済みプロンプトは、オンラインレプリケーションパッケージhttps://github.com/SAILResearch/fmse-blogs.comにある。 Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide a practitioner's view. We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies, leveraging an FM-powered surveying approach to systematically label and summarize the discussed activities and tasks. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities such as code understanding, summarization, and API recommendation. The majority of blog posts on SE4FM are about model deployment & operation, and system architecture & orchestration. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices such as edge or mobile devices. We outline eight future research directions inspired by our gained insights, aiming to bridge the gap between academic findings and real-world applications. Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains. Our dataset, results, code and used prompts can be found in our online replication package at https://github.com/SAILResearch/fmse-blogs.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# 状態空間モデルのパラメータ効率の良い微調整 Parameter-Efficient Fine-Tuning of State Space Models ( http://arxiv.org/abs/2410.09016v1 ) ライセンス: Link先を確認	Kevin Galim, Wonjun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee,	(参考訳) Mamba (Gu & Dao, 2024)のようなDeep State Space Models (SSM)は、言語モデリングの強力なツールとして登場し、効率的な推論とシーケンス長の線形スケーリングを提供する。しかし、パラメータ効率のよい微調整(PEFT)法のSSMモデルへの応用は、まだほとんど未検討である。本稿では,2つの質問を体系的に研究することを目的とする。 (i)既存のPEFTメソッドはSSMベースのモデルでどのように動作するか? (ii)どのモジュールが微調整に最も効果的か? 我々は,SSMモデル上で,4つの基本PEFT手法の実証的なベンチマークを行う。以上の結果から,プロンプトベースの手法(例えばプレフィックスチューニング)はもはや有効ではなく,理論的解析によってさらに実証的な結果が得られた。対照的に、LoRAはSSMベースのモデルに有効である。さらに,これらのモデルにおける LoRA の最適適用について検討し,SSM モジュールを変更せずに線形射影行列に LoRA を適用すると,SSM モジュールのチューニングに有効ではないため,最良の結果が得られることを理論的および実験的に証明した。線形射影行列にLoRAを適用しながら、SSMモジュール上の特定のチャネルや状態を選択的に更新するSDLoRA(Selective Dimension tuning)を導入する。大規模な実験結果から、この手法は標準のLoRAよりも優れていたことが分かる。 Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have emerged as powerful tools for language modeling, offering high performance with efficient inference and linear scaling in sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely unexplored. This paper aims to systematically study two key questions: (i) How do existing PEFT methods perform on SSM-based models? (ii) Which modules are most effective for fine-tuning? We conduct an empirical benchmark of four basic PEFT methods on SSM-based models. Our findings reveal that prompt-based methods (e.g., prefix-tuning) are no longer effective, an empirical result further supported by theoretical analysis. In contrast, LoRA remains effective for SSM-based models. We further investigate the optimal application of LoRA within these models, demonstrating both theoretically and experimentally that applying LoRA to linear projection matrices without modifying SSM modules yields the best results, as LoRA is not effective at tuning SSM modules. To further improve performance, we introduce LoRA with Selective Dimension tuning (SDLoRA), which selectively updates certain channels and states on SSM modules while applying LoRA to linear projection matrices. Extensive experimental results show that this approach outperforms standard LoRA.	翻訳日:2024-10-30 20:36:41 公開日:2024-10-11
# MedMobile:専門家レベルの臨床能力を備えたモバイルサイズの言語モデル MedMobile: A mobile-sized language model with expert-level clinical capabilities ( http://arxiv.org/abs/2410.09019v1 ) ライセンス: Link先を確認	Krithik Vishwanath, Jaden Stryker, Anton Alaykin, Daniel Alexander Alber, Eric Karl Oermann,	(参考訳) 言語モデル(LM)は、医学における専門家レベルの推論とリコール能力を示している。しかし、計算コストとプライバシに関する懸念が、大規模な実装の障壁となっている。モバイル機器上で動作可能な380億のパラメータLMである phi-3-mini, MedMobile を医用アプリケーションに適用した。 MedMobileがMedQA(USMLE)で75.7%、医師の合格点(60%)を超え、100倍の大きさのモデルのスコアに近づいたことを実証した。その後、注意深いアブリケーションを行い、思考、アンサンブル、微調整の連鎖が最大のパフォーマンス向上に繋がることを示した。 Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. We introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We demonstrate that MedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark for physicians (~60%), and approaching the scores of models 100 times its size. We subsequently perform a careful set of ablations, and demonstrate that chain of thought, ensembling, and fine-tuning lead to the greatest performance gains, while unexpectedly retrieval augmented generation fails to demonstrate significant improvements	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# 4H炭化ケイ素ショットキーダイオードの低温における単一V2欠陥 Single V2 defect in 4H Silicon Carbide Schottky diode at low temperature ( http://arxiv.org/abs/2410.09021v1 ) ライセンス: Link先を確認	Timo Steidl, Pierre Kuna, Erik Hesselmeier-Hüttmann, Di Liu, Rainer Stöhr, Wolfgang Knolle, Misagh Ghezellou, Jawad Ul-Hassan, Maximilian Schober, Michel Bockstedte, Adam Gali, Vadim Vorobyov, Jörg Wrachtrup,	(参考訳) 量子光学部品のナノ・フォトニック結合は、スケーラブルな固体量子技術にとって不可欠である。炭化ケイ素は、成熟した量子欠陥を持つ材料であり、半導体産業において様々な用途がある。そこで本研究では、金属半導体(Au/Ti/4H-SiC)エピタキシャルウエハ装置における単一シリコン空孔(V2)色中心の挙動をショットキーダイオード構成で解析する。欠陥近傍における自由キャリアの劣化と欠陥光遷移線の電気的チューニングについて検討する。単一電荷トラップを検出することにより、V2光線幅への影響を調べる。さらに、V2中心の電荷-光子-力学を研究し、その支配的な光子-イオン化過程の特徴速度と波長依存性を見出した。最後に, 接合部におけるV2系のスピンコヒーレンス特性を探索し, 量子ネットワークアプリケーションのためのいくつかの重要なプロトコルを実証する。我々の研究は、量子応用のための光学マイクロ構造を持つショットキーデバイスの低温統合の最初のデモンストレーションを示し、固体中の基本的にスケーラブルで再現可能な光スピン欠陥センターへの道を開いた。 Nanoelectrical and photonic integration of quantum optical components is crucial for scalable solid-state quantum technologies. Silicon carbide stands out as a material with mature quantum defects and a wide variety of applications in semiconductor industry. Here, we study the behaviour of single silicon vacancy (V2) colour centres in a metal-semiconductor (Au/Ti/4H-SiC) epitaxial wafer device, operating in a Schottky diode configuration. We explore the depletion of free carriers in the vicinity of the defect, as well as electrical tuning of the defect optical transition lines. By detecting single charge traps, we investigate their impact on V2 optical line width. Additionally, we investigate the charge-photon-dynamics of the V2 centre and find its dominating photon-ionisation processes characteristic rate and wavelength dependence. Finally, we probe the spin coherence properties of the V2 system in the junction and demonstrate several key protocols for quantum network applications. Our work shows the first demonstration of low temperature integration of a Schottky device with optical microstructures for quantum applications and paves the way towards fundamentally scalable and reproducible optical spin defect centres in solids.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# 実験前データと実験内データを組み合わせた分散化 Variance reduction combining pre-experiment and in-experiment data ( http://arxiv.org/abs/2410.09027v1 ) ライセンス: Link先を確認	Zhexiao Lin, Pablo Crespo,	(参考訳) オンライン制御実験(A/Bテスト)は、多くの企業にとって、データ駆動による意思決定に不可欠である。これらの実験の感度を高めることは、特に一定のサンプルサイズで、平均処理効果(ATE)に対する推定器の分散を減少させることに依存する。 CUPEDやCUPACのような既存の手法では、実験前のデータを使って分散を減らすが、その効果は実験前のデータと結果の相関に依存する。対照的に、実験中のデータは結果と強く相関し、情報的になることが多い。本稿では, CUPED や CUPAC よりも高分散化を実現するために, 実験前データと実験内データを組み合わせた新しい手法を提案する。また、漸近理論を確立し、本手法に対して一貫した分散推定器を提供する。この手法をEtsyにおける複数オンライン実験に適用することにより、実験中の共変量しか含まないCUPACに対して、相当なばらつきを低減できる。これらの結果は、実験感度を大幅に改善し、意思決定を加速するアプローチの可能性を強調している。 Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies. Increasing the sensitivity of these experiments, particularly with a fixed sample size, relies on reducing the variance of the estimator for the average treatment effect (ATE). Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome. In contrast, in-experiment data is often more strongly correlated with the outcome and thus more informative. In this paper, we introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC, without introducing bias or additional computation complexity. We also establish asymptotic theory and provide consistent variance estimators for our method. Applying this method to multiple online experiments at Etsy, we reach substantial variance reduction over CUPAC with the inclusion of only a few in-experiment covariates. These results highlight the potential of our approach to significantly improve experiment sensitivity and accelerate decision-making.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# 異常に拡張されたフロケ予熱寿命と長期量子センシングへの応用 Anomalously extended Floquet prethermal lifetimes and applications to long-time quantum sensing ( http://arxiv.org/abs/2410.09028v1 ) ライセンス: Link先を確認	Kieren A. Harkins, Cooper Selco, Christian Bengs, David Marchiori, Leo Joon Il Moon, Zhuo-Rui Zhang, Aristotle Yang, Angad Singh, Emanuel Druga, Yi-Qiao Song, Ashok Ajoy,	(参考訳) フラケット予熱は周期的に駆動される量子多体系において観測され、この系は加熱を回避し、安定で非平衡状態を長期にわたって維持する。ここでは,Floquetの前熱寿命を大幅に延長するために,オフ共振と短角励起を用いた新しい量子制御法を提案する。これはダイヤモンド中のランダムに位置決めされた双極子結合された13C核スピンで実証されるが、この手法は広く適用可能である。我々は100KでT_2'〜800sの寿命を達成し、熱前状態への遷移を連続的に追跡する。これは、前温化のない素スピン寿命の533,000倍の拡張に対応し、絶対寿命と適用されるフロッケパルスの総数(700万回を超える)の両方で新しい記録を構成している。ラプラス・インバージョン(Laplace inversion)を用いて、寿命延長の起源を洞察する新しいノイズ分光法を開発した。最後に, 室温で約10分間連続して, 時間変化磁場の時間的, 再初期化のない量子センシングへの応用を実演する。我々の研究は、Floquet制御による駆動量子システムを安定化する新たな機会を促進し、連続的に疑問を呈し、長時間応答する量子センサに新しい応用を開放する。 Floquet prethermalization is observed in periodically driven quantum many-body systems where the system avoids heating and maintains a stable, non-equilibrium state, for extended periods. Here we introduce a novel quantum control method using off-resonance and short-angle excitation to significantly extend Floquet prethermal lifetimes. This is demonstrated on randomly positioned, dipolar-coupled, 13C nuclear spins in diamond, but the methodology is broadly applicable. We achieve a lifetime $T_2'~800 s at 100 K while tracking the transition to the prethermal state quasi-continuously. This corresponds to a >533,000-fold extension over the bare spin lifetime without prethermalization, and constitutes a new record both in terms of absolute lifetime as well as the total number of Floquet pulses applied (here exceeding 7 million). Using Laplace inversion, we develop a new form of noise spectroscopy that provides insights into the origin of the lifetime extension. Finally, we demonstrate applications of these extended lifetimes in long-time, reinitialization-free quantum sensing of time-varying magnetic fields continuously for ~10 minutes at room temperature. Our work facilitates new opportunities for stabilizing driven quantum systems through Floquet control, and opens novel applications for continuously interrogated, long-time responsive quantum sensors.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# 効率的な状態準備のための動的量子回路の学習 Learning dynamic quantum circuits for efficient state preparation ( http://arxiv.org/abs/2410.09030v1 ) ライセンス: Link先を確認	Faisal Alam, Bryan K. Clark,	(参考訳) 動的量子回路(DQC)は、これらの測定結果に基づいて中間回路の測定とゲートを組み込む。 DQCは一定の深さで一定の長距離の絡み合った状態を作ることができ、コヒーレンス時間に制限されたデバイス上で複雑な量子状態を作るための有望な経路となる。状態準備のためのDQCのほとんどすべての構造は、ターゲット状態の特別な構造に依存して解析的に定式化されている。そこで本研究では,汎用状態に対する高忠実度DQC作成法を求めるスケーラブルなテンソルネットワークアルゴリズムを開発した。アルゴリズムを臨界状態、ランダム行列積状態、サブセット状態に適用する。 DQCは固定数のアンシラと多量のアンシラの両方を考慮に入れた。わずかなアンシラ状態であっても、我々のアルゴリズムによって発見されたDQCは、同じ深さの静的量子回路よりも不忠実な状態を常に準備する。特に,システムサイズおよび回路深さの一定忠実度ゲインを観測する。多数のアンシラを持つDQCに対して,ニューラルネットワークデコーダやリアルタイムデコードプロトコルなど,測定結果をデコードするスケーラブルな手法を導入する。我々の研究は、DQC回路を生成するためのアルゴリズム的アプローチの力を示し、量子コンピューティングの新しい分野への応用範囲を広げた。 Dynamic quantum circuits (DQCs) incorporate mid-circuit measurements and gates conditioned on these measurement outcomes. DQCs can prepare certain long-range entangled states in constant depth, making them a promising route to preparing complex quantum states on devices with a limited coherence time. Almost all constructions of DQCs for state preparation have been formulated analytically, relying on special structure in the target states. In this work, we approach the problem of state preparation variationally, developing scalable tensor network algorithms which find high-fidelity DQC preparations for generic states. We apply our algorithms to critical states, random matrix product states, and subset states. We consider both DQCs with a fixed number of ancillae and those with an extensive number of ancillae. Even in the few ancillae regime, the DQCs discovered by our algorithms consistently prepare states with lower infidelity than a static quantum circuit of the same depth. Notably, we observe constant fidelity gains across system sizes and circuit depths. For DQCs with an extensive number of ancillae, we introduce scalable methods for decoding measurement outcomes, including a neural network decoder and a real-time decoding protocol. Our work demonstrates the power of an algorithmic approach to generating DQC circuits, broadening their scope of applications to new areas of quantum computing.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# アルバータ・ウェルズ、衛星画像から石油とガスの井戸を特定 Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery ( http://arxiv.org/abs/2410.09032v1 ) ライセンス: Link先を確認	Pratinav Seth, Michelle Lin, Brefo Dwamena Yaw, Jade Boutot, Mary Kang, David Rolnick,	(参考訳) 何百万もの石油やガスの井戸が世界中に散らばっており、メタンを大気中に放出し、有害化合物を地下水に放出している。これらの場所の多くは不明であり、井戸がふさがれるのを防ぎ、汚染効果が回避された。リモートセンシングは、放棄された井戸を大規模にピンポイントする比較的未調査のツールである。本稿では,Planet Labsの高解像度マルチスペクトル衛星画像を活用した,この問題に対する最初の大規模ベンチマークデータセットを紹介する。我々のキュレートされたデータセットは、特に高密度のアルバータ州の213,000の井戸(放棄、停止、アクティブ)で構成され、アルバータエネルギーレギュレータから供給され、ドメインの専門家によって検証されている。我々は、コンピュータビジョンアプローチの約束を示すとともに、改善のための重要な余地を示すとともに、良質な検出とセグメンテーションのためのベースラインアルゴリズムを評価する。 Millions of abandoned oil and gas wells are scattered across the world, leaching methane into the atmosphere and toxic compounds into the groundwater. Many of these locations are unknown, preventing the wells from being plugged and their polluting effects averted. Remote sensing is a relatively unexplored tool for pinpointing abandoned wells at scale. We introduce the first large-scale benchmark dataset for this problem, leveraging medium-resolution multi-spectral satellite imagery from Planet Labs. Our curated dataset comprises over 213,000 wells (abandoned, suspended, and active) from Alberta, a region with especially high well density, sourced from the Alberta Energy Regulator and verified by domain experts. We evaluate baseline algorithms for well detection and segmentation, showing the promise of computer vision approaches but also significant room for improvement.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# PEAR: 複数の大規模言語モデルエージェントで実現可能なPtychographyのためのロバストでフレキシブルな自動化フレームワーク PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents ( http://arxiv.org/abs/2410.09034v1 ) ライセンス: Link先を確認	Xiangyu Yin, Chuqiao Shi, Yimo Han, Yi Jiang,	(参考訳) Ptychographyは、X線および電子顕微鏡における高度な計算イメージング技術である。物理学、化学、生物学、材料科学などの科学研究分野や、半導体のキャラクタリゼーションなどの産業分野で広く採用されている。実際には、高品質な画像を得るには、多数の実験パラメータとアルゴリズムパラメータを同時に最適化する必要がある。伝統的に、パラメータの選択はしばしば試行錯誤に依存し、低スループットのワークフローと潜在的な人間のバイアスにつながる。本研究では,大規模言語モデル (LLM) を利用したPEAR(Ptychographic Experiment and Analysis Robot)を開発した。高い堅牢性と精度を確保するため、PEARは知識検索、コード生成、パラメータレコメンデーション、画像推論などのタスクに複数のLLMエージェントを使用している。本研究では,LLaMA 3.1 8Bのような小型オープンウェイトモデルであっても,PEARのマルチエージェント設計によりワークフローの成功率が大幅に向上することを示す。 PEARはまた、様々な自動化レベルをサポートし、異なる研究環境における柔軟性と適応性を確保するために、カスタマイズされたローカル知識ベースで動作するように設計されている。 Ptychography is an advanced computational imaging technique in X-ray and electron microscopy. It has been widely adopted across scientific research fields, including physics, chemistry, biology, and materials science, as well as in industrial applications such as semiconductor characterization. In practice, obtaining high-quality ptychographic images requires simultaneous optimization of numerous experimental and algorithmic parameters. Traditionally, parameter selection often relies on trial and error, leading to low-throughput workflows and potential human bias. In this work, we develop the "Ptychographic Experiment and Analysis Robot" (PEAR), a framework that leverages large language models (LLMs) to automate data analysis in ptychography. To ensure high robustness and accuracy, PEAR employs multiple LLM agents for tasks including knowledge retrieval, code generation, parameter recommendation, and image reasoning. Our study demonstrates that PEAR's multi-agent design significantly improves the workflow success rate, even with smaller open-weight models such as LLaMA 3.1 8B. PEAR also supports various automation levels and is designed to work with customized local knowledge bases, ensuring flexibility and adaptability across different research environments.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# Mentor-KD: 小型言語モデルによるマルチステップ推論の改善 Mentor-KD: Making Small Language Models Better Multi-step Reasoners ( http://arxiv.org/abs/2410.09037v1 ) ライセンス: Link先を確認	Hojae Lee, Junho Kim, SangKeun Lee,	(参考訳) 大規模言語モデル(LLM)は、CoT(Chain-of-Thought)のプロンプトを活用することで、様々な複雑なタスクにわたって顕著なパフォーマンスを示している。近年,LLM教師が生成する多段階理性理論の微調整言語モデルを用いて,LLMの推論能力を伝達する知識蒸留(KD)手法が提案されている。しかし, LLM の教師モデルから不十分な蒸留セットを抽出する上での2つの課題は, 十分に考慮されていない。 1)データ品質及びデータ品質 2)ソフトラベルの提供。本稿では, 上記の課題に対処しつつ, LLM の多段階推論能力をより小さな LM に効果的に蒸留する Mentor-KD を提案する。具体的には、中間サイズのタスク固有の微調整モデルを用いて、追加のCoTアノテーションを増補し、蒸留の推論中に学生モデルにソフトラベルを提供する。我々は広範囲な実験を行い、メンターKDの有効性を様々なモデルや複雑な推論タスクで確認する。 Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# AttnGCG: 注意操作によるLLMの脱獄攻撃の強化 AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation ( http://arxiv.org/abs/2410.09040v1 ) ライセンス: Link先を確認	Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie,	(参考訳) 本稿では,Greedy Coordinate Gradient(GCG)戦略を中心に,トランスフォーマーに基づく大規模言語モデル(LLM)のジェイルブレイク攻撃に対する脆弱性について検討する。まず,攻撃の有効性とモデルの内部挙動との正の相関関係を考察する。例えば、LLMの安全性を確保するために設計されたシステムプロンプトにモデルがより多くの注意を払っている場合、攻撃は効果が低い傾向にある。この発見に基づいて,LLMジェイルブレイクを促進するため,モデルの注意点数を操作する拡張手法を導入し,これをAttnGCGと呼ぶ。経験的に、AttnGCGは様々なLLMに対して一貫した攻撃効率の改善を示し、Llama-2シリーズでは平均で7%、Gemmaシリーズでは平均で約10%向上した。また,GPT-3.5 や GPT-4 などのブラックボックス LLM に対する攻撃の堅牢性を示す。さらに、私たちの注目スコアの可視化はより解釈可能であり、ターゲットの注意操作がより効果的なジェイルブレイクを促進する方法について、より深い洞察を得ることができます。コードをhttps://github.com/UCSC-VLAA/AttnGCG-アタックでリリースします。 This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy. We first observe a positive correlation between the effectiveness of attacks and the internal behaviors of the models. For instance, attacks tend to be less effective when models pay more attention to system prompts designed to ensure LLM safety alignment. Building on this discovery, we introduce an enhanced method that manipulates models' attention scores to facilitate LLM jailbreaking, which we term AttnGCG. Empirically, AttnGCG shows consistent improvements in attack efficacy across diverse LLMs, achieving an average increase of ~7% in the Llama-2 series and ~10% in the Gemma series. Our strategy also demonstrates robust attack transferability against both unseen harmful goals and black-box LLMs like GPT-3.5 and GPT-4. Moreover, we note our attention-score visualization is more interpretable, allowing us to gain better insights into how our targeted attention manipulation facilitates more effective jailbreaking. We release the code at https://github.com/UCSC-VLAA/AttnGCG-attack.	翻訳日:2024-10-30 20:26:51 公開日:2024-10-11
# ゲージングアーベル境界対称性による安定化符号の体系的構成 Systematic construction of stabilizer codes via gauging abelian boundary symmetries ( http://arxiv.org/abs/2410.09044v1 ) ライセンス: Link先を確認	Bram Vancraeynest-De Cuiper, José Garre-Rubio,	(参考訳) そこで本研究では,(d+1)次元の安定化モデルを構築するための体系的枠組みを提案する。提案手法は,[J. Garre-Rubio, Nature Commun. 15, 7986 (2024)] の著者のひとりによって開発され,初期対称状態が繰り返し測定され,その境界における初期対称性を支持する1次元上の創発的モデルが得られた。この方法では、初期状態とそれらが基底状態である通勤安定度を持つハミルトン群の構築を可能にするだけでなく、境界対称性の自発的な破れにつながるこれらのモデルに対するギャップ付き境界条件を構築する方法も提供する。詳細な導入例では,2次元に居住する大域的な0-形式対称性を反復的にゲージすることで,クリフォード変形曲面符号を3次元的に構築することで,我々のパラダイムを実証する。次に、ウィリアムソンのゲージ法を少し拡張して、主要な結果の証明を与える。また、d=2 では、初期線形部分系とシエルピンスキーフラクタル対称性のゲージから異なるタイプ-Iフラクトン位が出現する例を2つ挙げる。これに従えば、関連するすべてのゲージマップと創発状態について、明示的なテンソルネットワーク表現を提供する。 We propose a systematic framework to construct a (d+1)-dimensional stabilizer model from an initial generic d-dimensional abelian symmetry. Our approach builds upon the iterative gauging procedure, developed by one of the authors in [J. Garre-Rubio, Nature Commun. 15, 7986 (2024)], in which an initial symmetric state is repeatedly gauged to obtain an emergent model in one dimension higher that supports the initial symmetry at its boundary. This method not only enables the construction of emergent states and corresponding commuting stabilizer Hamiltonians of which they are ground states, but it also provides a way to construct gapped boundary conditions for these models that amount to spontaneously breaking part of the boundary symmetry. In a detailed introductory example, we showcase our paradigm by constructing three-dimensional Clifford-deformed surface codes from iteratively gauging a global 0-form symmetry that lives in two dimensions. We then provide a proof of our main result, hereby drawing upon a slight extension of the gauging procedure of Williamson. We additionally provide two more examples in d=2 in which different type-I fracton orders emerge from gauging initial linear subsystem and Sierpinski fractal symmetries. En passant, we provide explicit tensor network representations of all of the involved gauging maps and the emergent states.	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# MiRAGeNews: マルチモーダルリアルAI生成ニュース検出 MiRAGeNews: Multimodal Realistic AI-Generated News Detection ( http://arxiv.org/abs/2410.09045v1 ) ライセンス: Link先を確認	Runsheng Huang, Liam Dugan, Yue Yang, Chris Callison-Burch,	(参考訳) 近年では、炎症性や誤解を招く「フェイク」ニュースの拡散がますます一般的になっている。同時に、想像できるあらゆるシーンを描いたフォトリアリスティックな画像を生成するためにAIツールを使用するのは、これまで以上に簡単になっている。 AIが生成したフェイクニュースコンテンツという2つの組み合わせは、特に強力で危険なものだ。 AI生成フェイクニュースの拡散に対処するために、最先端のジェネレータから12,500の高品質のリアルおよびAI生成イメージキャプチャーペアのデータセットであるMiRAGeNews Datasetを提案する。我々のデータセットは、人間(60% F-1)と最先端のマルチモーダルLSM(24% F-1)にとって重要な課題であることがわかった。データセットを使用して、ドメイン外の画像ジェネレータやニュースパブリッシャからのイメージキャプチャペアによる最先端のベースラインよりも、+5.1%のF-1を改善するマルチモーダル検出器(MiRAGe)をトレーニングする。 AI生成コンテンツを検出するための今後の作業を支援するため、コードとデータを公開しています。 The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# マニフォールド仮説下における拡散モデルの線形収束 Linear Convergence of Diffusion Models Under the Manifold Hypothesis ( http://arxiv.org/abs/2410.09046v1 ) ライセンス: Link先を確認	Peter Potaptchik, Iskander Azangulov, George Deligiannidis,	(参考訳) スコアマッチング生成モデルは複雑な高次元データ分布のサンプリングに成功している。多くの応用において、この分布は$D$次元空間に埋め込まれたより低い$d$次元多様体に集中していると考えられている。現在の最もよく知られた収束保証は$D$の線型あるいは$d$の多項式(超線型)である。後者は、後方SDEのための新しい統合スキームを利用する。両世界のベストを尽くし、クルバック・リーブル~(KL) の発散に収束するために必要となるステップ拡散モデルの数は、内在次元$d$の線型(対数項まで)であることを示す。さらに,この線形依存は鋭いことを示す。 Score-matching generative models have proven successful at sampling from complex high-dimensional data distributions. In many applications, this distribution is believed to concentrate on a much lower $d$-dimensional manifold embedded into $D$-dimensional space; this is known as the manifold hypothesis. The current best-known convergence guarantees are either linear in $D$ or polynomial (superlinear) in $d$. The latter exploits a novel integration scheme for the backward SDE. We take the best of both worlds and show that the number of steps diffusion models require in order to converge in Kullback-Leibler~(KL) divergence is linear (up to logarithmic terms) in the intrinsic dimension $d$. Moreover, we show that this linear dependency is sharp.	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# 視覚・言語モデルによる安全アライメント劣化の解明と軽減 Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models ( http://arxiv.org/abs/2410.09047v1 ) ライセンス: Link先を確認	Qin Liu, Chao Shang, Ling Liu, Nikolaos Pappas, Jie Ma, Neha Anna John, Srikanth Doss, Lluis Marquez, Miguel Ballesteros, Yassine Benajiba,	(参考訳) VLM(Vision-Language Models)の安全アライメント能力は、LLMのバックボーンと比べてビジョンモジュールの統合によって劣化する傾向にある。本稿では、この現象を「安全アライメント劣化」と呼び、VLMに視覚モダリティを導入する際に生じる表現ギャップから課題が生じることを示す。特に、マルチモーダル入力の表現は、LLMのバックボーンが最適化した分布を表すテキストのみの入力からずれていることを示す。同時に、テキスト埋め込み空間内で開発された安全アライメント機能は、この新しいマルチモーダル表現空間への転送に成功しなかった。安全アライメントの劣化を低減するため,VLMのLLMバックボーンに内在する安全アライメント能力の回復と,VLMの機能的機能の同時維持を両立させる推論時間表現介入手法であるCross-Modality Representation Manipulation (CMRM)を導入する。実験の結果,本研究の枠組みは,付加訓練を必要とせずとも,LLMバックボーンから受け継いだアライメント能力が,訓練前のVLMの流速や言語能力に最小限の影響を伴って,著しく回復することが示された。具体的には、マルチモーダル入力におけるLLaVA-7Bの安全性の低いレートは、推論時間の介入だけで61.53%から3.15%に削減できる。 WARNING: 有害な言語や有害な言語の例を含む。 The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the representations of multi-modal inputs shift away from that of text-only inputs which represent the distribution that the LLM backbone is optimized for. At the same time, the safety alignment capabilities, initially developed within the textual embedding space, do not successfully transfer to this new multi-modal representation space. To reduce safety alignment degradation, we introduce Cross-Modality Representation Manipulation (CMRM), an inference time representation intervention method for recovering the safety alignment ability that is inherent in the LLM backbone of VLMs, while simultaneously preserving the functional capabilities of VLMs. The empirical results show that our framework significantly recovers the alignment ability that is inherited from the LLM backbone with minimal impact on the fluency and linguistic capabilities of pre-trained VLMs even without additional training. Specifically, the unsafe rate of LLaVA-7B on multi-modal input can be reduced from 61.53% to as low as 3.15% with only inference-time intervention. WARNING: This paper contains examples of toxic or harmful language.	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# コードのための信頼できるLCMを目指して - データ中心の総合的監査フレームワーク Towards Trustworthy LLMs for Code: A Data-Centric Synergistic Auditing Framework ( http://arxiv.org/abs/2410.09048v1 ) ライセンス: Link先を確認	Chong Wang, Zhenpeng Chen, Tianlin Li, Yilun Zhao, Yang Liu,	(参考訳) LLMを利用したコーディングと開発アシスタントはプログラマのワークフローに広く普及している。しかし、LLMのコードに対する信頼性に関する懸念は、広く使われているにもかかわらず継続している。既存の研究の多くは、トレーニングまたは評価に重点を置いており、トレーニングや評価のステークホルダーがモデルの信頼性を理解し、統一された方向へ進むことができるかどうかという疑問を提起している。本稿では,トレーニングデータと評価データと相関性の両方を相乗的に強調するデータ中心型アプローチを採用した信頼性監査フレームワークDataTrustのビジョンを提案する。 DataTrustは、モデル信頼性指標とトレーニングにおけるデータ品質指標を結びつけることを目的としている。学習データを自律的に検査し、合成データを用いてモデル信頼性を評価し、特定の評価データから潜在的な原因を対応するトレーニングデータに関連付け、インジケータ接続を精査する。さらに、DataTrustを利用する信頼性アリーナは、クラウドソーシングされた入力に関わり、定量的な結果をもたらす。さまざまな利害関係者がDataTrustから得られるメリットを概説し、それがもたらす課題と機会について議論する。 LLM-powered coding and development assistants have become prevalent to programmers' workflows. However, concerns about the trustworthiness of LLMs for code persist despite their widespread use. Much of the existing research focused on either training or evaluation, raising questions about whether stakeholders in training and evaluation align in their understanding of model trustworthiness and whether they can move toward a unified direction. In this paper, we propose a vision for a unified trustworthiness auditing framework, DataTrust, which adopts a data-centric approach that synergistically emphasizes both training and evaluation data and their correlations. DataTrust aims to connect model trustworthiness indicators in evaluation with data quality indicators in training. It autonomously inspects training data and evaluates model trustworthiness using synthesized data, attributing potential causes from specific evaluation data to corresponding training data and refining indicator connections. Additionally, a trustworthiness arena powered by DataTrust will engage crowdsourced input and deliver quantitative outcomes. We outline the benefits that various stakeholders can gain from DataTrust and discuss the challenges and opportunities it presents.	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# SceneCraft:レイアウトガイド付き3Dシーンジェネレーション SceneCraft: Layout-Guided 3D Scene Generation ( http://arxiv.org/abs/2410.09049v1 ) ライセンス: Link先を確認	Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang,	(参考訳) ユーザ仕様に合わせて複雑な3Dシーンを作成するのは、従来の3Dモデリングツールでは面倒で難しい作業でした。いくつかの先駆的手法は自動テキスト・ツー・3D生成を実現しているが、概して形状やテクスチャを制限した小規模なシーンに限られている。 SceneCraftは,ユーザが提供するテキスト記述や空間的レイアウトの嗜好に従う,室内の詳細なシーンを生成する新しい手法である。我々の手法の中心はレンダリングに基づく手法であり、3Dセマンティックレイアウトを多視点2Dプロキシマップに変換する。さらに,ニューラルラディアンス場(NeRF)を最終シーン表現として学習する多視点画像を生成するための意味的・深度条件付き拡散モデルの設計を行う。パノラマ画像生成の制約がなければ、1つの部屋を超えて複雑な屋内空間生成を支援する従来の手法を超越する。実験により,本手法は,多様なテクスチャ,一貫した幾何学,現実的な視覚的品質を有する複雑な屋内シーン生成において,既存の手法を著しく上回ることを示す。コードとさらに多くの結果が、https://orangesodahub.github.io/SceneCraft.comで公開されている。 The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft	翻訳日:2024-10-30 16:58:09 公開日:2024-10-11
# SoK: クロスサイロFLの検証 SoK: Verifiable Cross-Silo FL ( http://arxiv.org/abs/2410.09124v1 ) ライセンス: Link先を確認	Aleksei Korneev, Jan Ramon,	(参考訳) Federated Learning(FL)は、複数のデバイスに分散したデータによる機械学習(ML)モデルのトレーニングを可能にする、広範なアプローチである。医療や金融などの分野でよく見られるクロスサイロFLでは、参加者の数は適度であり、各党は通常よく知られた組織を表している。例えば、医療データ所有者は、よく確立されたエンティティである病院やデータハブである。しかし、悪意のある当事者は、例えばバイアスのある結果や計算負荷の削減といった特定の利益を得るために、トレーニング手順を妨害しようとするかもしれない。トレーニングに使用されるデータが公開されている場合、悪意のあるエージェントを容易に検出できるが、トレーニングデータセットのプライバシを維持する必要がある場合には、問題はさらに深刻になる。この問題に対処するため、最近は検証可能なプロトコルの開発への関心が高まっている。本稿では,検証可能なクロスサイロFLに関する知識の体系化について述べる。我々は、様々なプロトコルを分析し、それらを分類に適合させ、それらの効率と脅威モデルと比較する。また,Zero-Knowledge Proof (ZKP) のスキームを分析し,FLコンテキストにおける全体的なコストを最小化する方法について論じる。最後に,研究ギャップを特定し,今後の科学的研究の方向性について議論する。 Federated Learning (FL) is a widespread approach that allows training machine learning (ML) models with data distributed across multiple devices. In cross-silo FL, which often appears in domains like healthcare or finance, the number of participants is moderate, and each party typically represents a well-known organization. For instance, in medicine data owners are often hospitals or data hubs which are well-established entities. However, malicious parties may still attempt to disturb the training procedure in order to obtain certain benefits, for example, a biased result or a reduction in computational load. While one can easily detect a malicious agent when data used for training is public, the problem becomes much more acute when it is necessary to maintain the privacy of the training dataset. To address this issue, there is recently growing interest in developing verifiable protocols, where one can check that parties do not deviate from the training procedure and perform computations correctly. In this paper, we present a systematization of knowledge on verifiable cross-silo FL. We analyze various protocols, fit them in a taxonomy, and compare their efficiency and threat models. We also analyze Zero-Knowledge Proof (ZKP) schemes and discuss how their overall cost in a FL context can be minimized. Lastly, we identify research gaps and discuss potential directions for future scientific work.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# フェイクラベルの学習:セキュア次元変換によるスプリットラーニングにおけるラベルリークの軽減 Training on Fake Labels: Mitigating Label Leakage in Split Learning via Secure Dimension Transformation ( http://arxiv.org/abs/2410.09125v1 ) ライセンス: Link先を確認	Yukun Jiang, Peiran Wang, Chengguo Lin, Ziyue Huang, Yong Cheng,	(参考訳) 双方向分割学習は、垂直連合学習の一般的なパラダイムとして現れている。ラベル所有者のプライバシを維持するために、スプリットラーニングは、学習プロセス中に各IRの入力と勾配に基づいて、中間表現(IR)を交換することのみを必要とする分割モデルを利用する。しかし、最近、スプリットラーニングはラベル推論攻撃を生き残ることが証明されている。いくつかの防衛方法は採用できるが、防御性能が限られるか、当初の任務に著しく悪影響を及ぼすかのいずれかである。本稿では,学習モデルの高能率性を維持しつつ,既存のラベル推論攻撃を防御する,新たな2要素分割学習手法を提案する。具体的には,まず次元変換モジュール SecDT を設計し,オリジナルラベル間の双方向マッピングとKクラスラベルの増大を実現し,指向性の観点からラベルの漏洩を緩和する。次に、勾配正規化アルゴリズムは、異なるクラスから勾配の大きさのばらつきを取り除くように設計されている。プライバシリークを軽減し,我々のKを敵に知られないようにするために,ソフトマックス正規化ガウス雑音を提案する。我々は2つのバイナリ分類データセット(AvazuとCriteo)と3つのマルチ分類データセット(MNIST、FashionMNIST、CIFAR-10)を含む実世界のデータセットの実験を行った。提案手法の有効性と既存手法よりも優れていることを示す。例えば、アバズデータセットでは、評価された4つの顕著な攻撃の攻撃 AUC は 0.4532+-0.0127 に削減できる。 Two-party split learning has emerged as a popular paradigm for vertical federated learning. To preserve the privacy of the label owner, split learning utilizes a split model, which only requires the exchange of intermediate representations (IRs) based on the inputs and gradients for each IR between two parties during the learning process. However, split learning has recently been proven to survive label inference attacks. Though several defense methods could be adopted, they either have limited defensive performance or significantly negatively impact the original mission. In this paper, we propose a novel two-party split learning method to defend against existing label inference attacks while maintaining the high utility of the learned models. Specifically, we first craft a dimension transformation module, SecDT, which could achieve bidirectional mapping between original labels and increased K-class labels to mitigate label leakage from the directional perspective. Then, a gradient normalization algorithm is designed to remove the magnitude divergence of gradients from different classes. We propose a softmax-normalized Gaussian noise to mitigate privacy leakage and make our K unknowable to adversaries. We conducted experiments on real-world datasets, including two binary-classification datasets (Avazu and Criteo) and three multi-classification datasets (MNIST, FashionMNIST, CIFAR-10); we also considered current attack schemes, including direction, norm, spectral, and model completion attacks. The detailed experiments demonstrate our proposed method's effectiveness and superiority over existing approaches. For instance, on the Avazu dataset, the attack AUC of evaluated four prominent attacks could be reduced by 0.4532+-0.0127.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# 宇宙機姿勢センサにおけるリアルタイム多変量時系列故障検出のための畳み込みニューラルネットワークの設計と評価 Convolutional Neural Network Design and Evaluation for Real-Time Multivariate Time Series Fault Detection in Spacecraft Attitude Sensors ( http://arxiv.org/abs/2410.09126v1 ) ライセンス: Link先を確認	Riccardo Gallon, Fabian Schiemenz, Alessandra Menicucci, Eberhard Gill,	(参考訳) 衛星上の従来の異常検出技術は、信頼性がありながら制限されたしきい値のメカニズムに基づいており、これは特定の欧州宇宙標準化協力(ECSS)標準に従って、単変量信号を監視し、回復行動をトリガーするように設計されている。しかし、人工知能ベースのフォールト検出・分離・リカバリ(FDIR)ソリューションは、これらの標準手法の限界を克服し、検出可能な障害の範囲を広げ、応答時間を改善しようとしている。本稿では,小型太陽系天体探査のためのドローン型宇宙船の加速度計および慣性測定ユニット内のスタンプ値を検出するための新しい手法を提案する。搭載されたFDIRシステム内でのアルゴリズムの互換性確保に重要な注意が払われており、その堅牢性が完全に証明されるまで実験が続けられている技術の軌道上での検証への一歩である。ネットワークの異常を効果的に検出し,システムレベルでの回復動作をトリガーする統合手法を提案する。反応トリガにおける検出性能とアルゴリズムの能力は、FDIRタスクの実行におけるアルゴリズムの優れた性能を示すために、一組のカスタム定義検出とシステムメトリクスを用いて評価する。 Traditional anomaly detection techniques onboard satellites are based on reliable, yet limited, thresholding mechanisms which are designed to monitor univariate signals and trigger recovery actions according to specific European Cooperation for Space Standardization (ECSS) standards. However, Artificial Intelligence-based Fault Detection, Isolation and Recovery (FDIR) solutions have recently raised with the prospect to overcome the limitations of these standard methods, expanding the range of detectable failures and improving response times. This paper presents a novel approach to detecting stuck values within the Accelerometer and Inertial Measurement Unit of a drone-like spacecraft for the exploration of Small Solar System Bodies (SSSB), leveraging a multi-channel Convolutional Neural Network (CNN) to perform multi-target classification and independently detect faults in the sensors. Significant attention has been dedicated to ensuring the compatibility of the algorithm within the onboard FDIR system, representing a step forward to the in-orbit validation of a technology that remains experimental until its robustness is thoroughly proven. An integration methodology is proposed to enable the network to effectively detect anomalies and trigger recovery actions at the system level. The detection performances and the capability of the algorithm in reaction triggering are evaluated employing a set of custom-defined detection and system metrics, showing the outstanding performances of the algorithm in performing its FDIR task.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# CYCLE: エンティティリンクにおけるクロス年次コントラスト学習 CYCLE: Cross-Year Contrastive Learning in Entity-Linking ( http://arxiv.org/abs/2410.09127v1 ) ライセンス: Link先を確認	Pengyu Zhang, Congfeng Cao, Klim Zaporojets, Paul Groth,	(参考訳) 知識グラフは常に進化し、新しいエンティティが出現し、既存の定義が修正され、エンティティの関係が変化する。これらの変化は、時間とともにモデルの性能が低下するのを特徴とするエンティティリンクモデルの時間的劣化につながる。この問題に対処するために,各期間にまたがる近隣のエンティティからの情報を集約するために,グラフ関係を活用することを提案する。このアプローチは、時間とともに類似したエンティティを識別する能力を高め、時間的劣化の影響を最小限にする。本稿では, \textbf{C}ross-\textbf{Y}ear \textbf{C}ontrastive \textbf{L}earning for \textbf{E}ntity-Linkingを紹介する。このモデルは、エンティティリンクタスクの時間的パフォーマンス劣化に対処するために、新しいグラフコントラスト学習法を用いる。比較学習法では,新たに追加されたグラフ関係を \textit{ positive} サンプルとして扱い,新たに削除したグラフ関係を \textit{ negative} サンプルとして扱う。提案手法は,時間差が1年である2023年以降の最先端技術よりも13.90 %の性能向上を実現し,ギャップが3年になるにつれて17.79 %の改善を実現し,時間劣化を効果的に防止する。さらに分析した結果,CYCLEは低次エンティティに対して特に堅牢であり,低接続性のため時間劣化に対する耐性が低いため,本手法に特に適していることがわかった。コードとデータは \url{https://github.com/pengyu-zhang/CYCLE-Cross-Contrastive-Yearning-in-Entity-Linking} で公開されている。 Knowledge graphs constantly evolve with new entities emerging, existing definitions being revised, and entity relationships changing. These changes lead to temporal degradation in entity linking models, characterized as a decline in model performance over time. To address this issue, we propose leveraging graph relationships to aggregate information from neighboring entities across different time periods. This approach enhances the ability to distinguish similar entities over time, thereby minimizing the impact of temporal degradation. We introduce \textbf{CYCLE}: \textbf{C}ross-\textbf{Y}ear \textbf{C}ontrastive \textbf{L}earning for \textbf{E}ntity-Linking. This model employs a novel graph contrastive learning method to tackle temporal performance degradation in entity linking tasks. Our contrastive learning method treats newly added graph relationships as \textit{positive} samples and newly removed ones as \textit{negative} samples. This approach helps our model effectively prevent temporal degradation, achieving a 13.90\% performance improvement over the state-of-the-art from 2023 when the time gap is one year, and a 17.79\% improvement as the gap expands to three years. Further analysis shows that CYCLE is particularly robust for low-degree entities, which are less resistant to temporal degradation due to their sparse connectivity, making them particularly suitable for our method. The code and data are made available at \url{https://github.com/pengyu-zhang/CYCLE-Cross-Year-Contrastive-Learning-in-Entity-Linking}.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# TIGER: 一時的に改善されたグラフエンティティリンカ TIGER: Temporally Improved Graph Entity Linker ( http://arxiv.org/abs/2410.09128v1 ) ライセンス: Link先を確認	Pengyu Zhang, Congfeng Cao, Paul Groth,	(参考訳) 例えば、新しいエンティティが導入されたり、エンティティ記述が変更されたりすると、知識グラフは時間とともに変化する。これは、Web検索やレコメンデーションといった知識グラフを多用する上で重要なタスクであるエンティティリンクのパフォーマンスに影響を与える。具体的には、エンティティリンクモデルは時間的劣化を示し、その性能は、エンティティリンクモデルがトレーニングされた元の状態から知識グラフが移動するほど低下する。この課題に対処するために、 \textbf{TIGER}: a \textbf{T}emporally \textbf{I}mproved \textbf{G}raph \textbf{E}ntity Linke\textbf{r}を紹介する。モデルにエンティティ間の構造情報を組み込むことで、学習された表現を強化し、時間とともにエンティティをより区別しやすくする。中心となる考え方は、グラフベースの情報をテキストベースの情報に統合することである。 3つのデータセットを用いて実験したところ, 時間差が1年である場合, 時間差が16.24\%向上し, 時間差が3年で20.93\%向上すると, 時間差が16.24\%向上することが示唆された。コードとデータは \url{https://github.com/pengyu-zhang/TIGER-Temporally-Improved-Graph-Entity-Linker} で公開されている。 Knowledge graphs change over time, for example, when new entities are introduced or entity descriptions change. This impacts the performance of entity linking, a key task in many uses of knowledge graphs such as web search and recommendation. Specifically, entity linking models exhibit temporal degradation - their performance decreases the further a knowledge graph moves from its original state on which an entity linking model was trained. To tackle this challenge, we introduce \textbf{TIGER}: a \textbf{T}emporally \textbf{I}mproved \textbf{G}raph \textbf{E}ntity Linke\textbf{r}. By incorporating structural information between entities into the model, we enhance the learned representation, making entities more distinguishable over time. The core idea is to integrate graph-based information into text-based information, from which both distinct and shared embeddings are based on an entity's feature and structural relationships and their interaction. Experiments on three datasets show that our model can effectively prevent temporal degradation, demonstrating a 16.24\% performance boost over the state-of-the-art in a temporal setting when the time gap is one year and an improvement to 20.93\% as the gap expands to three years. The code and data are made available at \url{https://github.com/pengyu-zhang/TIGER-Temporally-Improved-Graph-Entity-Linker}.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# nextlocllm: LLMを用いた次の位置予測 nextlocllm: next location prediction using LLMs ( http://arxiv.org/abs/2410.09129v1 ) ライセンス: Link先を確認	Shuai Liu, Ning Cao, Yile Chen, Yue Jiang, Gao Cong,	(参考訳) 次の位置予測は、人間の移動解析において重要な課題であり、様々な下流アプリケーションの基礎となる。既存の手法は通常、場所を表すために個別のIDに依存しており、本質的には空間的関係を見落とし、都市全体にわたって一般化できない。本稿では,自然言語記述処理における大規模言語モデル(LLM)の利点と,その強力な一般化機能を活かしたNextLocLLMを提案する。具体的には、IDを使う代わりに、NextLocLLMは連続的な空間座標に基づいて位置を符号化し、空間関係をより良くモデル化する。これらの座標はさらに正規化され、堅牢な都市間一般化が可能となる。 NextlocLLMのもう1つのハイライトは、LLM強化されたPOI埋め込みである。 LLMの能力を利用して、各POIカテゴリの自然言語記述を埋め込みにエンコードする。これらの埋め込みは非線形射影によって積分され、このLLM強化されたPOI埋め込みを形成し、位置の機能的特性を効果的に捉える。さらに、部分的に凍結したLDMバックボーンの入力として、タスクおよびデータプロンプトプレフィックスと軌跡埋め込みが組み込まれている。 NextLocLLMはさらに予測検索モジュールを導入し、予測における構造的一貫性を保証する。実験によると、NextLocLLMは既存のモデルを次の位置予測で上回り、教師付き設定とゼロショット設定の両方で優れている。 Next location prediction is a critical task in human mobility analysis and serves as a foundation for various downstream applications. Existing methods typically rely on discrete IDs to represent locations, which inherently overlook spatial relationships and cannot generalize across cities. In this paper, we propose NextLocLLM, which leverages the advantages of large language models (LLMs) in processing natural language descriptions and their strong generalization capabilities for next location prediction. Specifically, instead of using IDs, NextLocLLM encodes locations based on continuous spatial coordinates to better model spatial relationships. These coordinates are further normalized to enable robust cross-city generalization. Another highlight of NextlocLLM is its LLM-enhanced POI embeddings. It utilizes LLMs' ability to encode each POI category's natural language description into embeddings. These embeddings are then integrated via nonlinear projections to form this LLM-enhanced POI embeddings, effectively capturing locations' functional attributes. Furthermore, task and data prompt prefix, together with trajectory embeddings, are incorporated as input for partly-frozen LLM backbone. NextLocLLM further introduces prediction retrieval module to ensure structural consistency in prediction. Experiments show that NextLocLLM outperforms existing models in next location prediction, excelling in both supervised and zero-shot settings.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# 3nm FinFETマルチポートSRAMを用いたオンライン学習型CIMを用いた省エネルギーSNNアーキテクチャ Energy-efficient SNN Architecture using 3nm FinFET Multiport SRAM-based CIM with Online Learning ( http://arxiv.org/abs/2410.09130v1 ) ライセンス: Link先を確認	Lucas Huijbregts, Liu Hsiao-Hsuan, Paul Detterer, Said Hamdioui, Amirreza Yousefzadeh, Rajendra Bishnoi,	(参考訳) 現在の人工知能(AI)計算システムは、主にメモリウォールの問題から問題に直面しており、特にスマートフォン、ウェアラブル、Internet-of-Thingsセンサーシステムといった、バッテリー予算が制限されたエッジデバイスにおいて、システムレベルのパフォーマンスを制限している。本稿では、スパイキングニューラルネットワーク(SNN)推論に最適化されたSRAMベースの新しいCompute-In-Memory(CIM)アクセラレータを提案する。提案アーキテクチャでは、複数の分離されたReadポートを備えたマルチポートSRAM設計を採用し、スループットとTransposable Read-Writeポートを強化し、オンライン学習を容易にする。さらに,効率的なデータ処理とポート割り当てを行うArbiter回路を開発した。 3nmFinFET技術における128ドルのアレイの結果は、従来のシングルポート設計と比較して3.1ドルのスピード向上と2.2ドルのエネルギー効率向上を示す。システムレベルでは、スループットが44 MInf/sで607 pJ/Inf、29mWとなる。 Current Artificial Intelligence (AI) computation systems face challenges, primarily from the memory-wall issue, limiting overall system-level performance, especially for Edge devices with constrained battery budgets, such as smartphones, wearables, and Internet-of-Things sensor systems. In this paper, we propose a new SRAM-based Compute-In-Memory (CIM) accelerator optimized for Spiking Neural Networks (SNNs) Inference. Our proposed architecture employs a multiport SRAM design with multiple decoupled Read ports to enhance the throughput and Transposable Read-Write ports to facilitate online learning. Furthermore, we develop an Arbiter circuit for efficient data-processing and port allocations during the computation. Results for a 128$\times$128 array in 3nm FinFET technology demonstrate a 3.1$\times$ improvement in speed and a 2.2$\times$ enhancement in energy efficiency with our proposed multiport SRAM design compared to the traditional single-port design. At system-level, a throughput of 44 MInf/s at 607 pJ/Inf and 29mW is achieved.	翻訳日:2024-10-30 16:13:24 公開日:2024-10-11
# グラフがマルチモーダルに出会ったとき:マルチモーダルなグラフ学習のベンチマーク When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning ( http://arxiv.org/abs/2410.09132v1 ) ライセンス: Link先を確認	Hao Yan, Chaozhuo Li, Zhigang Yu, Jun Yin, Ruochen Liu, Peiyan Zhang, Weihao Han, Mingzheng Li, Zhengxin Zeng, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang, Senzhang Wang,	(参考訳) マルチモーダル属性グラフ(MAG)は、様々な実世界のシナリオで一般的であり、一般的に2種類の知識を含んでいる。 (a)属性知識は、主に、テキストや画像など、ノード(エンティティ)自体に含まれる異なるモダリティの属性によって支持される。 b) トポロジー知識は,ノード間の複雑な相互作用によって提供される。 MAG表現学習の基礎は、マルチモーダル属性とトポロジーのシームレスな統合にある。プレトレーニング言語/視覚モデル(PLMs/PVMs)とグラフニューラルネットワーク(GNNs)の最近の進歩は、MAGの効果的な学習を促進し、研究の関心を高めている。しかし、MAG表現学習のための有意義なベンチマークデータセットや標準化された評価手順が欠如していることは、この分野の進歩を妨げている。本稿では,MAGのベンチマークデータセットの包括的かつ多種多様な集合であるMultimodal Attribute Graph Benchmark (MAGB)を提案する。 MAGBデータセットは特に大規模であり、Eコマースネットワークからソーシャルネットワークまで幅広いドメインを含んでいる。新たなデータセットに加えて、GNNベースの手法やPLMベースの手法など、さまざまな学習パラダイムを用いたMAGB上で広範囲なベンチマーク実験を行い、マルチモーダル属性とグラフトポロジの統合の必要性と実現可能性について検討する。簡単に言えば、MAGデータセットの概要、標準化された評価手順、および現在のベースライン実験を提供する。 MAGBプロジェクト全体はhttps://github.com/sktsherlock/ATG.comで公開されている。 Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge: (a) Attribute knowledge is mainly supported by the attributes of different modalities contained in nodes (entities) themselves, such as texts and images. (b) Topology knowledge, on the other hand, is provided by the complex interactions posed between nodes. The cornerstone of MAG representation learning lies in the seamless integration of multimodal attributes and topology. Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs, garnering increased research interest. However, the absence of meaningful benchmark datasets and standardized evaluation procedures for MAG representation learning has impeded progress in this field. In this paper, we propose Multimodal Attribute Graph Benchmark (MAGB)}, a comprehensive and diverse collection of challenging benchmark datasets for MAGs. The MAGB datasets are notably large in scale and encompass a wide range of domains, spanning from e-commerce networks to social networks. In addition to the brand-new datasets, we conduct extensive benchmark experiments over MAGB with various learning paradigms, ranging from GNN-based and PLM-based methods, to explore the necessity and feasibility of integrating multimodal attributes and graph topology. In a nutshell, we provide an overview of the MAG datasets, standardized evaluation procedures, and present baseline experiments. The entire MAGB project is publicly accessible at https://github.com/sktsherlock/ATG.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# MVG-CRPS:多変量確率予測のためのロバストロス関数 MVG-CRPS: A Robust Loss Function for Multivariate Probabilistic Forecasting ( http://arxiv.org/abs/2410.09133v1 ) ライセンス: Link先を確認	Vincent Zhihao Zheng, Lijun Sun,	(参考訳) 確率的時系列予測では、多変量ガウス分布(MVG)が相関連続確率変数の予測分布として広く用いられている。現在の深い確率モデルでは、ニューラルネットワークを用いて分布の平均ベクトルと共分散行列をパラメータ化し、デフォルトの損失関数として対数スコア(負の対数類似関数)を用いるのが一般的である。しかし、ログスコアは外れ値に非常に敏感であり、データに異常が存在する場合、重大なエラーが発生する。単変量分布の学習にCRPS(Continuous Rank probability score)を用いることで、高次元MVG出力に特化して設計されたロバストな損失関数を提案する。提案したMVG-CRPS損失関数は、ニューラルネットワークの出力に基づいてクローズドフォーム表現を持ち、ディープラーニングモデルに容易に統合できる。多変量自己回帰と一変量列列列列列予測(Seq2Seq)という2つの確率的予測タスクにおけるMVG-CRPSの評価を行った。実世界のデータセットによる実験結果から,MVG-CRPSは頑健性と効率性を両立し,確率予測における精度の向上と不確実性の定量化を実現している。 In probabilistic time series forecasting, the multivariate Gaussian (MVG) distribution is widely used as predictive distribution for correlated continuous random variables. Current deep probabilistic models typically employ neural networks to parameterize the mean vector and covariance matrix of the distribution, with log-score (i.e., negative log-likelihood) as the default loss function. However, log-score is highly sensitive to outliers, leading to significant errors when anomalies are present in the data. Motivated by the use of the continuous ranked probability score (CRPS) in learning univariate distributions, we propose a robust loss function specifically designed for high-dimensional MVG outputs. The proposed MVG-CRPS loss function has a closed-form expression based on the neural network outputs, making it easily integrable into deep learning models. We evaluate MVG-CRPS on two probabilistic forecasting tasks -- multivariate autoregressive and univariate sequence-to-sequence (Seq2Seq) forecasting -- both involving observations following MVG distribution. Experimental results on real-world datasets demonstrate that MVG-CRPS achieves both robustness and efficiency, offering enhanced accuracy and uncertainty quantification in probabilistic forecasting.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# 自律型サイバー防衛におけるマルチエージェントアクター臨界 Multi-Agent Actor-Critics in Autonomous Cyber Defense ( http://arxiv.org/abs/2410.09134v1 ) ライセンス: Link先を確認	Mingjun Wang, Remington Dechene,	(参考訳) 自律的で適応的な防御機構の必要性は、急速に進化するサイバー脅威の状況において最重要になっている。マルチエージェントディープ強化学習(MADRL)は、自律型サイバーオペレーションの有効性とレジリエンスを高めるための有望なアプローチである。本稿では,複数エージェント間の協調的相互作用を利用して,サイバー脅威を検出し,軽減し,応答するマルチエージェント・アクター・クリティカル・アルゴリズムの適用について検討する。シミュレーションサイバー攻撃シナリオにおいて,各エージェントが迅速に学習し,MADRLを用いて自律的に脅威に対処できることを実証する。その結果、MADRLは自律型サイバー防衛システムの能力を大幅に向上させ、よりインテリジェントなサイバーセキュリティ戦略の道を開いたことが示唆された。この研究は、人工知能をサイバーセキュリティに活用するための知識の増大に寄与し、自律型サイバーオペレーションの今後の研究と開発に光を当てる。 The need for autonomous and adaptive defense mechanisms has become paramount in the rapidly evolving landscape of cyber threats. Multi-Agent Deep Reinforcement Learning (MADRL) presents a promising approach to enhancing the efficacy and resilience of autonomous cyber operations. This paper explores the application of Multi-Agent Actor-Critic algorithms which provides a general form in Multi-Agent learning to cyber defense, leveraging the collaborative interactions among multiple agents to detect, mitigate, and respond to cyber threats. We demonstrate each agent is able to learn quickly and counter act on the threats autonomously using MADRL in simulated cyber-attack scenarios. The results indicate that MADRL can significantly enhance the capability of autonomous cyber defense systems, paving the way for more intelligent cybersecurity strategies. This study contributes to the growing body of knowledge on leveraging artificial intelligence for cybersecurity and sheds light for future research and development in autonomous cyber operations.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# 土地被覆分析の高度化:動的世界データを用いた予測モデリングのための統合的データ抽出パイプライン Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset ( http://arxiv.org/abs/2410.09135v1 ) ライセンス: Link先を確認	Victor Radermecker, Andrea Zanon, Nancy Thomas, Annita Vapsi, Saba Rahimi, Rama Ramakrishnan, Daniel Borrajo,	(参考訳) 特に、データアクセシビリティーが政府や商業団体に排他的になるから、より広い研究コミュニティを含む現在へと移行するにつれて、土地のカバーを理解することは、多くの実践的応用にとって大きな可能性を秘めている。それでも、データは探索に関心のあるすべてのコミュニティメンバーにアクセスできるが、恐ろしい学習曲線が存在し、データにアクセス、前処理、その後のタスクに活用するための標準化されたプロセスはない。本研究では, 最先端の土地利用/土地被覆(LULC)データセットであるDynamic Worldデータセットを扱うための, フレキシブルで効率的なエンド・ツー・エンドパイプラインを提示することにより, このデータを民主化する。これには、ノイズ除去に取り組む事前処理および表現フレームワーク、大量のデータの効率的な抽出、複数の下流タスクに適したフォーマットでのLULCデータの再表現が含まれる。パイプラインのパワーを実証するために、都市化予測問題のためのデータを抽出し、優れたパフォーマンスで機械学習モデルのスイートを構築する。このタスクは任意の種類の土地被覆の予測に容易に一般化でき、パイプラインは他の下流タスクと互換性がある。 Understanding land cover holds considerable potential for a myriad of practical applications, particularly as data accessibility transitions from being exclusive to governmental and commercial entities to now including the broader research community. Nevertheless, although the data is accessible to any community member interested in exploration, there exists a formidable learning curve and no standardized process for accessing, pre-processing, and leveraging the data for subsequent tasks. In this study, we democratize this data by presenting a flexible and efficient end to end pipeline for working with the Dynamic World dataset, a cutting-edge near-real-time land use/land cover (LULC) dataset. This includes a pre-processing and representation framework which tackles noise removal, efficient extraction of large amounts of data, and re-representation of LULC data in a format well suited for several downstream tasks. To demonstrate the power of our pipeline, we use it to extract data for an urbanization prediction problem and build a suite of machine learning models with excellent performance. This task is easily generalizable to the prediction of any type of land cover and our pipeline is also compatible with a series of other downstream tasks.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# RealEra: 近隣のコンセプトマイニングによるセマンティックレベルのコンセプト消去 RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining ( http://arxiv.org/abs/2410.09140v1 ) ライセンス: Link先を確認	Yufan Liu, Jinyang An, Wanqian Zhang, Ming Li, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang,	(参考訳) テキスト・ツー・イメージ・ジェネレーション・モデルの顕著な発展は、肖像画の権利侵害や不適切なコンテンツの生成など、顕著なセキュリティ上の懸念を引き起こしている。概念消去は、モデルが保護され不適切な概念に関する知識を取り除くために提案されている。多くの手法は、有効性と特異性(無関係な概念を含む)のバランスをとろうとしてきたが、意味論的に関連する入力を操りながら、豊富な消去概念を生成することができる。本研究では,この「概念残余」問題に対処するためにRealEraを提案する。具体的には,消去概念の埋め込みにランダムな摂動を加え,消去範囲を拡大し,関連する概念入力を通じて世代を排除することによって,近隣概念マイニングのメカニズムを紹介した。さらに、消去範囲の拡大による無関係な概念の生成に対するネガティブな影響を軽減するため、RealEraは概念の超越正規化を通じて特異性を保っている。これにより、無関係な概念は対応する空間的位置を維持し、通常の生成性能を維持することができる。また, クロスアテンションアライメントにおけるU-Netの重み付けの最適化や, LoRAモジュールとの予測ノイズアライメントにも, クローズドフォームの解を用いる。複数のベンチマークでの大規模な実験により、RealEraは、有効性、特異性、一般化性の点で、過去の概念消去方法よりも優れていたことが示されている。詳細はプロジェクトのページ https://realerasing.github.io/RealEra/ で確認できます。 The remarkable development of text-to-image generation models has raised notable security concerns, such as the infringement of portrait rights and the generation of inappropriate content. Concept erasure has been proposed to remove the model's knowledge about protected and inappropriate concepts. Although many methods have tried to balance the efficacy (erasing target concepts) and specificity (retaining irrelevant concepts), they can still generate abundant erasure concepts under the steering of semantically related inputs. In this work, we propose RealEra to address this "concept residue" issue. Specifically, we first introduce the mechanism of neighbor-concept mining, digging out the associated concepts by adding random perturbation into the embedding of erasure concept, thus expanding the erasing range and eliminating the generations even through associated concept inputs. Furthermore, to mitigate the negative impact on the generation of irrelevant concepts caused by the expansion of erasure scope, RealEra preserves the specificity through the beyond-concept regularization. This makes irrelevant concepts maintain their corresponding spatial position, thereby preserving their normal generation performance. We also employ the closed-form solution to optimize weights of U-Net for the cross-attention alignment, as well as the prediction noise alignment with the LoRA module. Extensive experiments on multiple benchmarks demonstrate that RealEra outperforms previous concept erasing methods in terms of superior erasing efficacy, specificity, and generality. More details are available on our project page https://realerasing.github.io/RealEra/ .	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# ACER: Retrievalによる自動言語モデルコンテキスト拡張 ACER: Automatic Language Model Context Extension via Retrieval ( http://arxiv.org/abs/2410.09141v1 ) ライセンス: Link先を確認	Luyu Gao, Yunyi Zhang, Jamie Callan,	(参考訳) ロングコンテキストモデリングは、複雑な情報片の消化と推論において、言語AIの重要な能力の1つである。実際には、長文機能は通常、事前訓練された言語モデル~(LM)に、慎重に設計されたコンテキスト拡張段階を通じて構築され、汎用的な長文機能を生み出すことを目的としている。しかし、予備実験では、現在のオープンウェイトなジェネラリスト長文モデルでは、実用的な長文処理タスクにはまだ欠けていることが判明した。これは、完全に効果的な長期コンテキストモデリングがタスク固有のデータを必要とすることを意味するが、コストは禁じられる可能性がある。本稿では,人間による大量の情報処理の仕方からインスピレーションを得た: 失明した‘textbf{retrieval} ステージは大量の文書をランク付けする一方,読解者は上位候補のみを深く読み取る。我々は、短いコンテキストのLMを用いて、このプロセスを模倣する、‘textbf{automatic}データ合成パイプラインを構築した。短文LMは、タスク固有の長文機能を得るために、これらの自己生成データを使ってさらに調整される。事前学習が不完全なデータからどのように学習するかと同じように、短コンテキストモデルが合成データをブートストラップし、長コンテキストジェネリストモデルだけでなく、長コンテキスト検索拡張生成のような実世界のタスクでトレーニングデータを合成するために使用される検索および読み取りパイプラインよりも優れていることを仮説化し、さらに実証する。 Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we discovered that the current open-weight generalist long-context models are still lacking in practical long-context processing tasks. While this means perfectly effective long-context modeling demands task-specific data, the cost can be prohibitive. In this paper, we draw inspiration from how humans process a large body of information: a lossy \textbf{retrieval} stage ranks a large set of documents while the reader ends up reading deeply only the top candidates. We build an \textbf{automatic} data synthesis pipeline that mimics this process using short-context LMs. The short-context LMs are further tuned using these self-generated data to obtain task-specific long-context capabilities. Similar to how pre-training learns from imperfect data, we hypothesize and further demonstrate that the short-context model can bootstrap over the synthetic data, outperforming not only long-context generalist models but also the retrieval and read pipeline used to synthesize the training data in real-world tasks such as long-context retrieval augmented generation.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# 顔の顔画像から自動で顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔の顔 Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image ( http://arxiv.org/abs/2410.09155v1 ) ライセンス: Link先を確認	Marta Veganzones Rodriguez, Thinh Phan, Arthur F. A. Fernandes, Vivian Breen, Jesus Arango, Michael T. Kidd, Ngan Le,	(参考訳) ニワトリの性別を決定する過程であるニワトリの性交は、各性別が生産で果たす役割が異なるため、養鶏業において重要な課題である。効果的な伝統的な方法は高い精度を達成するが、色と羽毛のセックスは特定の品種に限られており、通気性セックスは侵襲的であり、訓練された専門家を必要としている。これらの課題に対処するために,人間の顔の性別分類技術に触発された新しいアプローチを提案する。本手法は, 知識を必要とせず, ニワトリの操作を最小化し, 動物福祉を増強し, 訓練時間を短縮することを目的としている。我々は、データ収集、顔とキーポイントの検出、顔のアライメント、分類を含む、訓練と推論のための総合的なシステムを開発する。我々は, ヒナの顔の特徴を更に分析するために, ヒナの顔の特徴を保ちながら, クロッピーフルフェイスとクロッピーミドルフェイスの2つの画像でモデルを評価した。本実験は, 将来のニワトリの性行為の実践において, 81.89%の精度で有望な生存可能性を示すものである。 Chick sexing, the process of determining the gender of day-old chicks, is a critical task in the poultry industry due to the distinct roles that each gender plays in production. While effective traditional methods achieve high accuracy, color, and wing feather sexing is exclusive to specific breeds, and vent sexing is invasive and requires trained experts. To address these challenges, we propose a novel approach inspired by facial gender classification techniques in humans: facial chick sexing. This new method does not require expert knowledge and aims to reduce training time while enhancing animal welfare by minimizing chick manipulation. We develop a comprehensive system for training and inference that includes data collection, facial and keypoint detection, facial alignment, and classification. We evaluate our model on two sets of images: Cropped Full Face and Cropped Middle Face, both of which maintain essential facial features of the chick for further analysis. Our experiment demonstrates the promising viability, with a final accuracy of 81.89%, of this approach for future practices in chick sexing by making them more universally applicable.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# 自己教師付き表現学習のための識別確率モデルについて On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning ( http://arxiv.org/abs/2410.09156v1 ) ライセンス: Link先を確認	Bokun Wang, Yunwen Lei, Yiming Ying, Tianbao Yang,	(参考訳) 複数モーダルな)自己教師付き表現学習のための連続領域における識別確率モデル問題について検討する。本研究では,各アンカーデータに対する分割関数の積分計算の課題に対処するため,モンテカルロの堅牢化のための多重重要サンプリング(MIS)技術を活用し,InfoNCEに基づくコントラスト損失を特殊ケースとして回復する。本フレームワークでは,モンテカルロ積分の誤差を低減することで,自己教師付き表現学習における現在のInfoNCEに基づくコントラスト損失の限界を明らかにするために一般化誤差解析を行い,より良いアプローチを開発するための洞察を導出する。この目的のために、凸最適化によりMISが要求する条件密度の和を近似する新しい非パラメトリック手法を提案する。さらに,提案した目的を解くための効率的なアルゴリズムを設計する。比較画像言語事前学習タスクにおいて,提案アルゴリズムと表現ベースラインを実証的に比較した。 CC3M と CC12M のデータセットに対する実験結果から,アルゴリズムの全体的な性能が向上したことを示す。 We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# LLMのためのハイブリッドトレーニングアプローチ:ドメイン特化アプリケーションにおける実データと合成データの活用によるモデル性能向上 Hybrid Training Approaches for LLMs: Leveraging Real and Synthetic Data to Enhance Model Performance in Domain-Specific Applications ( http://arxiv.org/abs/2410.09168v1 ) ライセンス: Link先を確認	Alexey Zhezherau, Alexei Yanockin,	(参考訳) 本研究では、実世界と合成データを統合してモデル性能、特に正確で文脈的に関係のある応答を生成することによって、大規模言語モデル(LLM)を微調整するハイブリッドアプローチについて検討する。転写された実データと高品質な合成セッションを組み合わせたデータセットを利用することで、不足、ノイズ、ドメイン固有の実データの制限を克服することを目的とした。トレーニングの多様性を高めるために、合成ペルソナとシナリオが採用された。本研究は,基本基礎モデル,実データで微調整されたモデル,ハイブリッド微調整されたモデルという3つのモデルを評価した。実験の結果、ハイブリッドモデルは特定の垂直的アプリケーションにおいて他のモデルよりも一貫して優れており、すべての指標で最高スコアを達成できた。さらなるテストでは、ハイブリッドモデルの優れた適応性と、さまざまなシナリオにおけるコンテキスト理解が確認された。これらの結果から, 実データと合成データを組み合わせることで, LLMの堅牢性と文脈感受性, 特にドメイン固有および垂直ユースケースにおいて有意に向上することが示唆された。 This research explores a hybrid approach to fine-tuning large language models (LLMs) by integrating real-world and synthetic data to boost model performance, particularly in generating accurate and contextually relevant responses. By leveraging a dataset combining transcribed real interactions with high-quality synthetic sessions, we aimed to overcome the limitations of scarce, noisy, and domain-specific real data. Synthetic personas and scenarios were employed to enhance training diversity. The study evaluated three models: a base foundational model, a model fine-tuned with real data, and a hybrid fine-tuned model. Experimental results showed that the hybrid model consistently outperformed the others in specific vertical applications, achieving the highest scores across all metrics. Further testing confirmed the hybrid model's superior adaptability and contextual understanding across diverse scenarios. These findings suggest that combining real and synthetic data can significantly improve the robustness and contextual sensitivity of LLMs, particularly in domain-specific and vertical use cases.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# ブロックチェーンに基づくセンサネットワークにおける集合メンバシップのための効率的なゼロ知識証明:新しいORアグリゲーションアプローチ Efficient Zero-Knowledge Proofs for Set Membership in Blockchain-Based Sensor Networks: A Novel OR-Aggregation Approach ( http://arxiv.org/abs/2410.09169v1 ) ライセンス: Link先を確認	Oleksandr Kuznetsov, Emanuele Frontoni, Marco Arnesano, Kateryna Kuznetsova,	(参考訳) ブロックチェーンベースのセンサネットワークは、IoTエコシステムにおけるセキュアで透過的なデータ管理のための、有望なソリューションを提供する。しかし、効率的な集合メンバシップ証明は、特に資源制約のある環境では、依然として重要な課題である。本稿では,ブロックチェーンベースのセンサネットワークに特化して設計された,ゼロ知識集合型メンバシップ証明のための新しいORアグリゲーション手法を提案する。我々は、包括的な理論基盤、詳細なプロトコル仕様、厳密なセキュリティ分析を提供する。実装には、リソース制約のあるデバイスに対する最適化技術と、著名なブロックチェーンプラットフォームとの統合戦略が組み込まれています。大規模な実験的評価は,既存手法,特に大規模展開におけるアプローチの優位性を実証している。その結果, 証明サイズ, 生成時間, 検証効率が有意に向上した。提案されたOR集約技術は、ブロックチェーンベースのIoTアプリケーションにおける設定メンバシップ検証のためのスケーラブルでプライバシ保護のソリューションを提供し、現在のアプローチの重要な制限に対処する。私たちの研究は、大規模センサーネットワークにおける効率的でセキュアなデータ管理の進歩に寄与し、IoTエコシステムにおけるブロックチェーンテクノロジの広範な採用の道を開いたのです。 Blockchain-based sensor networks offer promising solutions for secure and transparent data management in IoT ecosystems. However, efficient set membership proofs remain a critical challenge, particularly in resource-constrained environments. This paper introduces a novel OR-aggregation approach for zero-knowledge set membership proofs, tailored specifically for blockchain-based sensor networks. We provide a comprehensive theoretical foundation, detailed protocol specification, and rigorous security analysis. Our implementation incorporates optimization techniques for resource-constrained devices and strategies for integration with prominent blockchain platforms. Extensive experimental evaluation demonstrates the superiority of our approach over existing methods, particularly for large-scale deployments. Results show significant improvements in proof size, generation time, and verification efficiency. The proposed OR-aggregation technique offers a scalable and privacy-preserving solution for set membership verification in blockchain-based IoT applications, addressing key limitations of current approaches. Our work contributes to the advancement of efficient and secure data management in large-scale sensor networks, paving the way for wider adoption of blockchain technology in IoT ecosystems.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# Max-SATのための資源制約型ヒューリスティック Resource-Constrained Heuristic for Max-SAT ( http://arxiv.org/abs/2410.09173v1 ) ライセンス: Link先を確認	Brian Matejek, Daniel Elenius, Cale Gentry, David Stoker, Adam Cobb,	(参考訳) 資源制約のあるMax-SATのインスタンスに対するヒューリスティックを提案し、最適化されたソルバとハードウェアで解けるような、より小さなサブコンポーネントに繰り返し分解する。制約のない外ループは与えられた問題の状態空間を維持し、前の呼び出しとは無関係に最適化するためにSAT変数のサブセットを選択する。リソース制約された内部ループは、"sub-SAT"問題における満足できる節の数を最大化する。我々の外ループは内部ループのメカニズムに非依存であり、最適化ステップに従来の解法を使用することができる。しかし、選択した"sub-SAT"問題を2次非制約バイナリ最適化(QUBO)に変換して、特別なハードウェアで最適化することもできる。分解前にSATインスタンスをQUBOに変換する既存のソリューションとは対照的に、QUBO最適化の前にSAT変数のサブセットを選択する。本研究では,所定のSATインスタンスの構造を利用するグラフベースの新しい手法を含む,変数選択手法の集合を分析する。部分SAT問題を符号化するために必要なQUBO変数の数が異なるため、固定サイズのQUBOソルバに適合するサブSAT問題のサイズを予測するモデルも学習する。我々は,ランダムに生成されたMax-SATインスタンスと,Max-SAT評価ベンチマークによる実例と,既存のQUBOデコンポザソリューションよりも優れた実例について実証実験を行った。 We propose a resource-constrained heuristic for instances of Max-SAT that iteratively decomposes a larger problem into smaller subcomponents that can be solved by optimized solvers and hardware. The unconstrained outer loop maintains the state space of a given problem and selects a subset of the SAT variables for optimization independent of previous calls. The resource-constrained inner loop maximizes the number of satisfiable clauses in the "sub-SAT" problem. Our outer loop is agnostic to the mechanisms of the inner loop, allowing for the use of traditional solvers for the optimization step. However, we can also transform the selected "sub-SAT" problem into a quadratic unconstrained binary optimization (QUBO) one and use specialized hardware for optimization. In contrast to existing solutions that convert a SAT instance into a QUBO one before decomposition, we choose a subset of the SAT variables before QUBO optimization. We analyze a set of variable selection methods, including a novel graph-based method that exploits the structure of a given SAT instance. The number of QUBO variables needed to encode a (sub-)SAT problem varies, so we additionally learn a model that predicts the size of sub-SAT problems that will fit a fixed-size QUBO solver. We empirically demonstrate our results on a set of randomly generated Max-SAT instances as well as real world examples from the Max-SAT evaluation benchmarks and outperform existing QUBO decomposer solutions.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# Few-Shot Learningを用いたコンテキスト認識型SQLエラー訂正 - NLQ, エラー, およびSQL類似性に基づく新しいアプローチ Context-Aware SQL Error Correction Using Few-Shot Learning -- A Novel Approach Based on NLQ, Error, and SQL Similarity ( http://arxiv.org/abs/2410.09174v1 ) ライセンス: Link先を確認	Divyansh Jain, Eric Yang,	(参考訳) 近年、様々なアプリケーションで効率的なデータクエリの必要性から、SQLの自動生成の需要が大幅に増加した。しかし、自然言語入力の複雑さと可変性のため、正確なSQLクエリを生成することは依然として課題である。本稿では, 与えられた自然言語問題(NLQ)に対して, 最適な誤り訂正例を選択することにより, 生成したクエリの精度を向上させることを目的とした, SQL生成における誤り訂正のための新しい数ショット学習手法を提案する。オープンソースのGretelデータセットを用いて行った実験で、提案モデルは、誤り訂正のないベースラインアプローチによる修正誤差が39.2%増加し、単純な誤り訂正法から10%増加した。提案手法では, 組込み型類似度測定を用いて, 複数例のレポジトリから最も近いマッチングを識別する。それぞれの例は、不正なSQLクエリ、結果のエラー、正しいSQLクエリ、不正なクエリを正しいクエリに変換するための詳細なステップで構成されている。この手法を用いることで,新たに生成されたSQLクエリの誤り訂正を効果的に導くことができる。提案手法は,誤りの識別と修正を容易にする文脈関連例を提供することにより,SQL生成精度を大幅に向上することを示す。実験結果は、数ショットの学習プロセスを強化するための埋め込みベースの選択の有効性を強調し、より正確で信頼性の高いSQLクエリ生成につながった。この研究は、エラー訂正のための堅牢なフレームワークを提供し、より先進的でユーザフレンドリなデータベースインタラクションツールの道を開くことによって、自動SQL生成の分野に貢献する。 In recent years, the demand for automated SQL generation has increased significantly, driven by the need for efficient data querying in various applications. However, generating accurate SQL queries remains a challenge due to the complexity and variability of natural language inputs. This paper introduces a novel few-shot learning-based approach for error correction in SQL generation, enhancing the accuracy of generated queries by selecting the most suitable few-shot error correction examples for a given natural language question (NLQ). In our experiments with the open-source Gretel dataset, the proposed model offers a 39.2% increase in fixing errors from the baseline approach with no error correction and a 10% increase from a simple error correction method. The proposed technique leverages embedding-based similarity measures to identify the closest matches from a repository of few-shot examples. Each example comprises an incorrect SQL query, the resulting error, the correct SQL query, and detailed steps to transform the incorrect query into the correct one. By employing this method, the system can effectively guide the correction of errors in newly generated SQL queries. Our approach demonstrates significant improvements in SQL generation accuracy by providing contextually relevant examples that facilitate error identification and correction. The experimental results highlight the effectiveness of embedding-based selection in enhancing the few-shot learning process, leading to more precise and reliable SQL query generation. This research contributes to the field of automated SQL generation by offering a robust framework for error correction, paving the way for more advanced and user-friendly database interaction tools.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# Few-Shot分類モデルのクロスドメイン評価:自然画像と病理像の比較 Cross-Domain Evaluation of Few-Shot Classification Models: Natural Images vs. Histopathological Images ( http://arxiv.org/abs/2410.09176v1 ) ライセンス: Link先を確認	Ardhendu Sekhar, Aditya Bhattacharya, Vinayak Goyal, Vrinda Goel, Aditya Bhangale, Ravi Kant Gupta, Amit Sethi,	(参考訳) 本研究では,異なる領域,特に自然像と病理像を対象とする数ショット分類モデルの性能について検討した。まず、自然画像のいくつかの分類モデルを訓練し、その性能を病理画像で評価する。その後、病理画像上で同じモデルを訓練し、それらの性能を比較した。我々は,4つの病理組織データセットと1つの自然画像データセットを組み込み,最先端の分類手法を用いて,5ウェイ1ショット,5ウェイ5ショット,5ウェイ10ショットのシナリオでの性能評価を行った。実験結果から,多様な画像領域間の少数ショット分類モデルの転送可能性および一般化能力に関する知見が得られた。我々は、これらのモデルが新しいドメインに適応する際の長所と短所を分析し、クロスドメインシナリオにおけるパフォーマンスを最適化するための推奨事項を提供する。本研究は,多様な領域にまたがる画像分類の文脈において,少数ショット学習の理解を深めることに寄与する。 In this study, we investigate the performance of few-shot classification models across different domains, specifically natural images and histopathological images. We first train several few-shot classification models on natural images and evaluate their performance on histopathological images. Subsequently, we train the same models on histopathological images and compare their performance. We incorporated four histopathology datasets and one natural images dataset and assessed performance across 5-way 1-shot, 5-way 5-shot, and 5-way 10-shot scenarios using a selection of state-of-the-art classification techniques. Our experimental results reveal insights into the transferability and generalization capabilities of few-shot classification models between diverse image domains. We analyze the strengths and limitations of these models in adapting to new domains and provide recommendations for optimizing their performance in cross-domain scenarios. This research contributes to advancing our understanding of few-shot learning in the context of image classification across diverse domains.	翻訳日:2024-10-30 16:03:11 公開日:2024-10-11
# 高次元シュレーディンガー・ハミルトニアンのためのニューラルネットワークに基づくアルゴリズムの接続 From {\tt Ferminet} to PINN. Connections between neural network-based algorithms for high-dimensional Schrödinger Hamiltonian ( http://arxiv.org/abs/2410.09177v1 ) ライセンス: Link先を確認	Mashhood Khan, Emmanuel Lorin,	(参考訳) 本稿では、PDEの標準的な(データ駆動)ニューラルネットワークベースの解法と、応用数学と工学のコミュニティ(例えば、ディープ・リッツと物理インフォームドニューラルネットワーク(PINN))の一方で開発された固有値問題と、量子化学(例えば、変分モンテカルロアルゴリズム, {\tt Ferminet} や {\tt Paulinet})との接続を確立する。特に、ニューラルネットワークに基づく変分モンテカルロ(e g {\tt Ferminet} )によって初期化された標準拡散モンテカルロアルゴリズムの解に対応するデータを用いて、PINNアルゴリズムを {\it fit} 問題として再定式化する。最適化アルゴリズムのレベルでの接続も確立されている。 In this note, we establish some connections between standard (data-driven) neural network-based solvers for PDE and eigenvalue problems developed on one side in the applied mathematics and engineering communities (e.g. Deep-Ritz and Physics Informed Neural Networks (PINN)), and on the other side in quantum chemistry (e.g. Variational Monte Carlo algorithms, {\tt Ferminet} or {\tt Paulinet}). In particular, we re-formulate a PINN algorithm as a {\it fitting} problem with data corresponding to the solution to a standard Diffusion Monte Carlo algorithm initialized thanks to a neural network-based Variational Monte Carlo (e.g. {\tt Ferminet}). Connections at the level of the optimization algorithms are also established.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# メカヘルツ繰り返し速度における合成二色場駆動固体からのアト秒パルス Attosecond pulses from a solid driven by a synthesized two-color field at megahertz repetition rate ( http://arxiv.org/abs/2410.09178v1 ) ライセンス: Link先を確認	Zhaopin Chen, Mark Levit, Yuval Kern, Basabendra Roy, Adi Goldner, Michael Krüger,	(参考訳) 顕微鏡レベルでの光-物質相互作用におけるコヒーレント量子力学の探索には、ポンプ-プローブ実験において高繰り返しレートのアト秒パルス(IAP)が必要となる。 IAPの生成は主にキロヘルツ体制に限られている。本研究では,800nmおよび2000nmのフェムト秒パルスの合成場によって駆動される広帯域誘電体MgOにおける極超紫外(XUV)高調波のアト秒制御を,相対位相安定性で実験的に達成する。 16.5eVの光子エネルギーを中心とする約9eVのスペクトル幅を持つ準連続高調波は、2色相で調整することができ、3バンド半導体ブロッホ方程式に基づく数値シミュレーションによって確認されたIAP(約730 attoseconds)の生成をサポートする。高繰り返しドライバレーザーと固体高調波発生の適度な強度要求を利用して、我々は前例のないメガヘルツ繰り返し速度でIAPを生産し、全固体のコンパクトXUV光源をIAP生成に活用する。 Probing coherent quantum dynamics in light-matter interactions at the microscopic level requires high-repetition-rate isolated attosecond pulses (IAPs) in pump-probe experiments. To date, the generation of IAPs has been mainly limited to the kilohertz regime. In this work, we experimentally achieve attosecond control of extreme-ultraviolet (XUV) high harmonics in the wide-bandgap dielectric MgO, driven by a synthesized field of two femtosecond pulses at 800nm and 2000nm with relative phase stability. The resulting quasi-continuous harmonic plateau with ~ 9 eV spectral width centered around 16.5 eV photon energy can be tuned by the two-color phase and supports the generation of an IAP (~ 730 attoseconds), confirmed by numerical simulation based on three-band semiconductor Bloch equations. Leveraging the high-repetition-rate driver laser and the moderate intensity requirements of solid-state high-harmonic generation, we achieve IAP production at an unprecedented megahertz repetition rate, paving the way for all-solid compact XUV sources for IAP generation.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 大きな言語モデルはガス灯になり得るか? Can a large language model be a gaslighter? ( http://arxiv.org/abs/2410.09181v1 ) ライセンス: Link先を確認	Wei Li, Luyao Zhu, Yang Song, Ruixi Lin, Rui Mao, Yang You,	(参考訳) 大きな言語モデル(LLM)は、その能力と有用性により、人間の信頼を得ています。しかし、このことはLLMが言語を操作することによってユーザの考え方に影響を与える可能性がある。ガスライティング(gaslighting)は、心理学的な効果である。本研究では,高速かつ微調整型ガス灯攻撃によるLSMの脆弱性について検討する。そこで我々は,2段階のDeepCoGフレームワークを提案する。 1)提案したDeepGaslightingプロンプトテンプレートを用いたLCMからのガス照準計画 2) LLM からガスライティングの会話をチェイン・オブ・ガスライティング法により取得する。ガスライティングの会話データセットとそれに対応する安全なデータセットを、オープンソースのLLMに対する微調整ベースの攻撃と、これらのLLMに対する対ガスライティング安全アライメントに適用する。実験では、高速ベースと微調整ベースの攻撃の両方が、3つのオープンソースのLCMをガス灯に変換することを示した。対照的に,LLMの安全ガードレールを強化するために3つの安全アライメント戦略を推進した。我々の安全アライメント戦略はLLMの実用性に最小限の影響を与える。実験的な研究は、LLMが一般的な危険なクエリの有害性テストに合格したとしても、潜在的なガス灯である可能性を示唆している。 Large language models (LLMs) have gained human trust due to their capabilities and helpfulness. However, this in turn may allow LLMs to affect users' mindsets by manipulating language. It is termed as gaslighting, a psychological effect. In this work, we aim to investigate the vulnerability of LLMs under prompt-based and fine-tuning-based gaslighting attacks. Therefore, we propose a two-stage framework DeepCoG designed to: 1) elicit gaslighting plans from LLMs with the proposed DeepGaslighting prompting template, and 2) acquire gaslighting conversations from LLMs through our Chain-of-Gaslighting method. The gaslighting conversation dataset along with a corresponding safe dataset is applied to fine-tuning-based attacks on open-source LLMs and anti-gaslighting safety alignment on these LLMs. Experiments demonstrate that both prompt-based and fine-tuning-based attacks transform three open-source LLMs into gaslighters. In contrast, we advanced three safety alignment strategies to strengthen (by 12.05%) the safety guardrail of LLMs. Our safety alignment strategies have minimal impacts on the utility of LLMs. Empirical studies indicate that an LLM may be a potential gaslighter, even if it passed the harmfulness test on general dangerous queries.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 変分不等式の低モノトン類について On the Hypomonotone Class of Variational Inequalities ( http://arxiv.org/abs/2410.09182v1 ) ライセンス: Link先を確認	Khaled Alomar, Tatjana Chavdarova,	(参考訳) 本稿では,古典的な単調設定を超えた問題群である低単調演算子に適用した場合の過次アルゴリズムの挙動について検討する。過次法はモノトンおよびリプシッツ連続作用素による変分不等式を解く効果で広く知られているが、低モノトン設定ではその収束が保証されないことを示す。次数次アルゴリズムが収束しない条件を特定するための評価定理を提供する。以上の結果から, 外部段階の収束を保証し, より広範な問題に対する既存のVI法をさらに発展させる上で, より強い仮定の必要性が浮き彫りにされた。 This paper studies the behavior of the extragradient algorithm when applied to hypomonotone operators, a class of problems that extends beyond the classical monotone setting. While the extragradient method is widely known for its efficacy in solving variational inequalities with monotone and Lipschitz continuous operators, we demonstrate that its convergence is not guaranteed in the hypomonotone setting. We provide a characterization theorem that identifies the conditions under which the extragradient algorithm fails to converge. Our results highlight the necessity of stronger assumptions to guarantee convergence of extragradient and to further develop the existing VI methods for broader problems.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# L3Cube-MahaSum:マラタイにおける抽象テキスト要約のための包括的データセットとBARTモデル L3Cube-MahaSum: A Comprehensive Dataset and BART Models for Abstractive Text Summarization in Marathi ( http://arxiv.org/abs/2410.09184v1 ) ライセンス: Link先を確認	Pranita Deshmukh, Nikita Kulkarni, Sanhita Kulkarni, Kareena Manghani, Raviraj Joshi,	(参考訳) 本稿では,マラタイにおける多種多様なニュース記事の大規模収集であるMahaSUMデータセットについて述べる。 25kのサンプルを含むデータセットは、広範囲のオンラインニュースソースから記事を取り除き、抽象的な要約を手作業で検証することで作成されました。さらに、MahaSUMデータセットを使用して、Indic言語用に調整されたBARTモデルの変種であるIndicBARTモデルをトレーニングする。抽象的な要約作業において,訓練されたモデルの性能を評価し,マラタイにおける高品質な要約を作成する上での有効性を実証した。本研究は,Indic言語における自然言語処理研究の進展に寄与し,最先端のモデルを用いた今後の研究に有用な資源を提供する。データセットとモデルはhttps://github.com/l3cube-pune/MarathiNLPで公開されています。 We present the MahaSUM dataset, a large-scale collection of diverse news articles in Marathi, designed to facilitate the training and evaluation of models for abstractive summarization tasks in Indic languages. The dataset, containing 25k samples, was created by scraping articles from a wide range of online news sources and manually verifying the abstract summaries. Additionally, we train an IndicBART model, a variant of the BART model tailored for Indic languages, using the MahaSUM dataset. We evaluate the performance of our trained models on the task of abstractive summarization and demonstrate their effectiveness in producing high-quality summaries in Marathi. Our work contributes to the advancement of natural language processing research in Indic languages and provides a valuable resource for future research in this area using state-of-the-art models. The dataset and models are shared publicly at https://github.com/l3cube-pune/MarathiNLP	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 非エルミート準周期格子における回折と擬スペクトル Diffraction and pseudospectra in non-Hermitian quasiperiodic lattices ( http://arxiv.org/abs/2410.09185v1 ) ライセンス: Link先を確認	Ananya Ghatak, Dimitrios H. Kaltsas, Manas Kulkarni, Konstantinos G. Makris,	(参考訳) 乱れたオープンメディアにおける波動力学は興味深い話題であり、近年は非エルミート物理学、特にフォトニクスにおいて多くの注目を集めている。実際、利得と損失要素の空間分布は、統合フォトニック導波路アレイの文脈で物理的に可能である。特に、これらの格子では、伝播方向に沿った反直観的量子化ジャンプが強い障害限界(すべての固有状態が局在している)に現れ、最近実験的に観察されている。我々は,非エルミート準周期オーブリー・アンドルー・ハーパーモデルとオンサイトゲイン・アンド・ロス分布(NHAAH)について,疑似スペクトル分析に基づくスペクトル感度に着目して体系的に検討した。さらに, 回折力学と量子化ジャンプ, 飽和非線形性の影響について詳細に検討した。本研究は, 非線形性と非ハーモニティの複雑な関係を明らかにする。 Wave dynamics in disordered open media is an intriguing topic, and has lately attracted a lot of attention in non-Hermitian physics, especially in photonics. In fact, spatial distributions of gain and loss elements are physically possible in the context of integrated photonic waveguide arrays. In particular, in these type of lattices, counter-intuitive quantized jumps along the propagation direction appear in the strong disorder limit (where all eigenstates are localized) and they have also been recently experimentally observed. We systematically study the non-Hermitian quasiperiodic Aubry-Andr\'e-Harper model with on-site gain and loss distribution (NHAAH), with an emphasis on the spectral sensitivity based on pseudospectra analysis. Moreover, diffraction dynamics and the quantized jumps, as well as, the effect of saturable nonlinearity, are investigated in detail. Our study reveals the intricate relation between the nonlinearity and non-Hermiticity.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 簡単な学習アルゴリズム Learning Algorithms Made Simple ( http://arxiv.org/abs/2410.09186v1 ) ライセンス: Link先を確認	Noorbakhsh Amiri Golilarz, Elias Hossain, Abdoljalil Addeh, Keyan Alexander Rahimi,	(参考訳) 本稿では、学習アルゴリズムと、重要なパターンや特徴を分かりやすく理解しやすい方法で識別する訓練を含む、様々なタイプのアプリケーションにおけるそれらの重要性について論じる。人工知能(AI)、機械学習(ML)、ディープラーニング(DL)、ハイブリッドモデルの主な概念についてレビューする。本稿では,教師付き,教師なし,強化学習などの機械学習アルゴリズムの重要な部分についても論じる。これらのテクニックは、予測、分類、セグメンテーションといった重要なタスクに使用できる。畳み込みニューラルネットワーク(CNN)は、画像処理やビデオ処理など多くの用途に使われている。 CNNのアーキテクチャと、CNNをMLアルゴリズムに統合してハイブリッドモデルを構築する方法について検討する。本稿では,学習アルゴリズムのノイズに対する脆弱性について検討し,誤分類につながる。さらに,大規模言語モデル(LLM)と学習アルゴリズムの統合を議論し,大量のデータから重要なパターンを学習することで,医療,マーケティング,金融など多くの分野に適用可能な一貫性のある応答を生成する。さらに、次世代の学習アルゴリズムと、重要なタスクを実行するためにAdaptive and Dynamic Networkを統一する方法について論じる。全体として、本記事では、学習アルゴリズムの概要、現状、応用、今後の方向性について概説する。 In this paper, we discuss learning algorithms and their importance in different types of applications which includes training to identify important patterns and features in a straightforward, easy-to-understand manner. We will review the main concepts of artificial intelligence (AI), machine learning (ML), deep learning (DL), and hybrid models. Some important subsets of Machine Learning algorithms such as supervised, unsupervised, and reinforcement learning are also discussed in this paper. These techniques can be used for some important tasks like prediction, classification, and segmentation. Convolutional Neural Networks (CNNs) are used for image and video processing and many more applications. We dive into the architecture of CNNs and how to integrate CNNs with ML algorithms to build hybrid models. This paper explores the vulnerability of learning algorithms to noise, leading to misclassification. We further discuss the integration of learning algorithms with Large Language Models (LLM) to generate coherent responses applicable to many domains such as healthcare, marketing, and finance by learning important patterns from large volumes of data. Furthermore, we discuss the next generation of learning algorithms and how we may have an unified Adaptive and Dynamic Network to perform important tasks. Overall, this article provides brief overview of learning algorithms, exploring their current state, applications and future direction.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 再訓練に要する時間 : 機械学習システムにおける概念ドリフトの検出 Time to Retrain? Detecting Concept Drifts in Machine Learning Systems ( http://arxiv.org/abs/2410.09190v1 ) ライセンス: Link先を確認	Tri Minh Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang,	(参考訳) 機械学習(ML)技術のブームにより、ソフトウェア実践者は、AIOpsの障害予測など、さまざまなソフトウェアエンジニアリングタスクのための大量のストリーミングデータを処理するために、MLシステムを構築する。過去のデータを用いてトレーニングされたMLモデルは、概念の漂流に起因するパフォーマンス劣化、すなわち、トレーニングと生産の間のデータと相互関係(概念)の変化に遭遇する。コンセプトリフット検出を使用して、デプロイされたMLモデルを監視し、必要に応じてMLモデルを再トレーニングすることが不可欠である。本研究では,産業環境下での合成および実世界のデータセットに対する最新技術(SOTA)の概念ドリフト検出技術の適用について検討する。このような産業環境では、MLモデルアーキテクチャにおけるラベル付けと最大一般化に最小限の手作業が必要となる。現在のSOTA半教師付き手法は,ラベル付けに多大な労力を要するだけでなく,特定の種類のMLモデルに対してのみ有効であることがわかった。このような制約を克服するために,概念ドリフトを検出する新しいモデル非依存手法 (CDSeer) を提案する。評価の結果,CDSeerの精度とリコールは,手動ラベリングを著しく低減しつつ,最先端技術よりも優れていることがわかった。異なるドメインとユースケースの8つのデータセット上でCDSeerを評価することにより,概念ドリフト検出におけるCDSeerの有効性を実証する。工業用プロプライエタリデータセットへのCDSeerの内部展開の結果は、SOTAの概念ドリフト検出法と比較して99%少ないラベルを使用しながら、57.1%の精度の向上を示している。この性能は、100%データをラベル付けする必要がある教師付きコンセプトドリフト検出法に匹敵する。 CDSeerのパフォーマンス向上と導入の容易さは、MLシステムをより信頼性の高いものにする上で価値がある。 With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and production. It is essential to use concept rift detection to monitor the deployed ML models and re-train the ML models when needed. In this work, we explore applying state-of-the-art (SOTA) concept drift detection techniques on synthetic and real-world datasets in an industrial setting. Such an industrial setting requires minimal manual effort in labeling and maximal generality in ML model architecture. We find that current SOTA semi-supervised methods not only require significant labeling effort but also only work for certain types of ML models. To overcome such limitations, we propose a novel model-agnostic technique (CDSeer) for detecting concept drift. Our evaluation shows that CDSeer has better precision and recall compared to the state-of-the-art while requiring significantly less manual labeling. We demonstrate the effectiveness of CDSeer at concept drift detection by evaluating it on eight datasets from different domains and use cases. Results from internal deployment of CDSeer on an industrial proprietary dataset show a 57.1% improvement in precision while using 99% fewer labels compared to the SOTA concept drift detection method. The performance is also comparable to the supervised concept drift detection method, which requires 100% of the data to be labeled. The improved performance and ease of adoption of CDSeer are valuable in making ML systems more reliable.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 未知のテスト:ランダムプログラム生成によるOpenMPテストのためのフレームワーク Testing the Unknown: A Framework for OpenMP Testing via Random Program Generation ( http://arxiv.org/abs/2410.09191v1 ) ライセンス: Link先を確認	Ignacio Laguna, Patrick Chapman, Konstantinos Parasyris, Giorgis Georgakoudis, Cindy Rubio-González,	(参考訳) OpenMP の実装をテストするためのランダム化差分テスト手法を提案する。数十の検証と検証テストを手作業で作成する以前の作業とは対照的に,我々のアプローチでは数千のテストをランダムに生成することができ,OpenMPの実装を幅広いプログラム動作に公開しています。文法を用いてランダムなOpenMPテストの空間を表現し、Varityプログラムジェネレータの拡張として実装する。 1,800のOpenMPテストを生成することで、GCC、Clang、Intelの3つのOpenMP実装に適用する際に、さまざまなパフォーマンス異常と正当性の問題を見つけます。また、異常を解析し、我々のアプローチが生み出すテストのクラスの詳細を示すいくつかのケーススタディも提示する。 We present a randomized differential testing approach to test OpenMP implementations. In contrast to previous work that manually creates dozens of verification and validation tests, our approach is able to randomly generate thousands of tests, exposing OpenMP implementations to a wide range of program behaviors. We represent the space of possible random OpenMP tests using a grammar and implement our method as an extension of the Varity program generator. By generating 1,800 OpenMP tests, we find various performance anomalies and correctness issues when we apply it to three OpenMP implementations: GCC, Clang, and Intel. We also present several case studies that analyze the anomalies and give more details about the classes of tests that our approach creates.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# Marathi文書における長距離固有値認識 Long Range Named Entity Recognition for Marathi Documents ( http://arxiv.org/abs/2410.09192v1 ) ライセンス: Link先を確認	Pranita Deshmukh, Nikita Kulkarni, Sanhita Kulkarni, Kareena Manghani, Geetanjali Kale, Raviraj Joshi,	(参考訳) 高度自然言語処理(NLP)手法,特に名前付きエンティティ認識(NER)の需要は,マラチ語デジタルコンテンツの指数関数的増加により増大している。特に、NERは、遠隔のエンティティを認識し、非構造化のMarathiテキストデータをアレンジし、理解するために不可欠である。本稿では, 長距離エンティティの管理に重点を置いて, マラソン文書用に設計されている現在のNER技術について, 包括的分析を行う。 BERTトランスフォーマーモデルによる長距離マラソンNERの可能性を調査し、現在のプラクティスを掘り下げる。従来手法の有効性を分析するとともに、英文学におけるNERの比較とマラタイ文学への適応戦略を提案する。本稿は,NERがNLPにおいて重要な役割を担っていることを認めつつ,マラティの特定の言語的特徴と文脈的微妙さによって引き起こされる困難について論じる。結論として、このプロジェクトはMarathi NER技術を改善するための大きな一歩であり、さまざまなNLPタスクやドメインにわたる潜在的に広範なアプリケーションである。 The demand for sophisticated natural language processing (NLP) methods, particularly Named Entity Recognition (NER), has increased due to the exponential growth of Marathi-language digital content. In particular, NER is essential for recognizing distant entities and for arranging and understanding unstructured Marathi text data. With an emphasis on managing long-range entities, this paper offers a comprehensive analysis of current NER techniques designed for Marathi documents. It dives into current practices and investigates the BERT transformer model's potential for long-range Marathi NER. Along with analyzing the effectiveness of earlier methods, the report draws comparisons between NER in English literature and suggests adaptation strategies for Marathi literature. The paper discusses the difficulties caused by Marathi's particular linguistic traits and contextual subtleties while acknowledging NER's critical role in NLP. To conclude, this project is a major step forward in improving Marathi NER techniques, with potential wider applications across a range of NLP tasks and domains.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 合成学生:大規模言語モデルとコンピュータ学生のバグ分布の比較研究 Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students ( http://arxiv.org/abs/2410.09193v1 ) ライセンス: Link先を確認	Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hellas, Xandria Crosland,	(参考訳) 大規模言語モデル(LLM)は、合成教室データを生成するエキサイティングな機会を提供する。このようなデータには、典型的なエラーの分布を含むコード、教育ツールを開発する際のコールドスタート問題に対処するためのシミュレーションされた学生の振る舞い、プライバシー上の理由から認証データへのアクセスが制限された際のユーザデータが含まれる。本研究では,LLMが生成するバグの分布を,コンピュータ学生が生成するバグと対比した比較研究を行う。学生が生成するバグの大規模解析から得られた2つのデータを利用して,コードにエラーを注入するよう促すと,学生のバグに類似したバグパターンを示すためにLLMをコークスできるかどうかを検討する。以上の結果から,LLMは有意な誤差分布を生成せず,実際の学生が生成する可能性は低いことが示唆された。しかし、一般的な誤りや典型周波数の記述を含むガイダンスにより、LLMをシェパードして合成符号の現実的な誤差分布を生成することができる。 Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to privacy reasons. In this research paper, we conduct a comparative study examining the distribution of bugs generated by LLMs in contrast to those produced by computing students. Leveraging data from two previous large-scale analyses of student-generated bugs, we investigate whether LLMs can be coaxed to exhibit bug patterns that are similar to authentic student bugs when prompted to inject errors into code. The results suggest that unguided, LLMs do not generate plausible error distributions, and many of the generated errors are unlikely to be generated by real students. However, with guidance including descriptions of common errors and typical frequencies, LLMs can be shepherded to generate realistic distributions of errors in synthetic code.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# IoTシステムにおけるAIセキュリティとサイバーリスク AI security and cyber risk in IoT systems ( http://arxiv.org/abs/2410.09194v1 ) ライセンス: Link先を確認	Petar Radanliev, David De Roure, Carsten Maple, Jason R. C. Nurse, Razvan Nicolescu, Uchenna Ani,	(参考訳) データ戦略における現在の課題の文脈に合わせた依存性モデルを提示し、サイバーセキュリティコミュニティのためのレコメンデーションを行う。このモデルは、サイバーリスク評価と一般的なリスク影響評価に使用できる。 We present a dependency model tailored to the context of current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment.	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# 参照集合によるスケーラブルなシグナチャベース分布回帰 Scalable Signature-Based Distribution Regression via Reference Sets ( http://arxiv.org/abs/2410.09196v1 ) ライセンス: Link先を確認	Andrew Alden, Carmine Ventre, Blanka Horvath,	(参考訳) 確率過程における分布回帰(DR)は時系列の集合における回帰の学習タスクを記述する。確率解析で一般的なパスシグネチャは、DR問題を解くために使われてきた。近年の研究では、シグネチャベースの機能を通じて、経路に符号化された情報を活用できることが実証されている。しかし、現在の最先端のDRソリューションはメモリ集約的で計算コストが高い。これは、経路の長さと考慮される経路の数の間のトレードオフにつながる。この計算ボトルネックは、アプリケーションを小さなサンプルサイズに制限し、その結果、推定の不確実性をもたらす。本稿では,これらの課題に対処するための方法論を提案する。また,様々な学習タスクにDRを使用できるパイプラインを提案しながら,推定の不確実性を解消する。我々のアプローチと一体化しているのは、我々の新しい距離近似器である。これにより、さまざまなアプリケーションドメイン、サンプリングレート、確率的プロセスディメンションに対して、シームレスに方法論を適用することができます。本モデルは,推定理論,定量的ファイナンス,物理科学などの応用において良好に機能することを示す。我々のモデルは、与えられた分布内では見つからないデータだけでなく、目に見えないレジーム(確率モデルの目に見えないクラス)の下でもうまく一般化することを示した。 Distribution Regression (DR) on stochastic processes describes the learning task of regression on collections of time series. Path signatures, a technique prevalent in stochastic analysis, have been used to solve the DR problem. Recent works have demonstrated the ability of such solutions to leverage the information encoded in paths via signature-based features. However, current state of the art DR solutions are memory intensive and incur a high computation cost. This leads to a trade-off between path length and the number of paths considered. This computational bottleneck limits the application to small sample sizes which consequently introduces estimation uncertainty. In this paper, we present a methodology for addressing the above issues; resolving estimation uncertainties whilst also proposing a pipeline that enables us to use DR for a wide variety of learning tasks. Integral to our approach is our novel distance approximator. This allows us to seamlessly apply our methodology across different application domains, sampling rates, and stochastic process dimensions. We show that our model performs well in applications related to estimation theory, quantitative finance, and physical sciences. We demonstrate that our model generalises well, not only to unseen data within a given distribution, but also under unseen regimes (unseen classes of stochastic models).	翻訳日:2024-10-30 15:53:25 公開日:2024-10-11
# EHR時系列データの効率的なコントラスト付き単モーダル事前学習法 An Efficient Contrastive Unimodal Pretraining Method for EHR Time Series Data ( http://arxiv.org/abs/2410.09199v1 ) ライセンス: Link先を確認	Ryan King, Shivesh Kodali, Conrad Krueger, Tianbao Yang, Bobak J. Mortazavi,	(参考訳) 機械学習は、臨床時系列データのモデリングに革命をもたらした。マシンラーニングを使用することで、Deep Neural Network(DNN)を自動的にトレーニングして、必要なタスクに対する入力機能の複雑なマッピングを学習することが可能になる。これはElectronic Health Record(EHR)データベースにおいて特に有用であり、患者は集中治療単位(ICU)に長い期間を費やすことが多い。機械学習は、意味のある情報を抽出する効率的な方法として機能する。しかし、DNNを訓練するための多くの最先端のSOTA(State-of-the-art)手法は、大量のラベル付きデータを要求しており、費用と時間の観点から、クリニックにとって重要な課題となっている。自己教師型学習は、実践者が高価なラベルを必要とせずに、データから貴重な洞察を抽出できるようにすることによって、代替手段を提供する。しかし、現在のSOTA法は、最適な性能を達成するために大規模なデータバッチを必要とすることが多く、計算要求が増加する。これは、長い臨床経過データを扱う場合の課題である。そこで本研究では,長期臨床の時系列データに適したコントラスト前訓練法を提案する。提案手法は, 負対比較のための推定器を用いて, 効果的な特徴抽出を可能にする。我々は,線形評価や半教師あり学習といった標準的な自己指導型タスクを用いて,事前学習の有効性を評価する。さらに, このモデルでは, 臨床医に患者の症状について深い洞察を与えるために, 欠落した測定をインプットする能力を示す。我々の事前学習は, モデルのサイズと測定語彙尺度のサイズの両方において, 優れた性能を達成できることを実証する。最後に、eICUデータセットを用いてMIMIC-IIIデータセットに基づいてトレーニングされたモデルについて、外部から検証する。本モデルは,他の診療所へ転送可能な堅牢な臨床情報を学習できることを実証する。 Machine learning has revolutionized the modeling of clinical timeseries data. Using machine learning, a Deep Neural Network (DNN) can be automatically trained to learn a complex mapping of its input features for a desired task. This is particularly valuable in Electronic Health Record (EHR) databases, where patients often spend extended periods in intensive care units (ICUs). Machine learning serves as an efficient method for extract meaningful information. However, many state-of-the-art (SOTA) methods for training DNNs demand substantial volumes of labeled data, posing significant challenges for clinics in terms of cost and time. Self-supervised learning offers an alternative by allowing practitioners to extract valuable insights from data without the need for costly labels. Yet, current SOTA methods often necessitate large data batches to achieve optimal performance, increasing computational demands. This presents a challenge when working with long clinical timeseries data. To address this, we propose an efficient method of contrastive pretraining tailored for long clinical timeseries data. Our approach utilizes an estimator for negative pair comparison, enabling effective feature extraction. We assess the efficacy of our pretraining using standard self-supervised tasks such as linear evaluation and semi-supervised learning. Additionally, our model demonstrates the ability to impute missing measurements, providing clinicians with deeper insights into patient conditions. We demonstrate that our pretraining is capable of achieving better performance as both the size of the model and the size of the measurement vocabulary scale. Finally, we externally validate our model, trained on the MIMIC-III dataset, using the eICU dataset. We demonstrate that our model is capable of learning robust clinical information that is transferable to other clinics.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# シーケンス変換器を用いた表現としてのエージェント軌道の符号化 Encoding Agent Trajectories as Representations with Sequence Transformers ( http://arxiv.org/abs/2410.09204v1 ) ライセンス: Link先を確認	Athanasios Tsiligkaridis, Nicholas Kalinowski, Zhongheng Li, Elizabeth Hou,	(参考訳) 時空間データは、列内の位置(単語)の順序付け、位置間の長距離依存、複数の意味を持つ位置など、自然言語テキストに類似した多くの課題に直面している。本研究では,高次元時空間軌道を離散的な位置の列として表現し,トランスフォーマーに基づくニューラルネットワークアーキテクチャを用いて符号化する新しいモデルを提案する。言語モデルと同様に、STARE(Sequence Transformer for Agent Representation Encodings)モデルは、トラジェクトリデータにおける表現と構造を、監督タスク(例えば、分類)と自己監督タスク(例えば、マスク付きモデリング)の両方を通して学習することができる。提案手法は,ラベルの識別や位置の類似性など,多くの下流タスクに有用な有意義な符号化を学習できることを示す。これらのエンコーディングを用いて、時空間データに存在するエージェントと位置の関係も学習する。 Spatiotemporal data faces many analogous challenges to natural language text including the ordering of locations (words) in a sequence, long range dependencies between locations, and locations having multiple meanings. In this work, we propose a novel model for representing high dimensional spatiotemporal trajectories as sequences of discrete locations and encoding them with a Transformer-based neural network architecture. Similar to language models, our Sequence Transformer for Agent Representation Encodings (STARE) model can learn representations and structure in trajectory data through both supervisory tasks (e.g., classification), and self-supervisory tasks (e.g., masked modelling). We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings that are useful for many downstream tasks including discriminating between labels and indicating similarity between locations. Using these encodings, we also learn relationships between agents and locations present in spatiotemporal data.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# pyhgf:予測コーディングのためのニューラルネットワークライブラリ pyhgf: A neural network library for predictive coding ( http://arxiv.org/abs/2410.09206v1 ) ライセンス: Link先を確認	Nicolas Legrand, Lilian Weber, Peter Thestrup Waade, Anna Hedvig Møller Daugaard, Mojtaba Khodadadi, Nace Mikuš, Chris Mathys,	(参考訳) ベイズ的認知モデルは、計算神経科学と精神医学において大きな牽引力を得ている。彼らのスコープは人工知能に急速に拡張され、具体化され、適応可能でエネルギー効率の良い自律エージェントをサポートする一般的な推論フレームワークが提供される。この領域の中心的な理論は予測符号化(英語版)であり、学習と行動は感覚入力の原因に関する階層的確率論的推論によって駆動されることを示唆している。生物学的リアリズムは、これらのネットワークが精度重み付き予測と予測誤差という形で単純な局所計算に依存することを制約している。これはこのフレームワークを非常に効率的にしますが、その実装にはソフトウェア開発側でユニークな課題があります。標準的なニューラルネットワークライブラリにそのようなモデルを組み込むことは、これらのライブラリのコンパイルと差別化バックエンドが最適化アルゴリズムと最適化されるシステムの概念的な分離を迫られるため、しばしば制限される。これは、自己監視、自己組織化、細胞成長、機能的可塑性など、他の生物学的原理とは著しく離れている。本稿では,JAX と Rust が支援する Python パッケージである \texttt{pyhgf} を紹介し,予測符号化のための動的ネットワークの作成,操作,サンプリングを行う。ネットワークコンポーネントを透過的でモジュール的で、拡張可能な変数としてメッセージパッシングステップに囲み込むことで、他のフレームワークよりも改善します。結果として得られるグラフは、信念の伝播として任意の計算複雑性を実装することができる。しかし、コア変数の透明性は、自己組織化原則を活用する推論プロセスに変換することもでき、予期せぬ入力へのネットワーク構造適応の結果、構造学習、メタラーニング、因果発見を表現することができる。コード、チュートリアル、ドキュメントは、https://github.com/ilabcode/pyhgf.comでホストされている。 Bayesian models of cognition have gained considerable traction in computational neuroscience and psychiatry. Their scopes are now expected to expand rapidly to artificial intelligence, providing general inference frameworks to support embodied, adaptable, and energy-efficient autonomous agents. A central theory in this domain is predictive coding, which posits that learning and behaviour are driven by hierarchical probabilistic inferences about the causes of sensory inputs. Biological realism constrains these networks to rely on simple local computations in the form of precision-weighted predictions and prediction errors. This can make this framework highly efficient, but its implementation comes with unique challenges on the software development side. Embedding such models in standard neural network libraries often becomes limiting, as these libraries' compilation and differentiation backends can force a conceptual separation between optimization algorithms and the systems being optimized. This critically departs from other biological principles such as self-monitoring, self-organisation, cellular growth and functional plasticity. In this paper, we introduce \texttt{pyhgf}: a Python package backed by JAX and Rust for creating, manipulating and sampling dynamic networks for predictive coding. We improve over other frameworks by enclosing the network components as transparent, modular and malleable variables in the message-passing steps. The resulting graphs can implement arbitrary computational complexities as beliefs propagation. But the transparency of core variables can also translate into inference processes that leverage self-organisation principles, and express structure learning, meta-learning or causal discovery as the consequence of network structural adaptation to surprising inputs. The code, tutorials and documentation are hosted at: https://github.com/ilabcode/pyhgf.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# P-FOLIO: 冗長な人間記述型推論チェーンによる論理的推論の評価と改善 P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains ( http://arxiv.org/abs/2410.09207v1 ) ライセンス: Link先を確認	Simeng Han, Aaron Yu, Rui Shen, Zhenting Qi, Martin Riddell, Wenfei Zhou, Yujie Qiao, Yilun Zhao, Semih Yavuz, Ye Liu, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Dragomir Radev, Rex Ying, Arman Cohan,	(参考訳) 論理的推論におけるLLMの能力を理解するための既存の手法は、モデル能力の適切な調査には不十分な二項包含分類や合成導出論理に依存している。 P-FOLIO(P-FOLIO)は、人間によって書かれた現実的な論理的推論の物語の集合に対して、多種多様な複雑な推論の連鎖からなる人間の注釈付きデータセットである。 P-FOLIOは、人間が一階述語論理推論問題に対してよく構造化された自然言語証明をステップバイステップでアノテートするためのアノテーションプロトコルで収集される。 P-FOLIOの推論ステップの数は0から20までである。さらにP-FOLIOを用いて,大規模言語モデル推論機能の評価と改善を行う。単段階推論規則分類により, LLM推論能力は, 従来よりも多種多様で高い複雑性の推論規則を用いて, 粒度で評価する。 1つのモデル生成推論チェーンが、人間によって注釈付けされたチェーンとは全く異なる経路をたどることを考えると、モデルから複数の推論チェーンをサンプリングし、モデル生成推論チェーンの品質を評価するためにpass@kメトリクスを使用します。そこで本研究では,LLMの論理的推論能力は,多発的なプロンプトと微調整によって著しく向上することを示す。さらに、P-FOLIO上の微調整のLlama3-7Bは、他の3つのドメイン外論理推論データセット上で、モデル性能を10%以上改善する。また、最も強力なLSMが推論において不足していることを示すための詳細な分析も行います。データセットとコードを公開します。 Existing methods on understanding the capabilities of LLMs in logical reasoning rely on binary entailment classification or synthetically derived rationales, which are not sufficient for proper investigation of model's capabilities. We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains for a set of realistic logical reasoning stories also written by humans. P-FOLIO is collected with an annotation protocol that facilitates humans to annotate well-structured natural language proofs for first-order logic reasoning problems in a step-by-step manner. The number of reasoning steps in P-FOLIO span from 0 to 20. We further use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities. We evaluate LLM reasoning capabilities at a fine granularity via single-step inference rule classification, with more diverse inference rules of more diverse and higher levels of complexities than previous works. Given that a single model-generated reasoning chain could take a completely different path than the human-annotated one, we sample multiple reasoning chains from a model and use pass@k metrics for evaluating the quality of model-generated reasoning chains. We show that human-written reasoning chains significantly boost the logical reasoning capabilities of LLMs via many-shot prompting and fine-tuning. Furthermore, fine-tuning Llama3-7B on P-FOLIO improves the model performance by 10% or more on three other out-of-domain logical reasoning datasets. We also conduct detailed analysis to show where most powerful LLMs fall short in reasoning. We will release the dataset and code publicly.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# 量子レジームに近づくオプトメカニカルオシレータの条件運動スクイーズ Conditional Motional Squeezing of an Optomechanical Oscillator Approaching the Quantum Regime ( http://arxiv.org/abs/2410.09208v1 ) ライセンス: Link先を確認	Benjamin B. Lane, Junxin Chen, Ronald E. Pagano, Scott Aronson, Garrett D. Cole, Xinghui Yin, Thomas R. Corbitt, Nergis Mavalvala,	(参考訳) スクイーズド・メカニカル・ステート(Squeezed Mechanical state)は、量子コンピューティングにおける量子センシングと誤り訂正のための貴重なツールであり、基礎物理学のテストのための重要なプラットフォームである。近年,マイクロ波と光学系の両方のパラメトリック相互作用を用いて,固体機械振動子を圧縮状態に調製している。長い間、機械共鳴周波数に匹敵する速い測定速度で、測定中の発振器を量子的に絞った状態にすることができると予測されてきた。数十年の努力にもかかわらず、この単純なプロトコルはまだ量子状態において実証されていない。ここでは,50 ng GaAsカンチレバーを,0点ゆらぎ(1.07 +/minus 0.04倍)以上の条件付き古典的圧縮状態(0.28 +/minus 0.18)dBで,0点ゆらぎ(0点揺らぎ)を,前記録よりも量子状態に近い3桁のオーダーで表すために,後処理技術を用いる。これは、量子圧縮状態におけるマクロ振動子をリアルタイムに測定し、LIGO(Laser Interferometer Gravitational-Wave Observatory)のkgスケールの試験質量の機械システムに適用することができる。 Squeezed mechanical states are a valuable tool for quantum sensing and error correction in quantum computing, and a pivotal platform for tests of fundamental physics. Recently, solid state mechanical oscillators have been prepared in squeezed states using parametric interactions in both the microwave and optical regimes. It has long been predicted that a fast measurement rate comparable to the mechanical resonance frequency can prepare the oscillator under measurement into a quantum squeezed state. Despite decades of effort, this straightforward protocol is yet to be demonstrated in the quantum regime. Here, we use post-processing techniques to demonstrate preparation of a 50 ng GaAs cantilever in a conditional classical squeezed state with a minimum uncertainty (0.28 plus/minus 0.18) dB above (1.07 plus/minus 0.04 times) the zero point fluctuations, 3 orders of magnitude closer to the quantum regime in variance than the previous record. This paves the way to real-time measurement-based preparation of macroscopic oscillators in quantum squeezed states, and can be adapted to mechanical systems as large as the kg-scale test masses of the Laser Interferometer Gravitational-Wave Observatory (LIGO).	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# 超分子相互作用の正確な量子中心シミュレーション Accurate quantum-centric simulations of supramolecular interactions ( http://arxiv.org/abs/2410.09209v1 ) ライセンス: Link先を確認	Danil Kaliakin, Akhil Shajan, Javier Robledo Moreno, Zhen Li, Abhishek Mitra, Mario Motta, Caleb Johnson, Abdullah Ash Saki, Susanta Das, Iskandar Sitdikov, Antonio Mezzacapo, Kenneth M. Merz Jr,	(参考訳) 本稿では,超分子的アプローチによる非共有結合相互作用の量子中心シミュレーションについて述べる。水とメタン二量体のポテンシャルエネルギー面(PES)を,それぞれ親水性と疎水性相互作用を模擬し,サンプルベース量子対角化法(SQD)を用いてシミュレーションした。 27量子ビット回路と36量子ビット回路を用いた量子プロセッサのシミュレーションは、PSSの平衡領域の1kcal/mol以内において、完全アクティブ空間構成相互作用(CASCI)と結合クラスタシングル(doubles)、摂動三重項(CCSD(T))から逸脱する古典的手法と顕著に一致している。最後に、54量子ビットの実験により、疎水性相互作用を捉える量子法の容量限界をテストする。これらの結果は、量子コンピューティングを化学問題に適用し、生物学的、化学的、薬学的な科学に不可欠な複雑なシステムにおける非共有相互作用をより正確にモデル化するための道を開いた。 We present the first quantum-centric simulations of noncovalent interactions using a supramolecular approach. We simulate the potential energy surfaces (PES) of the water and methane dimers, featuring hydrophilic and hydrophobic interactions, respectively, with a sample-based quantum diagonalization (SQD) approach. Our simulations on quantum processors, using 27- and 36-qubit circuits, are in remarkable agreement with classical methods, deviating from complete active space configuration interaction (CASCI) and coupled-cluster singles, doubles, and perturbative triples (CCSD(T)) within 1 kcal/mol in the equilibrium regions of the PES. Finally, we test the capacity limits of the quantum methods for capturing hydrophobic interactions with an experiment on 54 qubits. These results mark significant progress in the application of quantum computing to chemical problems, paving the way for more accurate modeling of noncovalent interactions in complex systems critical to the biological, chemical and pharmaceutical sciences.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11
# 非公開3次元医用画像の分割のためのクロスドメイン分布アライメント Cross-Domain Distribution Alignment for Segmentation of Private Unannotated 3D Medical Images ( http://arxiv.org/abs/2410.09210v1 ) ライセンス: Link先を確認	Ruitong Sun, Mohammad Rostami,	(参考訳) セグメンテーションタスクのための3次元医用画像のマニュアルアノテーションは退屈で時間を要する。さらに、データプライバシは、医療領域でデータアノテーションを実行するためのクラウドソーシングの適用性を制限する。その結果、医用画像セグメンテーションのためのディープニューラルネットワークのトレーニングが困難になる可能性がある。本稿では、この問題を解決するために、新しいソースフリーなUnsupervised Domain Adaptation (UDA) 手法を提案する。我々のアイデアは、ベースモデルにより、関連するソースドメインの内部的に学習された分布を推定し、自己学習によるモデルの洗練を促進するために使用される擬似ラベルを生成することである。我々は,実世界の3D医療データセット上でのSOTA性能を実証した。 Manual annotation of 3D medical images for segmentation tasks is tedious and time-consuming. Moreover, data privacy limits the applicability of crowd sourcing to perform data annotation in medical domains. As a result, training deep neural networks for medical image segmentation can be challenging. We introduce a new source-free Unsupervised Domain Adaptation (UDA) method to address this problem. Our idea is based on estimating the internally learned distribution of a relevant source domain by a base model and then generating pseudo-labels that are used for enhancing the model refinement through self-training. We demonstrate that our approach leads to SOTA performance on a real-world 3D medical dataset.	翻訳日:2024-10-30 15:43:17 公開日:2024-10-11

Title

Authors

Abstract

論文公表日・翻訳日

# Manydepth2:動的シーンにおける動きを考慮した自己スーパービジョン単眼深度推定

Manydepth2: Motion-Aware Self-Supervised Monocular Depth Estimation in Dynamic Scenes ( http://arxiv.org/abs/2312.15268v3 )

ライセンス: Link先を確認

Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham,

(参考訳) 自己監督型単分子深度推定の進歩にもかかわらず、静的世界に関する仮定に依存するため、動的なシナリオでは課題が持続する。本稿では,動的対象と静的背景の両方の正確な深さ推定を実現するために,移動誘導コストボリューム深度ネットであるManddepth2を提案する。動的コンテンツによって引き起こされる課題に対処するために、光学フローと粗い単分子深度を取り入れて、新しい静的参照フレームを作成する。このフレームを使用して、目標フレームと協調してモーションガイド付きコストボリュームを構築する。さらに,ネットワーク構造の精度とレジリエンスを高めるため,様々な解像度で特徴マップからの情報を効果的に統合する注目型ディープネットアーキテクチャを導入する。同様の計算コストの手法と比較して、Multedepth2は、KITTI-2015データセット上での自己教師付き単眼深度推定において、ルート平均二乗誤差を約5%削減する。コードは:https://github.com/kaichen-z/Manydepth2

Despite advancements in self-supervised monocular depth estimation, challenges persist in dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present Manydepth2, a Motion-Guided Cost Volume Depth Net, to achieve precise depth estimation for both dynamic objects and static backgrounds, all while maintaining computational efficiency. To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a novel static reference frame. This frame is then utilized to build a motion-guided cost volume in collaboration with the target frame. Additionally, to enhance the accuracy and resilience of the network structure, we introduce an attention-based depth net architecture to effectively integrate information from feature maps with varying resolutions. Compared to methods with similar computational costs, Manydepth2 achieves a significant reduction of approximately five percent in root-mean-square error for self-supervised monocular depth estimation on the KITTI-2015 dataset. The code could be found: https://github.com/kaichen-z/Manydepth2

翻訳日:2024-11-09 09:05:28 公開日:2024-10-11

Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham,

翻訳日:2024-11-09 09:05:28 公開日:2024-10-11

# AIは人間がより良い判断を下すのに役立つか? 実験的な評価のための方法論的枠組み

Does AI help humans make better decisions? A methodological framework for experimental evaluation ( http://arxiv.org/abs/2403.12108v2 )

ライセンス: Link先を確認

Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin,

(参考訳) 人工知能(AI)、あるいはより一般的にデータ駆動型アルゴリズムの使用は、今日の社会においてユビキタスになりつつある。しかし、多くの場合、特に利害関係が高い場合、人間は最終的な決定を下す。したがって、重要な疑問は、AIが人間単独のシステムやAI単独のシステムと比較して、人間のより良い意思決定を支援するかどうかである。追加の仮定を伴わずにこの問題に実験的に答える新しい方法論的枠組みを導入する。我々は、基準となる潜在的な結果に基づいて、標準分類基準を用いて正しい意思決定を行う意思決定者の能力を測定する。我々は、AI生成レコメンデーションの提供が最終決定を下す場合にランダム化される、単盲の実験的設計を考える。この実験的な設計の下で、人間と人間とAI、AIとAIの3つの代替意思決定システムのパフォーマンスを比較する方法について説明する。また、AIレコメンデーションを備えた人間意思決定者を提供する時期と、そのようなレコメンデーションに従うべき時期も示します。提案手法を,事前リスク評価器のランダム化制御試験から得られたデータに適用する。リスクアセスメントの勧告は、現金保釈を課す裁判官の決定の分類精度を向上しないことがわかった。我々の分析では、リスクアセスメントの判断は、一般的にアルゴリズムによる支援の有無にかかわらず、人間の決定よりも悪い結果が得られます。

The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to experimentally answer this question without additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with humans making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems -- human-alone, human-with-AI, and AI-alone. We also show when to provide a human-decision maker with AI recommendations and when they should follow such recommendations. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that the risk assessment-alone decisions generally perform worse than human decisions with or without algorithmic assistance.

翻訳日:2024-11-09 03:59:24 公開日:2024-10-11

# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性

Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v2 )

ライセンス: Link先を確認

Tian Xie, Xueru Zhang,

(参考訳) 機械学習(ML)モデルは、人間に関する連続的な決定を行うために、社会的ドメインでますます使われているため、データ分散を再形成する能力を持つことが多い。人間は、戦略的エージェントとして、学習システムに反応して継続的に行動に適応する。人口が動的に変化するにつれて、MLシステムは高いパフォーマンスを保証するために頻繁な更新を必要とする可能性がある。しかし、高品質な人名サンプルの取得は、社会的領域において非常に困難であり、不可能である。この問題に対処する一般的なプラクティスは、モデル自体を使用してラベルのないデータサンプルを注釈付けすることです。本稿では,MLモデルが人的戦略応答を組み込んだモデルアノテート標本で再訓練された場合の長期的影響について検討する。まず,戦略エージェントとモデル間の相互作用を形式化し,それらの動的相互作用の下でどのように進化するかを分析する。モデルが再訓練されるにつれて、エージェントは肯定的な決定を受ける傾向が増し、一方、ポジティブなラベルを持つエージェントの割合は、時間とともに減少する可能性がある。そこで本研究では,力学を安定化させる改良されたリトレーニングプロセスを提案する。最後に、これらの再訓練プロセスによってアルゴリズム的公正性がどのように影響するかを検証し、各ラウンドで共通公正性制約を課すことは、長期的には不利なグループにとって利益にならないことを発見した。半合成および実データの実験は理論的な結果を検証する。

As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.

翻訳日:2024-11-09 02:30:11 公開日:2024-10-11

# 戦略的エージェントによるデータアノテーションの自動化:リスクと可能性

Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions ( http://arxiv.org/abs/2405.08027v3 )

ライセンス: Link先を確認

Tian Xie, Xueru Zhang,

翻訳日:2024-11-09 02:30:11 公開日:2024-10-11

# ストリーム拡散政策:可変ノイズ拡散モデルによる高速ポリシー合成

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models ( http://arxiv.org/abs/2406.04806v3 )

ライセンス: Link先を確認

Sigmund H. Høeg, Yilun Du, Olav Egeland,

(参考訳) 拡散モデルはロボット模倣学習に急速に採用され、複雑なデキスタラスなタスクを自律的に実行できるようになった。しかし、アクション合成は遅いことが多く、反復的推論の多くのステップを必要とし、高速なリアクティブポリシーを必要とするタスクでモデルが使える範囲を制限する。これを回避するために、近年の研究では、拡散過程の蒸留が政策合成の加速にどのように役立つかが研究されている。しかし、蒸留は計算コストが高く、合成された作用の精度と多様性の両方を損なう可能性がある。 SDP(Streaming Diffusion Policy, ストリーミング拡散ポリシー)は, 部分分節化動作軌跡の生成が全出力動作軌跡よりもかなり高速であるという知見を生かして, 政策合成を高速化する代替手法である。それぞれの観測において,本手法は雑音のレベルが変化し,即時動作はノイズフリーとなり,その後の動作はノイズレベルと不確実性が増大する部分的認知行動軌跡を出力する。新しい観測のための部分分極化作用軌跡は、予め予測された雑音性作用軌跡に数ステップの分極を施すことで、迅速に生成することができる。本手法の有効性を概説し、シミュレーションと実世界の双方で性能を保ちながら、ポリシー合成を劇的に高速化する。

Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings.

翻訳日:2024-11-09 01:44:51 公開日:2024-10-11

# ウィスパーの制御:音声基礎モデル制御のための普遍的音響対立攻撃

Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models ( http://arxiv.org/abs/2407.04482v2 )

ライセンス: Link先を確認

Vyas Raina, Mark Gales,

(参考訳) 音声認識に基づくフレキシブルな音声認識システムや、音声プロンプト付き大規模言語モデル(LLM)の形で、音声認識可能な基礎モデルがますます人気を博している。これらのモデルの興味深い側面の1つは、適切なプロンプトを用いて自動音声認識(ASR)以外のタスクを実行する能力である。例えば、OpenAI Whisperモデルは、音声の書き起こしと音声翻訳の両方を実行することができる。オーディオ・プロンプテッド LLM の開発により、さらに大きな制御オプションが生まれる可能性がある。この研究では、この柔軟性により、システムはモデル制御の敵攻撃の影響を受けやすいことを実証する。モデルへのアクセスがなければ、適切な音声入力を変更することでシステムの動作を変更することができる。このリスクを説明するために、入力音声信号に短い普遍的対角音響セグメントを付加して、ASR基礎モデルの迅速な設定を上書きできることを実証する。具体的には、音声の書き起こしを設定されているにもかかわらず、Whisperが常に音声翻訳を行うように制御するために、普遍的な対角音響セグメントをうまく利用した。全体として、本研究は、この形態のモデルが展開される前に考慮すべき基礎モデルに対して、新しい形態の敵攻撃を示すものである。

Speech enabled foundation models, either in the form of flexible speech recognition based systems or audio-prompted large language models (LLMs), are becoming increasingly popular. One of the interesting aspects of these models is their ability to perform tasks other than automatic speech recognition (ASR) using an appropriate prompt. For example, the OpenAI Whisper model can perform both speech transcription and speech translation. With the development of audio-prompted LLMs there is the potential for even greater control options. In this work we demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks. Without any access to the model prompt it is possible to modify the behaviour of the system by appropriately changing the audio input. To illustrate this risk, we demonstrate that it is possible to prepend a short universal adversarial acoustic segment to any input speech signal to override the prompt setting of an ASR foundation model. Specifically, we successfully use a universal adversarial acoustic segment to control Whisper to always perform speech translation, despite being set to perform speech transcription. Overall, this work demonstrates a new form of adversarial attack on multi-tasking speech enabled foundation models that needs to be considered prior to the deployment of this form of model.

翻訳日:2024-11-08 23:46:45 公開日:2024-10-11

# 差分プライバシーにおける隠れ状態解析のための2層ReLUネットワークの近似

Approximating Two-Layer ReLU Networks for Hidden State Analysis in Differential Privacy ( http://arxiv.org/abs/2407.04884v2 )

ライセンス: Link先を確認

Antti Koskela,

(参考訳) 差分プライバシー(DP)の隠れ状態脅威モデルは、トレーニング中に中間状態を見ることなく、最終訓練された機械学習(ML)モデルにしかアクセスできないと仮定する。しかし、このモデルの下での現在のプライバシー分析は凸最適化の問題に限られており、現代のディープラーニングアプリケーションに不可欠な多層ニューラルネットワークへの適用性が低下している。さらに、分類タスクにおける隠蔽状態のプライバシー分析の最も成功した応用は、ロジスティック回帰モデルである。本研究では,DP確率勾配勾配(DP-SGD)を学習した1つの隠蔽層ReLUネットワークに匹敵する,プライバシユーティリティトレードオフによる凸問題をプライベートにトレーニングできることを実証する。我々は、ReLU最小化問題の双対な定式化を確率論的に近似することでこれを達成し、強い凸問題をもたらす。これにより、既存の隠れ状態のプライバシー分析が利用でき、ノイズの多いサイクリックなミニバッチ勾配降下法(NoisyCGD)にも正確なプライバシー境界を提供する。ベンチマーク分類タスクの実験により、NoisyCGDは1層ReLUネットワークに適用されたDP-SGDに匹敵するプライバシー利用トレードオフを達成できることが示された。さらに、凸近似によって得られるスピードアップを強調する理論的ユーティリティ境界を提供する。

The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. Current privacy analyses under this model, however, are limited to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Additionally, the most successful applications of the hidden state privacy analyses in classification tasks have been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of one hidden-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). We achieve this through a stochastic approximation of a dual formulation of the ReLU minimization problem which results in a strongly convex problem. This enables the use of existing hidden state privacy analyses, providing accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Our experiments on benchmark classification tasks show that NoisyCGD can achieve privacy-utility trade-offs comparable to DP-SGD applied to one-hidden-layer ReLU networks. Additionally, we provide theoretical utility bounds that highlight the speed-ups gained through the convex approximation.

翻訳日:2024-11-08 23:35:45 公開日:2024-10-11

# LLM圧縮の多次元安全性評価

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression ( http://arxiv.org/abs/2407.04965v3 )

ライセンス: Link先を確認

Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar,

(参考訳) モデル圧縮技術により、大規模言語モデル(LLM)を現実世界のアプリケーションにデプロイすることが可能になる。局所的な展開に向けてのこの勢いの結果として、圧縮LDMは人口と相互作用する。圧縮に関する以前の研究は、典型的には、トレーニング損失と直接的に類似したパープレキシティの保存を優先する。圧縮法がモデル行動の他の重要な側面へ与える影響-----------------は体系的評価を必要とする。そこで本研究では,(1)退化障害,すなわち世代におけるバイアスと毒性,(2)識別的タスクにおけるバイアス,(3)方言バイアス,(4)言語モデリングと下流タスクパフォーマンスの4つの側面によるモデル圧縮の影響について検討する。本研究では,非構造化プルーニング,半構造化プルーニング,量子化など,LLM圧縮手法の幅広いスペクトルについて検討する。解析の結果,圧縮が予期せぬ結果をもたらすことが明らかとなった。圧縮は故意にLLMの変性障害を緩和するかもしれないが、それでも表現障害を悪化させる可能性がある。さらに、圧縮の増加は、異なる保護されたグループに異なる影響をもたらす。最後に、異なる圧縮法は、例えば、量子化はバイアスをほとんど保ち、プルーニングは急速に劣化する。本研究は, 実世界のアプリケーションにまたがる信頼性を確保するため, 圧縮LDMの開発に安全性評価を統合することの重要性を浮き彫りにした。 https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}}。

Increasingly, model compression techniques enable large language models (LLMs) to be deployed in real-world applications. As a result of this momentum towards local deployment, compressed LLMs will interact with a large population. Prior work on compression typically prioritize preserving perplexity, which is directly analogous to training loss. The impact of compression method on other critical aspects of model behavior\, -- \,particularly safety\, -- \,requires systematic assessment. To this end, we investigate the impact of model compression along four dimensions: (1) degeneration harm, i.e., bias and toxicity in generation; (2) representational harm, i.e., biases in discriminative tasks; (3) dialect bias; and(4) language modeling and downstream task performance. We examine a wide spectrum of LLM compression techniques, including unstructured pruning, semi-structured pruning, and quantization. Our analysis reveals that compression can lead to unexpected consequences. Although compression may unintentionally alleviate LLMs' degeneration harm, it can still exacerbate representational harm. Furthermore, increasing compression produces a divergent impact on different protected groups. Finally, different compression methods have drastically different safety impacts: for example, quantization mostly preserves bias while pruning degrades quickly. Our findings underscore the importance of integrating safety assessments into the development of compressed LLMs to ensure their reliability across real-world applications.\footnote{Our implementation and results are available here: \url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}}

翻訳日:2024-11-08 23:35:45 公開日:2024-10-11

# SEED-Story:大規模言語モデルを用いたマルチモーダル・ロングストーリー・ジェネレーション

SEED-Story: Multimodal Long Story Generation with Large Language Model ( http://arxiv.org/abs/2407.08683v2 )

ライセンス: Link先を確認

Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen,

(参考訳) 画像生成とオープンフォームテキスト生成の顕著な進歩により、インターリーブされた画像テキストコンテンツの作成は、ますます興味深い分野になりつつある。物語テキストと鮮やかなイメージをインターリーブで生成する多モーダルなストーリー生成は、幅広い応用において価値ある実践的課題として現れてきた。しかし、このタスクは、テキストと画像の間の複雑な相互作用の理解と、一貫性のあるコンテキストに関連のあるテキストと視覚の長いシーケンスを生成する能力を必要とするため、重大な課題を生じさせる。本稿では,MLLM(Multimodal Large Language Model)を利用した拡張多モーダルストーリ生成手法であるSEED-Storyを提案する。我々のモデルはMLLMの強力な理解能力に基づいて、テキストトークンと視覚トークンを予測し、それを適応された視覚的デトケナイザで処理し、一貫した文字やスタイルで画像を生成する。さらに,最大25個のストーリー(トレーニング用10個)を高効率で自動回帰的に生成できるマルチモーダルアテンションシンク機構を提案する。さらに,大規模かつ高解像度なStoryStreamというデータセットを提示し,モデルをトレーニングし,様々な側面においてマルチモーダルなストーリー生成のタスクを定量的に評価する。

With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad applications. However, this task poses significant challenges, as it necessitates the comprehension of the complex interplay between texts and images, and the ability to generate long sequences of coherent, contextually relevant texts and visuals. In this work, we propose SEED-Story, a novel method that leverages a Multimodal Large Language Model (MLLM) to generate extended multimodal stories. Our model, built upon the powerful comprehension capability of MLLM, predicts text tokens as well as visual tokens, which are subsequently processed with an adapted visual de-tokenizer to produce images with consistent characters and styles. We further propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner. Additionally, we present a large-scale and high-resolution dataset named StoryStream for training our model and quantitatively evaluating the task of multimodal story generation in various aspects.

翻訳日:2024-11-08 22:17:54 公開日:2024-10-11

# MetaUrban: 都市マイクロモビリティのための体操AIシミュレーションプラットフォーム

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility ( http://arxiv.org/abs/2407.08725v2 )

ライセンス: Link先を確認

Wayne Wu, Honglin He, Jack He, Yiran Wang, Chenda Duan, Zhizheng Liu, Quanyi Li, Bolei Zhou,

(参考訳) 街並みや広場のような公共の都市空間は、住民に役立ち、活気のある変化に社会生活を適応させる。最近のロボティクスとエンボディードAIの進歩により、公共の都市空間はもはや人間専用ではない。フードデリバリーロボットと電動車椅子は歩道を歩行者と共有し始めている。公共の都市空間における短距離移動のためのAIによって実現されるマイクロモビリティは、将来の交通システムにおいて重要な要素である。モバイルデバイスを操作するAIモデルの一般化性と安全性の確保が不可欠である。本稿では,AI駆動型都市マイクロモビリティ研究のための構成シミュレーションプラットフォームであるMetaUrbanを紹介する。 MetaUrbanは、多数の地上計画、オブジェクト配置、歩行者、脆弱な道路利用者、その他の移動エージェントの外観とダイナミクスをカバーし、構成要素から無限に多くのインタラクティブな都市シーンを構築することができる。本稿では,MetaUrbanを用いた都市マイクロモビリティ研究のパイロット研究としてポイントナビゲーションとソーシャルナビゲーションタスクを設計し,強化学習と模倣学習の様々な基盤を確立する。我々は,多種多様な機械構造がAI政策の学習と実行に大きな影響を及ぼすことを示した。我々は,シミュレーション環境の組成特性が,訓練された移動体エージェントの一般化性と安全性を大幅に向上させることを示す,徹底的なアブレーション研究を行った。 MetaUrbanは、研究機会を提供し、都市で安全で信頼性の高いAIとマイクロモビリティを育むために、一般公開される。コードとデータセットが公開される。

Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while robot dogs and humanoids have recently emerged in the street. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. Ensuring the generalizability and safety of AI models maneuvering mobile machines is essential. In this work, we present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for urban micromobility research and establish various baselines of Reinforcement Learning and Imitation Learning. We conduct extensive evaluation across mobile machines, demonstrating that heterogeneous mechanical structures significantly influence the learning and execution of AI policies. We perform a thorough ablation study, showing that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide research opportunities and foster safe and trustworthy embodied AI and micromobility in cities. The code and dataset will be publicly available.

翻訳日:2024-11-08 22:17:54 公開日:2024-10-11

# AdaptEval: テキスト要約のためのドメイン適応に基づく大規模言語モデルの評価

AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization ( http://arxiv.org/abs/2407.11591v3 )

ライセンス: Link先を確認

Anum Afzal, Ribin Chalumattu, Florian Matthes, Laura Mascarell,

(参考訳) LLM(Large Language Models)を用いた抽象的な要約タスクの進歩にもかかわらず、異なるドメインに容易に適応できる能力を評価する研究が不足している。各種ドメイン間の要約タスクにおいて,様々なLLMのドメイン適応能力について,微調整と文脈内学習の両方で評価する。また、最初のドメイン適応評価スイートであるAdaptEvalも紹介する。 AdaptEvalには、ドメイン適応の分析を容易にするための、ドメインベンチマークとメトリクスのセットが含まれている。この結果から,LLMはパラメータスケールに関係なく,文脈内学習環境において同等の性能を示すことが示された。

Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domain adaptation evaluation suite. AdaptEval includes a domain benchmark and a set of metrics to facilitate the analysis of domain adaptation. Our results demonstrate that LLMs exhibit comparable performance in the in-context learning setting, regardless of their parameter scale.

翻訳日:2024-11-08 21:10:26 公開日:2024-10-11

# 変圧器言語モデルの効率的な事前学習のための量子化探索

Exploring Quantization for Efficient Pre-Training of Transformer Language Models ( http://arxiv.org/abs/2407.11722v2 )

ライセンス: Link先を確認

Kamran Chitsaz, Quentin Fournier, Gonçalo Mordido, Sarath Chandar,

(参考訳) トランスフォーマーモデルのスケールの増大は、事前学習された計算要求の増加につながった。事前学習と微調整の後に量子化が有効であることが証明されているが、事前学習中にトランスフォーマーに量子化を適用することは、言語モデリングの大規模化においてほとんど未検討のままである。本研究の目的は、線形層成分に着目したトランスフォーマーの効率的な事前学習における量子化の影響を検討することである。重み、アクティベーション、勾配、オプティマイザ状態に直線量子化を体系的に適用することにより、トレーニング中のモデル効率、安定性、性能への影響を評価する。トランスフォーマーの事前学習に適用される効果的な量子化戦略の包括的レシピを提供することにより、言語モデリング能力を維持しながら、スクラッチから高いトレーニング効率を向上する。コードはhttps://github.com/chandar-lab/EfficientLLMsで入手できる。

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

翻訳日:2024-11-08 20:59:00 公開日:2024-10-11

# 二重機能抽出とクロスデュアルドメインデコードによる薬物再配置の薬物放出関連予測

Boosting drug-disease association prediction for drug repositioning via dual-feature extraction and cross-dual-domain decoding ( http://arxiv.org/abs/2407.11812v2 )

ライセンス: Link先を確認

Enqiang Zhu, Xiang Li, Chanjuan Liu, Nikhil R. Pal,

(参考訳) 薬物再配置は、薬物の発見と開発という領域において、迅速かつ費用対効果の高い戦略を提供する。近年では、大規模で複雑なデータセットを分析できるため、薬物再配置の強力なツールとしてディープラーニング技術が登場している。しかし、既存の多くの手法は、薬物や疾患の特徴の相互関係を考慮せずに、ネットワーク内の近傍ノードから特徴情報を抽出することに焦点を当てており、不正確な表現につながる可能性がある。この制限に対処するために、我々は2つの特徴(類似性と関連性)を用いて、Dual-Feature Drug Repositioning Neural Network(DFDRNN)モデルを提案する。 DFDRNNは、隣接する特徴を抽出するための自己注意機構を使用して、2つの二重機能抽出モジュールを組み込んでいる: ドメイン内二重機能抽出(IntraDDFE)モジュールは、単一のドメイン内の特徴を抽出する単一のドメイン(ドラッグまたは疾患)と、ドメイン間の特徴を抽出するドメイン間二重機能抽出(InterDDFE)モジュールである。これらのモジュールを利用することで、薬物や疾患のより適切なエンコーディングを確実にする。さらに、クロスデュアルドメインデコーダは、両方のドメインにおける薬物放出関連を予測するように設計されている。提案するDFDRNNモデルは,4つのベンチマークデータセット上で6つの最先端手法を上回り,平均AUROC 0.946 と平均 AUPR 0.597 を達成している。 2つの疾患のケーススタディでは、提案されたDFDRNNモデルが現実のシナリオに適用可能であることを示し、薬物再配置におけるその有意義な可能性を示している。

Uncovering new therapeutic uses of existing drugs, drug repositioning offers a fast and cost-effective strategy and holds considerable significance in the realm of drug discovery and development. In recent years, deep learning techniques have emerged as powerful tools in drug repositioning due to their ability to analyze large and complex datasets. However, many existing methods focus on extracting feature information from nearby nodes in the network to represent drugs and diseases, without considering the potential inter-relationships between the features of drugs and diseases, which may lead to inaccurate representations. To address this limitation, we use two features (similarity and association) to capture the potential relationships between the features of drugs and diseases, proposing a Dual-Feature Drug Repositioning Neural Network (DFDRNN) model. DFDRNN uses a self-attention mechanism to extract neighbor features and incorporates two dual-feature extraction modules: the intra-domain dual-feature extraction (IntraDDFE) module for extracting features within a single domain (drugs or diseases) and the inter-domain dual-feature extraction (InterDDFE) module for extracting features across domains. By utilizing these modules, we ensure more appropriate encoding of drugs and diseases. Additionally, a cross-dual-domain decoder is designed to predict drug-disease associations in both domains. Our proposed DFDRNN model outperforms six state-of-the-art methods on four benchmark datasets, achieving an average AUROC of 0.946 and an average AUPR of 0.597. Case studies on two diseases show that the proposed DFDRNN model can be applied in real-world scenarios, demonstrating its significant potential in drug repositioning.

翻訳日:2024-11-08 20:59:00 公開日:2024-10-11

# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v2 )

ライセンス: Link先を確認

Ayush Kaushal, Tejas Vaidhya, Tejas Pandey, Aaryan Bhagat, Irina Rish,

(参考訳) 後学習量子化は、LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが、残念ながら、4ビットの精度よりも大きな性能劣化に悩まされている。別のアプローチでは、圧縮されたモデルを低ビット幅(例えば、バイナリまたは3次モデル)で直接訓練する。しかし、そのようなモデルの性能、トレーニングのダイナミクス、スケーリングの傾向はまだよく分かっていない。この問題に対処するため、99Mから3.9Bパラメータを含む54の言語モデルで構成され、300BトークンでトレーニングされたSpectra LLMスイートをトレーニングし、公開リリースする。スペクトルには、FloatLMs、ポストトレーニング後の量子化QuantLMs (3, 4, 6, 8 bits)、および3次LLMs (TriLMs)が含まれる。例えば、TriLM 3.9Bは半精度FloatLM 830Mより小さいが、常識推論と知識ベンチマークでは半精度FloatLM 3.9Bと一致する。しかし、TriLM 3.9Bは6倍の大きさのモデルであるFloatLM 3.9Bと同じくらい毒性があり、ステレオタイピングである。さらに、TriLM 3.9Bは、検証分割とWebベースのコーパスの難易度でFloatLMに遅れをとっているが、LambadaやPennTreeBankのようなあまりノイズの少ないデータセットではパフォーマンスが良くなっている。低ビット幅モデルの理解を深めるため、私たちはSpectraスイートの500以上の中間チェックポイントを \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite} でリリースしています。

Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but unfortunately, it suffers from significant performance degradation below 4-bit precision. An alternative approach involves training compressed models directly at a low bitwidth (e.g., binary or ternary models). However, the performance, training dynamics, and scaling trends of such models are not yet well understood. To address this issue, we train and openly release the Spectra LLM suite consisting of 54 language models ranging from 99M to 3.9B parameters, trained on 300B tokens. Spectra includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, and 8 bits), and ternary LLMs (TriLMs) - our improved architecture for ternary language modeling, which significantly outperforms previously proposed ternary models of a given size (in bits), matching half-precision models at scale. For example, TriLM 3.9B is (bit-wise) smaller than the half-precision FloatLM 830M, but matches half-precision FloatLM 3.9B in commonsense reasoning and knowledge benchmarks. However, TriLM 3.9B is also as toxic and stereotyping as FloatLM 3.9B, a model six times larger in size. Additionally, TriLM 3.9B lags behind FloatLM in perplexity on validation splits and web-based corpora but performs better on less noisy datasets like Lambada and PennTreeBank. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite}.

翻訳日:2024-11-08 20:48:00 公開日:2024-10-11

# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v3 )

ライセンス: Link先を確認

Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,

(参考訳) GPU計算能力の急速な進歩は、メモリ容量と帯域幅の増大を上回り、LLM(Large Language Model)推論のボトルネックを生み出した。後学習量子化は, LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが, 4ビット精度以下の性能劣化に悩まされている。本稿では,従来の浮動小数点モデル (FloatLMs) とその後量子化バージョン (QuantLMs) の代替として,低ビット幅モデル,特に第三言語モデル (TriLMs) の事前学習を検討することで,これらの課題に対処する。我々は、FloatLMs、QuantLMs、TriLMsを含む複数のビット幅にまたがる最初のオープンなLLMスイートであるSpectra LLMスイートを、300Bトークンでトレーニングされた99Mから3.9Bのパラメータで紹介する。我々の総合的な評価は、TriLMがモデルサイズ(ビット)の点で優れたスケーリング挙動を提供することを示している。驚くべきことに、スケールが10億以上のパラメータでは、TriLMは様々なベンチマークで与えられたビットサイズに対して、QuantLMとFloatLMを一貫して上回っている。特にTriLMの3.9Bパラメータは、FloatLM 830Mよりビットが少ないにもかかわらず、全てのベンチマークでFloatLM 3.9Bのパフォーマンスと一致している。全体として、この研究は低ビット幅言語モデルの実現可能性と拡張性に関する貴重な洞察を与え、より効率的なLCMの開発への道を開いた。低ビット幅モデルの理解を深めるため、私たちはSpectraスイートの500以上の中間チェックポイントを \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite} でリリースしています。

Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of model size (in bits). Surprisingly, at scales exceeding one billion parameters, TriLMs consistently outperform their QuantLM and FloatLM counterparts for a given bit size across various benchmarks. Notably, the 3.9B parameter TriLM matches the performance of the FloatLM 3.9B across all benchmarks, despite having fewer bits than FloatLM 830M. Overall, this research provides valuable insights into the feasibility and scalability of low-bitwidth language models, paving the way for the development of more efficient LLMs. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at \href{https://github.com/NolanoOrg/SpectraSuite}{https://github.com/NolanoOrg/SpectraSuite}.

翻訳日:2024-11-08 20:48:00 公開日:2024-10-11

# スペクトル: 3次・量子化・FP16言語モデルに関する総合的研究

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models ( http://arxiv.org/abs/2407.12327v4 )

ライセンス: Link先を確認

Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,

翻訳日:2024-11-08 20:48:00 公開日:2024-10-11

# スペクトル: 大規模第三言語モデルの事前学習の有効性について

Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale ( http://arxiv.org/abs/2407.12327v5 )

ライセンス: Link先を確認

Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan Bhagat, Irina Rish,

(参考訳) GPU計算能力の急速な進歩は、メモリ容量と帯域幅の増大を上回り、LLM(Large Language Model)推論のボトルネックを生み出した。後学習量子化は, LLM推論におけるメモリ関連ボトルネックに対処する主要な手法であるが, 4ビット精度以下の性能劣化に悩まされている。本稿では,従来の浮動小数点モデル (FloatLMs) とその後量子化バージョン (QuantLMs) の代替として,低ビット幅モデル,特に第三言語モデル (TriLMs) の事前学習を検討することで,これらの課題に対処する。我々は、FloatLMs、QuantLMs、TriLMsを含む複数のビット幅にまたがる最初のオープンなLLMスイートであるSpectra LLMスイートを、300Bトークンでトレーニングされた99Mから3.9Bのパラメータで紹介する。我々の総合的な評価は、TriLMがモデルサイズ(ビット)の点で優れたスケーリング挙動を提供することを示している。驚くべきことに、スケールが10億以上のパラメータでは、TriLMは様々なベンチマークで与えられたビットサイズに対して、QuantLMとFloatLMを一貫して上回っている。特にTriLMの3.9Bパラメータは、FloatLM 830Mよりビットが少ないにもかかわらず、全てのベンチマークでFloatLM 3.9Bのパフォーマンスと一致している。全体として、この研究は低ビット幅言語モデルの実現可能性と拡張性に関する貴重な洞察を与え、より効率的なLCMの開発への道を開いた。低ビット幅モデルの理解を深めるため、Spectraスイートの500以上の中間チェックポイントをhttps://github.com/NolanoOrg/SpectraSuite.comでリリースしています。

Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of model size (in bits). Surprisingly, at scales exceeding one billion parameters, TriLMs consistently outperform their QuantLM and FloatLM counterparts for a given bit size across various benchmarks. Notably, the 3.9B parameter TriLM matches the performance of the FloatLM 3.9B across all benchmarks, despite having fewer bits than FloatLM 830M. Overall, this research provides valuable insights into the feasibility and scalability of low-bitwidth language models, paving the way for the development of more efficient LLMs. To enhance understanding of low-bitwidth models, we are releasing 500+ intermediate checkpoints of the Spectra suite at https://github.com/NolanoOrg/SpectraSuite.

翻訳日:2024-11-08 20:36:48 公開日:2024-10-11

# ロボットもマルチタスクが可能:クロスタスクロボットアクション生成のためのメモリアーキテクチャとLCMの統合

Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation ( http://arxiv.org/abs/2407.13505v2 )

ライセンス: Link先を確認

Hassan Ali, Philipp Allgeuer, Carlo Mazzola, Giulia Belgiovine, Burak Can Kaplan, Lukáš Gajdošech, Stefan Wermter,

(参考訳) 大規模言語モデル(LLM)は、ロボットの知覚と身体能力に則って、LLMの常識推論を基礎づけるロボットアプリケーションで最近使用されている。ヒューマノイドロボットでは、メモリは、特に、ロボットが以前のタスク状態、環境状態、実行された動作を記憶しなければならないマルチタスク設定において、現実世界の実施を促進する上でも重要な役割を果たす。本稿では,タスク間を効果的に切り替える一方で,タスク間動作を生成するためのLLMをメモリプロセスに組み込むことに対処する。提案する2層構造は,人間の認知にインスパイアされた記憶モデルと相補的な推論と追従の手法を併用した2つのLCMを特徴とする。その結果,5つのロボットタスクのベースラインよりも性能が大幅に向上し,ロボットの動作と適応タスク実行の知覚を組み合わせたLLMにメモリを統合できる可能性が示された。

Large Language Models (LLMs) have been recently used in robot applications for grounding LLM common-sense reasoning with the robot's perception and physical abilities. In humanoid robots, memory also plays a critical role in fostering real-world embodiment and facilitating long-term interactive capabilities, especially in multi-task setups where the robot must remember previous task states, environment states, and executed actions. In this paper, we address incorporating memory processes with LLMs for generating cross-task robot actions, while the robot effectively switches between tasks. Our proposed dual-layered architecture features two LLMs, utilizing their complementary skills of reasoning and following instructions, combined with a memory model inspired by human cognition. Our results show a significant improvement in performance over a baseline of five robotic tasks, demonstrating the potential of integrating memory with LLMs for combining the robot's action and perception for adaptive task execution.

翻訳日:2024-11-08 20:14:30 公開日:2024-10-11

# NeLLCom-X: 言語学習とグループコミュニケーションをシミュレートする包括的ニューラルネットワークフレームワーク

NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication ( http://arxiv.org/abs/2407.13999v2 )

ライセンス: Link先を確認

Yuchen Lian, Tessa Verhoef, Arianna Bisazza,

(参考訳) 計算言語学の最近の進歩には、ランダムな記号の集合から始まる相互作用するニューラルネットワークエージェントによる人間のような言語の出現をシミュレートすることが含まれる。最近導入されたNeLLComフレームワーク(Lian et al , 2023)により、エージェントはまず人工言語を学習し、それを通信に使用することができる。このフレームワーク(NeLLCom-X)は、言語学習性、通信圧力、グループサイズ効果の相互作用を調べるために、より現実的な役割交代エージェントとグループコミュニケーションを導入することで拡張される。我々は,単語順/ケースマーキングトレードオフの出現をシミュレートした先行研究から得られた重要な知見を複製してNeLLCom-Xを検証する。次に,相互作用が言語収束とトレードオフの出現にどのように影響するかを検討する。このフレームワークは、言語進化における相互作用とグループダイナミクスの重要性を強調し、多様な言語的側面の将来のシミュレーションを促進する。

Recent advances in computational linguistics include simulating the emergence of human-like languages with interacting neural network agents, starting from sets of random symbols. The recently introduced NeLLCom framework (Lian et al., 2023) allows agents to first learn an artificial language and then use it to communicate, with the aim of studying the emergence of specific linguistics properties. We extend this framework (NeLLCom-X) by introducing more realistic role-alternating agents and group communication in order to investigate the interplay between language learnability, communication pressures, and group size effects. We validate NeLLCom-X by replicating key findings from prior research simulating the emergence of a word-order/case-marking trade-off. Next, we investigate how interaction affects linguistic convergence and emergence of the trade-off. The novel framework facilitates future simulations of diverse linguistic aspects, emphasizing the importance of interaction and group dynamics in language evolution.

翻訳日:2024-11-08 19:38:31 公開日:2024-10-11

# アートインテリジェンス:高忠実景観絵画合成のための拡散型フレームワーク

Artistic Intelligence: A Diffusion-Based Framework for High-Fidelity Landscape Painting Synthesis ( http://arxiv.org/abs/2407.17229v4 )

ライセンス: Link先を確認

Wanggong Yang, Yifei Zhao,

(参考訳) 高忠実な風景画の生成は、構造と様式の両方を正確に制御する必要がある難しい課題である。本稿では,ランドスケープ・ペインティング・ジェネレーションに特化して設計された新しい拡散モデルLPGenを提案する。 LPGenは、構造的特徴とスタイル的特徴を独立に処理し、従来の絵画技法の階層的アプローチを効果的に模倣する、分離された相互注意機構を導入している。さらに、LPGenは、ランドスケープ絵画のレイアウトを制御するために設計されたマルチスケールエンコーダである構造制御器を提案し、美学と構成のバランスを損なう。さらに、このモデルは高解像度のランドスケープ画像のキュレートされたデータセットに事前トレーニングされ、異なる芸術様式で分類され、詳細で一貫した出力を確保するために微調整される。 LPGenは広範な評価を通じて、構造的に正確であるだけでなく、スタイリスティックに整合した絵画を製作する上で優れた性能を示し、現在の最先端のモデルを上回っている。この研究はAI生成芸術を進歩させ、技術と伝統的な芸術的実践の交わりを探索するための新たな道を提供する。コード、データセット、モデルの重み付けが公開されます。

Generating high-fidelity landscape paintings remains a challenging task that requires precise control over both structure and style. In this paper, we present LPGen, a novel diffusion-based model specifically designed for landscape painting generation. LPGen introduces a decoupled cross-attention mechanism that independently processes structural and stylistic features, effectively mimicking the layered approach of traditional painting techniques. Additionally, LPGen proposes a structural controller, a multi-scale encoder designed to control the layout of landscape paintings, striking a balance between aesthetics and composition. Besides, the model is pre-trained on a curated dataset of high-resolution landscape images, categorized by distinct artistic styles, and then fine-tuned to ensure detailed and consistent output. Through extensive evaluations, LPGen demonstrates superior performance in producing paintings that are not only structurally accurate but also stylistically coherent, surpassing current state-of-the-art models. This work advances AI-generated art and offers new avenues for exploring the intersection of technology and traditional artistic practices. Our code, dataset, and model weights will be publicly available.

翻訳日:2024-11-08 15:23:20 公開日:2024-10-11

# 深層学習の数学的理論

Mathematical theory of deep learning ( http://arxiv.org/abs/2407.18384v2 )

ライセンス: Link先を確認

Philipp Petersen, Jakob Zech,

(参考訳) この本は、ディープラーニングの数学的解析の紹介を提供する。これは、深層ニューラルネットワーク理論の3つの柱である近似理論、最適化理論、統計学習理論の基本的な結果をカバーしている。本書は、数学や関連分野の学生や研究者のためのガイドとして、このトピックに関する基礎知識を読者に提供することを目的としている。一般性よりも単純さを優先し、厳密でアクセスしやすい結果を提示し、ディープラーニングを支える基本的な数学的概念を理解するのに役立つ。

This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning.

翻訳日:2024-11-08 14:50:05 公開日:2024-10-11

# レービー確率モデルの騒音化

Denoising Lévy Probabilistic Models ( http://arxiv.org/abs/2407.18609v2 )

ライセンス: Link先を確認

Dario Shariatian, Umut Simsekli, Alain Durmus,

(参考訳) 拡散生成モデルにおけるガウシアンを超えての雑音分布の探索は未解決の問題である。ガウスのケースは実験的、理論的に成功し、スコアベースおよびデノゲーションの定式化に統一されたSDEフレームワークを適合させた。近年の研究では、重み付きノイズ分布はモード崩壊に対処し、クラス不均衡、重み付きテール、または外れ値を持つデータセットを管理することが示唆されている。 Yoon et al (NeurIPS 2023) は L'evy-Ito モデル (LIM) を導入し、SDE フレームワークを$\alpha$-stable ノイズでヘビーテール SDE に拡張した。理論上のエレガンスと性能の向上にもかかわらず、LIMの複雑な数学はアクセシビリティとより広範な採用を制限する可能性がある。本研究は,拡散確率モデル(DDPM)を$\alpha$-stableノイズで拡張し,L''evy確率モデル(DLPM)を作成した。初等証明手法を用いることで,DLPMは最小限の変更でバニラDDPMの実行を減らし,最小限の変更で既存の実装を利用できることを示す。 DLPMとLIMは異なるトレーニングアルゴリズムを持ち、ガウスの場合とは異なり、異なる後方プロセスとサンプリングアルゴリズムを認めている。実験により,DLPMは,データ分散テールのカバレッジの向上,不均衡なデータセットの生成の改善,後方ステップの削減による計算時間の短縮を実現している。

Investigating noise distribution beyond Gaussian in diffusion generative models is an open problem. The Gaussian case has seen success experimentally and theoretically, fitting a unified SDE framework for score-based and denoising formulations. Recent studies suggest heavy-tailed noise distributions can address mode collapse and manage datasets with class imbalance, heavy tails, or outliers. Yoon et al. (NeurIPS 2023) introduced the L\'evy-Ito model (LIM), extending the SDE framework to heavy-tailed SDEs with $\alpha$-stable noise. Despite its theoretical elegance and performance gains, LIM's complex mathematics may limit its accessibility and broader adoption. This study takes a simpler approach by extending the denoising diffusion probabilistic model (DDPM) with $\alpha$-stable noise, creating the denoising L\'evy probabilistic model (DLPM). Using elementary proof techniques, we show DLPM reduces to running vanilla DDPM with minimal changes, allowing the use of existing implementations with minimal changes. DLPM and LIM have different training algorithms and, unlike the Gaussian case, they admit different backward processes and sampling algorithms. Our experiments demonstrate that DLPM achieves better coverage of data distribution tail, improved generation of unbalanced datasets, and faster computation times with fewer backward steps.

翻訳日:2024-11-08 14:50:05 公開日:2024-10-11

# SOAP-RL:POMDP環境における強化学習のための逐次オプションアドバンテージプロパゲーション

SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments ( http://arxiv.org/abs/2407.18913v2 )

ライセンス: Link先を確認

Shu Ishida, João F. Henriques,

(参考訳) この研究は、強化学習アルゴリズムを部分的に観測されたマルコフ決定プロセス(POMDP)に拡張する方法とオプションを比較する。オプションの1つの見解は、時間的に拡張されたアクションであり、エージェントがポリシーのコンテキストウィンドウを越えて歴史的な情報を保持できるメモリとして実現することができる。オプションの割り当てはヒューリスティックスと手作りの目的を使って扱うことができるが、時間的に一貫した選択肢と関連するサブ政治を明示的な監督なしに学ぶことは困難である。 PPOEMとSOAPという2つのアルゴリズムが提案され、この問題に深く取り組むために研究されている。 PPOEM は (Hidden Markov Models の)フォワードバックワードアルゴリズムを適用して,オプション拡張ポリシに対する期待リターンを最適化する。しかし、この学習アプローチは、オン・ポリティクスのロールアウト中に不安定である。オプションの割り当ては、エピソード全体が利用可能なオフラインシーケンスに最適化されているため、将来の軌跡を知ることなく因果ポリシーを学ぶのにも適していない。別のアプローチとして、SOAPは最適なオプション割り当てのためのポリシー勾配を評価します。これは、GAE(Generalized advantage estimation)の概念を拡張して、オプションの利点を時間を通して伝播させ、オプションポリシー勾配の時間的バックプロパゲーションの実行と等価な分析を行う。このオプションポリシーは、エージェントの歴史にのみ条件付きであり、将来のアクションではない。競合するベースラインに対して評価され、SOAPは最も堅牢なパフォーマンスを示し、POMDPの廊下環境と、AtariやMuJoCoなどの標準ベンチマーク、PPOEM、LSTM、Option-Criticベースラインを正しく検出した。オープンソースコードはhttps://github.com/shuishida/SoapRL.comで公開されている。

This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervision is a challenge. Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem. PPOEM applies the forward-backward algorithm (for Hidden Markov Models) to optimize the expected returns for an option-augmented policy. However, this learning approach is unstable during on-policy rollouts. It is also unsuited for learning causal policies without the knowledge of future trajectories, since option assignments are optimized for offline sequences where the entire episode is available. As an alternative approach, SOAP evaluates the policy gradient for an optimal option assignment. It extends the concept of the generalized advantage estimation (GAE) to propagate option advantages through time, which is an analytical equivalent to performing temporal back-propagation of option policy gradients. This option policy is only conditional on the history of the agent, not future actions. Evaluated against competing baselines, SOAP exhibited the most robust performance, correctly discovering options for POMDP corridor environments, as well as on standard benchmarks including Atari and MuJoCo, outperforming PPOEM, as well as LSTM and Option-Critic baselines. The open-sourced code is available at https://github.com/shuishida/SoapRL.

翻訳日:2024-11-08 14:50:05 公開日:2024-10-11

# ビジョンランゲージモデルを用いたゼロショットにおけるロボティクス問題の解法

Solving Robotics Problems in Zero-Shot with Vision-Language Models ( http://arxiv.org/abs/2407.19094v3 )

ライセンス: Link先を確認

Zidan Wang, Rui Shen, Bradly Stadie,

(参考訳) ゼロショット方式でロボットの問題を解くために設計された多エージェント視覚大言語モデル(VLLM)フレームワークであるWonderful Teamを紹介した。我々の文脈では、ゼロショットとは、新しい環境において、ロボットの周囲のイメージとタスク記述をVLLMに提供し、ロボットがタスクを完了するために必要なアクションのシーケンスをVLLMが出力することを意味する。ロボット固有のデータに対するLLMの調整や、別々のビジョンエンコーダのトレーニングなど、パイプラインの微調整が必要な以前の作業とは異なり、当社のアプローチでは、慎重にエンジニアリングすることで、単一のオフザシェルフVLLMが、高レベルの計画から低レベルのロケーション抽出、アクション実行に至るまで、ロボットタスクのすべての側面を自律的に処理できることが示されています。重要なことに、GPT-4o単独で使うのに比べ、Wonderful Teamは自己修正的であり、自分自身のミスを反復的に修正できるため、長期的な課題を解決できる。我々は、VIMABenchを用いたシミュレーション環境と実世界の環境の両方において、広範な実験を通してフレームワークを検証する。私たちのシステムは、操作、ゴールリーチ、視覚的推論といった多様なタスクを、すべてゼロショットで処理できる能力を示しています。これらの結果は、この1年で視覚言語モデルは急速に進歩し、多くのロボティクス問題のバックボーンとして強く考えるべきである、という重要なポイントを浮き彫りにしている。

We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework designed to solve robotics problems in a zero-shot regime. In our context, zero-shot means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description, and the VLLM outputs the sequence of actions necessary for the robot to complete the task. Unlike prior work that requires fine-tuning parts of the pipeline -- such as adjusting an LLM on robot-specific data or training separate vision encoders -- our approach demonstrates that with careful engineering, a single off-the-shelf VLLM can autonomously handle all aspects of a robotics task, from high-level planning to low-level location extraction and action execution. Crucially, compared to using GPT-4o alone, Wonderful Team is self-corrective and capable of iteratively fixing its own mistakes, enabling it to solve challenging long-horizon tasks. We validate our framework through extensive experiments, both in simulated environments using VIMABench and in real-world settings. Our system showcases the ability to handle diverse tasks such as manipulation, goal-reaching, and visual reasoning -- all in a zero-shot manner. These results underscore a key point: vision-language models have progressed rapidly in the past year and should be strongly considered as a backbone for many robotics problems moving forward.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-11

# ビジョンランゲージモデルを用いたゼロショットにおけるロボティクス問題の解法

Solving Robotics Problems in Zero-Shot with Vision-Language Models ( http://arxiv.org/abs/2407.19094v4 )

ライセンス: Link先を確認

Zidan Wang, Rui Shen, Bradly Stadie,

翻訳日:2024-11-08 14:38:53 公開日:2024-10-11

# 大規模言語モデルを自動抑うつ分類のための3モードアーキテクチャに統合する

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification ( http://arxiv.org/abs/2407.19340v4 )

ライセンス: Link先を確認

Santosh V. Patapati,

(参考訳) メジャー・うつ病(Major Depressive Disorder、MDD)は、世界中の3億人に影響を及ぼす広汎な精神疾患である。本研究は, 臨床面接記録からのうつ病のバイナリ分類のための, BiLSTM に基づくトリモーダルモデルレベルの融合アーキテクチャを提案する。提案アーキテクチャでは、Mel Frequency Cepstral Coefficients, Facial Action Unitsを組み込み、2ショット学習に基づくGPT-4モデルを用いてテキストデータを処理する。これは、このタスクのために、大規模な言語モデルをマルチモーダルアーキテクチャに組み込む最初の作業である。 DAIC-WOZ AVEC 2016 Challenge cross-validation splitとLeave-One-Subject-Out cross-validation splitは、すべてのベースラインモデルと複数の最先端モデルを上回っている。 Leave-One-Subject-Outテストでは91.01%の精度、F1スコア85.95%の精度、80%の精度、92.86%のリコールを達成した。

Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-11

# DAIC-WOZに基づく自動抑うつ分類のための大規模言語モデルの3モードアーキテクチャへの統合

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification on the DAIC-WOZ ( http://arxiv.org/abs/2407.19340v5 )

ライセンス: Link先を確認

Santosh V. Patapati,

翻訳日:2024-11-08 14:38:53 公開日:2024-10-11

# 医療における感性推論

Sentiment Reasoning for Healthcare ( http://arxiv.org/abs/2407.21054v2 )

ライセンス: Link先を確認

Khai Le-Duc, Khai-Nguyen Nguyen, Bach Phan Tat, Duy Le, Jerry Ngo, Long Vo-Dang, Anh Totti Nguyen, Truong-Son Hy,

(参考訳) AI意思決定の透明性は、エラーによる深刻な結果のため、医療において不可欠であり、感情分析タスクにおいて、AIとユーザ間の信頼を構築する上で重要である。推論機能を組み込むことで、LLM(Large Language Models)は、より広い文脈における人間の感情を理解し、曖昧であいまいな言語を扱い、明確に述べられていない基本的な感情を推測する。本研究では,音声とテキストの両モードに対して,新たなタスクであるSentiment Reasoningを導入し,マルチモーダルなマルチタスクフレームワークとデータセットを提案する。本研究は,有理化訓練により,人文・ASR設定の感情分類におけるモデル性能が向上することを示した。また、生成した有理数は通常、人為的有理数と比較して異なる語彙を示すが、類似した意味論は維持する。すべてのコード、データ(英訳、ベトナム語)、モデルはオンラインで公開されている。

Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be explicitly stated. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Our study showed that rationale-augmented training enhances model performance in sentiment classification across both human transcript and ASR settings. Also, we found that the generated rationales typically exhibit different vocabularies compared to human-generated rationales, but maintain similar semantics. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed

翻訳日:2024-11-08 13:51:33 公開日:2024-10-11

# 医療における感性推論

Sentiment Reasoning for Healthcare ( http://arxiv.org/abs/2407.21054v3 )

ライセンス: Link先を確認

Khai-Nguyen Nguyen, Khai Le-Duc, Bach Phan Tat, Duy Le, Long Vo-Dang, Truong-Son Hy,

(参考訳) AIヘルスケアの意思決定における透明性は、AIとユーザ間の信頼を構築するために不可欠である。推論機能を組み込むことで、Large Language Models(LLM)はコンテキスト内の感情を理解し、ニュアンス付き言語を扱い、未定の感情を推測することができる。本研究では,音声とテキストの両モードに対して,新たなタスクであるSentiment Reasoningを導入し,マルチモーダルなマルチタスクフレームワークとデータセットを提案する。感性推論は感情分析における補助的タスクであり、モデルが感情ラベルの両方を予測し、入力の書き起こしに基づいてその背景にある理性を生成する。本研究は,人間に匹敵する品質のモデル予測の合理性を提供するとともに,モデル性能(精度とマクロF1)の1%向上)を合理的な微調整により向上させることにより,感性推論がモデルの透明性向上に役立つことを示す。また,ヒトとASR転写産物の有意な意味的品質の差は認められなかった。すべてのコード、データ(英訳とベトナム語)、モデルはオンラインで公開されている。

Transparency in AI healthcare decision-making is crucial for building trust among AI and users. Incorporating reasoning capabilities enables Large Language Models (LLMs) to understand emotions in context, handle nuanced language, and infer unstated sentiments. In this work, we introduce a new task -- Sentiment Reasoning -- for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts both the sentiment label and generates the rationale behind it based on the input transcript. Our study conducted on both human transcripts and Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helps improve model transparency by providing rationale for model prediction with quality semantically comparable to humans while also improving model performance (1% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also, no significant difference in the semantic quality of generated rationales between human and ASR transcripts. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed.

翻訳日:2024-11-08 13:51:33 公開日:2024-10-11

# 反復フォローアップ質問による検索機能向上

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions ( http://arxiv.org/abs/2408.00727v3 )

ライセンス: Link先を確認

Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang,

(参考訳) 大規模言語モデル(LLM)の創発的能力は、医学的問題を解く大きな可能性を示している。医学的な知識を持つことができるが、それでも幻覚があり、知識の更新には柔軟性がない。 Retrieval-Augmented Generation (RAG) は、外部知識ベースを用いたLSMの医療質問応答能力を高めるために提案されているが、複数の情報検索が必要な複雑なケースでは失敗する可能性がある。このような問題に対処するため,医学用反復RAG(i-MedRAG)を提案する。 i-MedRAGの各イテレーションでは、フォローアップクエリは従来のRAGシステムによって応答され、次のイテレーションでクエリ生成をガイドするためにさらに使用される。 I-MedRAG による様々な LLM の性能向上を,米国医学ライセンス試験 (USMLE) における臨床ヴィグネットの複雑な質問に対する従来の RAG との比較,および,Multitask Language Understanding (MMLU) データセットにおける様々な知識テストと比較した。特に、ゼロショットのi-MedRAGは、GPT-3.5上の既存のプロンプトエンジニアリングと微調整手法を全て上回り、MedQAデータセットの精度は69.68%である。さらに、i-MedRAGのスケーリング特性を、追従クエリの異なるイテレーションと、反復毎に異なるクエリ数で特徴付ける。今回のケーススタディでは,i-MedRAGが順応的にフォローアップクエリを問合せして推論連鎖を形成できることが示され,医学的質問の詳細な分析が可能となった。我々の知る限りでは、フォローアップクエリを医療用RAGに組み込むための最初の研究である。 i-MedRAGの実装はhttps://github.com/Teddy-XiongGZ/MedRAGで公開されている。

The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a conventional RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with conventional RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG. The implementation of i-MedRAG is available at https://github.com/Teddy-XiongGZ/MedRAG.

翻訳日:2024-11-08 13:29:21 公開日:2024-10-11

# 一般化量子スタインの補題と量子資源理論の第二法則

Generalized Quantum Stein's Lemma and Second Law of Quantum Resource Theories ( http://arxiv.org/abs/2408.02722v2 )

ライセンス: Link先を確認

Masahito Hayashi, Hayata Yamasaki,

(参考訳) 熱力学の第二の法則は物理学の基礎であり、単一の関数であるエントロピーを通した熱力学状態間の可換性を特徴づける。熱力学の普遍的な適用性を考えると、量子情報理論における基本的な疑問は、量子情報処理のリソースの変換性を単一の関数で特徴づけるために、類似の第2法則を定式化できるかどうかである。 2008年、有望な定式化が提案され、仮説テストの量子バージョンの変種における最適性能とリソース変換可能性のリンクが提案された。この定式化の中心は一般化された量子シュタインの補題であり、これは資源の正則化された相対エントロピーである量子資源の測度によってこの最適性能を特徴づけることを目的としていた。もし有効であると証明された場合、一般化された量子シュタインの補題は、熱力学におけるエントロピーの役割を果たすリソースの正則化された相対エントロピーを持つ量子資源の第二法則に繋がる。しかし2023年、この補題の元々の証明に論理的なギャップが見つかり、そのような第二法則の定式化の可能性に疑問が投げかけられた。本研究では、代替手法を開発し、一般化された量子シュタイン補題の証明に成功したことにより、この問題を解決する。この証明に基づき、量子状態の静的リソースと古典量子(CQ)チャネルで表される動的リソースの基本クラスの両方に適用できる量子資源理論の定式化を第2法則で再確立し、拡張する。これらの結果は、熱力学と量子情報理論の類似をブリッジする根本的な問題を解決している。

The second law of thermodynamics is the cornerstone of physics, characterizing the convertibility between thermodynamic states through a single function, entropy. Given the universal applicability of thermodynamics, a fundamental question in quantum information theory is whether an analogous second law can be formulated to characterize the convertibility of resources for quantum information processing by a single function. In 2008, a promising formulation was proposed, linking resource convertibility to the optimal performance of a variant of the quantum version of hypothesis testing. Central to this formulation was the generalized quantum Stein's lemma, which aimed to characterize this optimal performance by a measure of quantum resources, the regularized relative entropy of resource. If proven valid, the generalized quantum Stein's lemma would lead to the second law for quantum resources, with the regularized relative entropy of resource taking the role of entropy in thermodynamics. However, in 2023, a logical gap was found in the original proof of this lemma, casting doubt on the possibility of such a formulation of the second law. In this work, we resolve this problem by developing alternative techniques and successfully proving the generalized quantum Stein's lemma. Based on our proof, we reestablish and extend the formulation of quantum resource theories with the second law, applicable to both static resources of quantum states and a fundamental class of dynamical resources represented by classical-quantum (CQ) channels. These results resolve the fundamental problem of bridging the analogy between thermodynamics and quantum information theory.

翻訳日:2024-11-08 12:55:50 公開日:2024-10-11

# 一般化量子スタインの補題と量子資源理論の第二法則

Generalized Quantum Stein's Lemma and Second Law of Quantum Resource Theories ( http://arxiv.org/abs/2408.02722v3 )

ライセンス: Link先を確認

Masahito Hayashi, Hayata Yamasaki,

(参考訳) 熱力学の第二の法則は物理学の基礎であり、単一の関数であるエントロピーを通した熱力学状態間の可換性を特徴づける。熱力学の普遍的な適用性を考えると、量子情報理論における基本的な疑問は、量子情報処理のリソースの変換性を単一の関数で特徴づけるために、類似の第2法則を定式化できるかどうかである。 2008年、有望な定式化が提案され、仮説テストの量子バージョンの変種における最適性能とリソース変換可能性のリンクが提案された。この定式化の中心は一般化された量子シュタインの補題であり、これは資源の正則化された相対エントロピーである量子資源の測度によってこの最適性能を特徴づけることを目的としていた。もし有効であると証明された場合、一般化された量子シュタインの補題は、熱力学におけるエントロピーの役割を果たすリソースの正則化された相対エントロピーを持つ量子資源の第二法則に繋がる。しかし2023年、この補題の元々の証明に論理的なギャップが見つかり、そのような第二法則の定式化の可能性に疑問が投げかけられた。本研究では,従来の解析よりも小さな仮定の集合の下で一般化された量子シュタインの補題を証明するための代替手法を開発し,この問題に対処する。我々の証明に基づき、量子状態の静的リソースと古典量子(CQ)チャネルで表される動的リソースの基本クラスの両方に適用可能な量子資源理論の第2法則を再確立し、拡張する。これらの結果は、熱力学と量子情報理論の類似をブリッジする根本的な問題を解決している。

The second law of thermodynamics is the cornerstone of physics, characterizing the convertibility between thermodynamic states through a single function, entropy. Given the universal applicability of thermodynamics, a fundamental question in quantum information theory is whether an analogous second law can be formulated to characterize the convertibility of resources for quantum information processing by a single function. In 2008, a promising formulation was proposed, linking resource convertibility to the optimal performance of a variant of the quantum version of hypothesis testing. Central to this formulation was the generalized quantum Stein's lemma, which aimed to characterize this optimal performance by a measure of quantum resources, the regularized relative entropy of resource. If proven valid, the generalized quantum Stein's lemma would lead to the second law for quantum resources, with the regularized relative entropy of resource taking the role of entropy in thermodynamics. However, in 2023, a logical gap was found in the original proof of this lemma, casting doubt on the possibility of such a formulation of the second law. In this work, we address this problem by developing alternative techniques to successfully prove the generalized quantum Stein's lemma under a smaller set of assumptions than the original analysis. Based on our proof, we reestablish and extend the second law of quantum resource theories, applicable to both static resources of quantum states and a fundamental class of dynamical resources represented by classical-quantum (CQ) channels. These results resolve the fundamental problem of bridging the analogy between thermodynamics and quantum information theory.

翻訳日:2024-11-08 12:55:50 公開日:2024-10-11

# 消費者用空中水速スキャナ(AASS)と深層学習に基づく超高分解能再構築・検出ネットワークを用いた水中リッターモニタリング

Underwater litter monitoring using consumer-grade aerial-aquatic speedy scanner (AASS) and deep learning based super-resolution reconstruction and detection network ( http://arxiv.org/abs/2408.03564v2 )

ライセンス: Link先を確認

Fan Zhao, Yongying Liu, Jiaqi Wang, Yijia Chen, Dianhan Xi, Xinlei Shao, Shigeru Tabeta, Katsunori Mizuno,

(参考訳) 水中のゴミは湖、川、海などの水生環境に広く散らばっており、自然生態系に大きな影響を与えている。調査効率、コスト、環境条件における水中ゴミ検出の現在のモニタリング技術は、自動検出のための効率的でコンシューマレベルの技術の必要性を強調している。本研究では,Aerial-Aquatic Speedy Scanner (AASS) とSuper-Resolution Reconstruction (SRR) と改良されたYOLOv8検出ネットワークを紹介する。 AASSは従来の手法よりもデータ取得効率を高め、水中の廃棄物を正確に識別する高品質な画像をキャプチャする。 SRRは、動きのぼやけと解像度の不十分さを軽減し、画像分解能を向上させる。具体的には、RCANモデルは、試験されたSRRモデルのうち、再構成された画像の精度を78.6%と最も高い平均精度(mAP)を達成した。倍率係数が4のSRRテストセットは,従来のバイコビックセットに比べて改善されたmAPを示す。これらの結果から,提案手法の有効性が示唆された。

Underwater litter is widely spread across aquatic environments such as lakes, rivers, and oceans, significantly impacting natural ecosystems. Current monitoring technologies for detecting underwater litter face limitations in survey efficiency, cost, and environmental conditions, highlighting the need for efficient, consumer-grade technologies for automatic detection. This research introduces the Aerial-Aquatic Speedy Scanner (AASS) combined with Super-Resolution Reconstruction (SRR) and an improved YOLOv8 detection network. AASS enhances data acquisition efficiency over traditional methods, capturing high-quality images that accurately identify underwater waste. SRR improves image-resolution by mitigating motion blur and insufficient resolution, thereby enhancing detection tasks. Specifically, the RCAN model achieved the highest mean average precision (mAP) of 78.6% for detection accuracy on reconstructed images among the tested SRR models. With a magnification factor of 4, the SRR test set shows an improved mAP compared to the conventional bicubic set. These results demonstrate the effectiveness of the proposed method in detecting underwater litter.

翻訳日:2024-11-08 12:33:46 公開日:2024-10-11

# 分散型製造システムにおけるプロセス最適化のための状態ベースポテンシャルゲームへの移行学習

Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems ( http://arxiv.org/abs/2408.05992v2 )

ライセンス: Link先を確認

Steve Yuwono, Dorothea Schwung, Andreas Schwung,

(参考訳) 本稿では,製造システムにおける分散自己最適化の促進を目的とした,状態ベースポテンシャルゲーム(TL-SbPG)における新しいトランスファー学習手法を提案する。提案手法は, 大規模システムにおける自己学習のメカニズムを改善するために, 類似プレイヤー間の知識の共有と伝達を行う実践的な産業環境に焦点をあてる。 TL-SbPGでは、獲得した知識を他のプレイヤーが再利用してポリシーを最適化し、プレイヤーの学習結果を改善し、学習プロセスの加速を図ることができる。この目標を達成するために,プレイヤーの移動学習概念と類似性基準を開発し,2つの異なる設定を提供する。 a) プレーヤとプレーヤの事前定義された類似性 (b) トレーニング中に選手間の類似性を動的に推定した。我々は、転写学習におけるSbPGフレームワークの適用性を正式に証明する。さらに,学習段階における伝達学習手順の最適タイミングと重み付けを決定するための効率的な手法を提案する。実験室規模のテストベッドを用いた実験により, TL-SbPGは生産効率を著しく向上するとともに, 生産スケジュールの消費電力を低減し, ネイティブSbPGよりも優れた性能を示した。

This paper presents a novel transfer learning approach in state-based potential games (TL-SbPGs) for enhancing distributed self-optimization in manufacturing systems. The approach focuses on the practical relevant industrial setting where sharing and transferring gained knowledge among similar-behaved players improves the self-learning mechanism in large-scale systems. With TL-SbPGs, the gained knowledge can be reused by other players to optimize their policies, thereby improving the learning outcomes of the players and accelerating the learning process. To accomplish this goal, we develop transfer learning concepts and similarity criteria for players, which offer two distinct settings: (a) predefined similarities between players and (b) dynamically inferred similarities between players during training. We formally prove the applicability of the SbPG framework in transfer learning. Additionally, we introduce an efficient method to determine the optimal timing and weighting of the transfer learning procedure during the training phase. Through experiments on a laboratory-scale testbed, we demonstrate that TL-SbPGs significantly boost production efficiency while reducing power consumption of the production schedules while also outperforming native SbPGs.

翻訳日:2024-11-08 11:38:16 公開日:2024-10-11

# 希少・曖昧な単語の文脈化による大規模言語モデルに基づく音声認識の強化

Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words ( http://arxiv.org/abs/2408.08027v2 )

ライセンス: Link先を確認

Kento Nozawa, Takashi Masuko, Toru Taniguchi,

(参考訳) 我々は,テキストプロンプトの先行情報としてキーワードを提供することで,文脈認識が可能な大規模言語モデル (LLM) に基づく自動音声認識システムを開発した。我々はデコーダのみのアーキテクチャを採用し、日本語と英語が支配するデータセットをデコーダとして、スクラッチから事前学習した社内LLMであるPLaMo-100Bをデコーダとして使用する。我々は、事前訓練されたWhisperエンコーダをオーディオエンコーダとして採用し、オーディオエンコーダからのオーディオ埋め込みをアダプタ層によりテキスト埋め込み空間に投影し、テキストプロンプトから変換されたテキスト埋め込みと結合してデコーダへの入力を形成する。テキストプロンプトの先行情報としてキーワードを提供することにより、入力音声中の曖昧な単語を正確に書き起こすためにモデルアーキテクチャを変更することなく、LLMベースのASRシステムを文脈化することができる。実験結果から,デコーダにキーワードを付与することで,希少かつ曖昧な単語の認識性能を大幅に向上させることができることがわかった。

We develop a large language model (LLM) based automatic speech recognition (ASR) system that can be contextualized by providing keywords as prior information in text prompts. We adopt decoder-only architecture and use our in-house LLM, PLaMo-100B, pre-trained from scratch using datasets dominated by Japanese and English texts as the decoder. We adopt a pre-trained Whisper encoder as an audio encoder, and the audio embeddings from the audio encoder are projected to the text embedding space by an adapter layer and concatenated with text embeddings converted from text prompts to form inputs to the decoder. By providing keywords as prior information in the text prompts, we can contextualize our LLM-based ASR system without modifying the model architecture to transcribe ambiguous words in the input audio accurately. Experimental results demonstrate that providing keywords to the decoder can significantly improve the recognition performance of rare and ambiguous words.

翻訳日:2024-11-08 07:29:14 公開日:2024-10-11

# Ojaの可塑性規則は、生物学的制約下でニューラルネットワークを訓練する際のいくつかの課題を克服する

Oja's plasticity rule overcomes several challenges of training neural networks under biological constraints ( http://arxiv.org/abs/2408.08408v2 )

ライセンス: Link先を確認

Navid Shervani-Tabar, Marzieh Alireza Mirhoseini, Robert Rosenbaum,

(参考訳) 生体神経回路と深層人工ニューラルネットワーク(DNN)の類似点と相違点については,多くの文献がある。しかし、DNNの現代的なトレーニングは、データのバッチ化、正規化、適応オプティマイザ、正確なウェイト初期化といったいくつかのエンジニアリングトリックに依存している。 DNNのトレーニングにおいて重要な役割を担っているにもかかわらず、これらのエンジニアリングのトリックは、生物学的ネットワークと人工ネットワークの並行性を描画する際にしばしば見過ごされる。本研究では,Ojaの塑性規則が工学的トリックの必要性を部分的に克服していることを示す。具体的には、オンライン学習、深層建築、準最適重量初期化のような、難しいが生物学的に現実的な学習シナリオの下では、Ojaのルールは純粋なバックプロパゲーションの性能を大幅に向上させることができる。以上の結果から, 単純なシナプス可塑性規則はDNNのトレーニングにおいて, 生物学的に信頼性の低いアプローチで克服される学習の課題を克服できることが示された。

There is a large literature on the similarities and differences between biological neural circuits and deep artificial neural networks (DNNs). However, modern training of DNNs relies on several engineering tricks such as data batching, normalization, adaptive optimizers, and precise weight initialization. Despite their critical role in training DNNs, these engineering tricks are often overlooked when drawing parallels between biological and artificial networks, potentially due to a lack of evidence for their direct biological implementation. In this study, we show that Oja's plasticity rule partly overcomes the need for some engineering tricks. Specifically, under difficult, but biologically realistic learning scenarios such as online learning, deep architectures, and sub-optimal weight initialization, Oja's rule can substantially improve the performance of pure backpropagation. Our results demonstrate that simple synaptic plasticity rules can overcome challenges to learning that are typically overcome using less biologically plausible approaches when training DNNs.

翻訳日:2024-11-08 07:29:14 公開日:2024-10-11

# EmoDynamiX:混合感情と談話ダイナミクスをモデル化した感情支援対話戦略予測

EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics ( http://arxiv.org/abs/2408.08782v2 )

ライセンス: Link先を確認

Chenwei Wan, Matthieu Labeau, Chloé Clavel,

(参考訳) 苦痛を経験する人々に快適さとアドバイスを提供するために、感情的にインテリジェントな会話システムを設計することは、魅力的な研究分野である。近年,大規模言語モデル (LLM) の進歩に伴い,明示的な戦略予測ステップを伴わない対話エージェントが普及している。しかし、暗黙的な戦略計画には透明性が欠如しており、最近の研究では、特定の社会的感情戦略に対するLLMs固有の嗜好バイアスが、高品質な感情支援の提供を妨げることが示されている。この課題に対処するために,言語生成から切り離された戦略予測を提案するとともに,ユーザの微粒な感情とヘテロジニアスグラフを用いたシステム戦略間の談話ダイナミクスをモデル化し,パフォーマンスと透明性を向上させる新たな対話戦略予測フレームワークであるEmoDynamiXを導入する。 2つのESCデータセットの実験結果から、EmoDynamiXは従来の最先端手法よりも優れたマージン(習熟度と嗜好バイアスの低下)を示した。当社のアプローチは,意思決定のバックトレースを可能にすることで,透明性の向上も実現しています。

Designing emotionally intelligent conversational systems to provide comfort and advice to people experiencing distress is a compelling area of research. Recently, with advancements in large language models (LLMs), end-to-end dialogue agents without explicit strategy prediction steps have become prevalent. However, implicit strategy planning lacks transparency, and recent studies show that LLMs' inherent preference bias towards certain socio-emotional strategies hinders the delivery of high-quality emotional support. To address this challenge, we propose decoupling strategy prediction from language generation, and introduce a novel dialogue strategy prediction framework, EmoDynamiX, which models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency. Experimental results on two ESC datasets show EmoDynamiX outperforms previous state-of-the-art methods with a significant margin (better proficiency and lower preference bias). Our approach also exhibits better transparency by allowing backtracing of decision making.

翻訳日:2024-11-08 07:18:07 公開日:2024-10-11

# SurgicaL-CD:連続拡散モデルを用いた画像翻訳による手術画像の生成

SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models ( http://arxiv.org/abs/2408.09822v3 )

ライセンス: Link先を確認

Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Stefanie Speidel,

(参考訳) コンピュータ補助手術システム(CAS)は、手術中の外科医を補助し、合併症を軽減し、患者のケアを強化するように設計されている。これらのシステムのために機械学習モデルをトレーニングするには、大量の注釈付きデータセットが必要である。従来の手法では, シミュレーションからリアルな手術画像を作成するために, 生成モデルを用いて画像翻訳を行う方法が検討されている。しかし、これらのアプローチは高品質で多様な外科画像を作成するのに苦労している。そこで本研究では, ペアデータのないサンプル画像のみを用いて, リアルな画像を生成するために, 整合拡散法である \emph{SurgicaL-CD} を提案する。 3つのデータセットに対する我々のアプローチを評価し、下流トレーニングデータセットとして品質と有用性の観点から生成された画像を評価する。以上の結果から,本手法はGANや拡散に基づく手法よりも優れていることが示された。私たちのコードはhttps://gitlab.com/nct_tso_public/gan2diffusionで利用可能です。

Computer-assisted surgery (CAS) systems are designed to assist surgeons during procedures, thereby reducing complications and enhancing patient care. Training machine learning models for these systems requires a large corpus of annotated datasets, which is challenging to obtain in the surgical domain due to patient privacy concerns and the significant labeling effort required from doctors. Previous methods have explored unpaired image translation using generative models to create realistic surgical images from simulations. However, these approaches have struggled to produce high-quality, diverse surgical images. In this work, we introduce \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data. We evaluate our approach on three datasets, assessing the generated images in terms of quality and utility as downstream training datasets. Our results demonstrate that our method outperforms GANs and diffusion-based approaches. Our code is available at https://gitlab.com/nct_tso_public/gan2diffusion.

翻訳日:2024-11-08 06:55:48 公開日:2024-10-11

# SSL-TTS: Zero-Shot Multi-Speaker TTSのためのセルフスーパーバイディングとkNN検索

SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS ( http://arxiv.org/abs/2408.10771v2 )

ライセンス: Link先を確認

Karl El Hajal, Ajinkya Kulkarni, Enno Hermann, Mathew Magimai. -Doss,

(参考訳) 最近のゼロショットマルチ話者テキスト音声(TTS)モデルは印象的な結果をもたらすが、通常は多数の話者からの広範な音声データセットと複雑な訓練パイプラインに依存している。一方,TLSの効果的な中間表現として,自己教師付き学習(SSL)音声の特徴が出現している。また、個々の話者識別を維持しつつ、線形に共有音声情報を持つ異なる話者のSSLが特徴であり、ストレートフォワードとロバストな音声クローンを可能にすることも観察された。本研究では、単一話者からの音声の書き起こしに基づいて訓練された軽量で効率的なゼロショットTTSフレームワークであるSSL-TTSを紹介する。 SSL-TTSはSSLの機能と検索手法を利用して、シンプルで堅牢なゼロショットマルチスピーカー合成を行う。客観的および主観的評価は、我々のアプローチが、より大規模なトレーニングデータセットを必要とする最先端のモデルに匹敵する性能を達成することを示す。低トレーニングデータ要件は、SSL-TTSが低リソースドメインや言語向けのマルチスピーカーTSシステムの開発に適していることを意味する。また、音声をブレンドすることで出力音声の微妙な制御を可能にする補間パラメータも導入する。デモサンプルはhttps://idiap.github.io/ssl-ttsで入手できる。

While recent zero-shot multispeaker text-to-speech (TTS) models achieve impressive results, they typically rely on extensive transcribed speech datasets from numerous speakers and intricate training pipelines. Meanwhile, self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS. It was also observed that SSL features from different speakers that are linearly close share phonetic information while maintaining individual speaker identity, which enables straight-forward and robust voice cloning. In this study, we introduce SSL-TTS, a lightweight and efficient zero-shot TTS framework trained on transcribed speech from a single speaker. SSL-TTS leverages SSL features and retrieval methods for simple and robust zero-shot multi-speaker synthesis. Objective and subjective evaluations show that our approach achieves performance comparable to state-of-the-art models that require significantly larger training datasets. The low training data requirements mean that SSL-TTS is well suited for the development of multi-speaker TTS systems for low-resource domains and languages. We also introduce an interpolation parameter which enables fine control over the output speech by blending voices. Demo samples are available at https://idiap.github.io/ssl-tts

翻訳日:2024-11-08 06:33:41 公開日:2024-10-11

# SPARK:大規模ビジョンランゲージモデルのためのマルチビジョンセンサ知覚と推論ベンチマーク

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models ( http://arxiv.org/abs/2408.12114v3 )

ライセンス: Link先を確認

Youngjoon Yu, Sangyun Chung, Byung-Kwan Lee, Yong Man Ro,

(参考訳) 大規模ビジョンランゲージモデル (LVLM) はテキスト・アライン・ビジョン・インプットによって大幅に進歩している。彼らは、テキストモダリティを視覚入力と整合させることにより、コンピュータビジョンタスクにおいて顕著な進歩を遂げた。熱、深度、医療用X線画像など、RGB以外のマルチビジョンセンサーを組み込む試みもある。しかし、現在のLVLMは、マルチビジョンセンサの物理的特性を考慮せずに、同じRGB領域にあるかのように、マルチビジョンセンサから撮影した画像を見ることができる。データセットとそれに対応するコンテキスト知識から、基本的なマルチビジョンセンサー情報を適切に伝達することができない。その結果、実際の物理的環境から得られる情報とテキストとの整合性は正しくは得られず、物理的環境を考慮した複雑なセンサ関連質問への回答が困難になる。本稿では,画像とマルチビジョンセンサ間の基本的なマルチビジョンセンサ情報ギャップを低減するために,SPARKと呼ばれるマルチビジョンセンサ知覚と推論ベンチマークを確立することを目的とする。 6,248個の視覚言語検定サンプルを作成し,多視点感覚知覚と多視点感覚推論を,様々な種類のセンサ関連質問を対象とする物理センサ知識習熟度に基づいて検討した。我々は,これらの試料を用いて,LVLMを10個評価した。その結果、ほとんどのモデルでは、様々な範囲で多視点感覚理論の欠陥が見られた。コードとデータはhttps://github.com/top-yun/SPARKで公開されている。

Large-scale Vision-Language Models (LVLMs) have significantly advanced with text-aligned vision inputs. They have made remarkable progress in computer vision tasks by aligning text modality with vision inputs. There are also endeavors to incorporate multi-vision sensors beyond RGB, including thermal, depth, and medical X-ray images. However, we observe that current LVLMs view images taken from multi-vision sensors as if they were in the same RGB domain without considering the physical characteristics of multi-vision sensors. They fail to convey the fundamental multi-vision sensor information from the dataset and the corresponding contextual knowledge properly. Consequently, alignment between the information from the actual physical environment and the text is not achieved correctly, making it difficult to answer complex sensor-related questions that consider the physical environment. In this paper, we aim to establish a multi-vision Sensor Perception And Reasoning benchmarK called SPARK that can reduce the fundamental multi-vision sensor information gap between images and multi-vision sensors. We generated 6,248 vision-language test samples to investigate multi-vision sensory perception and multi-vision sensory reasoning on physical sensor knowledge proficiency across different formats, covering different types of sensor-related questions. We utilized these samples to assess ten leading LVLMs. The results showed that most models displayed deficiencies in multi-vision sensory reasoning to varying extents. Codes and data are available at https://github.com/top-yun/SPARK

翻訳日:2024-11-08 05:49:00 公開日:2024-10-11

# 拡散モデルがいかにして分解と構成を学ぶか

How Diffusion Models Learn to Factorize and Compose ( http://arxiv.org/abs/2408.13256v2 )

ライセンス: Link先を確認

Qiyao Liang, Ziming Liu, Mitchell Ostrow, Ila Fiete,

(参考訳) 拡散モデルは、トレーニングセットに一緒に現れない可能性のある要素を組み合わせて、フォトリアリスティックな画像を生成することができ、 \textit{compositionally generalize} の能力を示すことができる。それでも、構成性の正確なメカニズムと、それがいかにトレーニングによって獲得されるかは、いまだ解明されていない。認知神経科学的なアプローチに触発されて、拡散モデルが構成可能な特徴の意味的意味的・因果的表現を学習するかどうかを調べるために、高度に縮小された設定を考える。様々な2次元ガウスバンプ画像を生成するために訓練された条件付き拡散確率モデル(DDPM)について広範囲に制御実験を行った。その結果,データに基づく変動の連続的な特徴を符号化するために,モデルが分解されるが完全連続な多様体表現を学習することが判明した。このような表現では、モデルは優れた特徴合成性を示すが、ある特徴の見えない値を補間する能力は限定的である。さらに, 実験結果から, 拡散モデルが構成例が少なく, 構成性が得られることが示され, DDPMの訓練方法がより効率的であることが示唆された。最後に、拡散モデルの多様体形成と物理学のパーコレーション理論を結びつけ、因子化表現学習の突然の開始についての洞察を提供する。これにより, 拡散モデルがデータ中の構成構造をどのように捉えているか, より深く理解することができる。

Diffusion models are capable of generating photo-realistic images that combine elements which likely do not appear together in the training set, demonstrating the ability to \textit{compositionally generalize}. Nonetheless, the precise mechanism of compositionality and how it is acquired through training remains elusive. Inspired by cognitive neuroscientific approaches, we consider a highly reduced setting to examine whether and when diffusion models learn semantically meaningful and factorized representations of composable features. We performed extensive controlled experiments on conditional Denoising Diffusion Probabilistic Models (DDPMs) trained to generate various forms of 2D Gaussian bump images. We found that the models learn factorized but not fully continuous manifold representations for encoding continuous features of variation underlying the data. With such representations, models demonstrate superior feature compositionality but limited ability to interpolate over unseen values of a given feature. Our experimental results further demonstrate that diffusion models can attain compositionality with few compositional examples, suggesting a more efficient way to train DDPMs. Finally, we connect manifold formation in diffusion models to percolation theory in physics, offering insight into the sudden onset of factorized representation learning. Our thorough toy experiments thus contribute a deeper understanding of how diffusion models capture compositional structure in data.

翻訳日:2024-11-08 05:26:28 公開日:2024-10-11

# 大規模言語モデルにおける学習自由なアクティベーション空間

Training-Free Activation Sparsity in Large Language Models ( http://arxiv.org/abs/2408.14690v2 )

ライセンス: Link先を確認

James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun,

(参考訳) アクティベーションスパシティは、前方通過時の行列乗算に必要な計算量とメモリ移動量を削減し、大規模言語モデル(LLM)における実用的な推論スピードアップを可能にする。しかし、既存の手法は広く普及を阻害する限界に直面している。いくつかのアプローチは、ReLUベースのスパーシリティを持つ古いモデル向けに調整されているが、数千億のトークンに対して、広範な事前トレーニングを必要とするものもある。本稿では,TEALについて述べる。TEALは,モデル全体にわたって,大域的なアクティベーション間隔を隠蔽状態に適用する,単純なトレーニング不要な手法である。 TEALは、Llama-2、Llama-3、Mistralファミリ間の性能劣化を最小限に抑えながら、40-50%のモデル幅を実現し、サイズは7Bから70Bまで様々である。既存のスパースカーネルを改善し、最大1.53$\times$と1.8$\times$のウォールクロック復号速度を40%および50%のモデル幅で示す。 TEALは重量量子化と互換性があり、さらなる効率向上を可能にする。

Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older models with ReLU-based sparsity, while others require extensive continued pre-training on up to hundreds of billions of tokens. This paper describes TEAL, a simple training-free method that applies magnitude-based activation sparsity to hidden states throughout the entire model. TEAL achieves 40-50% model-wide sparsity with minimal performance degradation across Llama-2, Llama-3, and Mistral families, with sizes varying from 7B to 70B. We improve existing sparse kernels and demonstrate wall-clock decoding speed-ups of up to 1.53$\times$ and 1.8$\times$ at 40% and 50% model-wide sparsity. TEAL is compatible with weight quantization, enabling further efficiency gains.

翻訳日:2024-11-08 04:52:58 公開日:2024-10-11

# 生成検証:次世代予測としてのリワードモデリング

Generative Verifiers: Reward Modeling as Next-Token Prediction ( http://arxiv.org/abs/2408.15240v2 )

ライセンス: Link先を確認

Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal,

(参考訳) 検証や報酬モデルはしばしば、大きな言語モデル(LLM)の推論性能を高めるために使われる。一般的なアプローチはBest-of-N法であり、LLMによって生成されるN候補解は検証器によってランク付けされ、最もよい解が選択される。 LLMベースの検証は、通常、解を採点するために識別分類器として訓練されるが、事前訓練されたLLMのテキスト生成能力は利用しない。この制限を克服するために、我々は、ユビキタスな次世代予測目標を用いて、検証とソリューション生成を共同で行うトレーニング検証を提案する。このような生成検証器(genRM)は、標準的な検証器と比較して、命令チューニングとシームレスに統合し、チェーン・オブ・シント推論を可能とし、多数決によるテスト時間計算を有効活用することで、LLMのいくつかの利点の恩恵を受けることができる。 GenRM は差別的, DPO 検証, LLM-as-a-Judge よりも優れており, アルゴリズム的および数学的推論タスクにおいて, Best-of-N を用いて解いた問題の数に対して 16-40% の改善が得られた。さらに, 算数問題に対する微妙な誤りを抽出するためには, 総合的検証理論を用いたGenRMの学習が十分であることがわかった。最後に、生成検証器がモデルサイズと推論時間計算で好適にスケールできることを実証する。

Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. Compared to standard verifiers, such generative verifiers (GenRM) can benefit from several advantages of LLMs: they integrate seamlessly with instruction tuning, enable chain-of-thought reasoning, and can utilize additional test-time compute via majority voting for better verification. We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge, resulting in a 16-40% improvement in the number of problems solved with Best-of-N on algorithmic and math reasoning tasks. Furthermore, we find that training GenRM with synthetic verification rationales is sufficient to pick out subtle errors on math problems. Finally, we demonstrate that generative verifiers scale favorably with model size and inference-time compute.

翻訳日:2024-11-08 04:41:58 公開日:2024-10-11

# ASR-LLMセットアップにおける日本語音声認識の高速化と生成誤差補正

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction ( http://arxiv.org/abs/2408.16180v2 )

ライセンス: Link先を確認

Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara,

(参考訳) 大きな言語モデル(LLM)の強力な表現力により、自動音声認識(ASR)のための生成誤り補正(GER)は、ASRの誤りに対処するための意味的および音声的改善を提供することを目的としている。本研究では,LLMをベースとしたGERが日本語処理能力の強化と拡張を実現し,0.9-2.6kテキスト発声による日本語ASRのGERベンチマークを初めて提示する。また、入力側で複数のシステム仮説を統合し、出力側で複数のLSMを補正し、それらをマージすることで、新しいマルチパス拡張生成誤差補正(MPA GER)を導入する。我々の知る限りでは、ASRシステム(例えば、N-best仮説)が生成した出力書き起こしにおける第二パス言語モデリングを含む日本語 GER 用 LLM の使用に関する最初の調査である。実験では,SPREDS-U1-jaデータとCSJデータの両方において,ASR品質と一般化の手法による性能改善を実証した。

With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances. We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them. To the best of our knowledge, this is the first investigation of the use of LLMs for Japanese GER, which involves second-pass language modeling on the output transcriptions generated by the ASR system (e.g., N-best hypotheses). Our experiments demonstrated performance improvement in the proposed methods of ASR quality and generalization both in SPREDS-U1-ja and CSJ data.

翻訳日:2024-11-08 04:19:50 公開日:2024-10-11

# FlowRetrieval:Few-Shot Imitation LearningのためのFlow-Guided Data Retrieval

FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning ( http://arxiv.org/abs/2408.16944v2 )

ライセンス: Link先を確認

Li-Heng Lin, Yuchen Cui, Amber Xie, Tianyu Hua, Dorsa Sadigh,

(参考訳) 擬似学習は、与えられた下流タスクに対するポリシーを効率的に適応するために、少数のタスク固有のデモンストレーションにのみ依存する。検索ベースのメソッドには,関連する過去の経験を検索して,ポリシ学習時に対象データを拡張する,という約束がある。しかし、既存のデータ検索手法は2つの極端に該当する。それらは、前提に適さない事前データにおいて視覚的に類似したシーンを持つ正確な行動の存在に依存するか、あるいはタスクの高レベルの言語記述のセマンティックな類似性に基づいて検索する。本研究では,多量のタスクデータにおける動きの類似性を利用して,目的タスクの少数の模倣学習を改善する方法について検討する。私たちのキーとなる洞察は、モーション類似データには、アクションとオブジェクトの相互作用の影響についての豊富な情報があり、それは、数発の適応で活用できるということだ。本稿では,従来のデータから類似した動作を抽出すると同時に,そのようなデータから最大限の利益を得ることのできるポリシの学習を指導するために,光フロー表現を利用したFlowRetrievalを提案する。その結果、FlowRetrievalは、シミュレーションや実世界のドメイン間で先行手法よりも優れており、最高の検索ベースの先行手法よりも平均27%高い成功率を実現していることがわかった。実のFranka EmikaロボットによるPen-in-Cupタスクにおいて、FlowRetrievalは、すべての事前および対象データから学習するベースライン模倣学習技術の性能を3.7倍に向上させる。 Webサイト: https://flow-retrieval.github.io

Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behaviors with visually similar scenes in the prior data, which is impractical to assume; or they retrieve based on semantic similarity of high-level language descriptions of the task, which might not be that informative about the shared low-level behaviors or motions across tasks that is often a more important factor for retrieving relevant data for policy learning. In this work, we investigate how we can leverage motion similarity in the vast amount of cross-task data to improve few-shot imitation learning of the target task. Our key insight is that motion-similar data carries rich information about the effects of actions and object interactions that can be leveraged during few-shot adaptation. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains, achieving on average 27% higher success rate than the best retrieval-based prior method. In the Pen-in-Cup task with a real Franka Emika robot, FlowRetrieval achieves 3.7x the performance of the baseline imitation learning technique that learns from all prior and target data. Website: https://flow-retrieval.github.io

翻訳日:2024-11-08 04:08:49 公開日:2024-10-11

# 適応型大規模言語モデルにおける安全性層 - LLMセキュリティの鍵

Safety Layers in Aligned Large Language Models: The Key to LLM Security ( http://arxiv.org/abs/2408.17003v2 )

ライセンス: Link先を確認

Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li,

(参考訳) LLMは安全で、悪意のある質問を認識し、拒否することができる。しかし、そのようなセキュリティ維持における内部パラメータの役割はまだよく理解されておらず、さらに、悪意のないバックドアや通常のデータで微調整された場合、これらのモデルはセキュリティの劣化に対して脆弱である可能性がある。これらの課題に対処するため、我々の研究は、パラメータレベルでLLMをアライメントする際のセキュリティのメカニズムを明らかにし、モデルの中心にある小さな連続した層を特定します。まず、モデルの内部層内の入力ベクトルの変動を分析することにより、これらの安全層の存在を確かめる。さらに、オーバーリジェクション現象とパラメータスケーリング分析を利用して、安全層を正確に特定する。これらの知見に基づいて, 安全部分パラメータ細調整法(SPPFT)を提案する。提案手法は, 完全微調整と比較して, 性能を保ち, 計算資源の削減を図りながら, LLMの安全性を著しく維持できることを示す。

Aligned LLMs are secure, capable of recognizing and refusing to answer malicious questions. However, the role of internal parameters in maintaining such security is not well understood yet, further these models can be vulnerable to security degradation when fine-tuned with non-malicious backdoor or normal data. To address these challenges, our work uncovers the mechanism behind security in aligned LLMs at the parameter level, identifying a small set of contiguous layers in the middle of the model that are crucial for distinguishing malicious queries from normal ones, referred to as "safety layers". We first confirm the existence of these safety layers by analyzing variations in input vectors within the model's internal layers. Additionally, we leverage the over-rejection phenomenon and parameters scaling analysis to precisely locate the safety layers. Building on these findings, we propose a novel fine-tuning approach, Safely Partial-Parameter Fine-Tuning (SPPFT), that fixes the gradient of the safety layers during fine-tuning to address the security degradation. Our experiments demonstrate that the proposed approach can significantly preserve LLM security while maintaining performance and reducing computational resources compared to full fine-tuning.

翻訳日:2024-11-08 04:08:49 公開日:2024-10-11

# 適応型大規模言語モデルにおける安全性層 - LLMセキュリティの鍵

Safety Layers in Aligned Large Language Models: The Key to LLM Security ( http://arxiv.org/abs/2408.17003v3 )

ライセンス: Link先を確認

Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li,

翻訳日:2024-11-08 04:08:49 公開日:2024-10-11

# LLMの幻覚グラフも--構造的視点から

LLMs hallucinate graphs too: a structural perspective ( http://arxiv.org/abs/2409.00159v2 )

ライセンス: Link先を確認

Erwan Le Merrer, Gilles Tredan,

(参考訳) LLMが幻覚、すなわち誤った情報を事実として返すことは知られている。本稿では,これらの幻覚を構造化された形で研究する可能性について紹介する。この文脈における幻覚は、文献からよく知られたグラフ(例えば、Karate club, Les Mis\'erables, graph atlas)に刺激されたときの誤った出力である。これらの幻覚グラフは、文の事実的正確性(あるいはそうでない)よりもはるかにリッチであるという利点がある。我々の最初の貢献は、主要な近代LSMからのトポロジカル幻覚の多様性を観察することである。 2つ目の貢献は、グラフアトラス集合内のいくつかのグラフからの平均的なグラフ編集距離であるグラフアトラス距離という、このような幻覚の振幅に対する計量の提案である。我々は、この指標を、そのランクを得るために1万倍のプロンプトを利用する幻覚のランクである幻覚のリーダーボードと比較する。

It is known that LLMs do hallucinate, that is, they return incorrect information as facts. In this paper, we introduce the possibility to study these hallucinations under a structured form: graphs. Hallucinations in this context are incorrect outputs when prompted for well known graphs from the literature (e.g. Karate club, Les Mis\'erables, graph atlas). These hallucinated graphs have the advantage of being much richer than the factual accuracy -- or not -- of a statement; this paper thus argues that such rich hallucinations can be used to characterize the outputs of LLMs. Our first contribution observes the diversity of topological hallucinations from major modern LLMs. Our second contribution is the proposal of a metric for the amplitude of such hallucinations: the Graph Atlas Distance, that is the average graph edit distance from several graphs in the graph atlas set. We compare this metric to the Hallucination Leaderboard, a hallucination rank that leverages 10,000 times more prompts to obtain its ranking.

翻訳日:2024-11-08 03:57:28 公開日:2024-10-11

# FMRFT:Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking

FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking ( http://arxiv.org/abs/2409.01148v2 )

ライセンス: Link先を確認

Mingyuan Yao, Yukang Huo, Qingbin Tian, Jiayin Zhao, Xiao Liu, Ruifeng Wang, Lin Xue, Haihua Wang,

(参考訳) 深層学習技術を用いた魚の追跡によって, 病気や飢餓による異常魚の行動の早期検出が可能であり, 工業用養殖にとって重要な意味を持つ。しかし、水中での反射や、高い類似性、刺激による急激な水泳、相互閉塞などのいくつかの理由により、魚のマルチターゲット追跡に困難が生じる。これらの課題に対処するため、本稿では、複雑なマルチシナリオ・スタージョン追跡データセットを確立し、リアルタイムなエンドツーエンド魚追跡ソリューションであるFMRFTモデルを提案する。このモデルには、マルチフレームの時間記憶と特徴抽出を容易にするMIMアーキテクチャが組み込まれており、複数の魚をフレーム間で追跡することの難しさに対処している。さらに、QTSI(Query Time Sequence Intersection)モジュールを備えたFMRFTモデルは、隠蔽されたオブジェクトを効果的に管理し、RT-DETRの優れた機能相互作用と事前フレーム処理機能を用いて冗長なトラッキングフレームを削減する。この組み合わせは魚追跡の精度と安定性を著しく向上させる。データセット上でトレーニングおよびテストが行われ、IDF1スコアは90.3%、MOTA精度は94.3%である。実験結果から,FMRFTモデルは魚の群集における相似性と相互排除の課題に効果的に対処し,工場の農業環境における正確な追跡を可能にすることが示唆された。

Early detection of abnormal fish behavior caused by disease or hunger can be achieved through fish tracking using deep learning techniques, which holds significant value for industrial aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity, rapid swimming caused by stimuli and mutual occlusion bring challenges to multi-target tracking of fish. To address these challenges, this paper establishes a complex multi-scenario sturgeon tracking dataset and introduces the FMRFT model, a real-time end-to-end fish tracking solution. The model incorporates the low video memory consumption Mamba In Mamba (MIM) architecture, which facilitates multi-frame temporal memory and feature extraction, thereby addressing the challenges to track multiple fish across frames. Additionally, the FMRFT model with the Query Time Sequence Intersection (QTSI) module effectively manages occluded objects and reduces redundant tracking frames using the superior feature interaction and prior frame processing capabilities of RT-DETR. This combination significantly enhances the accuracy and stability of fish tracking. Trained and tested on the dataset, the model achieves an IDF1 score of 90.3% and a MOTA accuracy of 94.3%. Experimental results show that the proposed FMRFT model effectively addresses the challenges of high similarity and mutual occlusion in fish populations, enabling accurate tracking in factory farming environments.

翻訳日:2024-11-08 03:35:26 公開日:2024-10-11

# 確率量子化を用いた高次元データのロバストクラスタリング

Robust Clustering on High-Dimensional Data with Stochastic Quantization ( http://arxiv.org/abs/2409.02066v3 )

ライセンス: Link先を確認

Anton Kozyriev, Vladimir Norkin,

(参考訳) 本稿では,従来のベクトル量子化アルゴリズム,特にK-Meansとその変種K-Means++の限界に対処し,SQアルゴリズムを高次元教師なし・半教師付き学習タスクのスケーラブルな代替手段として検討する。従来のクラスタリングアルゴリズムは、計算中の非効率なメモリ利用に悩まされることが多く、すべてのデータサンプルをメモリにロードする必要があるため、大規模なデータセットでは実用的ではない。 Mini-Batch K-Meansのような変種は、メモリ使用量の削減によってこの問題を部分的に緩和するが、クラスタリング問題の非凸性に起因する堅牢な理論的収束保証は欠如している。対照的に、確率量子化アルゴリズムは強力な理論的収束保証を提供し、クラスタリングタスクの堅牢な代替となる。本研究では,ラベル付きデータとラベル付きデータの様々な比率でモデル精度を比較し,部分ラベル付きデータを用いた画像分類問題に対して,アルゴリズムの計算効率と迅速な収束性を実証する。高次元化の課題に対処するため,我々は,Stochastic Quantizationアルゴリズムと従来の量子化アルゴリズムの両アルゴリズムの効率を比較する基盤となる,潜時空間の低次元表現に画像をエンコードするトリプレットネットワークを用いた。さらに,適応学習率による修正を導入することにより,アルゴリズムの収束速度を向上させる。

This paper addresses the limitations of conventional vector quantization algorithms, particularly K-Means and its variant K-Means++, and investigates the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning tasks. Traditional clustering algorithms often suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as Mini-Batch K-Means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, the Stochastic Quantization algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data, comparing model accuracy across various ratios of labeled to unlabeled data. To address the challenge of high dimensionality, we employ a Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both the Stochastic Quantization algorithm and traditional quantization algorithms. Furthermore, we enhance the algorithm's convergence speed by introducing modifications with an adaptive learning rate.

翻訳日:2024-11-07 23:56:04 公開日:2024-10-11

# LibMOON: PyTorchのグラディエントベースの多目的最適化ライブラリ

LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch ( http://arxiv.org/abs/2409.02969v3 )

ライセンス: Link先を確認

Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Yifan Chen, Han Zhao, Qingfu Zhang,

(参考訳) マルチ目的最適化問題(MOP)は、機械学習、マルチタスク学習、公正性や堅牢性制約下での学習などにおいて広く用いられている。複数の目的関数をスカラー目的関数に還元する代わりに、MOPは、数千/数百万のパラメータを持つモデルよりも複数の目的関数を同時に最適化することを含む、いわゆるパレート最適性(Pareto optimality)あるいはパレート集合学習(Pareto set learning)を最適化することを目指している。 MOPの既存のベンチマークライブラリは、主に進化的アルゴリズムに焦点を当てており、そのほとんどはゼロ階数/メタヒューリスティックな手法であり、目的からの高階情報を効果的に利用せず、数千/数百万のパラメータを持つ大規模モデルにスケールできない。本稿では,このギャップを考慮し,最先端の勾配法をサポートする初の多目的最適化ライブラリであるLibMOONを紹介する。

Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with thousands / millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order / meta-heuristic methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with thousands / millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.

翻訳日:2024-11-07 23:34:03 公開日:2024-10-11

# FairQuant: ディープニューラルネットワークの検証と定量化

FairQuant: Certifying and Quantifying Fairness of Deep Neural Networks ( http://arxiv.org/abs/2409.03220v2 )

ライセンス: Link先を確認

Brian Hyeongseok Kim, Jingbo Wang, Chao Wang,

(参考訳) 本稿では,ディープニューラルネットワーク(DNN)の個人的公正性を正式に証明し,定量化する手法を提案する。個人的公正性は、法的に保護された属性(例えば、性別や人種)を除いて同一の2つの個人が同じ処置を受けることを保証している。このような保証を提供する技術は存在するが、DNNのサイズや入力次元が大きくなるにつれてスケーラビリティや精度の欠如に悩まされる傾向がある。本手法は, DNNのシンボル間隔に基づく解析に抽象化を適用し, そして, フェアネス特性に導かれる反復的改良を施すことにより, この制限を克服する。さらに,本手法は,DNNが公平かどうかを判断するだけでなく,分類結果が妥当である個人の割合を計算することによって,従来の定性認証から定量的認証まで,記号間隔に基づく分析を引き上げている。提案手法を実装し,4つの人気フェアネス研究データセットに基づいてトレーニングしたディープニューラルネットワーク上で評価を行った。実験結果から,本手法は最先端技術よりも精度が高いだけでなく,桁違いに高速であることがわかった。

We propose a method for formally certifying and quantifying individual fairness of deep neural networks (DNN). Individual fairness guarantees that any two individuals who are identical except for a legally protected attribute (e.g., gender or race) receive the same treatment. While there are existing techniques that provide such a guarantee, they tend to suffer from lack of scalability or accuracy as the size and input dimension of the DNN increase. Our method overcomes this limitation by applying abstraction to a symbolic interval based analysis of the DNN followed by iterative refinement guided by the fairness property. Furthermore, our method lifts the symbolic interval based analysis from conventional qualitative certification to quantitative certification, by computing the percentage of individuals whose classification outputs are provably fair, instead of merely deciding if the DNN is fair. We have implemented our method and evaluated it on deep neural networks trained on four popular fairness research datasets. The experimental results show that our method is not only more accurate than state-of-the-art techniques but also several orders-of-magnitude faster.

翻訳日:2024-11-07 23:23:02 公開日:2024-10-11

# エントロピー駆動型エンタングルメント鍛造

Entropy-driven entanglement forging ( http://arxiv.org/abs/2409.04510v2 )

ライセンス: Link先を確認

Axel Pérez-Obiol, Sergi Masot-Llima, Antonio M. Romero, Javier Menéndez, Arnau Rios, Artur García-Sáez, Bruno Juliá-Díaz,

(参考訳) 変動量子アルゴリズムを用いた物理系シミュレーションは、よく研究されているアプローチであるが、量子ビット数と回路深さの要求により、現在のデバイスで実装することは困難である。システムにおける知識,すなわちサブシステムのエントロピー,エンタングルメント構造,あるいは特定の対称性の制限が,エンタングルメント鍛造によるこれらのアルゴリズムのコスト削減にどの程度有効かを示す。そのため、核シェルモデルで原子核${}^{28}$Neおよび${}^{60}$Tiと同様に、パラメトリズドホッピング項を持つフェルミ・ハッバード一次元鎖をシミュレートする。適応型変分量子固有解法を用いて、量子回路に必要な量子ビットの最大数(最大4分の1)と2量子ビットゲートの量(桁数以上)の両方において、大幅な減少が認められる。提案手法は, エントロピー駆動型エンタングルメント鍛造法を用いて, ノイズの多い中間規模量子デバイスの限界に量子シミュレーションを適応させることが可能である。

Simulating physical systems with variational quantum algorithms is a well-studied approach, but it is challenging to implement in current devices due to demands in qubit number and circuit depth. We show how limited knowledge of the system, namely the entropy of its subsystems, its entanglement structure or certain symmetries, can be used to reduce the cost of these algorithms with entanglement forging. To do so, we simulate a Fermi-Hubbard one-dimensional chain with a parametrized hopping term, as well as atomic nuclei ${}^{28}$Ne and ${}^{60}$Ti with the nuclear shell model. Using an adaptive variational quantum eigensolver we find significant reductions in both the maximum number of qubits (up to one fourth) and the amount of two-qubit gates (over an order of magnitude) required in the quantum circuits. Our findings indicate that our method, entropy-driven entanglement forging, can be used to adjust quantum simulations to the limitations of noisy intermediate-scale quantum devices.

翻訳日:2024-11-07 23:00:54 公開日:2024-10-11

# 言語モデルにおける真理と政治的バイアスの関係について

On the Relationship between Truth and Political Bias in Language Models ( http://arxiv.org/abs/2409.05283v2 )

ライセンス: Link先を確認

Suyash Fulay, William Brannon, Shrestha Mohanty, Cassandra Overney, Elinor Poole-Dayan, Deb Roy, Jad Kabbara,

(参考訳) 言語モデルアライメントの研究は、モデルが有用で害のないだけでなく、真実で偏見のないものであることを保証するためにしばしば試みる。しかし、これらの目的を同時に最適化することは、ある側面の改善が他の側面にどのように影響するかを曖昧にする可能性がある。本研究では、言語モデルアライメントと政治科学の両方に不可欠な2つの概念(真理性と政治的偏見)の関係を分析することに焦点を当てる。我々は、様々な人気真実性データセットの報酬モデルを訓練し、その後、彼らの政治的偏見を評価する。以上の結果から,これらのデータセットの真正性に対する報酬モデルの最適化は,政治的偏見を左右する傾向にあることが明らかとなった。また、既存のオープンソース報酬モデル(つまり、標準的な人間の嗜好データセットでトレーニングされたモデル)も、同様のバイアスを示しており、より大きなモデルではバイアスが大きくなっていることもわかりました。これらの結果は、真理を表わすために使用されるデータセット、真理と政治的に偏見のないモデルに整合する潜在的な制限、真理と政治の関係について言語モデルが捉えるものについて重要な疑問を提起する。

Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political bias. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e., those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions about the datasets used to represent truthfulness, potential limitations of aligning models to be both truthful and politically unbiased, and what language models capture about the relationship between truth and politics.

翻訳日:2024-11-07 22:38:45 公開日:2024-10-11

# Mpox Narrative on Instagram: 感情、ヘイトスピーチ、不安分析のためのMpox上のInstagram投稿のラベル付き多言語データセット

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis ( http://arxiv.org/abs/2409.05292v4 )

ライセンス: Link先を確認

Nirmalya Thakur,

(参考訳) WHOは、世界保健機関(WHO)の国際的懸念の公衆衛生非常事態を宣言している。ソーシャルメディアのマイニングに関する以前の研究は、mpoxのアウトブレイクに関するInstagram投稿のデータセットの開発に重点を置いていなかった。本研究は, この研究ギャップに対処し, この分野に2つの科学的貢献を行うことを目的としている。まず、2022年7月23日から2024年9月5日までに発行されたmpoxに関する60,127のInstagram投稿の多言語データセットを示す。データセットはhttps://dx.doi.org/10.21227/7fvc-y093で公開されている。これらの投稿のそれぞれについて、データセット内の別々の属性として、ポストID、ポスト説明、出版日時、言語、翻訳版(Google Translate APIを使用して英訳が行われた)が提示される。このデータセットを開発した後、感情分析、ヘイトスピーチ検出、不安やストレス検出を行った。このプロセスには各ポストを分類することが含まれる。 (i)恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立という感情階級の1つ (二)憎むこと、憎まないこと、 (3)不安・ストレス、または不安・ストレスは検出されなかった。これらの結果はデータセット内の別の属性として示されます。次に、感情分析、ヘイトスピーチ分析、不安やストレス分析の結果について述べる。恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立性の差は27.95%、2.57%、8.69%、5.94%、2.69%、1.53%、50.64%であった。ヘイトスピーチの検出に関しては、95.75%の投稿にはヘイトが含まれておらず、残りの4.25%にはヘイトが含まれていた。最後に、投稿の72.05%は不安/ストレスを示しておらず、残りの27.95%はある種の不安/ストレスを表している。

The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. No prior work related to social media mining has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper aims to address this research gap and makes two scientific contributions to this field. First, it presents a multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. The dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram posts about mpox in 52 languages. For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were performed. This process included classifying each post into (i) one of the sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no anxiety/stress detected. These results are presented as separate attributes in the dataset. Second, this paper presents the results of performing sentiment analysis, hate speech analysis, and anxiety or stress analysis. The variation of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and 50.64%, respectively. In terms of hate speech detection, 95.75% of the posts did not contain hate and the remaining 4.25% of the posts contained hate. Finally, 72.05% of the posts did not indicate any anxiety/stress, and the remaining 27.95% of the posts represented some form of anxiety/stress.

翻訳日:2024-11-07 22:38:45 公開日:2024-10-11

# 自動走行におけるトレーサブル動作仕様へのオントロジー的アプローチ

An Ontology-based Approach Towards Traceable Behavior Specifications in Automated Driving ( http://arxiv.org/abs/2409.06607v2 )

ライセンス: Link先を確認

Nayel Fabian Salem, Marcus Nolte, Veronica Haber, Till Menzel, Hans Steege, Robert Graubohm, Markus Maurer,

(参考訳) 自動走行システムを備えた公共交通機関の車両には、様々な期待が寄せられている: その他の面において、その行動は安全であり、道路の規則に適合し、利用者に移動性を提供するべきである。開発者は、例えば、システム設計時の要件の観点から、この振る舞いを指定する責任を負います。この記事で論じるとおり、この仕様は常に前提とトレードオフの必要性を伴います。その結果、そのような振舞い仕様の不足が生じ、安全でないシステムの振舞いに繋がる可能性がある。仕様の不備の特定を支援するには、要件とそれぞれの前提を明確にする必要がある。本稿では,自動走行システム搭載車両の動作を特定するためのオントロジーに基づく手法として,セマンティックノーム行動解析を提案する。オントロジーを用いて、対象とする運用環境の特定動作を正式に表現し、特定動作と対処するステークホルダーのニーズの間のトレーサビリティを確立する。さらに,ドイツの法律文脈におけるセマンティックノルム行動分析の適用例を2つのシナリオで説明し,その結果を評価した。評価の結果,行動仕様における仮定の明示的な文書化は,仕様の不備の特定と治療の両立を支えていることが明らかとなった。そこで本論文は,自動走行におけるオントロジーに基づく行動仕様を容易にするための要件,用語,およびそれに対応する方法論を提供する。

Vehicles in public traffic that are equipped with Automated Driving Systems are subject to a number of expectations: Among other aspects, their behavior should be safe, conforming to the rules of the road and provide mobility to their users. This poses challenges for the developers of such systems: Developers are responsible for specifying this behavior, for example, in terms of requirements at system design time. As we will discuss in the article, this specification always involves the need for assumptions and trade-offs. As a result, insufficiencies in such a behavior specification can occur that can potentially lead to unsafe system behavior. In order to support the identification of specification insufficiencies, requirements and respective assumptions need to be made explicit. In this article, we propose the Semantic Norm Behavior Analysis as an ontology-based approach to specify the behavior for an Automated Driving System equipped vehicle. We use ontologies to formally represent specified behavior for a targeted operational environment, and to establish traceability between specified behavior and the addressed stakeholder needs. Furthermore, we illustrate the application of the Semantic Norm Behavior Analysis in a German legal context with two example scenarios and evaluate our results. Our evaluation shows that the explicit documentation of assumptions in the behavior specification supports both the identification of specification insufficiencies and their treatment. Therefore, this article provides requirements, terminology and an according methodology to facilitate ontology-based behavior specifications in automated driving.

翻訳日:2024-11-07 22:05:05 公開日:2024-10-11

# 複素数による自動微分に関するチュートリアル

A tutorial on automatic differentiation with complex numbers ( http://arxiv.org/abs/2409.06752v2 )

ライセンス: Link先を確認

Nicholas Krämer,

(参考訳) 自動微分は至る所にあるが、複雑な算術においてどのように機能するかに関する最小限の文書は、「$\mathbb{C}^d$」$\cong$「$\mathbb{R}^{2d}$の微分」や、Wirtinger calculusへの浅い参照」以上のものしか存在しない。残念なことに、値 $\mathbb{C}^d \cong \mathbb{R}^{2d}$ は、高額な線型代数関数や微分方程式シミュレータの微分を避けるために、カスタム勾配規則を導出する必要があるとすぐに不足する。このような文書の欠如に対処するため、この記事では、複素数による前方および逆モードの自動微分を調査し、正則性やコーシー-リーマン方程式を明示的に避けながら、ウィッティンガー微分、修正鎖則、異なる勾配規則などのトピックをカバーした。正確には、複素解析や微分幾何学に頼らずに、ほとんど完全に線型代数を持つヤコビ-ベクトル積とベクトル-ヤコビ積の複素バージョンを導出し、説明し、実装する。このチュートリアルは、ユーザや開発者にとっても、カスタムのグラデーション伝搬ルールを実装する際に、複雑な値を真剣に取るためのアクションを呼びます。

Automatic differentiation is everywhere, but there exists only minimal documentation of how it works in complex arithmetic beyond stating "derivatives in $\mathbb{C}^d$" $\cong$ "derivatives in $\mathbb{R}^{2d}$" and, at best, shallow references to Wirtinger calculus. Unfortunately, the equivalence $\mathbb{C}^d \cong \mathbb{R}^{2d}$ becomes insufficient as soon as we need to derive custom gradient rules, e.g., to avoid differentiating "through" expensive linear algebra functions or differential equation simulators. To combat such a lack of documentation, this article surveys forward- and reverse-mode automatic differentiation with complex numbers, covering topics such as Wirtinger derivatives, a modified chain rule, and different gradient conventions while explicitly avoiding holomorphicity and the Cauchy--Riemann equations (which would be far too restrictive). To be precise, we will derive, explain, and implement a complex version of Jacobian-vector and vector-Jacobian products almost entirely with linear algebra without relying on complex analysis or differential geometry. This tutorial is a call to action, for users and developers alike, to take complex values seriously when implementing custom gradient propagation rules -- the manuscript explains how.

翻訳日:2024-11-07 22:05:05 公開日:2024-10-11

# 能動触覚物体認識・詩推定・形状伝達学習のためのベイズ的枠組み

A Bayesian Framework for Active Tactile Object Recognition, Pose Estimation and Shape Transfer Learning ( http://arxiv.org/abs/2409.06912v3 )

ライセンス: Link先を確認

Haodong Zheng, Andrei Jalba, Raymond H. Cuijpers, Wijnand IJsselsteijn, Sanne Schoenmakers,

(参考訳) 人間はアクティブタッチで世界を探索し、理解することができるので、ロボットには同様の能力が望まれる。本稿では, アクティブな触覚物体認識, ポーズ推定, 形状伝達学習の課題に対処し, カスタマイズされた粒子フィルタ (PF) とガウス過程暗黙曲面 (GPIS) を統一されたベイズフレームワークで組み合わせる。新しい触覚入力で、カスタマイズされたPFは、オブジェクトの新規性を追跡しながら、オブジェクトクラスの共同分布とオブジェクトのポーズを更新する。新たなオブジェクトが特定されると、GPISを使ってその形状を再構築する。 GPISの先行をPFから最大ポスペリオリ(MAP)推定することで、既知の形状に関する知識を移譲し、新しい形状を学ぶことができる。グローバルな形状推定に基づく探索手法を提案し, 有効データ取得を誘導し, 十分な情報による探索を終了する。シミュレーション実験を通じて,提案フレームワークは,対象のクラスを推定し,既知のオブジェクトのポーズと新しい形状の学習において,その有効性と効率を実証した。さらに、予め学習した形状を確実に認識することができる。

As humans can explore and understand the world through active touch, similar capability is desired for robots. In this paper, we address the problem of active tactile object recognition, pose estimation and shape transfer learning, where a customized particle filter (PF) and Gaussian process implicit surface (GPIS) is combined in a unified Bayesian framework. Upon new tactile input, the customized PF updates the joint distribution of the object class and object pose while tracking the novelty of the object. Once a novel object is identified, its shape will be reconstructed using GPIS. By grounding the prior of the GPIS with the maximum-a-posteriori (MAP) estimation from the PF, the knowledge about known shapes can be transferred to learn novel shapes. An exploration procedure based on global shape estimation is proposed to guide active data acquisition and terminate the exploration upon sufficient information. Through experiments in simulation, the proposed framework demonstrated its effectiveness and efficiency in estimating object class and pose for known objects and learning novel shapes. Furthermore, it can recognize previously learned shapes reliably.

翻訳日:2024-11-07 22:05:05 公開日:2024-10-11

# xTED:拡散に基づく軌道編集によるクロスドメイン適応

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing ( http://arxiv.org/abs/2409.08687v2 )

ライセンス: Link先を確認

Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi Zhang, Jianming Hu, Xianyuan Zhan,

(参考訳) 異なるドメインから事前に収集されたデータを再利用することは、ターゲットドメインで不十分なデータを持つが、他のドメインで比較的豊富である意思決定タスクにとって魅力的な解決策である。既存のドメイン間政策伝達手法は主に、ドメイン/タスク固有の差別者、表現、ポリシーなどの政策学習を促進するために、ドメインの対応や修正を学習することを目的としている。この設計哲学は、しばしば重いモデルアーキテクチャやタスク/ドメイン固有のモデリングをもたらし、柔軟性を欠いている。複雑な下流のドメイン間ポリシー転送モデルに頼るのではなく、データレベルでドメインギャップを直接ブリッジできるだろうか? 本研究では,クロスドメイントラジェクトリ適応のために特別に設計された拡散モデルを用いたクロスドメイントラジェクトリ・EDiting (xTED) フレームワークを提案する。提案するモデルアーキテクチャは,対象データ内の動的パターンだけでなく,状態,行動,報酬間の複雑な依存関係を効果的にキャプチャする。事前訓練された拡散を先行として利用することにより、元の意味情報を保存しながら、ソースドメインの軌跡を対象のドメインプロパティにマッチするように変換することができる。このプロセスは、基礎となるドメインギャップを暗黙的に修正し、ソースデータの状態リアリズムと動的信頼性を高め、様々な下流ポリシー学習手法で柔軟な組み入れを可能にする。その単純さにもかかわらず、xTEDは広範なシミュレーションや実ロボット実験において優れた性能を示している。

Reusing pre-collected data from different domains is an appealing solution for decision-making tasks that have insufficient data in the target domain but are relatively abundant in other related domains. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning domain/task-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer models? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. By utilizing the pre-trained diffusion as a prior, source domain trajectories can be transformed to match with target domain properties while preserving original semantic information. This process implicitly corrects underlying domain gaps, enhancing state realism and dynamics reliability in the source data, and allowing flexible incorporation with various downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.

翻訳日:2024-11-07 21:09:04 公開日:2024-10-11

# RAGにおけるLLMの信頼性測定と向上 : 接地属性と再利用学習を通して

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse ( http://arxiv.org/abs/2409.11242v2 )

ライセンス: Link先を確認

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj, Hai Leong Chieu, Navonil Majumder, Soujanya Poria,

(参考訳) LLMは、検索拡張生成(RAG)システムの不可欠なコンポーネントである。エンド・ツー・エンドのRAGシステムの全体的な品質評価には多くの研究が焦点を当てているが、RAGタスクに対するLCMの適切性を理解するにはギャップがある。本稿では,RAG フレームワークにおける LLM の信頼性を評価する総合的尺度である Trust-Score を紹介する。この結果から,テキスト内学習などの様々なプロンプト手法では,信頼スコアが測定したRAGタスクにLLMを効果的に適応できないことがわかった。そこで本研究では,信頼スコア性能向上のためのLLM調整手法であるTrust-Alignを提案する。 LLaMA-3ファミリーは,ASQA(14.0),QAMPARI(28.9),ERI5(13.7)において,同規模のオープンソースLLMを著しく上回っている。また、LLaMAシリーズ(1bから8b)、Qwen-2.5シリーズ(0.5bから7b)、Phi3.5(3.8b)など、様々なオープンウェイトモデルにおけるTrust-Alignの有効性を示す。ソースコードは \url{https://anonymous.4open.science/r/trust-align} で公開しています。

LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that various prompting methods, such as in-context learning, fail to effectively adapt LLMs to the RAG task as measured by Trust-Score. Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. The LLaMA-3 family, aligned using our method, significantly outperforms open-source LLMs of similar sizes on ASQA (up 14.0), QAMPARI (up 28.9), and ELI5 (up 13.7). We also demonstrate the effectiveness of Trust-Align across different open-weight models, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b), and Phi3.5 (3.8b). We release our code at \url{https://anonymous.4open.science/r/trust-align}

翻訳日:2024-11-07 20:13:03 公開日:2024-10-11

# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション

Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v1 )

ライセンス: Link先を確認

Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,

(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像内の大きな物体をモデル化する能力に影響を与える。この問題に対処するため,ウィンドウパーティションをチャネルアテンションに組み込んで大きな受容場を取得し,よりグローバルな情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。

Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects in the image. To address this issue, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.

翻訳日:2024-11-07 03:44:25 公開日:2024-10-11

# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション

Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v2 )

ライセンス: Link先を確認

Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,

(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。

Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.

翻訳日:2024-11-07 03:44:25 公開日:2024-10-11

# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション

Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v3 )

ライセンス: Link先を確認

Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,

(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。

Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.

翻訳日:2024-11-07 03:44:25 公開日:2024-10-11

# ウェーブレット強調画像圧縮のためのウィンドウベースチャネルアテンション

Window-based Channel Attention for Wavelet-enhanced Learned Image Compression ( http://arxiv.org/abs/2409.14090v4 )

ライセンス: Link先を確認

Heng Xu, Bowen Hai, Yushun Tang, Zhihai He,

(参考訳) Learned Image Compression (lic)モデルは従来のコーデックよりも高速な速度歪み性能を実現している。既存のlicモデルは、基本ブロックとしてCNN、Transformer、Mixed CNN-Transformerを使用している。しかし、ウィンドウの傾きの変化によって制限され、Swin-Transformerベースのlicは受容野の限られた成長を示し、画像圧縮のために大きなオブジェクトをモデル化する能力に影響を及ぼす。この問題に対処し、性能を向上させるために、初めてウィンドウ分割をチャネルアテンションに組み込んで、大きな受容場を取得し、より多くのグローバル情報を取得する。チャネルアテンションは局所的な情報学習を妨げるため、トランスフォーマーコーデックの既存のアテンションメカニズムを空間的なアテンションに拡張して複数の受容場を確立することが重要である。また、離散ウェーブレット変換をSCH(Spatial-Channel Hybrid)フレームワークに組み込んで、効率的な周波数依存性のダウンサンプリングを行い、受容場を拡大する。実験の結果,VTM-23.1と比較して,4つの標準データセットに対してBDレートが18.54%,23.98%,22.33%,24.71%削減された。

Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.

翻訳日:2024-11-07 03:44:25 公開日:2024-10-11

# ファウショット学習のための特徴生成器

A Feature Generator for Few-Shot Learning ( http://arxiv.org/abs/2409.14141v1 )

ライセンス: Link先を確認

Heethanjan Kanagalingam, Thenukan Pathmanathan, Navaneethan Ketheeswaran, Mokeeshan Vathanakumar, Mohamed Afham, Ranga Rodrigo,

(参考訳) FSL(Few-shot Learning)は、ラベル付きデータに制限のある新しいオブジェクトやクラスをモデルが認識できるようにすることを目的としている。新しいデータポイントを合成して限られたデータセットを増やす機能ジェネレータが、この課題に対する有望な解決策として登場した。本稿では,FSLタスクの埋め込みプロセスの改善における特徴発生器の有効性について検討する。本稿では,クラスごとの画像不足による不正確な埋め込みの問題に対処するため,クラスレベルのテキスト記述から視覚的特徴を生成する特徴生成器を提案する。生成元を分類器の損失、識別器の損失、生成した特徴と真のクラス埋め込みの間の距離損失の組み合わせで訓練することにより、正確な同クラス特徴の生成を保証し、全体的な特徴表現を強化する。提案手法は1ショットで10%,5ショットで約5%の精度でベースラインモデルを上回った。さらに、この論文では、ビジュアルオンリーとビジュアル+テキストジェネレータの両方がテストされている。

Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper.

翻訳日:2024-11-07 03:22:12 公開日:2024-10-11

# ファウショット学習のための特徴生成器

A Feature Generator for Few-Shot Learning ( http://arxiv.org/abs/2409.14141v2 )

ライセンス: Link先を確認

Heethanjan Kanagalingam, Thenukan Pathmanathan, Navaneethan Ketheeswaran, Mokeeshan Vathanakumar, Mohamed Afham, Ranga Rodrigo,

(参考訳) FSL(Few-shot Learning)は、ラベル付きデータに制限のある新しいオブジェクトやクラスをモデルが認識できるようにすることを目的としている。新しいデータポイントを合成して限られたデータセットを増やす機能ジェネレータが、この課題に対する有望な解決策として登場した。本稿では,FSLタスクの埋め込みプロセスの改善における特徴発生器の有効性について検討する。本稿では,クラスごとの画像不足による不正確な埋め込みの問題に対処するため,クラスレベルのテキスト記述から視覚的特徴を生成する特徴生成器を提案する。生成元を分類器の損失、識別器の損失、生成した特徴と真のクラス埋め込みの間の距離損失の組み合わせで訓練することにより、正確な同クラス特徴の生成を保証し、全体的な特徴表現を強化する。提案手法は1ショットで10%,5ショットで約5%の精度でベースラインモデルを上回った。さらに、この論文では、ビジュアルオンリーとビジュアル+テキストジェネレータの両方がテストされている。コードはhttps://github.com/heethanjan/Feature-Generator-for-FSLで公開されている。

Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper. The code is publicly available at https://github.com/heethanjan/Feature-Generator-for-FSL.

翻訳日:2024-11-07 03:22:12 公開日:2024-10-11

# 回転する表面符号と回転しない表面符号の等式誤差率の比較

Compare the Pair: Rotated vs. Unrotated Surface Codes at Equal Logical Error Rates ( http://arxiv.org/abs/2409.14765v1 )

ライセンス: Link先を確認

Anthony Ryan O'Rourke, Simon Devitt,

(参考訳) 現実的な量子コンピュータは、リソース効率のよいエラー訂正コードを必要とする。回転曲面符号は、回転しない曲面符号の約半分の量子ビットを用いて、同じ誤差補正距離を持つ論理量子ビットを生成する。しかし、距離の代わりに、より有用な量子ビット保存計量は論理的誤り抑制に基づいている。本研究では,各符号の量子ビット数を比較し,回路レベルの雑音下での等価な論理誤差率を求める。我々は、スタビライザシミュレータStimと非相関な最小整合デコーダPyMatching 2を用いて、有効なCNOT順序でメモリ実験回路のモンテカルロサンプリングを行う。高奇数および符号距離に対する論理的誤り率と物理的誤り率の両面的なスケーリングについて明らかにする。ローテーションされたコードは、非ローテーションコードで使用されるキュービット数$\sim74\%$を使用して、論理誤差率$p_L = 10^{-12}$を演算物理誤差率$p=10^{-3}$で達成している。この比率は、すべての有用な論理的誤り率に対して$p=10^{-3}$の2つの係数の物理的誤り率に対して$\sim75\%$のままである。本研究は, 回転符号による量子ビット保存を明確化し, 表面符号の将来の実装に使用する数値的正当性を提供する。

Practical quantum computers will require resource-efficient error-correcting codes. The rotated surface code uses approximately half the number of qubits as the unrotated surface code to create a logical qubit with the same error-correcting distance. However, instead of distance, a more useful qubit-saving metric would be based on logical error suppression. In this work we compare the number of qubits used by each code to achieve equal logical error rates under circuit-level noise. We perform Monte Carlo sampling of memory experiment circuits with all valid CNOT orders, using the stabiliser simulator Stim and the uncorrelated minimum-weight perfect-matching decoder PyMatching 2. We clarify the well-below-threshold scaling of logical to physical error rates for high odd and even code distances. We find that the rotated code uses $\sim74\%$ the number of qubits used by the unrotated code to achieve a logical error rate of $p_L = 10^{-12}$ at the operational physical error rate of $p=10^{-3}$. The ratio remains $\sim75\%$ for physical error rates within a factor of two of $p=10^{-3}$ for all useful logical error rates. Our work clarifies the qubit savings provided by the rotated code and provides numerical justification for its use in future implementations of the surface code.

翻訳日:2024-11-06 21:12:18 公開日:2024-10-11

# 回転する表面符号と回転しない表面符号の等式誤差率の比較

Compare the Pair: Rotated vs. Unrotated Surface Codes at Equal Logical Error Rates ( http://arxiv.org/abs/2409.14765v2 )

ライセンス: Link先を確認

Anthony Ryan O'Rourke, Simon Devitt,

(参考訳) 現実的な量子コンピュータは、リソース効率のよいエラー訂正コードを必要とする。回転曲面符号は、回転しない曲面符号の約半分の量子ビットを用いて、同じ誤差補正距離を持つ論理量子ビットを生成する。しかし、距離の代わりに、より有用な量子ビット保存計量は論理誤差率に基づいている。本研究では,回路レベルのノイズ下での論理的誤り率と物理的誤り率を高い奇数および偶数で比較し,各符号が使用する量子ビット数を比較して等価な論理的誤り率を求める。我々は、スタビライザシミュレータStimと非相関な最小整合デコーダPyMatching 2を用いて、有効なCNOT順序でメモリ実験回路のモンテカルロサンプリングを行う。ローテーションされたコードは、ノイズモデルによって740～75\%の量子ビット数を使用し、論理誤差率$p_L = 10^{-12}$を演算物理誤差率$p=10^{-3}$で達成する。この比率は、すべての有用な論理的誤り率に対して$p=10^{-3}$の2要素の物理誤差率に対して$\approx75\%$のままである。我々の研究は、表面コードの低p_L$スケールを見つけ、回転された表面コードによって提供されるキュービットの節約を明確化し、将来の実装におけるその使用の数値的正当性を提供する。

Practical quantum computers will require resource-efficient error-correcting codes. The rotated surface code uses approximately half the number of qubits as the unrotated surface code to create a logical qubit with the same error-correcting distance. However, instead of distance, a more useful qubit-saving metric would be based on logical error rates. In this work we find the well-below-threshold scaling of logical to physical error rates under circuit-level noise for both codes at high odd and even distances, then compare the number of qubits used by each code to achieve equal logical error rates. We perform Monte Carlo sampling of memory experiment circuits with all valid CNOT orders, using the stabiliser simulator Stim and the uncorrelated minimum-weight perfect-matching decoder PyMatching 2. We find that the rotated code uses $74 - 75\%$ the number of qubits used by the unrotated code, depending on the noise model, to achieve a logical error rate of $p_L = 10^{-12}$ at the operational physical error rate of $p=10^{-3}$. The ratio remains $\approx75\%$ for physical error rates within a factor of two of $p=10^{-3}$ for all useful logical error rates. Our work finds the low-$p_L$ scaling of the surface code and clarifies the qubit savings provided by the rotated surface code, providing numerical justification for its use in future implementations of the surface code.

翻訳日:2024-11-06 21:12:18 公開日:2024-10-11

# Bagging Regularized M-estimatorの精密漸近

Precise Asymptotics of Bagging Regularized M-estimators ( http://arxiv.org/abs/2409.15252v2 )

ライセンス: Link先を確認

Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec,

(参考訳) 我々は,アンサンブル推定器の正方形予測リスクを,正規化M-推定器(subagging,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating,subsample bootstrap aggregating)を用いて評価し,そのリスクに対する一貫した推定器を構築する。具体的には、M \ge 1$ 正規化 M-推定器の不均一なコレクションを、それぞれ(おそらく異なる)サブサンプルサイズ、凸微分可能損失、凸正則化器で訓練する。サンプルサイズが$n$、フィーチャーサイズが$p$、サブサンプルサイズが$k_m$ for $m \in [M]$で、固定制限比が$n/p$、$k_m/n$です。我々の分析の鍵となるのは、重なり合う部分サンプル上の推定器と残留誤差の相関関係の合同漸近挙動に関する新しい結果である。独立な利害関係では、非アンサンブル設定($M = 1$)における自由度に関連するトレース汎函数の収束も確立し、それまで知られていた平方損失とリッジのケースを拡張し、ラッソ正則化器(英語版)(lasso regularizers)を拡大する。共通損失、正規化子、サブサンプルサイズで訓練された均質アンサンブルに特化すると、リスク評価はアンサンブルとサブサンプルサイズ$(M,k)$による暗黙の正規化効果にいくつかの光を放つ。アンサンブルサイズが$M$の場合、サブサンプルサイズを最適に調整すると、サンプル単位のモノトニックリスクが生じる。フルアンサンブル推定器 ($M \to \infty$ の場合) に対して、最適部分サンプルサイズ $k^\star$ は、明示正規化が消えるとき、過度にパラメータ化された状態 $(k^\star \le \min\{n,p\})$ に属する傾向にある。最後に、サブサンプルサイズ、アンサンブルサイズ、正規化のジョイント最適化は、(サブゲージなしで)全データでのみレギュラーライザ最適化を著しく上回る。

We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M \ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M \to \infty$), the optimal subsample size $k^\star$ tends to be in the overparameterized regime $(k^\star \le \min\{n,p\})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).

翻訳日:2024-11-06 20:16:59 公開日:2024-10-11

# エージェント能力評価のための確率的手法の解析

Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( http://arxiv.org/abs/2409.16125v2 )

ライセンス: Link先を確認

Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer,

(参考訳) AIシステムからのリスクを軽減するためには、その能力を正確に評価する必要があります。これは、稀にしか表示されない場合に特に困難である。 Phuongらは、与えられたタスクを完了したAIエージェントの確率をよりよく推定することを目的とした2つの方法を提案する。マイルストーン法はタスクをサブタスクに分解し、全体の成功率の推定を改善する。これらの手法をモンテカルロ推定器として解析したところ、両者ともモンテカルロサンプリングに比べて分散を効果的に減少させるが、バイアスももたらされることが判明した。実験結果から,本手法は実世界の多くの課題に対する真解率を過小評価する。専門家のベスト・オブ・N法は、本質的に欠陥のある再重み付け因子に起因する全てのタスクに対してさらに深刻な過小評価を示す。困難なタスクにおけるAIエージェントの能力推定の精度を高めるため、今後の研究はモンテカルロ推定器の豊富な文献を活用するべきであると提案する。

To mitigate risks from AI systems, we need to assess their capabilities accurately. This is especially difficult in cases where capabilities are only rarely displayed. Phuong et al. propose two methods that aim to obtain better estimates of the probability of an AI agent successfully completing a given task. The milestone method decomposes tasks into subtasks, aiming to improve overall success rate estimation, while the expert best-of-N method leverages human guidance as a proxy for the model's independent performance. Our analysis of these methods as Monte Carlo estimators reveals that while both effectively reduce variance compared to naive Monte Carlo sampling, they also introduce bias. Experimental results demonstrate that the milestone method underestimates true solve rates for many real-world tasks due to its constraining assumptions. The expert best-of-N method exhibits even more severe underestimation across all tasks, attributed to an inherently flawed re-weighting factor. To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.

翻訳日:2024-11-06 17:52:35 公開日:2024-10-11

# エージェント能力評価のための確率的手法の解析

Analyzing Probabilistic Methods for Evaluating Agent Capabilities ( http://arxiv.org/abs/2409.16125v3 )

ライセンス: Link先を確認

Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer,

翻訳日:2024-11-06 17:52:35 公開日:2024-10-11

# 忘れ、無視、近視:オンライン連続学習における重要な課題の再考

Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning ( http://arxiv.org/abs/2409.19245v1 )

ライセンス: Link先を確認

Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shaoyuan Li, Songcan Chen,

(参考訳) オンライン連続学習では、一定の無限のデータストリームからモデルを学習する必要がある。この分野では大きな努力がなされているが、その多くは、より重い訓練負荷を犠牲にして、より優れた分類能力を達成するために、破滅的な忘れる問題を緩和することに焦点を当てていた。彼らは、例えば、高速なデータストリーム環境では、データは遅いモデルに対応するために停止しない、という現実のシナリオを見落としていた。本稿では,モデルが時間単位内で処理できるトレーニングサンプルの最大数として定義されるモデルのスループットが,同様に重要であることを強調する。モデルがどれだけのデータを利用できるかを直接制限し、現在のメソッドに挑戦的なジレンマを提示する。モデルの無知: OCLの単一パスの性質は、制約付きトレーニング時間とストレージ容量内で効果的な特徴を学習し、効果的な学習とモデルのスループットのトレードオフにつながる。これらの課題に対処するため、我々は、最小時間で効率的なグローバルな識別的特徴学習を容易にする非スパース分類法進化フレームワーク(NsCE)を提案する。 NsCEは、非スパースな最大分離正規化と、事前訓練されたモデルの助けを借りて、ターゲットとなる体験再生技術を統合し、新しいグローバルな差別的特徴の迅速な獲得を可能にしている。

Online continual learning requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the catastrophic forgetting issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e.g., in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that model throughput -- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and excessively sparse classifier, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features.

翻訳日:2024-11-06 00:18:22 公開日:2024-10-11

# 忘れ、無視、近視:オンライン連続学習における重要な課題の再考

Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning ( http://arxiv.org/abs/2409.19245v2 )

ライセンス: Link先を確認

Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shao-yuan Li, Songcan Chen,

翻訳日:2024-11-06 00:18:22 公開日:2024-10-11

# 伝統放送コンペティションにおけるヘッジと近似真理

Hedging and Approximate Truthfulness in Traditional Forecasting Competitions ( http://arxiv.org/abs/2409.19477v1 )

ライセンス: Link先を確認

Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo,

(参考訳) 予測競技において、従来のメカニズムは、各イベントの結果に対して各競技者の予測をスコアし、最高得点の競技者が勝利する。この伝統メカニズムがインセンティブの問題に悩まされることはよく知られているが、イベントの数が増加するにつれて、競技者が大まかに真実であることは民間伝承である。しかし、これまでのところこの文学は、この伝統的なメカニズムの形式的な分析を欠いている。本稿はそのような分析を初めて行う。任意の数のイベントであっても、最高の予測者はヘッジへのインセンティブを持ち、より穏健な信念を報告し、勝利確率を高めることができる。しかし, 正の面から, 2人の競技者が, 相手の相対的品質と事象の結果に十分な不確実性がある場合には, ほぼ真相を呈することを示す。

In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.

翻訳日:2024-11-05 22:57:44 公開日:2024-10-11

# 伝統放送コンペティションにおけるヘッジと近似真理

Hedging and Approximate Truthfulness in Traditional Forecasting Competitions ( http://arxiv.org/abs/2409.19477v2 )

ライセンス: Link先を確認

Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo,

翻訳日:2024-11-05 22:57:44 公開日:2024-10-11

# HealthQ: 医療会話におけるLCM鎖の問合せ機能について

HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations ( http://arxiv.org/abs/2409.19487v1 )

ライセンス: Link先を確認

Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani,

(参考訳) デジタル医療において、大言語モデル(LLM)は主に質問応答能力を高め、患者との相互作用を改善するために利用されてきた。しかし、効果的な患者ケアは、関連する質問に答えることで、積極的に情報を収集できるLCMチェーンを必要とする。本稿では,LLMヘルスケアチェーンの問合せ能力を評価するための新しいフレームワークであるHealthQを提案する。提案手法は,レトリーバル拡張生成 (RAG) や思考の連鎖 (CoT) ,反射的連鎖など複数のLCM連鎖を実装し,その関連性や情報性を評価するためのLCM判定器を導入した。 HealthQを検証するために、我々は、Recall-Oriented Understudy for Gisting Evaluation (ROUGE) や Named Entity Recognition (NER) ベースのセット比較のような従来の自然言語処理(NLP)メトリクスを使用し、公衆医療用ノートデータセットであるChatDoctor と MTS-Dialog から2つのカスタムデータセットを構築した。医療会話におけるLSMの質問能力に関する初の総合的研究を行い、新しいデータセット生成パイプラインを開発し、詳細な評価手法を提案する。

In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions. However, effective patient care necessitates LLM chains that can actively gather information by posing relevant questions. This paper presents HealthQ, a novel framework designed to evaluate the questioning capabilities of LLM healthcare chains. We implemented several LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, and introduced an LLM judge to assess the relevance and informativeness of the generated questions. To validate HealthQ, we employed traditional Natural Language Processing (NLP) metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Named Entity Recognition (NER)-based set comparison, and constructed two custom datasets from public medical note datasets, ChatDoctor and MTS-Dialog. Our contributions are threefold: we provide the first comprehensive study on the questioning capabilities of LLMs in healthcare conversations, develop a novel dataset generation pipeline, and propose a detailed evaluation methodology.

翻訳日:2024-11-05 22:57:44 公開日:2024-10-11

# HealthQ: 医療会話におけるLCM鎖の問合せ機能について

HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations ( http://arxiv.org/abs/2409.19487v2 )

ライセンス: Link先を確認

Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani,

翻訳日:2024-11-05 22:57:44 公開日:2024-10-11

# ドメイン被覆強化によるLDMのフェデレーション・インストラクション・チューニング

Federated Instruction Tuning of LLMs with Domain Coverage Augmentation ( http://arxiv.org/abs/2409.20135v3 )

ライセンス: Link先を確認

Zezhou Wang, Yaxin Du, Zhuzhong Qian, Siheng Chen,

(参考訳) Federated Domain-specific Instruction Tuning (FedDIT)は、限られたクロスクライアントなプライベートデータとサーバサイドの公開データを使って命令拡張を行い、最終的に特定のドメイン内のモデルパフォーマンスを向上する。現在まで、FedDITに影響を与える要因は不明確であり、既存の命令拡張手法は主に分散環境を考慮せずに集中的な設定に焦点を当てている。実験の結果,データ不均一性ではなく,クロスクライアントなドメインカバレッジがFedDITのモデル性能を駆動していることが判明した。そこで本研究では,クライアントセンターの選択と検索に基づく拡張により,ドメインカバレッジを最適化するFedDCAを提案する。クライアント側の計算効率とシステムのスケーラビリティのために、FedDCAの変種であるFedDCA$^*$はサーバ側の特徴アライメントを備えた異種エンコーダを利用する。 4つの異なる領域(コード、医療、財務、数学)にわたる大規模な実験は、両方の方法の有効性を裏付けるものである。さらに,多量の公開データを用いたメモリ抽出攻撃に対するプライバシ保護について検討した。その結果,公開データの量とプライバシ保護能力との間に有意な相関は認められなかった。しかし、微調整ラウンドの増加に伴い、プライバシー漏洩のリスクは減少または収束する。

Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data together with server-side public data for instruction augmentation, ultimately boosting model performance within specific domains. To date, the factors affecting FedDIT remain unclear, and existing instruction augmentation methods primarily focus on the centralized setting without considering distributed environments. Our experiments reveal that the cross-client domain coverage, rather than data heterogeneity, drives model performance in FedDIT. In response, we propose FedDCA, which optimizes domain coverage through greedy client center selection and retrieval-based augmentation. For client-side computational efficiency and system scalability, FedDCA$^*$, the variant of FedDCA, utilizes heterogeneous encoders with server-side feature alignment. Extensive experiments across four distinct domains (code, medical, financial, and mathematical) substantiate the effectiveness of both methods. Additionally, we investigate privacy preservation against memory extraction attacks utilizing various amounts of public data. Results show that there is no significant correlation between the volume of public data and the privacy-preserving capability. However, as the fine-tuning rounds increase, the risk of privacy leakage reduces or converges.

翻訳日:2024-11-05 16:08:18 公開日:2024-10-11

# ドメイン被覆強化によるLDMのフェデレーション・インストラクション・チューニング

Federated Instruction Tuning of LLMs with Domain Coverage Augmentation ( http://arxiv.org/abs/2409.20135v4 )

ライセンス: Link先を確認

Zezhou Wang, Yaxin Du, Zhuzhong Qian, Siheng Chen,

翻訳日:2024-11-05 15:58:31 公開日:2024-10-11

# プロンプトを超えて: 大規模言語モデルの動的会話ベンチマーク

Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models ( http://arxiv.org/abs/2409.20222v2 )

ライセンス: Link先を確認

David Castillo-Bolado, Joseph Davidson, Finlay Gray, Marek Rosa,

(参考訳) 本稿では,対話エージェントに対する動的ベンチマークシステムを導入し,その性能をシミュレーションし,ユーザ$\leftrightarrow$agentインタラクションによって評価する。インタラクションはユーザとエージェント間の会話であり、複数のタスクが導入され、同時に実行される。タスクをインターリーブするために定期的にコンテキストスイッチを行い、エージェントの長期記憶、継続的な学習、情報統合機能を評価する現実的なテストシナリオを構築します。プロプライエタリおよびオープンソースのLarge-Language Modelsの結果、LLMは一般的にシングルタスクのインタラクションでうまく機能するが、インターリーブされると同じタスクで苦労する。特に、LTMシステムで補足された短いコンテキストのLLMは、より大きなコンテキストを持つものよりもパフォーマンスが良い。我々のベンチマークは、これまでのベンチマークでは捉えられなかったような、より自然な相互作用に対応するLLMには、他にも課題があることを示唆している。

We introduce a dynamic benchmarking system for conversational agents that evaluates their performance through a single, simulated, and lengthy user$\leftrightarrow$agent interaction. The interaction is a conversation between the user and agent, where multiple tasks are introduced and then undertaken concurrently. We context switch regularly to interleave the tasks, which constructs a realistic testing scenario in which we assess the Long-Term Memory, Continual Learning, and Information Integration capabilities of the agents. Results from both proprietary and open-source Large-Language Models show that LLMs in general perform well on single-task interactions, but they struggle on the same tasks when they are interleaved. Notably, short-context LLMs supplemented with an LTM system perform as well as or better than those with larger contexts. Our benchmark suggests that there are other challenges for LLMs responding to more natural interactions that contemporary benchmarks have heretofore not been able to capture.

翻訳日:2024-11-05 15:58:31 公開日:2024-10-11

# 逆絵画:絵画の過程を再構築する

Inverse Painting: Reconstructing The Painting Process ( http://arxiv.org/abs/2409.20556v2 )

ライセンス: Link先を確認

Bowei Chen, Yifan Wang, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz,

(参考訳) 入力絵が与えられた場合、どのように塗られたかのタイムラプス映像を再構成する。我々はこれを自己回帰画像生成問題として定式化し、初期空白の「キャンバス」を反復的に更新する。モデルは、多くのペイントビデオのトレーニングによって、実際のアーティストから学習する。本手法では,テキストと領域理解を取り入れて絵画の「指示」を定義し,新しい拡散型レンダラーでキャンバスを更新する。この方法は、訓練された限られたアクリル様式の絵画を外挿し、幅広い芸術様式やジャンルのもっともらしい結果を示す。

Given an input painting, we reconstruct a time-lapse video of how it may have been painted. We formulate this as an autoregressive image generation problem, in which an initially blank "canvas" is iteratively updated. The model learns from real artists by training on many painting videos. Our approach incorporates text and region understanding to define a set of painting "instructions" and updates the canvas with a novel diffusion-based renderer. The method extrapolates beyond the limited, acrylic style paintings on which it has been trained, showing plausible results for a wide range of artistic styles and genres.

翻訳日:2024-11-05 15:38:59 公開日:2024-10-11

# 人間の意味軌道に対する伝達不能な非教師付き外乱検出フレームワーク

Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories ( http://arxiv.org/abs/2410.00054v1 )

ライセンス: Link先を確認

Zheng Zhang, Hossein Amiri, Dazhou Yu, Yuntong Hu, Liang Zhao, Andreas Zufle,

(参考訳) セマンティックトラジェクトリは、旅行目的や場所活動などのテキスト情報で時空間データを豊かにするものであり、医療、社会保障、都市計画に不可欠な不適切な行動を特定するための鍵となる。従来の外れ値検出は、ドメイン知識を必要とし、目に見えない外れ値を特定する能力を制限するヒューリスティックなルールに依存している。さらに、空間的、時間的、テキスト的次元にわたるマルチモーダルデータを共同で検討できる包括的なアプローチが欠如している。ドメインに依存しないモデルの必要性に対処するため,TOD4TrajフレームワークのTransferable Outlier Detection for Human Semantic Trajectories(TOD4Traj)を提案する。対照的な学習モジュールは、時間的および集団間の定期的なモビリティパターンを特定するために、さらにプロポーズされ、個々の一貫性とグループの多数派パターンに基づいて、アウトレーヤを共同で検出することができる。実験の結果,TOD4Trajは既存のモデルよりも優れた性能を示し,その有効性と適応性を示した。

Semantic trajectories, which enrich spatial-temporal data with textual information such as trip purposes or location activities, are key for identifying outlier behaviors critical to healthcare, social security, and urban planning. Traditional outlier detection relies on heuristic rules, which requires domain knowledge and limits its ability to identify unseen outliers. Besides, there lacks a comprehensive approach that can jointly consider multi-modal data across spatial, temporal, and textual dimensions. Addressing the need for a domain-agnostic model, we propose the Transferable Outlier Detection for Human Semantic Trajectories (TOD4Traj) framework.TOD4Traj first introduces a modality feature unification module to align diverse data feature representations, enabling the integration of multi-modal information and enhancing transferability across different datasets. A contrastive learning module is further pro-posed for identifying regular mobility patterns both temporally and across populations, allowing for a joint detection of outliers based on individual consistency and group majority patterns. Our experimental results have shown TOD4Traj's superior performance over existing models, demonstrating its effectiveness and adaptability in detecting human trajectory outliers across various datasets.

翻訳日:2024-11-05 15:19:28 公開日:2024-10-11

# 人間の意味軌道に対する伝達不能な非教師付き外乱検出フレームワーク

Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories ( http://arxiv.org/abs/2410.00054v2 )

ライセンス: Link先を確認

Zheng Zhang, Hossein Amiri, Dazhou Yu, Yuntong Hu, Liang Zhao, Andreas Zufle,

翻訳日:2024-11-05 15:19:28 公開日:2024-10-11

# Scheherazade: LLMにおけるChain-of-Thought Math ReasoningとChain-of-Problemsの評価

Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems ( http://arxiv.org/abs/2410.00151v1 )

ライセンス: Link先を確認

Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,

(参考訳) ベンチマークは、Large Language Models (LLMs) の数学推論能力の進歩を測定するために重要である。しかし、GSM8Kのような既存の広く使われているベンチマークは、複数の最先端LCMが94%以上の精度を達成するため、あまり役に立たない。より厳しいベンチマークが提案されているが、その作成は手作業で行われ、高価であることが多い。本稿では,数理推論問題を論理的に連鎖させることにより,挑戦的な数理推論ベンチマークを自動生成するSchherazadeを提案する。本稿では,前鎖法と後鎖法という2つの異なる連鎖法を提案する。 GSM8KにSchherazadeを適用し、GSM8K-Scheherazadeを作成し、3つのフロンティアLSMとOpenAIのo1-previewを評価する。その結果,フロンティアモデルの性能低下はわずか数問の連鎖で急激に低下するが,予備評価では,最大5問の逆連鎖が継続することが示された。加えて、他のモデルはすべて、問題が逆向きにチェーンされている場合、パフォーマンスが悪くなるが、o1-previewは逆向きにチェーンされたベンチマークでパフォーマンスが良くなる。データセットとコードを公開します。

Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.

翻訳日:2024-11-05 14:40:28 公開日:2024-10-11

Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,

翻訳日:2024-11-05 14:40:28 公開日:2024-10-11

Stephen Miner, Yoshiki Takashima, Simeng Han, Ferhat Erata, Timos Antonopoulos, Ruzica Piskac, Scott J Shapiro,

翻訳日:2024-11-05 14:40:28 公開日:2024-10-11

# 猫は猫だ(犬じゃない!):因果解析と埋め込み最適化によるテキスト・画像エンコーダの情報混合の解明

A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization ( http://arxiv.org/abs/2410.00321v1 )

ライセンス: Link先を確認

Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai,

(参考訳) 本稿では,テキスト・ツー・イメージ拡散モデル(T2I)のテキストエンコーダにおける因果的方法の影響を分析する。それまでの作業は、デノナイジングプロセスを通じて問題に対処することに集中してきた。しかしながら、テキストの埋め込みがT2Iモデルにどのように貢献するか、特に複数のオブジェクトを生成する場合についての研究は行われていない。本稿では,テキスト埋め込みの包括的分析について述べる。一テキストの埋め込みが生成された画像にどのように貢献するか及び二情報が失われた理由及び第一項の対象に偏りがあること。そこで本研究では, 安定拡散における情報収支の90.05%向上を図り, トレーニング不要な簡易かつ効果的なテキスト埋め込みバランス最適化手法を提案する。さらに,従来の手法よりも高精度に情報損失を定量化し,人的評価と81%の精度で情報損失を評価できる新しい自動評価指標を提案する。この測定基準は、CLIPのテキストイメージの類似性のような現在の分散スコアの制限に対処するため、オブジェクトの存在と精度を効果的に測定する。

This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing the issues through the denoising process. However, there is no research discussing how text embedding contributes to T2I models, especially when generating more than one object. In this paper, we share a comprehensive analysis of text embedding: i) how text embedding contributes to the generated images and ii) why information gets lost and biases towards the first-mentioned object. Accordingly, we propose a simple but effective text embedding balance optimization method, which is training-free, with an improvement of 90.05% on information balance in stable diffusion. Furthermore, we propose a new automatic evaluation metric that quantifies information loss more accurately than existing methods, achieving 81% concordance with human assessments. This metric effectively measures the presence and accuracy of objects, addressing the limitations of current distribution scores like CLIP's text-image similarities.

翻訳日:2024-11-05 06:26:14 公開日:2024-10-11

Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai,

翻訳日:2024-11-05 06:26:14 公開日:2024-10-11

Chieh-Yun Chen, Chiang Tseng, Li-Wu Tsao, Hong-Han Shuai,

翻訳日:2024-11-05 06:26:14 公開日:2024-10-11

Chieh-Yun Chen, Chiang Tseng, Li-Wu Tsao, Hong-Han Shuai,

翻訳日:2024-11-05 06:26:14 公開日:2024-10-11

# Webエージェントにおける自己修復のためのマルチモーダルオートバリデーション

Multimodal Auto Validation For Self-Refinement in Web Agents ( http://arxiv.org/abs/2410.00689v1 )

ライセンス: Link先を確認

Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish Jagmohan,

(参考訳) 私たちの世界がデジタル化するにつれ、複雑で単調なタスクを自動化できるWebエージェントがワークフローの合理化に欠かせないものになりつつある。本稿では,マルチモーダル検証と自己補充によるWebエージェントの性能向上手法を提案する。本稿では,Webエージェントの自動検証のための階層構造が,最先端のAgent-E Webオートメーションフレームワークを基盤として,様々なモダリティ(テキスト,ビジョン)の包括的研究を行う。我々はまた、Webエージェントがワークフローの失敗を検出し、自己修正することを可能にする自動バリケータを開発し、Web自動化のための自己修正機構も導入した。その結果,Agent-E(SOTA Webエージェント)の最先端性能が向上し,WebVoyagerベンチマークのサブセットでタスク補完率が76.2\%から81.24\%に向上した。本稿では,複雑な実世界のシナリオにおいて,より信頼性の高いディジタルアシスタントを実現する方法について述べる。

As our world digitizes, web agents that can automate complex and monotonous tasks are becoming essential in streamlining workflows. This paper introduces an approach to improving web agent performance through multi-modal validation and self-refinement. We present a comprehensive study of different modalities (text, vision) and the effect of hierarchy for the automatic validation of web agents, building upon the state-of-the-art Agent-E web automation framework. We also introduce a self-refinement mechanism for web automation, using the developed auto-validator, that enables web agents to detect and self-correct workflow failures. Our results show significant gains on Agent-E's (a SOTA web agent) prior state-of-art performance, boosting task-completion rates from 76.2\% to 81.24\% on the subset of the WebVoyager benchmark. The approach presented in this paper paves the way for more reliable digital assistants in complex, real-world scenarios.

翻訳日:2024-11-05 04:25:20 公開日:2024-10-11

# Webエージェントにおける自己修復のためのマルチモーダルオートバリデーション

Multimodal Auto Validation For Self-Refinement in Web Agents ( http://arxiv.org/abs/2410.00689v2 )

ライセンス: Link先を確認

Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish Jagmohan,

翻訳日:2024-11-05 04:25:20 公開日:2024-10-11

# 階層的テキスト分類の再検討:推論とメトリクス

Revisiting Hierarchical Text Classification: Inference and Metrics ( http://arxiv.org/abs/2410.01305v1 )

ライセンス: Link先を確認

Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald,

(参考訳) 階層的テキスト分類(階層的テキスト分類、hierarchical text classification)は、階層として整理された構造化空間内のテキストにラベルを割り当てるタスクである。最近の研究は、HTCを従来のマルチラベル分類問題として扱い、そのように評価している。そこで我々は,具体的に設計された階層的指標に基づくモデルの評価を提案し,計量選択と予測推定法の複雑さを実証する。我々は、新しい挑戦的なデータセットを導入し、比較的最近の洗練されたモデルを評価し、それらを、理論上動機付けられた新しい損失を含む、単純だが強力なベースラインと比較する。最後に、これらのベースラインが最新のモデルと非常によく競合していることを示します。このことは、HTCの新しい方法を提案する際に、評価方法論を慎重に検討することの重要性を強調している。コードの実装とデータセットは \url{https://github.com/RomanPlaud/revisitingHTC} で公開されている。

Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{https://github.com/RomanPlaud/revisitingHTC}.

翻訳日:2024-11-04 21:59:16 公開日:2024-10-11

# 階層的テキスト分類の再検討:推論とメトリクス

Revisiting Hierarchical Text Classification: Inference and Metrics ( http://arxiv.org/abs/2410.01305v2 )

ライセンス: Link先を確認

Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald,

翻訳日:2024-11-04 21:59:16 公開日:2024-10-11

# KnobGen: スケッチベース拡散モデルにおけるアートワークの高度化制御

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models ( http://arxiv.org/abs/2410.01595v1 )

ライセンス: Link先を確認

Pouyan Navard, Amin Karimi Monsefi, Mengxi Zhou, Wei-Lun Chao, Alper Yilmaz, Rajiv Ramnath,

(参考訳) 近年の拡散モデルではテキスト・ツー・イメージ(T2I)生成が大幅に改善されているが、細粒度精度と高レベル制御のバランスをとるのにしばしば苦労している。 ControlNetやT2I-Adapterのような手法は、調味されたアーティストによるスケッチに従うのに優れているが、過度に剛性があり、初心者のスケッチに意図しない欠陥を複製する傾向がある。一方、スケッチベースの抽象化フレームワークのような粗粒度の粗いメソッドは、よりアクセスしやすい入力処理を提供するが、詳細で専門的な使用に必要な正確な制御は欠いている。このような制約に対処するため,スケッチベースの画像生成を民主化し,スケッチの複雑さやユーザスキルの異なるレベルにシームレスに適応する,デュアルパスのフレームワークであるKnobGenを提案する。 KnobGenは、高レベルのセマンティクスにはCGC(Coarse-Grained Controller)モジュール、詳細な洗練にはFGC(Final-Grained Controller)モジュールを使用する。これら2つのモジュールの相対強度は、我々のノブ推論機構によって調整され、ユーザの特定のニーズに合わせることができる。これらのメカニズムにより、KnobGenは初心者スケッチと味付けアーティストによって描かれたスケッチの両方から、柔軟に画像を生成することができる。これは、MultiGen-20Mデータセットと新たに収集されたスケッチデータセットに示されているように、画像の自然な外観を維持しながら最終的な出力を制御する。

Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.

翻訳日:2024-11-04 16:44:34 公開日:2024-10-11

# KnobGen: スケッチベース拡散モデルにおけるアートワークの高度化制御

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models ( http://arxiv.org/abs/2410.01595v2 )

ライセンス: Link先を確認

Pouyan Navard, Amin Karimi Monsefi, Mengxi Zhou, Wei-Lun Chao, Alper Yilmaz, Rajiv Ramnath,

翻訳日:2024-11-04 16:44:34 公開日:2024-10-11

# 解釈可能なコントラスト型モンテカルロ木探索手法

Interpretable Contrastive Monte Carlo Tree Search Reasoning ( http://arxiv.org/abs/2410.01707v1 )

ライセンス: Link先を確認

Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen,

(参考訳) 大規模言語モデル(LLM)のための新しいMCTS推論アルゴリズムであるSC-MCTS*を提案する。私たちのモチベーションは: 1. 従来のMCTS LLM推論作業は、CoTと比較して最大の欠点-スロースピードを見落としていることが多い。 2) 従来の研究は, LLM推論のツールとしてMCTSを主に用いており, 定量分析が限定的であったり, 解釈可能性の観点からその成分のアブレーション研究を行ったりしていた。 3)報奨モデルはMCTSにおいて最も重要な要素であるが,これまでの研究ではMCTSの報奨モデルの改良や詳細な研究はめったに行われていない。そこで我々は, LLMのMCTS推論性能に対する各成分の影響を明らかにするとともに, MCTSの成分に対する広範囲なアブレーション研究および定量的解析を行った。この上に建つ。一コントラスト復号の原理に基づく高度に解釈可能な報酬モデルを設計し、 (ii) は投機的復号法を用いて1ノードあたり51.9%の速度向上を達成した。また、 3) UCTノード選択戦略とバックプロパゲーションを改善した結果,性能が大幅に向上した。我々は,Llama-3.1-70BとSC-MCTS*を用いたBlocksworldのマルチステップ推論データセットにおいて,平均17.4%でo1-miniを上回りました。

We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable reward model based on the principle of contrastive decoding and (ii) achieved an average speed improvement of 51.9% per node using speculative decoding. Additionally, (iii) we improved UCT node selection strategy and backpropagation used in previous works, resulting in significant performance improvement. We outperformed o1-mini by an average of 17.4% on the Blocksworld multi-step reasoning dataset using Llama-3.1-70B with SC-MCTS*.

翻訳日:2024-11-04 15:53:34 公開日:2024-10-11

# 解釈可能なコントラスト型モンテカルロ木探索手法

Interpretable Contrastive Monte Carlo Tree Search Reasoning ( http://arxiv.org/abs/2410.01707v2 )

ライセンス: Link先を確認

Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen,

(参考訳) 大規模言語モデル(LLM)のための新しいMCTS推論アルゴリズムであるSC-MCTS*を提案する。私たちのモチベーションは: 1. 従来のMCTS LLM推論作業は、CoTと比較して最大の欠点-スロースピードを見落としていることが多い。 2) 従来の研究は, LLM推論のツールとしてMCTSを主に用いており, 定量分析が限定的であったり, 解釈可能性の観点からその成分のアブレーション研究を行ったりしていた。 3)報奨モデルはMCTSにおいて最も重要な要素であるが,これまでの研究ではMCTSの報奨モデルの改良や詳細な研究はめったに行われていない。そこで我々は, LLMのMCTS推論性能に対する各成分の影響を明らかにするとともに, MCTSの成分に対する広範囲なアブレーション研究および定量的解析を行った。この上に建つ。一コントラスト復号の原理に基づく高度に解釈可能な報酬モデルを設計し、 (ii) は投機的復号法を用いて1ノードあたり51.9%の速度向上を達成した。また、 3) UCTノード選択戦略とバックプロパゲーションを改善した結果,性能が大幅に向上した。我々は,Llama-3.1-70BとSC-MCTS*を用いたBlocksworldのマルチステップ推論データセットにおいて,平均17.4%でo1-miniを上回りました。私たちのコードは \url{https://github.com/zitian-gao/SC-MCTS} で利用可能です。

We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable reward model based on the principle of contrastive decoding and (ii) achieved an average speed improvement of 51.9% per node using speculative decoding. Additionally, (iii) we improved UCT node selection strategy and backpropagation used in previous works, resulting in significant performance improvement. We outperformed o1-mini by an average of 17.4% on the Blocksworld multi-step reasoning dataset using Llama-3.1-70B with SC-MCTS*. Our code is available at \url{https://github.com/zitian-gao/SC-MCTS}.

翻訳日:2024-11-04 15:53:34 公開日:2024-10-11

# 経口摂取:RAGを併用したLLMからの幻覚を除去する

Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG ( http://arxiv.org/abs/2410.02825v1 )

ライセンス: Link先を確認

Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert,

(参考訳) 本稿では, LLM と RAG によるプライバシープロセスの効率向上の可能性を秘めた新しい手法を提案する。幻覚を抑えるため,プライバシ固有の知識ベースでベースLLMモデルを継続的に事前訓練し,意味的なRAG層で拡張する。評価の結果,本手法は,不正確さを低減した事実情報による応答を基礎として,プライバシ関連クエリ処理におけるモデル性能(既定のLCMと比較して最大2倍のメトリクス)の向上を図っている。

This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.

翻訳日:2024-11-03 05:34:38 公開日:2024-10-11

# 経口摂取:RAGを併用したLLMからの幻覚を除去する

Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG ( http://arxiv.org/abs/2410.02825v2 )

ライセンス: Link先を確認

Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert,

翻訳日:2024-11-03 05:34:38 公開日:2024-10-11

# CAnDOIT: 時系列からの観測データと干渉データによる因果発見

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v1 )

ライセンス: Link先を確認

Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,

(参考訳) 原因と効果の研究は科学の多くの分野において最重要であり、知的システムの多くの実践的応用にも重要である。特に、隠れ要因を含む状況における因果関係の同定は、因果モデルを構築するための観察データのみに依存する手法にとって大きな課題である。本稿では,観測時系列データと介入時系列データの両方を用いて因果関係モデルを再構成する因果関係探索手法であるCAnDOITを提案する。因果解析における介入データの使用は、シナリオが複雑であり、観測データだけでは正しい因果構造を明らかにするのに不十分な、ロボット工学のような現実世界の応用にとって不可欠である。この手法の検証は、まずランダムに生成された合成モデル上で行われ、その後、ロボット操作環境における因果構造学習のためのよく知られたベンチマークで行われる。実験により、アプローチは介入からのデータを効果的に処理し、それらを活用して因果解析の精度を高めることができることが示された。 CAnDOITのPython実装も開発され、GitHubで公開されている: https://github.com/lcastri/causalflow。

The study of cause-and-effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This paper proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time-series data. The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well-known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub: https://github.com/lcastri/causalflow.

翻訳日:2024-11-03 05:24:53 公開日:2024-10-11

# CAnDOIT: 時系列からの観測データと干渉データによる因果発見

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v2 )

ライセンス: Link先を確認

Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,

翻訳日:2024-11-03 05:24:53 公開日:2024-10-11

# CAnDOIT: 時系列からの観測データと干渉データによる因果発見

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series ( http://arxiv.org/abs/2410.02844v3 )

ライセンス: Link先を確認

Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto,

翻訳日:2024-11-03 05:24:53 公開日:2024-10-11

# マイクロ波フォトニッククラスター状態の効率的なトモグラフィー

Efficient tomography of microwave photonic cluster states ( http://arxiv.org/abs/2410.03345v1 )

ライセンス: Link先を確認

Yoshiki Sunada, Shingo Kono, Jesper Ilves, Takanori Sugiyama, Yasunari Suzuki, Tsuyoshi Okubo, Shuhei Tamate, Yutaka Tabuchi, Yasunobu Nakamura,

(参考訳) 多数の量子ビット間の絡み合いは多くの量子アルゴリズムにとって重要な資源である。このような多体状態は、光学領域やマイクロ波領域において、イテナントフォトニック量子ビットの連鎖を絡み合わせることで効率よく生成される。しかし、指数関数的に大きな密度行列を実験的に再構成することで生成した多体状態を完全に特徴づけることは依然として困難である。本稿では, 行列積演算式に基づく効率的なトモグラフィ法を開発し, 2^{35} \times 2^{35}$密度行列を再構成することにより, 最大35個のマイクロ波フォトニック量子ビットのクラスタ状態でこれを実証する。これにより,大規模なクラスタ状態を生成する場合にのみ発生する光子源の性能劣化を検出することができる。このトモグラフィー法は一般に、絡み合った量子ビットの様々な物理的実現に適用でき、絡み合った光子の高忠実度源の開発を導くための効率的なベンチマーク法を提供する。

Entanglement among a large number of qubits is a crucial resource for many quantum algorithms. Such many-body states have been efficiently generated by entangling a chain of itinerant photonic qubits in the optical or microwave domain. However, it has remained challenging to fully characterize the generated many-body states by experimentally reconstructing their exponentially large density matrices. Here, we develop an efficient tomography method based on the matrix-product-operator formalism and demonstrate it on a cluster state of up to 35 microwave photonic qubits by reconstructing its $2^{35} \times 2^{35}$ density matrix. The full characterization enables us to detect the performance degradation of our photon source which occurs only when generating a large cluster state. This tomography method is generally applicable to various physical realizations of entangled qubits and provides an efficient benchmarking method for guiding the development of high-fidelity sources of entangled photons.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-11

# マイクロ波フォトニッククラスター状態の効率的なトモグラフィー

Efficient tomography of microwave photonic cluster states ( http://arxiv.org/abs/2410.03345v2 )

ライセンス: Link先を確認

Yoshiki Sunada, Shingo Kono, Jesper Ilves, Takanori Sugiyama, Yasunari Suzuki, Tsuyoshi Okubo, Shuhei Tamate, Yutaka Tabuchi, Yasunobu Nakamura,

翻訳日:2024-11-02 22:58:37 公開日:2024-10-11

# 大規模視線言語モデルにおける多言語間知識衝突の解き放つ

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models ( http://arxiv.org/abs/2410.03659v1 )

ライセンス: Link先を確認

Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen,

(参考訳) LVLM(Large Vision-Language Models)は、マルチモーダル入力をキャプチャし、推論する能力を示す。しかし、これらのモデルは、そのビジョンと言語構成要素の間の表現された知識の不整合から生じるパラメトリックな知識の矛盾を招きやすい。本稿では,$\textbf{cross-modality parametric knowledge conflict}$の問題を正式に定義し,それらを検出,解釈,緩和するための体系的なアプローチを提案する。モデルのサイズに関わらず,近年のLVLMにおけるモダリティ間のコンフリクトレートが持続的に高いことを示す,視覚的およびテキスト的回答間のコンフリクトを識別するパイプラインを導入する。さらに、これらの競合がどのように推論プロセスに干渉するかを考察し、競合するサンプルを他者から識別するための対照的な指標を提案する。これらの知見に基づいて,回答信頼度に基づく不確実性成分から推定される望ましくないロジットを除去する動的コントラスト復号法を開発した。ログを提供しないモデルに対しては、競合を緩和するための2つのプロンプトベースの戦略を導入します。提案手法は,ViQuAEデータセットとInfoSeekデータセットの両方において,有望な精度向上を実現する。具体的には、LLaVA-34Bを用いて、動的コントラストデコーディングにより平均2.24%の精度が向上する。

Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs. However, these models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components. In this paper, we formally define the problem of $\textbf{cross-modality parametric knowledge conflict}$ and present a systematic approach to detect, interpret, and mitigate them. We introduce a pipeline that identifies conflicts between visual and textual answers, showing a persistently high conflict rate across modalities in recent LVLMs regardless of the model size. We further investigate how these conflicts interfere with the inference process and propose a contrastive metric to discern the conflicting samples from the others. Building on these insights, we develop a novel dynamic contrastive decoding method that removes undesirable logits inferred from the less confident modality components based on answer confidence. For models that do not provide logits, we also introduce two prompt-based strategies to mitigate the conflicts. Our methods achieve promising improvements in accuracy on both the ViQuAE and InfoSeek datasets. Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2.24%.

翻訳日:2024-11-02 20:58:02 公開日:2024-10-11

# 大規模視覚言語モデルにおけるモダリティ間の知識紛争の解き放つ

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models ( http://arxiv.org/abs/2410.03659v2 )

ライセンス: Link先を確認

Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen,

翻訳日:2024-11-02 20:48:16 公開日:2024-10-11

# タブラルデータを用いた診断におけるグラディエントブースティング決定木

Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data ( http://arxiv.org/abs/2410.03705v1 )

ライセンス: Link先を確認

A. Yarkın Yıldız, Asli Kalayci,

(参考訳) 医学的診断は、正確な分類と治療の提供の観点から、医療分野において重要な課題である。正しい診断に基づいて、ほぼ正確な決定を下すことは、患者の生活そのものに影響を与え、正しく分類されていない場合、大惨事を引き起こす可能性がある。サポートベクタマシン(SVM)やロジスティックレグレッション、TabNetやTabTransformerといった最先端の表層深層学習(DL)メソッドなど、従来の機械学習(ML)が提案され、表層医学データセット上で使用されている。さらに、性能の向上、計算コストの低減、タスクの最適化の容易化などにより、近年ではアンサンブル法が使われている。それらは、いくつかの診断タスクにおいて、医療上の意思決定プロセスの成功という観点で、強力な代替手段を提供する。本研究では,XGBoost,CatBoost,LightGBMに着目し,アンサンブル手法,特にグラフデータを用いた医学分類タスクにおけるGBDTアルゴリズムの利点について検討した。実験では、GBDTメソッドが従来のMLやディープニューラルネットワークアーキテクチャよりも優れており、いくつかのベンチマーク表診断データセットよりも平均ランクが高いことが示されている。さらに、DLモデルに比べて計算能力ははるかに少なく、高い性能と低い複雑さの観点から最適な方法論を作成する。

Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.

翻訳日:2024-11-02 20:38:12 公開日:2024-10-11

# タブラルデータを用いた診断におけるグラディエントブースティング決定木

Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data ( http://arxiv.org/abs/2410.03705v2 )

ライセンス: Link先を確認

A. Yarkın Yıldız, Asli Kalayci,

翻訳日:2024-11-02 20:38:12 公開日:2024-10-11

# 教師なしの人間選好学習

Unsupervised Human Preference Learning ( http://arxiv.org/abs/2410.03731v2 )

ライセンス: Link先を確認

Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Dilek Hakkani Tur,

(参考訳) 大規模言語モデルは、印象的な推論能力を示すが、個々のユーザの好み情報がないため、パーソナライズされたコンテンツの提供に苦慮している。文脈内学習やパラメータ効率のよい微調整といった既存の手法は、個人の所有する小さな個人データセットを考えると、人間の嗜好の複雑さを捉えるには不十分である。本稿では,より大規模で訓練済みのモデルを指導する自然言語規則を生成するために,小パラメータモデルを選好エージェントとして活用し,効率的なパーソナライズを実現する手法を提案する。提案手法では, より大規模な基礎モデルの出力を誘導し, 大規模モデルの広範な知識と能力を活用しながら, 個人の好みに合わせたコンテンツを生成する。重要なのは、このパーソナライゼーションは、大きなモデルを微調整する必要がないことだ。メールや記事のデータセットによる実験結果から,本手法がベースラインのパーソナライズ手法を著しく上回ることを示した。基礎モデルをデータと計算効率のよい方法で個別の好みに適応させることにより、我々のアプローチは高度にパーソナライズされた言語モデルアプリケーションへの道を開く。

Large language models demonstrate impressive reasoning abilities but struggle to provide personalized content due to their lack of individual user preference information. Existing methods, such as in-context learning and parameter-efficient fine-tuning, fall short in capturing the complexity of human preferences, especially given the small, personal datasets individuals possess. In this paper, we propose a novel approach utilizing small parameter models as preference agents to generate natural language rules that guide a larger, pre-trained model, enabling efficient personalization. Our method involves a small, local "steering wheel" model that directs the outputs of a much larger foundation model, producing content tailored to an individual's preferences while leveraging the extensive knowledge and capabilities of the large model. Importantly, this personalization is achieved without the need to fine-tune the large model. Experimental results on email and article datasets, demonstrate that our technique significantly outperforms baseline personalization methods. By allowing foundation models to adapt to individual preferences in a data and compute-efficient manner, our approach paves the way for highly personalized language model applications.

翻訳日:2024-11-02 20:18:28 公開日:2024-10-11

# 教師なしの人間選好学習

Unsupervised Human Preference Learning ( http://arxiv.org/abs/2410.03731v3 )

ライセンス: Link先を確認

Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Dilek Hakkani-Tür,

翻訳日:2024-11-02 20:18:28 公開日:2024-10-11

# 対向ロバスト性のための脳誘発正則化器

A Brain-Inspired Regularizer for Adversarial Robustness ( http://arxiv.org/abs/2410.03952v1 )

ライセンス: Link先を確認

Elie Attias, Cengiz Pehlevan, Dina Obeid,

(参考訳) 畳み込みニューラルネットワーク(CNN)は多くの視覚的タスクに優れるが、人間の目には知覚できないわずかな入力摂動に敏感であり、しばしばタスクの失敗をもたらす。近年の研究では、脳に似た表現を促進する正則化器を用いたCNNのトレーニングが、ニューラル記録を用いて、モデルロバスト性を改善することが示されている。しかしながら、ニューラルネットワークの使用要件は、これらの方法の有用性を厳しく制限する。ニューラル記録を必要とせずに、ニューラルレギュレータの計算機能を模倣する正規化器を開発することは可能か? 本研究では、Li et al (2019) で導入された神経正則化器を検査し、その基礎となる強度を抽出する。正規化器は神経表現類似性を用いており、画素類似性とも相関している。この発見に触発され,オリジナルの本質を保ちながら画像画素の類似性を用いて計算される新たな正則化器を導入し,ニューラル記録の必要性を排除した。我々の正規化方法が示す。 1) モデルロバスト性は, 各種データセットに対するブラックボックス攻撃の範囲に大きく向上した。 2) 計算コストが低く、元のデータセットにのみ依存する。我々の研究は、生物学的に動機付けられた損失関数が人工ニューラルネットワークの性能向上にどのように役立つかを探る。

Convolutional Neural Networks (CNNs) excel in many visual tasks, but they tend to be sensitive to slight input perturbations that are imperceptible to the human eye, often resulting in task failures. Recent studies indicate that training CNNs with regularizers that promote brain-like representations, using neural recordings, can improve model robustness. However, the requirement to use neural data severely restricts the utility of these methods. Is it possible to develop regularizers that mimic the computational function of neural regularizers without the need for neural recordings, thereby expanding the usability and effectiveness of these techniques? In this work, we inspect a neural regularizer introduced in Li et al. (2019) to extract its underlying strength. The regularizer uses neural representational similarities, which we find also correlate with pixel similarities. Motivated by this finding, we introduce a new regularizer that retains the essence of the original but is computed using image pixel similarities, eliminating the need for neural recordings. We show that our regularization method 1) significantly increases model robustness to a range of black box attacks on various datasets and 2) is computationally inexpensive and relies only on original datasets. Our work explores how biologically motivated loss functions can be used to drive the performance of artificial neural networks.

翻訳日:2024-11-02 15:10:07 公開日:2024-10-11

# 対向ロバスト性のための脳誘発正則化器

A Brain-Inspired Regularizer for Adversarial Robustness ( http://arxiv.org/abs/2410.03952v2 )

ライセンス: Link先を確認

Elie Attias, Cengiz Pehlevan, Dina Obeid,

翻訳日:2024-11-02 15:10:07 公開日:2024-10-11

# EIP-4844の180日後:小さなロールアップで溶剤ジレンマを共有するか?

180 Days After EIP-4844: Will Blob Sharing Solve Dilemma for Small Rollups? ( http://arxiv.org/abs/2410.04111v1 )

ライセンス: Link先を確認

Suhyeon Lee,

(参考訳) EIP-4844によるブロブの導入により、Ethereum上のロールアップに対するデータアベイラビリティ(DA)コストが大幅に削減された。しかし、128KBのブロブの固定サイズのため、データスループットの低いロールアップはジレンマに直面している。複数のロールアップがひとつのブロブを共有するブロブ共有は、この問題の解決策として提案されている。本稿では,EIP-4844の実装から約6ヶ月後に収集した実世界データに基づくブロブ共有の有効性について検討する。簡単なブロブ共有形式を用いてコスト変化をシミュレートすることにより、ブロブ共有が小規模ロールアップのコストとDAサービス品質を大幅に改善し、ジレンマを効果的に解消できることを実証する。特に, ブロブシェアリングによるブロブベース手数料の平滑化効果に起因して, 多くのロールアップにおいてUSDのコスト削減が90%を超えることが観察された。

The introduction of blobs through EIP-4844 has significantly reduced the Data Availability (DA) costs for rollups on Ethereum. However, due to the fixed size of blobs at 128 KB, rollups with low data throughput face a dilemma: they either use blobs inefficiently or decrease the frequency of DA submissions. Blob sharing, where multiple rollups share a single blob, has been proposed as a solution to this problem. This paper examines the effectiveness of blob sharing based on real-world data collected approximately six months after the implementation of EIP-4844. By simulating cost changes using a simple blob sharing format, we demonstrate that blob sharing can substantially improve the costs and DA service quality for small rollups, effectively resolving their dilemma. Notably, we observed cost reductions in USD exceeding 90% for most of the rollups when they cooperate, attributable to the smoothing effect of the blob base fee achieved through blob sharing.

翻訳日:2024-11-02 14:01:04 公開日:2024-10-11

# EIP-4844の180日後:小さなロールアップで溶剤ジレンマを共有するか?

180 Days After EIP-4844: Will Blob Sharing Solve Dilemma for Small Rollups? ( http://arxiv.org/abs/2410.04111v2 )

ライセンス: Link先を確認

Suhyeon Lee,

(参考訳) EIP-4844によるブロブの導入により、Ethereum上のロールアップに対するデータアベイラビリティ(DA)コストが大幅に削減された。しかし、128KBのブロブの固定サイズのため、データスループットの低いロールアップはジレンマに直面している。複数のロールアップがひとつのブロブを共有するブロブ共有は、この問題の解決策として提案されている。本稿では,EIP-4844の実装から約6ヶ月後に収集した実世界データに基づくブロブ共有の有効性について検討する。簡単なブロブ共有形式を用いてコスト変化をシミュレートすることにより、ブロブ共有が小規模ロールアップのコストとDAサービス品質を大幅に改善し、ジレンマを効果的に解消できることを実証する。特に, ブロブシェアリングによるブロブベース手数料の平滑化効果に起因して, 多くのロールアップにおいてUSDのコスト削減が85%を超えることが確認された。

The introduction of blobs through EIP-4844 has significantly reduced the Data Availability (DA) costs for rollups on Ethereum. However, due to the fixed size of blobs at 128 KB, rollups with low data throughput face a dilemma: they either use blobs inefficiently or decrease the frequency of DA submissions. Blob sharing, where multiple rollups share a single blob, has been proposed as a solution to this problem. This paper examines the effectiveness of blob sharing based on real-world data collected approximately six months after the implementation of EIP-4844. By simulating cost changes using a simple blob sharing format, we demonstrate that blob sharing can substantially improve the costs and DA service quality for small rollups, effectively resolving their dilemma. Notably, we observed cost reductions in USD exceeding 85% for most of the rollups when they cooperate, attributable to the smoothing effect of the blob base fee achieved through blob sharing.

翻訳日:2024-11-02 14:01:04 公開日:2024-10-11

# ブラックボックスと外見ガラス:ディープネットワークにおけるマルチレベル対称性,反射面,凸最適化

Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks ( http://arxiv.org/abs/2410.04279v1 )

ライセンス: Link先を確認

Emi Zeger, Mert Pilanci,

(参考訳) 絶対値アクティベーションと任意の入力次元を持つディープニューラルネットワーク(DNN)のトレーニングは,幾何代数を用いて表現された新しい特徴を持つ等価凸ラッソ問題として定式化可能であることを示す。この定式化は、ニューラルネットワークの対称性をコードする幾何学的構造を明らかにする。 DNNの等価なラッソ形式を用いて、我々は、ディープネットワークと浅層ネットワークの根本的な区別を正式に証明する:ディープネットワークは本来、それらの適合する関数において対称構造を好んでおり、より深いディープネットワークは、マルチレベル対称性、すなわち対称性内での対称性を可能にする。さらに、ラッソの特徴は、訓練点を越えて反射される超平面への距離を表す。これらの反射超平面は、トレーニングデータによって分散され、最適な重みベクトルに直交する。大規模言語モデルによる埋め込みを用いた学習ネットワークにおける理論支援と理論的予測特徴の実証実験

We show that training deep neural networks (DNNs) with absolute value activation and arbitrary input dimension can be formulated as equivalent convex Lasso problems with novel features expressed using geometric algebra. This formulation reveals geometric structures encoding symmetry in neural networks. Using the equivalent Lasso form of DNNs, we formally prove a fundamental distinction between deep and shallow networks: deep networks inherently favor symmetric structures in their fitted functions, with greater depth enabling multilevel symmetries, i.e., symmetries within symmetries. Moreover, Lasso features represent distances to hyperplanes that are reflected across training points. These reflection hyperplanes are spanned by training data and are orthogonal to optimal weight vectors. Numerical experiments support theory and demonstrate theoretically predicted features when training networks using embeddings generated by Large Language Models.

翻訳日:2024-11-02 08:39:47 公開日:2024-10-11

Emi Zeger, Mert Pilanci,

翻訳日:2024-11-02 08:39:47 公開日:2024-10-11

# 画像生成のための地域プリミティブの分散化

Disentangling Regional Primitives for Image Generation ( http://arxiv.org/abs/2410.04421v1 )

ライセンス: Link先を確認

Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang,

(参考訳) 本稿では,画像生成のためのニューラルネットワークの内部表現構造を説明する手法を提案する。具体的には、ニューラルネットワークの中間層の特徴からプリミティブな特徴成分を分離し、各特徴成分が特定の画像領域を生成するためにのみ使用されることを保証する。このようにして、画像全体の生成は、様々なプリエンコードされた原始的地域パターンの重ね合わせと見なすことができ、それぞれが特徴成分によって生成される。特徴成分は、ニューラルネットワークによって符号化された異なる画像領域を生成する要求の間のOR関係として表現できる。したがって、Harsanyi相互作用を拡張してそのようなOR相互作用を表現し、特徴成分をアンタングルする。実験では、各特徴成分と特定の画像領域の生成との明確な対応を示す。

This paper presents a method to explain the internal representation structure of a neural network for image generation. Specifically, our method disentangles primitive feature components from the intermediate-layer feature of the neural network, which ensures that each feature component is exclusively used to generate a specific set of image regions. In this way, the generation of the entire image can be considered as the superposition of different pre-encoded primitive regional patterns, each being generated by a feature component. We find that the feature component can be represented as an OR relationship between the demands for generating different image regions, which is encoded by the neural network. Therefore, we extend the Harsanyi interaction to represent such an OR interaction to disentangle the feature component. Experiments show a clear correspondence between each feature component and the generation of specific image regions.

翻訳日:2024-11-02 07:51:01 公開日:2024-10-11

# 画像生成のための地域プリミティブの分散化

Disentangling Regional Primitives for Image Generation ( http://arxiv.org/abs/2410.04421v2 )

ライセンス: Link先を確認

Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang,

翻訳日:2024-11-02 07:51:01 公開日:2024-10-11

# ニューラルネットワークにおける冗長計算ブロックの検出と近似

Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v1 )

ライセンス: Link先を確認

Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,

(参考訳) ディープニューラルネットワークはしばしば、異なるモデルとそれぞれの層の両方で、同様の内部表現を学習する。ネットワーク間の類似性は、モデルの縫合やマージといった技術を可能にする一方で、ネットワーク内の類似性は、より効率的なアーキテクチャを設計する新たな機会を提供する。本稿では、様々なニューラルネットワークアーキテクチャにおいて、これらの内部的類似性の出現について検討し、その類似性パターンが使用するデータセットから独立して現れることを示す。冗長ブロックを検出するための単純なメトリックであるBlock Redundancyを導入し、将来のアーキテクチャ最適化手法の基礎を提供する。これに基づいて,より単純な変換を用いて1つ以上の冗長な計算ブロックを特定し,近似する一般的なフレームワークである冗長ブロック近似(RBA)を提案する。 2つの表現間の変換 $\mathcal{T}$ がクローズド形式で効率的に計算できることを示し、ネットワークから冗長ブロックを置き換えるのに十分である。 RBAは、優れたパフォーマンスを維持しながら、モデルパラメータと時間の複雑さを減らす。我々は,事前学習された基礎モデルとデータセットを用いて,視覚領域における分類タスクの検証を行った。

Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.

翻訳日:2024-11-02 01:07:35 公開日:2024-10-11

# ニューラルネットワークにおける冗長計算ブロックの検出と近似

Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v2 )

ライセンス: Link先を確認

Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,

翻訳日:2024-11-02 01:07:35 公開日:2024-10-11

# ニューラルネットワークにおける冗長計算ブロックの検出と近似

Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v3 )

ライセンス: Link先を確認

Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,

翻訳日:2024-11-02 01:07:35 公開日:2024-10-11

# ニューラルネットワークにおける冗長計算ブロックの検出と近似

Detecting and Approximating Redundant Computational Blocks in Neural Networks ( http://arxiv.org/abs/2410.04941v4 )

ライセンス: Link先を確認

Irene Cannistraci, Emanuele Rodolà, Bastian Rieck,

翻訳日:2024-11-02 01:07:35 公開日:2024-10-11

# Agnostic Smoothed Online Learning

Agnostic Smoothed Online Learning ( http://arxiv.org/abs/2410.05124v1 )

ライセンス: Link先を確認

Moïse Blanchard,

(参考訳) 統計学習における古典的な結果は、2つの極端なデータ生成モデルを考えるのが一般的である。これらのモデルのギャップを埋めるために、最近の研究は滑らかなフレームワークを導入し、各イテレーションにおいて、ある固定基底測度$\mu$に対して$\sigma^{-1}$で束縛された密度を持つような分布から敵がインスタンスを生成する。このフレームワークは、$\sigma$ の値に依存する i.i.d. と逆の場合を補間する。古典的オンライン予測問題において、スムーズなオンライン学習の先行結果は、学習者に対して、PAC学習や一貫性文学における標準設定と対照的に、基本測度$\mu$が知られているという間違いなく強い仮定に依存している。基本測度が未知であり、値が任意である一般的な不可知問題を考える。この方向に沿って、Blockらは、経験的リスク最小化は、明確に定義された仮定の下で、サブリニアな後悔であることを示した。本稿では,再帰的被覆に基づくR-Coverアルゴリズムを提案する。分類に関して、R-Cover は VC 次元 $d$ を持つ函数クラスに対して、適応的後悔 $\tilde O(\sqrt{dT/\sigma})$ を持つことを証明している。回帰に関して、R-Cover は多項式脂肪散乱次元成長を持つ関数類に対して、サブ線形な後悔を持つことを確かめる。

Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.

翻訳日:2024-11-02 00:08:45 公開日:2024-10-11

# Agnostic Smoothed Online Learning

Agnostic Smoothed Online Learning ( http://arxiv.org/abs/2410.05124v2 )

ライセンス: Link先を確認

Moïse Blanchard,

翻訳日:2024-11-02 00:08:45 公開日:2024-10-11

# VLM2Vec:大規模マルチモーダル埋め込みタスクのためのビジョンランゲージモデルの訓練

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks ( http://arxiv.org/abs/2410.05160v1 )

ライセンス: Link先を確認

Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen,

(参考訳) 埋め込みモデルは、セマンティックな類似性、情報検索、クラスタリングなど、さまざまな下流タスクを可能にする上で重要である。近年,タスク(例えばMTEB)をまたいで一般化可能なユニバーサルテキスト埋め込みモデルの開発への関心が高まっている。しかし, 汎用マルチモーダル埋め込みモデルの学習の進展は, その重要性にもかかわらず比較的遅かった。本研究では,幅広い下流タスクを扱える普遍的な埋め込み構築の可能性を探究する。 1 MMEB(Massive Multimodal Embedding Benchmark)は、4 つのメタタスク(分類、視覚的質問応答、マルチモーダル検索、視覚的グラウンド)と36 つのデータセット(20 のトレーニングと16 の評価データセットを含む)と、2 の VLM2Vec (Vision-Language Model -> Vector) を含む。 CLIPやBLIPのような以前のモデルとは異なり、VLM2Vecは画像とテキストの組み合わせを処理してタスク命令に基づいた固定次元ベクトルを生成することができる。我々は,Phi-3.5-V上に一連のVLM2Vecモデルを構築し,MMEBの評価分割に基づいて評価する。以上の結果から,MMEBにおける既存マルチモーダル埋め込みモデルとアウト・オブ・ディストリビューションデータセットの双方において,モデルが10%から20%の絶対的な平均的改善を達成できることが示唆された。

Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding models that can generalize across tasks (e.g., MTEB). However, progress in learning universal multimodal embedding models has been relatively slow despite their importance. In this work, we aim to explore the potential for building universal embeddings capable of handling a wide range of downstream tasks. Our contributions are twofold: (1) MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training and 16 evaluation datasets, and (2) VLM2Vec (Vision-Language Model -> Vector), a contrastive training framework that converts any state-of-the-art vision-language model into an embedding model via training on MMEB. Unlike previous models such as CLIP and BLIP, VLM2Vec can process any combination of images and text to generate a fixed-dimensional vector based on task instructions. We build a series of VLM2Vec models on Phi-3.5-V and evaluate them on MMEB's evaluation split. Our results show that \model achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models on both in-distribution and out-of-distribution datasets in MMEB.

翻訳日:2024-11-01 23:58:57 公開日:2024-10-11

# VLM2Vec:大規模マルチモーダル埋め込みタスクのためのビジョンランゲージモデルの訓練

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks ( http://arxiv.org/abs/2410.05160v2 )

ライセンス: Link先を確認

Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen,

(参考訳) 埋め込みモデルは、セマンティックな類似性、情報検索、クラスタリングなど、さまざまな下流タスクを可能にする上で重要である。近年,タスク(例えばMTEB)をまたいで一般化可能なユニバーサルテキスト埋め込みモデルの開発への関心が高まっている。しかし, 汎用マルチモーダル埋め込みモデルの学習の進展は, その重要性にもかかわらず比較的遅かった。本研究では,幅広い下流タスクを扱える普遍的な埋め込み構築の可能性を探究する。 1 MMEB(Massive Multimodal Embedding Benchmark)は、4 つのメタタスク(分類、視覚的質問応答、マルチモーダル検索、視覚的グラウンド)と36 つのデータセット(20 のトレーニングと16 の評価データセットを含む)と、2 の VLM2Vec (Vision-Language Model -> Vector) を含む。 CLIPやBLIPのような以前のモデルとは異なり、VLM2Vecは画像とテキストの組み合わせを処理してタスク命令に基づいた固定次元ベクトルを生成することができる。我々は,Phi-3.5-V上に一連のVLM2Vecモデルを構築し,MMEBの評価分割に基づいて評価する。以上の結果から,VLM2Vecは,MMEBにおける既存のマルチモーダル埋め込みモデルよりも10%から20%の絶対的な平均的改善を実現していることがわかった。

Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding models that can generalize across tasks (e.g., MTEB). However, progress in learning universal multimodal embedding models has been relatively slow despite their importance. In this work, we aim to explore the potential for building universal embeddings capable of handling a wide range of downstream tasks. Our contributions are twofold: (1) MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training and 16 evaluation datasets, and (2) VLM2Vec (Vision-Language Model -> Vector), a contrastive training framework that converts any state-of-the-art vision-language model into an embedding model via training on MMEB. Unlike previous models such as CLIP and BLIP, VLM2Vec can process any combination of images and text to generate a fixed-dimensional vector based on task instructions. We build a series of VLM2Vec models on Phi-3.5-V and evaluate them on MMEB's evaluation split. Our results show that VLM2Vec achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models on both in-distribution and out-of-distribution datasets in MMEB.

翻訳日:2024-11-01 23:58:57 公開日:2024-10-11

# マルチモーダル連続学習の最近の進歩:包括的調査

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey ( http://arxiv.org/abs/2410.05352v1 )

ライセンス: Link先を確認

Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King,

(参考訳) 継続学習(CL)は、機械学習モデルに新しいデータから継続的に学習する権限を付与することを目的としている。機械学習モデルは、小規模から大規模に事前訓練されたアーキテクチャへと進化し、また、非モーダルデータからマルチモーダルデータへのサポートから、近年、マルチモーダル連続学習(MMCL)手法が出現している。 MMCLの最大の課題は、単純で単調なCLメソッドの積み重ねを超えることである。本研究はMMCLに関する総合的な調査である。本研究は,MMCL手法の構造的分類だけでなく,基本的な背景知識とMMCL設定を提供する。我々は,既存のMMCLメソッドを,正規化ベース,アーキテクチャベース,リプレイベース,プロンプトベースという4つのカテゴリに分類し,その方法論を説明し,重要なイノベーションを強調した。さらに,この分野でのさらなる研究を促進するため,オープンなMMCLデータセットとベンチマークを要約し,今後の研究・開発に向けたいくつかの今後の方向性について論じる。関連するMMCL論文やオープンリソースをインデックスするGitHubリポジトリも、https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learningで公開しています。

Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As machine learning models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary challenge of MMCL is that it goes beyond a simple stacking of unimodal CL methods, as such straightforward approaches often yield unsatisfactory performance. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize existing MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, and discuss several promising future directions for investigation and development. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.

翻訳日:2024-11-01 19:17:28 公開日:2024-10-11

# マルチモーダル連続学習の最近の進歩:包括的調査

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey ( http://arxiv.org/abs/2410.05352v2 )

ライセンス: Link先を確認

Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King,

翻訳日:2024-11-01 19:07:22 公開日:2024-10-11

# 基底自由点展開機能工学のための暗黙的に学習されたニューラルフェーズ関数

Implicitly Learned Neural Phase Functions for Basis-Free Point Spread Function Engineering ( http://arxiv.org/abs/2410.05413v1 )

ライセンス: Link先を確認

Aleksey Valouev, Rachel Chan,

(参考訳) ポイントスプレッド機能(PSF)技術は、ニューラルネットワーク、蛍光顕微鏡、バイオフォトニクスなど、計算画像における光の焦点を正確に制御するために欠かせない。 PSF は位相関数のフーリエ変換の大きさから導かれ、PSF (PSF Engineering) に与えられた位相関数の構成が不適切な逆問題となる。従来のPSF工学手法は物理基底関数に依存しており、撮像タスクに必要なPSFの範囲で一般化する能力を制限する。本稿では, 位相関数の品質において, 既存の画素ワイズ最適化手法を著しく上回る暗黙のニューラル表現を活用する新しい手法を提案する。

Point spread function (PSF) engineering is vital for precisely controlling the focus of light in computational imaging, with applications in neural imaging, fluorescence microscopy, and biophotonics. The PSF is derived from the magnitude of the Fourier transform of a phase function, making the construction of the phase function given the PSF (PSF engineering) an ill-posed inverse problem. Traditional PSF engineering methods rely on physical basis functions, limiting their ability to generalize across the range of PSFs required for imaging tasks. We introduce a novel approach leveraging implicit neural representations that significantly outperforms existing pixel-wise optimization methods in phase function quality.

翻訳日:2024-11-01 18:57:16 公開日:2024-10-11

# 基底自由点展開機能工学のための暗黙的に学習されたニューラルフェーズ関数

Implicitly Learned Neural Phase Functions for Basis-Free Point Spread Function Engineering ( http://arxiv.org/abs/2410.05413v2 )

ライセンス: Link先を確認

Aleksey Valouev,

翻訳日:2024-11-01 18:57:16 公開日:2024-10-11

# ニューラルネットワークを用いたマルチスペクトル衛星画像からのアクティブ火災検知のための分類器の設計

Designing a Classifier for Active Fire Detection from Multispectral Satellite Imagery Using Neural Architecture Search ( http://arxiv.org/abs/2410.05425v1 )

ライセンス: Link先を確認

Amber Cassimon, Phil Reiter, Siegfried Mercelis, Kevin Mets,

(参考訳) 本稿では、強化学習に基づくニューラルアーキテクチャサーチ(NAS)エージェントを用いて、小型ニューラルネットワークを設計し、マルチスペクトル衛星画像上でアクティブな火災検知を行う。具体的には、単一マルチスペクトル画素が火災の一部であるかどうかを判断できるニューラルネットワークを設計することを目的としており、センサデータのオンボード処理を容易にするために、低地球軌道(LEO)ナノサテライト(LEO)の制約内で行うことを目的としている。強化学習を利用するには報酬関数が必要である。我々は、この報酬関数を、純粋にアーキテクチャの特徴からINT8精度への量子化に続いて、特定のアーキテクチャによって得られたF1スコアを予測する回帰モデルの形で提供する。このモデルは、ニューラルネットワークアーキテクチャのランダムなサンプルを収集し、これらのアーキテクチャをトレーニングし、それらの分類性能統計を収集して訓練される。 F1スコア以外にも、設計モデルのサイズを制限し、ナノサテライトプラットフォームが課すリソース制約に適合するように、トレーニング可能なパラメータの総数を報酬関数に含めています。最後に、最高のニューラルネットワークをGoogle Coral Micro Dev Boardにデプロイし、推論レイテンシと消費電力を評価しました。このニューラルネットワークは1,716のトレーニング可能なパラメータで構成され、平均984{\mu}を推論に用い、800mW前後を消費して推論を行う。これらの結果から,我々の強化学習に基づくNASアプローチは,未解決の新たな問題に適用できることが示唆された。

This paper showcases the use of a reinforcement learning-based Neural Architecture Search (NAS) agent to design a small neural network to perform active fire detection on multispectral satellite imagery. Specifically, we aim to design a neural network that can determine if a single multispectral pixel is a part of a fire, and do so within the constraints of a Low Earth Orbit (LEO) nanosatellite with a limited power budget, to facilitate on-board processing of sensor data. In order to use reinforcement learning, a reward function is needed. We supply this reward function in the shape of a regression model that predicts the F1 score obtained by a particular architecture, following quantization to INT8 precision, from purely architectural features. This model is trained by collecting a random sample of neural network architectures, training these architectures, and collecting their classification performance statistics. Besides the F1 score, we also include the total number of trainable parameters in our reward function to limit the size of the designed model and ensure it fits within the resource constraints imposed by nanosatellite platforms. Finally, we deployed the best neural network to the Google Coral Micro Dev Board and evaluated its inference latency and power consumption. This neural network consists of 1,716 trainable parameters, takes on average 984{\mu}s to inference, and consumes around 800mW to perform inference. These results show that our reinforcement learning-based NAS approach can be successfully applied to novel problems not tackled before.

翻訳日:2024-11-01 18:47:31 公開日:2024-10-11

Amber Cassimon, Phil Reiter, Siegfried Mercelis, Kevin Mets,

翻訳日:2024-11-01 18:47:31 公開日:2024-10-11

# PH-Dropout:ビュー合成のための臨床てんかん不確実性定量化

PH-Dropout: Prctical Epistemic Uncertainty Quantification for View Synthesis ( http://arxiv.org/abs/2410.05468v1 )

ライセンス: Link先を確認

Chuanhao Sun, Thanos Triantafyllou, Anthos Makris, Maja Drmač, Kai Xu, Luo Mai, Mahesh K. Marina,

(参考訳) Neural Radiance Fields (NeRF) と Gaussian Splatting (GS) を用いたビュー合成は、実世界のシナリオのレンダリングにおいて顕著な忠実さを示した。しかし, 視線合成における精度, 効率のよい不確実性定量化(UQ)の実践的方法が欠落している。既存のNeRFのアプローチでは、大きな計算オーバーヘッド(例: ``10x のトレーニング時間の増加" や ``10x の繰り返しトレーニング)を導入するか、特定の不確実性条件やモデルに制限される。特に、GSモデルは包括的てんかんUQに対する体系的なアプローチを欠いている。この機能は、ニューラルネットワークビュー合成の堅牢性とスケーラビリティを改善し、アクティブなモデル更新、エラー推定、不確実性に基づいたスケーラブルなアンサンブルモデリングを可能にするために重要である。本稿では,関数近似の観点からNeRFとGSに基づく手法を再検討し,3次元表現学習における重要な違いと接続を同定する。これらの知見に基づいて, PH-Dropout (Post hoc Dropout) の導入を行った。以上の結果から,PH-Dropoutの有効性が示唆された。

View synthesis using Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) has demonstrated impressive fidelity in rendering real-world scenarios. However, practical methods for accurate and efficient epistemic Uncertainty Quantification (UQ) in view synthesis are lacking. Existing approaches for NeRF either introduce significant computational overhead (e.g., ``10x increase in training time" or ``10x repeated training") or are limited to specific uncertainty conditions or models. Notably, GS models lack any systematic approach for comprehensive epistemic UQ. This capability is crucial for improving the robustness and scalability of neural view synthesis, enabling active model updates, error estimation, and scalable ensemble modeling based on uncertainty. In this paper, we revisit NeRF and GS-based methods from a function approximation perspective, identifying key differences and connections in 3D representation learning. Building on these insights, we introduce PH-Dropout (Post hoc Dropout), the first real-time and accurate method for epistemic uncertainty estimation that operates directly on pre-trained NeRF and GS models. Extensive evaluations validate our theoretical findings and demonstrate the effectiveness of PH-Dropout.

翻訳日:2024-11-01 18:28:00 公開日:2024-10-11

# PH-Dropout:ビュー合成のための実用的てんかん不確実性定量化

PH-Dropout: Practical Epistemic Uncertainty Quantification for View Synthesis ( http://arxiv.org/abs/2410.05468v2 )

ライセンス: Link先を確認

Chuanhao Sun, Thanos Triantafyllou, Anthos Makris, Maja Drmač, Kai Xu, Luo Mai, Mahesh K. Marina,

翻訳日:2024-11-01 18:28:00 公開日:2024-10-11

# T2V-Turbo-v2:データ・リワード・条件付き誘導設計によるビデオ生成後モデルの強化

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design ( http://arxiv.org/abs/2410.05677v1 )

ライセンス: Link先を確認

Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang,

(参考訳) 本稿では,事前学習したT2Vモデルから高機能な一貫性モデルを蒸留することにより,後学習段階における拡散型テキスト・ツー・ビデオ(T2V)モデルの改善に焦点をあてる。提案手法であるT2V-Turbo-v2は, 高品質なトレーニングデータ, 報酬モデルフィードバック, 条件付きガイダンスなど, 各種監視信号の整合蒸留プロセスへの統合により, 大幅な高度化を実現する。包括的アブレーション研究を通じて、特定の学習目標に対するデータセットの調整の重要性と、視覚的品質とテキスト・ビデオのアライメントを向上させるための多様な報酬モデルからの学習の有効性を強調した。さらに,教師のODEソルバを増強する効果的なエネルギー関数の設計に焦点を当てた,条件付き指導戦略の広大な設計空間を強調した。トレーニングデータセットからモーションガイダンスを抽出し、ODEソルバに組み込むことで、VBenchとT2V-CompBenchのモーション関連指標の改善により、生成されたビデオのモーション品質を改善する効果を示す。実証的に、我々のT2V-Turbo-v2は、Gen-3やKlingといったプロプライエタリシステムを上回る85.13のスコアで、VBenchに新たな最先端結果を確立する。

In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals, including high-quality training data, reward model feedback, and conditional guidance, into the consistency distillation process. Through comprehensive ablation studies, we highlight the crucial importance of tailoring datasets to specific learning objectives and the effectiveness of learning from diverse reward models for enhancing both the visual quality and text-video alignment. Additionally, we highlight the vast design space of conditional guidance strategies, which centers on designing an effective energy function to augment the teacher ODE solver. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver, showcasing its effectiveness in improving the motion quality of the generated videos with the improved motion-related metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2 establishes a new state-of-the-art result on VBench, with a Total score of 85.13, surpassing proprietary systems such as Gen-3 and Kling.

翻訳日:2024-11-01 17:09:37 公開日:2024-10-11

Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang,

翻訳日:2024-11-01 17:09:37 公開日:2024-10-11

# CASA:高能率インクリメンタル物体検出のための視覚言語モデルにおけるクラス非依存的共有属性

CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection ( http://arxiv.org/abs/2410.05804v1 )

ライセンス: Link先を確認

Mingyi Guo, Yuyang Liu, Zongying Lin, Peixi Peng, Yonghong Tian,

(参考訳) インクリメンタルオブジェクト検出(IOD)は、シーケンシャルデータにおけるバックグラウンドカテゴリが以前学習されたクラスや将来のクラスを含む場合、バックグラウンドシフトによって問題となる。 CLIPのようなビジョン言語基盤モデルにインスパイアされたこれらのモデルは、事前トレーニング中に広範な画像とテキストのペアデータから共有属性をキャプチャする。本稿では,視覚言語基礎モデルの属性をインクリメンタルオブジェクト検出に活用する手法を提案する。本手法は,クラス非依存の共有属性ベース(CASA)を構築し,インクリメンタルクラス間の共通意味情報をキャプチャする。具体的には、大規模言語モデルを用いて、候補となるテキスト属性を生成し、現在のトレーニングデータに基づいて最も関連性の高い属性を選択し、それらの意味を属性割り当て行列に記録する。その後のタスクでは、保持された属性を凍結し、残りの候補を引き続き選択し、属性割り当て行列を更新する。さらに, OWL-ViTをベースラインとして, 事前学習した基礎モデルのパラメータを保存する。 IODのスケーラビリティと適応性を大幅に向上させるため,パラメータ効率の微調整によりパラメータ記憶に0.7%しか加えていない。 COCOデータセット上での2相および多相の大規模実験により,提案手法の最先端性能が実証された。

Incremental object detection (IOD) is challenged by background shift, where background categories in sequential data may include previously learned or future classes. Inspired by the vision-language foundation models such as CLIP, these models capture shared attributes from extensive image-text paired data during pre-training. We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection. Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes. Specifically, we utilize large language models to generate candidate textual attributes and select the most relevant ones based on current training data, recording their significance in an attribute assignment matrix. For subsequent tasks, we freeze the retained attributes and continue selecting from the remaining candidates while updating the attribute assignment matrix accordingly. Furthermore, we employ OWL-ViT as our baseline, preserving the original parameters of the pre-trained foundation model. Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of IOD. Extensive two-phase and multi-phase experiments on the COCO dataset demonstrate the state-of-the-art performance of our proposed method.

翻訳日:2024-11-01 12:39:56 公開日:2024-10-11

Mingyi Guo, Yuyang Liu, Zongying Lin, Peixi Peng, Yonghong Tian,

翻訳日:2024-11-01 12:39:56 公開日:2024-10-11

# Aria: オープンなマルチモーダルなNative Mixture-of-Expertsモデル

Aria: An Open Multimodal Native Mixture-of-Experts Model ( http://arxiv.org/abs/2410.05993v1 )

ライセンス: Link先を確認

Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li,

(参考訳) 情報は多様である。マルチモーダルネイティブAIモデルは、現実世界の情報を統合し、包括的な理解を提供するために不可欠である。プロプライエタリなマルチモーダルネイティブモデルが存在するが、オープン性の欠如は、適応だけでなく、採用の障害となる。このギャップを埋めるために、オープンなマルチモーダルネイティブモデルであるAriaを紹介します。 Ariaは3.9Bと3.5Bのアクティベートパラメータをそれぞれ視覚トークンとテキストトークンに混合したエキスパートモデルである。 Pixtral-12BとLlama3.2-11Bを上回り、様々なマルチモーダルタスクにおける最高のプロプライエタリモデルと競合する。言語理解,マルチモーダル理解,長いコンテキストウィンドウ,命令フォローなどにおいて,Ariaを4段階のパイプラインに従ってゼロからトレーニングする。私たちは、Ariaの実際のアプリケーションへの導入と適応を容易にするコードベースとともに、モデルの重みをオープンソースにしています。

Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles for adoptions, let alone adaptations. To fill this gap, we introduce Aria, an open multimodal native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. Aria is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual token and text token, respectively. It outperforms Pixtral-12B and Llama3.2-11B, and is competitive against the best proprietary models on various multimodal tasks. We pre-train Aria from scratch following a 4-stage pipeline, which progressively equips the model with strong capabilities in language understanding, multimodal understanding, long context window, and instruction following. We open-source the model weights along with a codebase that facilitates easy adoptions and adaptations of Aria in real-world applications.

翻訳日:2024-11-01 11:50:19 公開日:2024-10-11

# Aria: オープンなマルチモーダルなNative Mixture-of-Expertsモデル

Aria: An Open Multimodal Native Mixture-of-Experts Model ( http://arxiv.org/abs/2410.05993v2 )

ライセンス: Link先を確認

Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li,

翻訳日:2024-11-01 11:50:19 公開日:2024-10-11

# 未知リンク関数を持つ一般化スパース付加モデル

Generalized Sparse Additive Model with Unknown Link Function ( http://arxiv.org/abs/2410.06012v1 )

ライセンス: Link先を確認

Peipei Yuan, Xinge You, Hong Chen, Xuelin Zhang, Qinmu Peng,

(参考訳) 一般化加法モデル(GAM)は高次元データ解析に成功している。しかし、既存のほとんどのメソッドは、リンク関数、コンポーネント関数、変数相互作用を同時に見積もることはできない。この問題を軽減するために,未知リンク関数 (GSAMUL) を持つ一般化スパース付加モデル(一般スパース付加モデル)を提案し,B-スプラインベースと未知リンク関数を多層パーセプトロン (MLP) ネットワークで推定する。さらに$\ell_{2,1}$-norm正規化器は変数選択に使用される。提案したGSAMULは、可変選択と隠れ相互作用の両方を実現することができる。この推定を二段階最適化問題に統合し、データをトレーニングセットと検証セットに分割する。理論的には、近似手順の収束に関する保証を提供する。応用において、合成および実世界のデータセットの実験的評価は、提案手法の有効性を一貫して検証する。

Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. However, most existing methods cannot simultaneously estimate the link function, the component functions and the variable interaction. To alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. Furthermore, $\ell_{2,1}$-norm regularizer is used for variable selection. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the effectiveness of the proposed approach.

翻訳日:2024-11-01 11:40:34 公開日:2024-10-11

# 未知リンク関数を持つ一般化スパース付加モデル

Generalized Sparse Additive Model with Unknown Link Function ( http://arxiv.org/abs/2410.06012v2 )

ライセンス: Link先を確認

Peipei Yuan, Xinge You, Hong Chen, Xuelin Zhang, Qinmu Peng,

翻訳日:2024-11-01 11:40:34 公開日:2024-10-11

# ブロック誘起符号生成逆数ネットワーク(BISGAN) : GANを用いた信号スポーフィングとその評価

Block Induced Signature Generative Adversarial Network (BISGAN): Signature Spoofing Using GANs and Their Evaluation ( http://arxiv.org/abs/2410.06041v1 )

ライセンス: Link先を確認

Haadia Amjad, Kilian Goeller, Steffen Seitz, Carsten Knoll, Naseer Bajwa, Muhammad Imran Malik, Ronald Tetzlaff,

(参考訳) ディープラーニングはバイオメトリックスにおいて、効率的な識別と検証システムを開発するために積極的に利用されている。手書き署名は認証目的のための生体データの一般的なサブセットである。 GAN(Generative Adversarial Network)は、オリジナルおよびフォージされたシグネチャから学習し、フォージされたシグネチャを生成する。ほとんどのGAN技術は、識別器である強力なシグネチャ検証器を生成するが、ジェネレータモデルによって生成される偽造品の品質をより重視する必要がある。この研究は、署名検証システムのベンチマークを達成するために、偽造サンプルを生成するジェネレータを作成することに重点を置いている。 Inceptionモデルのようなブロックを注入したCycleGANを生成元とし,SigCNNモデルの変種を基本判別器として使用する。私たちは、署名スプーフィングで80%から100%の成功をもたらす新しいテクニックでモデルをトレーニングします。さらに、生成された偽造物の良さを測るカスタム評価手法を作成する。本研究は,バイオメトリックデータ生成と評価の理解を深めるため,データ品質を汚すジェネレータ指向のGANアーキテクチャを提唱する。

Deep learning is actively being used in biometrics to develop efficient identification and verification systems. Handwritten signatures are a common subset of biometric data for authentication purposes. Generative adversarial networks (GANs) learn from original and forged signatures to generate forged signatures. While most GAN techniques create a strong signature verifier, which is the discriminator, there is a need to focus more on the quality of forgeries generated by the generator model. This work focuses on creating a generator that produces forged samples that achieve a benchmark in spoofing signature verification systems. We use CycleGANs infused with Inception model-like blocks with attention heads as the generator and a variation of the SigCNN model as the base Discriminator. We train our model with a new technique that results in 80% to 100% success in signature spoofing. Additionally, we create a custom evaluation technique to act as a goodness measure of the generated forgeries. Our work advocates generator-focused GAN architectures for spoofing data quality that aid in a better understanding of biometric data generation and evaluation.

翻訳日:2024-11-01 11:30:40 公開日:2024-10-11

Haadia Amjad, Kilian Goeller, Steffen Seitz, Carsten Knoll, Naseer Bajwa, Ronald Tetzlaff, Muhammad Imran Malik,

翻訳日:2024-11-01 11:30:40 公開日:2024-10-11

# Auto-Evolve: 自己推論フレームワークによる大規模言語モデルのパフォーマンス向上

Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework ( http://arxiv.org/abs/2410.06328v1 )

ライセンス: Link先を確認

Krishna Aswani, Huilin Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse,

(参考訳) CoT(Chain-of-Thought)やSelf-Discover(Self-Discover)といった急進的なエンジニアリング戦略の進歩は、Large Language Models(LLMs)の推論能力を改善する大きな可能性を示している。しかし、これらの最先端戦略(SOTA)は、人間の問題解決へのアプローチをシミュレートすることを目的とした、'emph{"think step by step"} や 'emph{"break down this problem"} のような静的な推論モジュールの単一または固定セットに依存している。この制約は、多様な問題に効果的に取り組む際のモデルの柔軟性を制限する。本稿では,LLMが動的推論モジュールと下流動作計画の自己生成を可能にする新しいフレームワークであるAuto-Evolveを紹介する。我々は、Claude 2.0、Claude 3 Sonnet、Mistral Large、GPT 4による難易度の高いBigBench-Hard(BBH)データセットのAuto-Evolveを評価する。 Auto-EvolveはCoTを最大10.4 %、そしてこれら4つのモデルで平均7 %上回っている。私たちのフレームワークには2つのイノベーションがあります。 a) Auto-Evolveは、人間の推論パラダイムと整合しながら、タスク毎の推論モジュールを動的に生成することにより、事前定義されたテンプレートの必要性を排除します。 b) LLMの指導指導を段階的に洗練し, 1ステップで行うよりも平均2.8倍の性能向上に寄与する反復改良部品を導入する。

Recent advancements in prompt engineering strategies, such as Chain-of-Thought (CoT) and Self-Discover, have demonstrated significant potential in improving the reasoning abilities of Large Language Models (LLMs). However, these state-of-the-art (SOTA) prompting strategies rely on single or fixed set of static seed reasoning modules like \emph{"think step by step"} or \emph{"break down this problem"} intended to simulate human approach to problem-solving. This constraint limits the flexibility of models in tackling diverse problems effectively. In this paper, we introduce Auto-Evolve, a novel framework that enables LLMs to self-create dynamic reasoning modules and downstream action plan, resulting in significant improvements over current SOTA methods. We evaluate Auto-Evolve on the challenging BigBench-Hard (BBH) dataset with Claude 2.0, Claude 3 Sonnet, Mistral Large, and GPT 4, where it consistently outperforms the SOTA prompt strategies. Auto-Evolve outperforms CoT by up to 10.4\% and on an average by 7\% across these four models. Our framework introduces two innovations: a) Auto-Evolve dynamically generates reasoning modules for each task while aligning with human reasoning paradigm, thus eliminating the need for predefined templates. b) We introduce an iterative refinement component, that incrementally refines instruction guidance for LLMs and helps boost performance by average 2.8\% compared to doing it in a single step.

翻訳日:2024-11-01 06:29:16 公開日:2024-10-11

Krishna Aswani, Huilin Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse,

(参考訳) CoT(Chain-of-Thought)やSelf-Discover(Self-Discover)といった急進的なエンジニアリング戦略の進歩は、Large Language Models(LLMs)の推論能力を改善する大きな可能性を示している。しかし、これらの最先端(SOTA)の戦略を促進するには、「ステップバイステップ」や「この問題を分解する」といった静的な推論モジュールの単一あるいは固定的なセットを頼りにしており、人間の問題解決へのアプローチをシミュレートする。この制約は、多様な問題に効果的に取り組む際のモデルの柔軟性を制限する。本稿では,LLMが動的推論モジュールと下流動作計画の自己生成を可能にする新しいフレームワークであるAuto-Evolveを紹介する。我々は、Claude 2.0、Claude 3 Sonnet、Mistral Large、GPT 4による難易度の高いBigBench-Hard(BBH)データセットのAuto-Evolveを評価する。 Auto-EvolveはCoTを最大10.4%上回り、4つのモデルで平均7%上回っている。私たちのフレームワークには2つのイノベーションがあります。 a) Auto-Evolveは、人間の推論パラダイムと整合しながら、タスク毎の推論モジュールを動的に生成することにより、事前定義されたテンプレートの必要性を排除します。 b) LLMの指導指導を段階的に洗練し, 1ステップで行うよりも平均2.8%向上する反復改良部品を導入する。

Recent advancements in prompt engineering strategies, such as Chain-of-Thought (CoT) and Self-Discover, have demonstrated significant potential in improving the reasoning abilities of Large Language Models (LLMs). However, these state-of-the-art (SOTA) prompting strategies rely on single or fixed set of static seed reasoning modules like "think step by step" or "break down this problem" intended to simulate human approach to problem-solving. This constraint limits the flexibility of models in tackling diverse problems effectively. In this paper, we introduce Auto-Evolve, a novel framework that enables LLMs to self-create dynamic reasoning modules and downstream action plan, resulting in significant improvements over current SOTA methods. We evaluate Auto-Evolve on the challenging BigBench-Hard (BBH) dataset with Claude 2.0, Claude 3 Sonnet, Mistral Large, and GPT 4, where it consistently outperforms the SOTA prompt strategies. Auto-Evolve outperforms CoT by up to 10.4% and on an average by 7% across these four models. Our framework introduces two innovations: a) Auto-Evolve dynamically generates reasoning modules for each task while aligning with human reasoning paradigm, thus eliminating the need for predefined templates. b) We introduce an iterative refinement component, that incrementally refines instruction guidance for LLMs and helps boost performance by average 2.8% compared to doing it in a single step.

翻訳日:2024-11-01 06:29:16 公開日:2024-10-11

# TopoTune : 一般化された組合せ複雑ニューラルネットワークのためのフレームワーク

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks ( http://arxiv.org/abs/2410.06530v1 )

ライセンス: Link先を確認

Mathilde Papillon, Guillermo Bernárdez, Claudio Battiloro, Nina Miolane,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ領域の対称性を保存する方法で、リレーショナルデータセット、処理ノード、エッジ機能からの学習に優れています。しかし、多くの複雑なシステム、例えば生物学やソーシャルネットワークは、より自然に高階位相空間で表されるマルチウェイの複雑な相互作用を生み出している。トポロジカルディープラーニング(TDL)の新たな分野は、これらの高次構造を適応し活用することを目指している。比較的一般的な TDL モデルである Combinatorial Complex Neural Networks (CCNN) は、GNN よりも表現力が高く、パフォーマンスも優れていることが示されている。しかし、グラフ深層学習のエコシステムとは違って、TDLは新しいアーキテクチャを簡単に定義するための原則的で標準化されたフレームワークがなく、アクセシビリティと適用性を制限する。この問題に対処するために,汎用CCNN(Generalized CCNNs,GCCNs)を導入する。これは,任意の(グラフ)ニューラルネットワークをTDLモデルに体系的に変換するために使用できる,新しい単純かつ強力なTDLモデルのファミリーである。 GCCN が CCNN を一般化・サブスメートすることを証明する一方で,多様な GCCN のクラスに対する広範な実験により,これらのアーキテクチャは CCNN との整合性や性能を保ちながら,モデルの複雑さを少なく抑えることが示されている。 TDLを加速し、民主化するために、私たちは、実践者が前例のない柔軟性と容易さでGCCNを定義し、構築し、訓練できる軽量ソフトウェアであるTopoTuneを紹介します。

Graph Neural Networks (GNNs) excel in learning from relational datasets, processing node and edge features in a way that preserves the symmetries of the graph domain. However, many complex systems--such as biological or social networks--involve multiway complex interactions that are more naturally represented by higher-order topological spaces. The emerging field of Topological Deep Learning (TDL) aims to accommodate and leverage these higher-order structures. Combinatorial Complex Neural Networks (CCNNs), fairly general TDL models, have been shown to be more expressive and better performing than GNNs. However, differently from the graph deep learning ecosystem, TDL lacks a principled and standardized framework for easily defining new architectures, restricting its accessibility and applicability. To address this issue, we introduce Generalized CCNNs (GCCNs), a novel simple yet powerful family of TDL models that can be used to systematically transform any (graph) neural network into its TDL counterpart. We prove that GCCNs generalize and subsume CCNNs, while extensive experiments on a diverse class of GCCNs show that these architectures consistently match or outperform CCNNs, often with less model complexity. In an effort to accelerate and democratize TDL, we introduce TopoTune, a lightweight software that allows practitioners to define, build, and train GCCNs with unprecedented flexibility and ease.

翻訳日:2024-11-01 05:09:09 公開日:2024-10-11

# TopoTune : 一般化された組合せ複雑ニューラルネットワークのためのフレームワーク

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks ( http://arxiv.org/abs/2410.06530v2 )

ライセンス: Link先を確認

Mathilde Papillon, Guillermo Bernárdez, Claudio Battiloro, Nina Miolane,

翻訳日:2024-11-01 05:09:09 公開日:2024-10-11

# Chip-Tuning: 言語モデルが言う前に分類する

Chip-Tuning: Classify Before Language Models Say ( http://arxiv.org/abs/2410.06541v1 )

ライセンス: Link先を確認

Fangwei Zhu, Dian Li, Jiajun Huang, Gang Liu, Hui Wang, Zhifang Sui,

(参考訳) 大規模言語モデル(LLM)の性能の急激な発展は、モデルサイズがエスカレーションされ、モデルトレーニングと推論のコストが増大する。従来の研究では、LLMの特定の層が冗長性を示し、これらの層を取り除くことで、モデルの性能がわずかに損なわれることが判明した。本稿では, LLMの層冗長性を説明するために, 探索手法を採用し, 探索型分類器を用いて言語モデルを効果的に解析できることを実証する。分類問題に特化した簡易かつ効果的な構造化プルーニングフレームワークであるチップチューニングを提案する。チップチューニングは、LLMの異なる層にチップという名前の小さなプロブリング分類器を取り付け、バックボーンモデルが凍結されたチップを訓練する。分類用チップを選択した後、付加層に後続するすべての層は、限界性能損失で除去できる。各種LLMおよびデータセットによる実験結果から,チップチューニングは従来の最先端のベースラインよりも精度とプルーニング比の両方で有意に優れ,プルーニング比が最大50%に達することが示された。また、チップチューニングはマルチモーダルモデルに適用でき、モデル微調整と組み合わせることで、優れた互換性が証明できる。

The rapid development in the performance of large language models (LLMs) is accompanied by the escalation of model size, leading to the increasing cost of model training and inference. Previous research has discovered that certain layers in LLMs exhibit redundancy, and removing these layers brings only marginal loss in model performance. In this paper, we adopt the probing technique to explain the layer redundancy in LLMs and demonstrate that language models can be effectively pruned with probing classifiers. We propose chip-tuning, a simple and effective structured pruning framework specialized for classification problems. Chip-tuning attaches tiny probing classifiers named chips to different layers of LLMs, and trains chips with the backbone model frozen. After selecting a chip for classification, all layers subsequent to the attached layer could be removed with marginal performance loss. Experimental results on various LLMs and datasets demonstrate that chip-tuning significantly outperforms previous state-of-the-art baselines in both accuracy and pruning ratio, achieving a pruning ratio of up to 50%. We also find that chip-tuning could be applied on multimodal models, and could be combined with model finetuning, proving its excellent compatibility.

翻訳日:2024-11-01 05:09:09 公開日:2024-10-11

# Chip-Tuning: 言語モデルが言う前に分類する

Chip-Tuning: Classify Before Language Models Say ( http://arxiv.org/abs/2410.06541v2 )

ライセンス: Link先を確認

Fangwei Zhu, Dian Li, Jiajun Huang, Gang Liu, Hui Wang, Zhifang Sui,

翻訳日:2024-11-01 05:09:09 公開日:2024-10-11

# マルチラウンド優先最適化を用いた細部・高精度ビデオキャプションのためのマルチモーダルLLMの強化

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization ( http://arxiv.org/abs/2410.06682v1 )

ライセンス: Link先を確認

Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang,

(参考訳) ビデオには豊富な情報が含まれており、自然言語で詳細な正確な記述を生成することが、ビデオ理解の重要な側面である。本稿では,指向性優先最適化 (DPO) によるビデオキャプションの強化を目的とした,低ランク適応 (LoRA) を備えた高度オーディオ視覚大言語モデル (LLM) である Video-SALMONN 2 を提案する。 DPOを用いて最適化されたビデオ記述の完全性と精度を評価するための新しい指標を提案する。さらに,DPO参照モデルを定期的に更新し,各トレーニングラウンド(1000ステップ)後のパラメータ更新のプロキシとしてLoRAモジュールをマージ,再起動し,地上の映像キャプションからのガイダンスを取り入れてプロセスの安定化を図る,新しいマルチラウンドDPO(mrDPO)アプローチを導入する。我々は,mDPO学習モデルによって生成されたキャプションを教師付きラベルとして使用することにより,pre-DPO LLMを微調整する再生チューニングを提案する。実験の結果,mDPOはビデオSALMONN 2のキャプション精度を著しく向上させ,グローバル・ローカル・エラー率を40%,ローカル・エラー率を20%,反復率を35倍に低下させることがわかった。ビデオキャプションタスクにおけるGPT-4oやGemini-1.5-Proといった主要なモデルよりも、70億のパラメータしか持たない最後のビデオ-SALMONN 2モデルは、同様のサイズのモデルの間で、最先端の動画質問応答ベンチマークと競合する性能を維持している。受け入れたら、コード、モデルチェックポイント、トレーニングとテストデータをリリースします。デモは \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io} で公開されている。

Videos contain a wealth of information, and generating detailed and accurate descriptions in natural language is a key aspect of video understanding. In this paper, we present video-SALMONN 2, an advanced audio-visual large language model (LLM) with low-rank adaptation (LoRA) designed for enhanced video (with paired audio) captioning through directed preference optimization (DPO). We propose new metrics to evaluate the completeness and accuracy of video descriptions, which are optimized using DPO. To further improve training, we introduce a novel multi-round DPO (mrDPO) approach, which involves periodically updating the DPO reference model, merging and re-initializing the LoRA module as a proxy for parameter updates after each training round (1,000 steps), and incorporating guidance from ground-truth video captions to stabilize the process. To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels. Experiments show that mrDPO significantly enhances video-SALMONN 2's captioning accuracy, reducing global and local error rates by 40\% and 20\%, respectively, while decreasing the repetition rate by 35\%. The final video-SALMONN 2 model, with just 7 billion parameters, surpasses leading models such as GPT-4o and Gemini-1.5-Pro in video captioning tasks, while maintaining competitive performance to the state-of-the-art on widely used video question-answering benchmark among models of similar size. Upon acceptance, we will release the code, model checkpoints, and training and test data. Demos are available at \href{https://video-salmonn-2.github.io}{https://video-salmonn-2.github.io}.

翻訳日:2024-11-01 04:19:50 公開日:2024-10-11

Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang,

翻訳日:2024-11-01 04:19:50 公開日:2024-10-11

# パラメトリックPDEのためのニューラルソルバーの学習と物理インフォームド法

Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods ( http://arxiv.org/abs/2410.06820v1 )

ライセンス: Link先を確認

Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, Patrick Gallinari,

(参考訳) 物理インフォームド深層学習は、大きな解空間を探索し、多くの反復を必要とし、不安定な訓練につながるような偏微分方程式(PDE)を解く複雑さのために、最適化の課題に直面していることが多い。これらの課題は、特に損失関数の微分項によって生じる最適化問題の条件が悪くなることから生じる。これらの問題に対処するために、データに基づいて訓練された物理インフォームド反復アルゴリズムを用いてPDEの解法を学ぶことを提案する。提案手法は,各PDEインスタンスに自動的に適応し,最適化プロセスの大幅な高速化と安定化を実現し,物理認識モデルの高速収束を可能にする勾配降下アルゴリズムの条件付けを学習する。さらに,従来の物理インフォームド手法は1つのPDEインスタンスを解くが,本手法はパラメトリックPDEに対処する。具体的には, 物理損失勾配をPDEパラメータと統合し, 係数, 初期条件, 境界条件を含むPDEパラメータの分布を解く。提案手法の有効性を,複数のデータセット上での実験実験により実証し,トレーニングとテスト時間最適化性能を比較した。

Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem, caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach addresses parametric PDEs. Specifically, our method integrates the physical loss gradient with the PDE parameters to solve over a distribution of PDE parameters, including coefficients, initial conditions, or boundary conditions. We demonstrate the effectiveness of our method through empirical experiments on multiple datasets, comparing training and test-time optimization performance.

翻訳日:2024-11-01 03:21:00 公開日:2024-10-11

# パラメトリックPDEのためのニューラルソルバーの学習と物理インフォームド法

Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods ( http://arxiv.org/abs/2410.06820v2 )

ライセンス: Link先を確認

Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, Patrick Gallinari,

翻訳日:2024-11-01 03:21:00 公開日:2024-10-11

# 誰がウェブブラウザを使うのか? アメリカにおけるブラウザのフィンガープリントにおける人口統計学の役割

How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users ( http://arxiv.org/abs/2410.06954v1 )

ライセンス: Link先を確認

Alex Berke, Badih Ghazi, Enrico Bacis, Pritish Kamath, Ravi Kumar, Robin Lassonde, Pasin Manurangsi, Umar Syed,

(参考訳) ブラウザのフィンガープリントは、ユーザーのデバイスから属性を収集してユニークな「指紋」を作成することで、クッキーなしでもウェブ上のユーザーを特定し、追跡するために使用することができる。この技術と結果として生じるプライバシーリスクは10年以上にわたって研究されてきた。しかし、先行研究ではデータが公開されていないため、さらなる研究は限られている。さらに、先行研究のデータにはユーザーの人口統計が欠如していた。ここでは、さらなる研究を可能にするための第一種データセットを提供する。これには、ユーザの人口統計と調査回答によるブラウザ属性が含まれ、8,400人の米国研究参加者からインフォームドコンセントで収集された。このデータセットを用いて、人口集団間で指紋認証のリスクがどのように異なるかを示す。例えば、低所得のユーザはリスクが高く、ユーザーの年齢が上がるにつれて、どちらも指紋認証や実際の指紋認証のリスクに気を遣う傾向にある。さらに, 性別, 年齢, 所得水準, 人種などのユーザ人口層を, 指紋認証によく使用されるブラウザ属性から推定し, このリスクに最も寄与するブラウザ属性を特定する。また,オープンな研究のためにブラウザデータを共有する可能性について,12,461人の参加者から回答を得て,今後のデータ収集にどのような影響があるかを調べる実験を行った。女性参加者は、私たちが収集したブラウザデータを表示するように、ブラウザデータをシェアする傾向が著しく低かった。全体として、指紋認証のリスクを評価し、ユーザプライバシを改善することを目的として、現在進行中の作業において、ユーザ人口統計学が重要な役割を担っていることを示す。私たちが提供しているデータセットとデータ収集ツールは、この研究で対処されていない研究の質問をさらに研究するために使用できます。

Browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies used data not publicly available. Additionally, data in prior studies lacked user demographics. Here we provide a first-of-its-kind dataset to enable further research. It includes browser attributes with users' demographics and survey responses, collected with informed consent from 8,400 US study participants. We use this dataset to demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with responses from 12,461 total participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect. Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we provide can be used to further study research questions not addressed in this work.

翻訳日:2024-10-31 23:27:23 公開日:2024-10-11

Alex Berke, Enrico Bacis, Badih Ghazi, Pritish Kamath, Ravi Kumar, Robin Lassonde, Pasin Manurangsi, Umar Syed,

翻訳日:2024-10-31 23:27:23 公開日:2024-10-11

# ELMO:アップサンプリングによるリアルタイムLiDARモーションキャプチャ

ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ( http://arxiv.org/abs/2410.06963v1 )

ライセンス: Link先を確認

Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee,

(参考訳) 本稿では,単一LiDARセンサ用に設計されたリアルタイムアップサンプリングモーションキャプチャフレームワークELMOを紹介する。 ELMOは、条件付き自己回帰変換器ベースのアップサンプリングモーションジェネレータとしてモデル化され、20fpsのLiDARポイントクラウドシーケンスから60fpsのモーションキャプチャを実現する。 ELMOの鍵となる特徴は、自覚機構と、運動と点雲のための慎重に設計された埋め込みモジュールとの結合であり、運動の質を著しく高めていることである。高精度なモーションキャプチャを実現するため,単一フレーム点雲からのユーザの骨格オフセットを予測可能な1時間スケルトンキャリブレーションモデルを開発した。さらに,LDARシミュレータを用いた新しいデータ拡張手法を導入し,グローバルなルート追跡を強化し,環境理解を向上させる。提案手法の有効性を示すため,ELMOと画像ベースと点クラウドベースのモーションキャプチャにおける最先端の手法を比較した。設計原則を検証するために、さらにアブレーション研究を行います。 ELMOの高速推論時間はリアルタイムアプリケーションに適しており、ライブストリーミングとインタラクティブなゲームシナリオを備えたデモビデオで例示されています。さらに,20種類の被験者を対象とする高品質なLiDARモキャップ同期データセットの提供も行い,今後の研究に有用な資料となる。データセットと評価コードは {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}} で公開されている。

This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}}

翻訳日:2024-10-31 23:17:38 公開日:2024-10-11

# ELMO:アップサンプリングによるリアルタイムLiDARモーションキャプチャ

ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ( http://arxiv.org/abs/2410.06963v2 )

ライセンス: Link先を確認

Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee,

翻訳日:2024-10-31 23:17:38 公開日:2024-10-11

# Bridge the Points:グラフベースのFew-shotセグメンテーション

Bridge the Points: Graph-based Few-shot Segment Anything Semantically ( http://arxiv.org/abs/2410.06964v1 )

ライセンス: Link先を確認

Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei,

(参考訳) 近年の大規模事前訓練技術の進歩により、視覚基盤モデルの能力が大幅に向上し、特に、点と箱のプロンプトに基づいて正確なマスクを生成できるセグメンツ・アシング・モデル(SAM)が注目されている。近年の研究では、SAMをFew-shot Semantic Segmentation (FSS)に拡張し、SAMベースの自動セマンティックセマンティックセマンティックセマンティックセマンティクスの迅速な生成に焦点を当てている。しかし、これらの手法は適切なプロンプトの選択に苦慮し、異なるシナリオに対して特定のハイパーパラメータ設定が必要であり、SAMの過剰使用によるワンショット推論時間が長くなるため、効率が低下し、自動化能力が制限される。これらの問題に対処するため,グラフ解析に基づく簡易かつ効果的な手法を提案する。特に、Positive-Negative Alignmentモジュールは、マスクを生成するためのポイントプロンプトを動的に選択する。その後のポイント・マスク・クラスタリングモジュールは、ポイント上のマスクカバレッジに基づいて、マスクと選択されたポイントの粒度を有向グラフとして整列する。これらの点は、有向グラフの弱連結成分を効率的な方法で分解し、異なる自然クラスターを構成することによって集約される。最後に、グラフベースの粒度アライメントの恩恵を受け、高信頼マスクを集約し、最終的な予測のために偽陽性マスクをフィルタリングし、追加のハイパーパラメータと冗長マスクの生成を減らす。標準FSS、ワンショット部分セグメンテーション、クロスドメインFSSデータセットの広範な実験分析は、提案手法の有効性と効率を検証し、COCO-20iでは58.7%、LVIS-92iでは35.2%のmIoUで最先端のジェネラリストモデルを上回った。コードはhttps://andyzaq.github.io/GF-SAM/で公開されている。

The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.

翻訳日:2024-10-31 23:17:38 公開日:2024-10-11

# Bridge the Points:グラフベースのFew-shotセグメンテーション

Bridge the Points: Graph-based Few-shot Segment Anything Semantically ( http://arxiv.org/abs/2410.06964v2 )

ライセンス: Link先を確認

Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei,

翻訳日:2024-10-31 23:17:38 公開日:2024-10-11

# 直流拡散:直流の場合、直線性は必要ではない

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow ( http://arxiv.org/abs/2410.07303v1 )

ライセンス: Link先を確認

Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, Hongsheng Li,

(参考訳) 拡散モデルは、視覚生成を大幅に改善したが、生成ODEを解くという計算集約的な性質のため、生成速度の遅さによって妨げられている。広く認識されている解である整流流は、ODEパスを直線化することで生成速度を向上させる。主な構成要素は以下のとおりである。 1) フローマッチングの拡散形式を用いる。 2)$\boldsymbol v$-predictionを採用し、 3) 整流(再流)を行う。そこで本稿では,事前学習した拡散モデルを用いて,一致したノイズとサンプルのペアを得るとともに,一致したノイズとサンプルのペアを再学習する手法を提案する。これに基づいて構成する。 1)と 2)不要。さらに, 直線性は整合に不可欠な訓練対象ではなく, 流れマッチングモデルの特定の事例であることも強調する。より重要なトレーニングターゲットは、DDPMやSub-VPのようなモデルに対して本質的に湾曲した一階近似ODEパスを達成することである。この知見に基づいて、フローマッチングモデルに制限されるのではなく、より広い範囲の拡散モデルを含むように、設計空間と修正の応用範囲を一般化するRectified Diffusionを提案する。安定拡散v1-5と安定拡散XLについて検証した。本手法は,修正フローベース以前の作業(例えばInstaFlow)のトレーニング手順を大幅に単純化するだけでなく,トレーニングコストの低減を図り,優れたパフォーマンスを実現する。私たちのコードはhttps://github.com/G-U-N/Rectified-Diffusion.comで公開されています。

Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs. Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path. Its key components include: 1) using the diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. Based on this, components 1) and 2) are unnecessary. Furthermore, we highlight that straightness is not an essential training target for rectification; rather, it is a specific case of flow-matching models. The more critical training target is to achieve a first-order approximate ODE path, which is inherently curved for models like DDPM and Sub-VP. Building on this insight, we propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models, rather than being restricted to flow-matching models. We validate our method on Stable Diffusion v1-5 and Stable Diffusion XL. Our method not only greatly simplifies the training procedure of rectified flow-based previous works (e.g., InstaFlow) but also achieves superior performance with even lower training cost. Our code is available at https://github.com/G-U-N/Rectified-Diffusion.

翻訳日:2024-10-31 21:06:44 公開日:2024-10-11

# 直流拡散:直流の場合、直線性は必要ではない

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow ( http://arxiv.org/abs/2410.07303v2 )

ライセンス: Link先を確認

Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, Hongsheng Li,

翻訳日:2024-10-31 21:06:44 公開日:2024-10-11

# DA-Code:大規模言語モデルのためのエージェントデータサイエンスコード生成ベンチマーク

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models ( http://arxiv.org/abs/2410.07331v1 )

ライセンス: Link先を確認

Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu,

(参考訳) 本稿では,エージェントベースのデータサイエンスタスク上でのLCMの評価に特化して設計されたコード生成ベンチマークであるDA-Codeを紹介する。まず、DA-Code内のタスクは本質的に困難で、従来のコード生成タスクとは分離され、基礎と計画において高度なコーディングスキルが要求されます。次に、DA-Codeの例は、すべて実データと多種多様なデータに基づいており、幅広い複雑なデータラングリングと分析タスクをカバーしている。第三に、これらの課題を解決するためには、複雑なデータサイエンスプログラミング言語を使用し、複雑なデータ処理を実行し、答えを導出する必要がある。私たちは、実世界のデータ分析シナリオと整合し、スケーラブルな、制御可能で実行可能な環境にベンチマークをセットアップしました。アノテーションは評価スイートを慎重に設計し、評価の精度と堅牢性を確保する。我々はDA-Agentベースラインを開発する。実験によると、ベースラインは他の既存のフレームワークよりも優れているが、現在の最高のLCMを使用すると、わずか30.5%の精度しか得られず、改善の余地は十分にある。ベンチマークは[https://da-code-bench.github.io](https://da-code-bench.github.io]で公開しています。

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at [https://da-code-bench.github.io](https://da-code-bench.github.io).

翻訳日:2024-10-31 20:56:57 公開日:2024-10-11

# DA-Code:大規模言語モデルのためのエージェントデータサイエンスコード生成ベンチマーク

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models ( http://arxiv.org/abs/2410.07331v2 )

ライセンス: Link先を確認

Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu,

(参考訳) 本稿では,エージェントベースのデータサイエンスタスク上でのLCMの評価に特化して設計されたコード生成ベンチマークであるDA-Codeを紹介する。まず、DA-Code内のタスクは本質的に困難で、従来のコード生成タスクとは分離され、基礎と計画において高度なコーディングスキルが要求されます。次に、DA-Codeの例は、すべて実データと多種多様なデータに基づいており、幅広い複雑なデータラングリングと分析タスクをカバーしている。第三に、これらの課題を解決するためには、複雑なデータサイエンスプログラミング言語を使用し、複雑なデータ処理を実行し、答えを導出する必要がある。私たちは、実世界のデータ分析シナリオと整合し、スケーラブルな、制御可能で実行可能な環境にベンチマークをセットアップしました。アノテーションは評価スイートを慎重に設計し、評価の精度と堅牢性を確保する。我々はDA-Agentベースラインを開発する。実験によると、ベースラインは他の既存のフレームワークよりも優れているが、現在の最高のLCMを使用すると、わずか30.5%の精度しか得られず、改善の余地は十分にある。ベンチマークはhttps://da-code-bench.github.io.comで公開しています。

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at https://da-code-bench.github.io.

翻訳日:2024-10-31 20:56:57 公開日:2024-10-11

# レッドウッド混交林におけるNRF加速生態モニタリング

NeRF-Accelerated Ecological Monitoring in Mixed-Evergreen Redwood Forest ( http://arxiv.org/abs/2410.07418v1 )

ライセンス: Link先を確認

Adam Korycki, Cory Yeaton, Gregory S. Gilbert, Colleen Josephson, Steve McGuire,

(参考訳) 森林マッピングは、森林環境の動態を理解するために必要な重要な観測データを提供する。特に、乳房の高さにおける樹径(DBH)は、森林バイオマスと二酸化炭素(CO$_2$)の隔離を推定するために用いられる指標である。森林マッピングのマニュアル手法は労働集約的かつ時間を要するものであり、大規模な地図作成のボトルネックとなっている。自動マッピングは、通常点雲の形で、密集した森林の復元に頼っている。地上レーザースキャン(TLS)と移動レーザースキャン(MLS)は高価なLiDARセンシングを用いて点雲を生成し、木径の推定に成功している。ニューラルレイディアンスフィールド(NeRF)は、入力ビューのスパースセットでニューラルネットワークをトレーニングすることで、フォトリアリスティックで視覚に基づく再構築を可能にする創発的技術である。本稿では,混交常緑樹林における幹径推定を目的としたMLSとNeRF林の復元の比較を行った。さらに,コンベックス・ハルモデルを用いたDBH推定法を提案する。このアプローチを用いて1.68cmのRMSEを達成し、標準シリンダーモデリング手法を一貫して上回った。コードコントリビューションとフォレストデータセットはhttps://github.com/harelab-ucsc/RedwoodNeRF.comで無償公開しています。

Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide (CO$_2$) sequestration. Manual methods of forest mapping are labor intensive and time consuming, a bottleneck for large-scale mapping efforts. Automated mapping relies on acquiring dense forest reconstructions, typically in the form of point clouds. Terrestrial laser scanning (TLS) and mobile laser scanning (MLS) generate point clouds using expensive LiDAR sensing, and have been used successfully to estimate tree diameter. Neural radiance fields (NeRFs) are an emergent technology enabling photorealistic, vision-based reconstruction by training a neural network on a sparse set of input views. In this paper, we present a comparison of MLS and NeRF forest reconstructions for the purpose of trunk diameter estimation in a mixed-evergreen Redwood forest. In addition, we propose an improved DBH-estimation method using convex-hull modeling. Using this approach, we achieved 1.68 cm RMSE, which consistently outperformed standard cylinder modeling approaches. Our code contributions and forest datasets are freely available at https://github.com/harelab-ucsc/RedwoodNeRF.

翻訳日:2024-10-31 20:37:14 公開日:2024-10-11

# レッドウッド混交林におけるNRF加速生態モニタリング

NeRF-Accelerated Ecological Monitoring in Mixed-Evergreen Redwood Forest ( http://arxiv.org/abs/2410.07418v2 )

ライセンス: Link先を確認

Adam Korycki, Cory Yeaton, Gregory S. Gilbert, Colleen Josephson, Steve McGuire,

(参考訳) 森林マッピングは、森林環境の動態を理解するために必要な重要な観測データを提供する。特に、乳房の高さにおける樹径(DBH)は、森林のバイオマスと二酸化炭素の隔離を推定するために用いられる指標である。森林マッピングのマニュアル手法は労働集約的かつ時間を要するものであり、大規模な地図作成のボトルネックとなっている。自動マッピングは、通常点雲の形で、密集した森林の復元に頼っている。地上レーザースキャン(TLS)と移動レーザースキャン(MLS)は高価なLiDARセンシングを用いて点雲を生成し、木径の推定に成功している。ニューラルレイディアンスフィールド(NeRF)は、入力ビューのスパースセットでニューラルネットワークをトレーニングすることで、フォトリアリスティックで視覚に基づく再構築を可能にする創発的技術である。本稿では,混交常緑樹林における幹径推定を目的としたMLSとNeRF林の復元の比較を行った。さらに,コンベックス・ハルモデルを用いたDBH推定法を提案する。このアプローチを用いて1.68cmのRMSEを達成し、標準シリンダーモデリング手法を一貫して上回った。コードコントリビューションとフォレストデータセットはhttps://github.com/harelab-ucsc/RedwoodNeRF.comで無償公開しています。

Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide sequestration. Manual methods of forest mapping are labor intensive and time consuming, a bottleneck for large-scale mapping efforts. Automated mapping relies on acquiring dense forest reconstructions, typically in the form of point clouds. Terrestrial laser scanning (TLS) and mobile laser scanning (MLS) generate point clouds using expensive LiDAR sensing, and have been used successfully to estimate tree diameter. Neural radiance fields (NeRFs) are an emergent technology enabling photorealistic, vision-based reconstruction by training a neural network on a sparse set of input views. In this paper, we present a comparison of MLS and NeRF forest reconstructions for the purpose of trunk diameter estimation in a mixed-evergreen Redwood forest. In addition, we propose an improved DBH-estimation method using convex-hull modeling. Using this approach, we achieved 1.68 cm RMSE, which consistently outperformed standard cylinder modeling approaches. Our code contributions and forest datasets are freely available at https://github.com/harelab-ucsc/RedwoodNeRF.

翻訳日:2024-10-31 20:37:14 公開日:2024-10-11

# インシシット・ネットワークの家族に対する一般化境界

A Generalization Bound for a Family of Implicit Networks ( http://arxiv.org/abs/2410.07427v1 )

ライセンス: Link先を確認

Samy Wu Fung, Benjamin Berkels,

(参考訳) インプリシット・ネットワーク(英: Implicit Network)は、パラメータ化された演算子の固定点によって出力が定義されるニューラルネットワークのクラスである。彼らは自然言語処理、画像処理、その他多くのアプリケーションを含む多くのアプリケーションで成功を収めてきた。彼らは経験的成功を多く見出しているが、その一般化に関する理論的な研究はまだ未定である。本研究では、パラメータ化されたパラメータ化固定点演算子を定義する暗黙ネットワークの大規模なファミリーを考える。これらのアーキテクチャのラデマッハ複雑性の被覆数論に基づいて、このクラスに有界な一般化を示す。

Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.

翻訳日:2024-10-31 20:37:14 公開日:2024-10-11

# インシシット・ネットワークの家族に対する一般化境界

A Generalization Bound for a Family of Implicit Networks ( http://arxiv.org/abs/2410.07427v2 )

ライセンス: Link先を確認

Samy Wu Fung, Benjamin Berkels,

翻訳日:2024-10-31 20:37:14 公開日:2024-10-11

# 実世界の物体検出のための自己監督型学習:サーベイ

Self-Supervised Learning for Real-World Object Detection: a Survey ( http://arxiv.org/abs/2410.07442v1 )

ライセンス: Link先を確認

Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le Hégarat-Mascle, Arnaud Woiselle,

(参考訳) 自己監視学習(SSL)はコンピュータビジョンにおいて有望なアプローチとして登場し、大規模なラベルなしデータセットから意味のある表現をネットワークで学習することを可能にする。 SSLメソッドは、インスタンス識別とMasked Image Modeling(MIM)の2つの主要なカテゴリに分類される。インスタンスの識別はSSLの基本であるが、元々は分類のために設計されており、特に小さなオブジェクトに対して、オブジェクト検出にはあまり効果がない可能性がある。本研究では,実世界のオブジェクト検出に適したSSL手法に着目し,複雑な環境下での小さなオブジェクトの検出に重点を置いている。従来の調査と異なり、オブジェクトレベルのインスタンス識別とMIMメソッドを含むSSL戦略を詳細に比較し、CNNとViTベースのアーキテクチャの両方を用いた小さなオブジェクト検出の有効性を評価する。具体的には、我々のベンチマークは、広範に使用されているCOCOデータセットと、赤外線リモートセンシング画像における車両検出に焦点を当てた特殊な現実世界データセットに基づいて実施されている。また、カスタムドメイン固有のデータセットに対する事前トレーニングの影響を評価し、未処理のデータを扱うのにSSL戦略がいかに適しているかを強調します。分析の結果,インスタンス識別手法はCNNベースのエンコーダとよく似ており,MIM法はViTベースのアーキテクチャやカスタムデータセットの事前学習に適していることがわかった。この調査は、バックボーンアーキテクチャ、オブジェクトサイズ、カスタム事前トレーニング要件などの要因を考慮して、最適なSSL戦略を選択するための実用的なガイドを提供する。最終的に、適切なSSL事前トレーニング戦略を選択することは、適切なエンコーダとともに、現実世界のオブジェクト検出におけるパフォーマンスを著しく向上させることを示す。

Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on the widely-used COCO dataset, as well as on a specialized real-world dataset focused on vehicle detection in infrared remote sensing imagery. We also assess the impact of pre-training on custom domain-specific datasets, highlighting how certain SSL strategies are better suited for handling uncurated data. Our findings highlight that instance discrimination methods perform well with CNN-based encoders, while MIM methods are better suited for ViT-based architectures and custom dataset pre-training. This survey provides a practical guide for selecting optimal SSL strategies, taking into account factors such as backbone architecture, object size, and custom pre-training requirements. Ultimately, we show that choosing an appropriate SSL pre-training strategy, along with a suitable encoder, significantly enhances performance in real-world object detection, particularly for small object detection in frugal settings.

翻訳日:2024-10-31 17:06:37 公開日:2024-10-11

# 実世界の物体検出のための自己監督型学習:サーベイ

Self-Supervised Learning for Real-World Object Detection: a Survey ( http://arxiv.org/abs/2410.07442v2 )

ライセンス: Link先を確認

Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le Hégarat-Mascle, Arnaud Woiselle,

翻訳日:2024-10-31 17:06:37 公開日:2024-10-11

# SEAL:二レベルデータ選択による安全性向上LLMファインチューニング

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection ( http://arxiv.org/abs/2410.07471v1 )

ライセンス: Link先を確認

Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen,

(参考訳) ダウンストリームパフォーマンスを高めるためにタスク固有のデータを微調整することは、LLM(Large Language Models)を活用する上で重要なステップである。しかし、以前の研究では、いくつかの反対サンプルや良質なデータにモデルを微調整することで、モデルが予め装備されたアライメントと安全性の能力を大きく構成できることが示されている。本研究では,LLMファインチューニングにおける安全性向上のための新しいフレームワークであるSEALを提案する。 SEALは、二段階最適化に基づいてデータローカを学習し、安全で高品質な微調整データをランク付けし、安全でないものや低品質なものをランク付けする。 SEALで訓練されたモデルは、Llama-3-8b-InstructモデルとMerlinite-7bモデルでそれぞれランダム選択と比較して8.5%と9.7%の勝利率で、複数のベースラインよりも優れた品質を示している。私たちのコードはgithub https://github.com/hanshen95/SEAL.comで利用可能です。

Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# SEAL:二レベルデータ選択による安全性向上LLMファインチューニング

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection ( http://arxiv.org/abs/2410.07471v2 )

ライセンス: Link先を確認

Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen,

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# 機械的解釈の統一と検証--グループ運用を事例として

Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations ( http://arxiv.org/abs/2410.07476v1 )

ライセンス: Link先を確認

Wilson Wu, Louis Jaburi, Jacob Drori, Jason Gross,

(参考訳) 機械論的解釈可能性に関する最近の研究は、有限群の二項演算で訓練されたニューラルネットワークによって実行される計算のリバースエンジニアリングに焦点が当てられている。我々は、このタスクで訓練された一層ニューラルネットワークの内部を調査し、未同定構造を明らかにし、過去の作品の説明を統一するモデルについてより完全な記述を生成する。特に、これらのモデルは各入力引数の同値である。我々は,モデル理解の定量的評価であるモデル性能のコンパクトな証明に翻訳することで,この課題を訓練した少数のネットワークに適用できることを確認した。特に、この説明は、ブルート力の30%の時間で走るモデルの精度を保証し、トレーニングしたモデルの45%に対して >=95% の精度を与える。従来の研究からの説明だけでは,非自明な非空洞的精度境界が得られなかった。

A recent line of work in mechanistic interpretability has focused on reverse-engineering the computation performed by neural networks trained on the binary operation of finite groups. We investigate the internals of one-hidden-layer neural networks trained on this task, revealing previously unidentified structure and producing a more complete description of such models that unifies the explanations of previous works. Notably, these models approximate equivariance in each input argument. We verify that our explanation applies to a large fraction of networks trained on this task by translating it into a compact proof of model performance, a quantitative evaluation of model understanding. In particular, our explanation yields a guarantee of model accuracy that runs in 30% the time of brute force and gives a >=95% accuracy bound for 45% of the models we trained. We were unable to obtain nontrivial non-vacuous accuracy bounds using only explanations from previous works.

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# 機械的解釈の統一と検証--グループ運用を事例として

Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations ( http://arxiv.org/abs/2410.07476v2 )

ライセンス: Link先を確認

Wilson Wu, Louis Jaburi, Jacob Drori, Jason Gross,

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# WALL-E: ルール学習による世界アライメントによる世界モデルベースLLMエージェントの改善

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents ( http://arxiv.org/abs/2410.07484v1 )

ライセンス: Link先を確認

Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang,

(参考訳) 大規模言語モデル(LLM)はモデルベースエージェントの強力な世界モデルとして直接機能するのか? LLMの以前の知識と特定の環境のダイナミクスのギャップは存在するが、LLMをその展開環境と整合させることでギャップを橋渡しすることができ、LLMのルール学習によって「世界整合性」を効果的に達成できることが本研究で明らかとなった。 LLMの豊富な事前知識を考えると、LLM予測と指定された環境力学を整合させるのに十分なルールはいくつかしかない。そこで本研究では,エージェント探索軌道と世界モデル予測との比較に基づいて,これらの規則を LLM を通して学習するニューロシンボリックアプローチを提案する。結果として得られる世界モデルは、LLMと学習ルールから構成される。我々のLLMエージェントWALL-Eはモデル予測制御(MPC)に基づいて構築されている。精密世界モデルに基づくルックアヘッド動作の最適化により、MPCは探索と学習効率を大幅に改善する。既存のLLMエージェントと比較して、WALL-Eの推論は、LPM入力に含まれる冗長なバッファ付き軌道ではなく、いくつかの主規則のみを必要とする。 MinecraftとALFWorldのオープンワールドチャレンジでは、WALL-Eは既存の方法よりも成功率が高く、時間計画のコストが低く、推論に使用されるトークンの数も少ない。 Minecraftでは、WALL-Eは成功率を15-30%上回り、リプランラウンドのコストは8-20で、トークンの60-80%に過ぎなかった。 ALFWorldでは、成功率は6回の反復で95%という新記録に達した。

Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# WALL-E: ルール学習による世界アライメントによる世界モデルベースLLMエージェントの改善

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents ( http://arxiv.org/abs/2410.07484v2 )

ライセンス: Link先を確認

Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang,

翻訳日:2024-10-31 16:56:23 公開日:2024-10-11

# 単一核スピンquditの量子性証明

Certifying the quantumness of a single nuclear spin qudit through its uniform precession ( http://arxiv.org/abs/2410.07641v1 )

ライセンス: Link先を確認

Arjen Vaartjes, Martin Nurizzo, Lin Htoo Zaw, Benjamin Wilhelm, Xi Yu, Danielle Holmes, Daniel Schwienbacher, Anders Kringhøj, Mark R. van Blankenstein, Alexander M. Jakob, Fay E. Hudson, Kohei M. Itoh, Riley J. Murray, Robin Blume-Kohout, Namit Anand, Andrew S. Dzurak, David N. Jamieson, Valerio Scarani, Andrea Morello,

(参考訳) スピン・プレセッション(Spin precession)は、古典的な量子系の力学の教科書の例である。ここでは、核スピンの異方性状態の量子性を、その均一な先入観を通じて証明することで、この見解に挑戦する。この結果の鍵となるのは、予想値ではなく、先行するスピンの$x$-射影の正の値を測定し、半古典的なスピンコヒーレント状態に制限されないスピン > 1/2 qudit を用いることである。実験は1つのスピン-7/2$^{123}$Sb核上で行われ、シリコンナノエレクトロニクスデバイスに埋め込まれ、高忠実な準備、制御、投射的な単発読み出しが可能である。 Schr\\odinger cat state や他の核のBespoke state を用いて、19の標準偏差による古典的境界を破り、古典的な確率分布がこのスピンの沈み込みの統計を説明できないことを証明し、1つの原子スケールのクディットにおいて高い忠実度で量子資源状態を作成する能力を強調した。

Spin precession is a textbook example of dynamics of a quantum system that exactly mimics its classical counterpart. Here we challenge this view by certifying the quantumness of exotic states of a nuclear spin through its uniform precession. The key to this result is measuring the positivity, instead of the expectation value, of the $x$-projection of the precessing spin, and using a spin > 1/2 qudit, that is not restricted to semi-classical spin coherent states. The experiment is performed on a single spin-7/2 $^{123}$Sb nucleus, implanted in a silicon nanoelectronic device, amenable to high-fidelity preparation, control, and projective single-shot readout. Using Schr\"odinger cat states and other bespoke states of the nucleus, we violate the classical bound by 19 standard deviations, proving that no classical probability distribution can explain the statistic of this spin precession, and highlighting our ability to prepare quantum resource states with high fidelity in a single atomic-scale qudit.

翻訳日:2024-10-31 15:46:26 公開日:2024-10-11

# 核スピンquditの量子性に対する一様述語による証明

Certifying the quantumness of a nuclear spin qudit through its uniform precession ( http://arxiv.org/abs/2410.07641v2 )

ライセンス: Link先を確認

翻訳日:2024-10-31 15:46:26 公開日:2024-10-11

# シャドウ検出のためのテスト時間強度整合性適応

Test-Time Intensity Consistency Adaptation for Shadow Detection ( http://arxiv.org/abs/2410.07695v1 )

ライセンス: Link先を確認

Leyi Zhu, Weihuang Liu, Xinyi Chen, Zimeng Li, Xuhang Chen, Zhen Wang, Chi-Man Pun,

(参考訳) 影検出はコンピュータビジョンにおける正確なシーン理解には不可欠であるが、照明、物体形状、シーンコンテキストのバリエーションによって引き起こされる多様な影の出現に課題がある。ディープラーニングモデルは、トレーニングデータセットのサイズと多様性が制限されているため、現実のイメージに一般化するのに苦労することが多い。これを解決するために,テスト時間適応時に光強度情報を活用する新しいフレームワークであるTICAを導入し,影検出精度を向上する。 TICAは、影領域全体での光強度の固有の矛盾を利用して、より一貫した予測に向けてモデルを導く。基本的なエンコーダ・デコーダモデルは、最初はシャドー検出のためのラベル付きデータセットでトレーニングされる。そして、テストフェーズにおいて、2つの拡張入力画像バージョン間で一貫した強度予測を行うことにより、各テストサンプルに対してネットワークを調整する。この整合性トレーニングは、特に前景と背景の交差点領域の両方を対象として、堅牢な適応のために画像内の影領域を正確に識別する。 ISTDおよびSBUシャドウ検出データセットの大規模な評価により、TICAが既存の最先端手法よりも優れており、バランスの取れた誤差率(BER)において優れた結果が得られることが明らかにされた。

Shadow detection is crucial for accurate scene understanding in computer vision, yet it is challenged by the diverse appearances of shadows caused by variations in illumination, object geometry, and scene context. Deep learning models often struggle to generalize to real-world images due to the limited size and diversity of training datasets. To address this, we introduce TICA, a novel framework that leverages light-intensity information during test-time adaptation to enhance shadow detection accuracy. TICA exploits the inherent inconsistencies in light intensity across shadow regions to guide the model toward a more consistent prediction. A basic encoder-decoder model is initially trained on a labeled dataset for shadow detection. Then, during the testing phase, the network is adjusted for each test sample by enforcing consistent intensity predictions between two augmented input image versions. This consistency training specifically targets both foreground and background intersection regions to identify shadow regions within images accurately for robust adaptation. Extensive evaluations on the ISTD and SBU shadow detection datasets reveal that TICA significantly demonstrates that TICA outperforms existing state-of-the-art methods, achieving superior results in balanced error rate (BER).

翻訳日:2024-10-31 15:25:43 公開日:2024-10-11

# シャドウ検出のためのテスト時間強度整合性適応

Test-Time Intensity Consistency Adaptation for Shadow Detection ( http://arxiv.org/abs/2410.07695v2 )

ライセンス: Link先を確認

Leyi Zhu, Weihuang Liu, Xinyi Chen, Zimeng Li, Xuhang Chen, Zhen Wang, Chi-Man Pun,

翻訳日:2024-10-31 15:25:43 公開日:2024-10-11

# ソフトと剛体ロボットと模倣学習を組み合わせたコンタクトリッチタスクの習得

Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning ( http://arxiv.org/abs/2410.07787v1 )

ライセンス: Link先を確認

Mariano Ramírez Montero, Ebrahim Shahabi, Giovanni Franzese, Jens Kober, Barbara Mazzolai, Cosimo Della Santina,

(参考訳) ソフトロボットは、安全で堅牢で適応可能な環境との相互作用を確立する能力によって、ロボットシステムの使用に革命をもたらす可能性があるが、その正確な制御は依然として困難である。対照的に、従来の剛性ロボットは高い精度と再現性を提供するが、ソフトロボットの柔軟性は欠如している。我々はこれらの特徴をハイブリッドロボットプラットフォームに組み込むことで、全体的な能力を大幅に向上させることができると論じている。この研究は、剛性マニピュレータと完全に発達したソフトアームを統合する、新しいハイブリッドロボットプラットフォームを提示する。このシステムは、自律的な模倣学習を通じて柔軟で一般化可能なタスクを実行するために必要な知性を備えている。物理的なソフトネスと機械学習により、当社のプラットフォームは高度に一般化可能なスキルを達成できる一方、剛体コンポーネントは精度と再現性を保証する。

Soft robots have the potential to revolutionize the use of robotic systems with their capability of establishing safe, robust, and adaptable interactions with their environment, but their precise control remains challenging. In contrast, traditional rigid robots offer high accuracy and repeatability but lack the flexibility of soft robots. We argue that combining these characteristics in a hybrid robotic platform can significantly enhance overall capabilities. This work presents a novel hybrid robotic platform that integrates a rigid manipulator with a fully developed soft arm. This system is equipped with the intelligence necessary to perform flexible and generalizable tasks through imitation learning autonomously. The physical softness and machine learning enable our platform to achieve highly generalizable skills, while the rigid components ensure precision and repeatability.

翻訳日:2024-10-31 14:56:00 公開日:2024-10-11

# ソフトと剛体ロボットと模倣学習を組み合わせたコンタクトリッチタスクの習得

Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning ( http://arxiv.org/abs/2410.07787v2 )

ライセンス: Link先を確認

Mariano Ramírez Montero, Ebrahim Shahabi, Giovanni Franzese, Jens Kober, Barbara Mazzolai, Cosimo Della Santina,

翻訳日:2024-10-31 14:56:00 公開日:2024-10-11

# BA-Net:ディープニューラルネットワークにおけるブリッジ注意

BA-Net: Bridge Attention in Deep Neural Networks ( http://arxiv.org/abs/2410.07860v1 )

ライセンス: Link先を確認

Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song,

(参考訳) 注意機構、特にチャネルアテンションは、多くのコンピュータビジョンタスクに大きな影響を与えている。その効果にもかかわらず、既存の多くのメソッドは、主に個々の畳み込み層に適用される複雑な注意モジュールを通してパフォーマンスを最適化することに焦点を当てており、しばしば複数の層にまたがる相乗的相互作用を見落としている。このギャップに対応するために、異なる畳み込み層間のより効率的な統合と情報フローを促進するために設計された新しいアプローチであるブリッジアテンションを導入する。本研究は,情報冗長性を低減し,全体の情報交換を最適化する適応選択演算子を導入することにより,元のブリッジアテンションモデル(BAv1)を拡張した。 BAv2はImageNet分類タスクにおいて、それぞれResNet50とResNet101をバックボーンネットワークとして使用する場合、80.49%と81.75%のTop-1アキュラシーを得る。これらの結果は、それぞれ1.61%、0.77%のリトレーニングベースラインを上回っている。さらに、BAv2は、従来のSENet101のような既存のチャンネルアテンション技術よりも0.52%向上し、BAv2を高度な畳み込みネットワークやビジョントランスフォーマーに統合することで、幅広いコンピュータビジョンタスクのパフォーマンスが大幅に向上し、その幅広い適用性を裏付けている。

Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability.

翻訳日:2024-10-31 14:25:50 公開日:2024-10-11

# BA-Net:ディープニューラルネットワークにおけるブリッジ注意

BA-Net: Bridge Attention in Deep Neural Networks ( http://arxiv.org/abs/2410.07860v2 )

ライセンス: Link先を確認

Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song,

翻訳日:2024-10-31 14:25:50 公開日:2024-10-11

# 関数表現統一フレームワーク

The Function-Representation Unification Framework ( http://arxiv.org/abs/2410.07928v1 )

ライセンス: Link先を確認

Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon,

(参考訳) 認知アーキテクチャは、人工的な認知を開発する研究の最前線です。しかし、分離されたメモリとプログラムモデルから問題にアプローチする。この計算モデルには、知識検索ヒューリスティックという根本的な問題がある。本稿では,メモリとプログラムが結合した新しい計算モデルである関数表現を用いて,この問題を解決することを提案する。本稿では、これらの関数表現の実装と利用に関するフレームワーク全体を提案し、数学的定義と証明を通してそれらの可能性を探る。また、複数のFunction-Representationを編成するさまざまな方法について話し、これらのFunction-Representationsが実装できる関数の種類を探る。最後に、提案の限界についても検討する。

Cognitive Architectures are the forefront of our research into developing an artificial cognition. However, they approach the problem from a separated memory and program model of computation. This model of computation poses a fundamental problem: the knowledge retrieval heuristic. In this paper we propose to solve this problem by using a new model of computation, one where the memory and the program are united: the Function-Representation. We propose a whole framework about how to implement and use these Function-Representations, and we explore their potential through mathematical definitions and proofs. We also talk about different ways to organise multiple Function-Representations, and explore the kind of functions that these Function-Representations can implement. Finally, we also explore the limitations of our proposal.

翻訳日:2024-10-31 13:53:52 公開日:2024-10-11

# 計算関数表現モデル

The Function-Representation Model of Computation ( http://arxiv.org/abs/2410.07928v2 )

ライセンス: Link先を確認

Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon,

(参考訳) 認知アーキテクチャは、人工的な認知を開発する研究の最前線です。しかし、分離されたメモリとプログラムモデルから問題にアプローチする。この計算モデルには、知識検索ヒューリスティックという根本的な問題がある。本稿では,メモリとプログラムが結合した新しい計算モデルである関数表現を用いて,この問題を解決することを提案する。本稿では,これらの関数表現の実装と利用に基づく新しい計算モデルを提案し,その可能性について数学的定義と証明を用いて検討する。また、複数のFunction-Representationを編成するさまざまな方法について話し、これらのFunction-Representationsが実装できる関数の種類を探る。最後に、提案の限界についても検討する。

Cognitive Architectures are the forefront of our research into developing an artificial cognition. However, they approach the problem from a separated memory and program model of computation. This model of computation poses a fundamental problem: the knowledge retrieval heuristic. In this paper we propose to solve this problem by using a new model of computation, one where the memory and the program are united: the Function-Representation. We propose a novel model of computation based on implementing and using these Function-Representations, and we explore its potential through mathematical definitions and proofs. We also talk about different ways to organise multiple Function-Representations, and explore the kind of functions that these Function-Representations can implement. Finally, we also explore the limitations of our proposal.

翻訳日:2024-10-31 13:53:52 公開日:2024-10-11

# Omni-MATH:大規模言語モデルのためのユニバーサルオリンピックレベルの数学ベンチマーク

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models ( http://arxiv.org/abs/2410.07985v1 )

ライセンス: Link先を確認

Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、数学的推論能力に大きなブレークスルーをもたらした。しかし、GSM8KやMATHのような既存のベンチマークは高い精度で解決されている(例えば、OpenAI o1はMATHデータセットで94.8%を達成した)。このギャップを埋めるために、オリンピアードレベルでのLLMの数学的推論を評価するために設計された、包括的で挑戦的なベンチマークを提案する。既存のOlympiad関連のベンチマークとは異なり、我々のデータセットは数学のみに重点を置いており、厳密な人間のアノテーションを使った4428の競合レベルの問題の膨大なコレクションを含んでいる。これらの問題は33以上のサブドメインに厳密に分類され、Olympiad-mathematical reasoningにおけるモデル性能の総合的な評価を可能にしている。さらに,このベンチマークに基づいて詳細な分析を行った。実験の結果,最も先進的なモデルであるOpenAI o1-miniとOpenAI o1-previewでさえ,60.54%と52.55%の精度で,オリンピアードレベルの問題に悩まされ,オリンピアードレベルの数学的推論において重大な課題が浮き彫りにされていることがわかった。

Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems with rigorous human annotation. These problems are meticulously categorized into over 33 sub-domains and span more than 10 distinct difficulty levels, enabling a holistic assessment of model performance in Olympiad-mathematical reasoning. Furthermore, we conducted an in-depth analysis based on this benchmark. Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy, highlighting significant challenges in Olympiad-level mathematical reasoning.

翻訳日:2024-10-31 06:15:07 公開日:2024-10-11

# ロボットマニピュレーションのための相乗的・一般化・効率的なデュアルシステムを目指して

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation ( http://arxiv.org/abs/2410.08001v1 )

ライセンス: Link先を確認

Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao,

(参考訳) 多様な動的環境下での多目的ロボットシステムの運用に対する需要が増大するにつれ、幅広い適応性と高レベルの推論を容易にするために、大規模なクロス・エボディメント・データ・コーパスを活用するジェネリスト・ポリシーの重要性が強調されている。しかし、ジェネラリストは非効率な推論と費用対効果の訓練に苦慮した。代わりに、スペシャリストポリシーは特定のドメインデータに対してキュレーションされ、タスクレベルの精度を効率よく向上させる。しかし、幅広いアプリケーションに対する一般化能力は欠如している。これらの観測から着想を得たRoboDualは、一般論と専門政策の両方の利点を補う相乗的二重システムである。視覚言語アクション(VLA)に基づくジェネラリストの高レベルなタスク理解と離散化されたアクション出力に基づいて、多段階のアクションロールアウトのために、拡散トランスフォーマーベースのスペシャリストを考案した。 OpenVLAと比較すると、RoboDualは実世界の設定が26.7%改善し、CALVINが12%向上した。デモデータの5%のみを使用して、強力なパフォーマンスを維持し、実世界のデプロイにおける3.8倍の制御周波数を実現する。コードは一般に公開されている。私たちのプロジェクトページは、https://opendrivelab.com/RoboDual/でホストされています。

The increasing demand for versatile robotic systems to operate in diverse and dynamic environments has emphasized the importance of a generalist policy, which leverages a large cross-embodiment data corpus to facilitate broad adaptability and high-level reasoning. However, the generalist would struggle with inefficient inference and cost-expensive training. The specialist policy, instead, is curated for specific domain data and excels at task-level precision with efficiency. Yet, it lacks the generalization capacity for a wide range of applications. Inspired by these observations, we introduce RoboDual, a synergistic dual-system that supplements the merits of both generalist and specialist policy. A diffusion transformer-based specialist is devised for multi-step action rollouts, exquisitely conditioned on the high-level task understanding and discretized action output of a vision-language-action (VLA) based generalist. Compared to OpenVLA, RoboDual achieves 26.7% improvement in real-world setting and 12% gain on CALVIN by introducing a specialist policy with merely 20M trainable parameters. It maintains strong performance with 5% of demonstration data only, and enables a 3.8 times higher control frequency in real-world deployment. Code would be made publicly available. Our project page is hosted at: https://opendrivelab.com/RoboDual/

翻訳日:2024-10-31 06:05:02 公開日:2024-10-11

Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao,

翻訳日:2024-10-31 06:05:02 公開日:2024-10-11

# 高速フィードフォワード3次元ガウス平滑化圧縮

Fast Feedforward 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2410.08017v1 )

ライセンス: Link先を確認

Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,

(参考訳) 3D Gaussian Splatting (3DGS)は、新しいビュー合成のためのリアルタイムかつ高忠実なレンダリングを推し進めているため、ストレージ要件は広く採用される上で課題となる。様々な圧縮技術が提案されているが、既存の3DGSでは圧縮を実現するためにシーンごとの最適化が必要であり、圧縮が緩やかで遅くなる。この問題を解決するために,1つのフィードフォワードパスで3DGS表現を高速に圧縮できる最適化フリーモデルであるFCGS(Fast Compression of 3D Gaussian Splatting)を導入し,圧縮時間を数分から秒に短縮する。圧縮効率を向上させるために,ガウス属性を異なるエントロピー制約経路に割り当てるマルチパスエントロピーモジュールを提案する。また,非構造ガウス系ブロブの冗長性を取り除くために,ガウス系コンテキストモデルとガウス系コンテキストモデルの両方を慎重に設計する。全体として、FCGSは圧縮比を20倍以上に向上し、精細度を維持しながら、ほとんどのシーン毎のSOTA最適化手法を上回ります。私たちのコードは、https://github.com/YihangChen-ee/FCGS.comで利用可能です。

With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.

翻訳日:2024-10-31 05:55:14 公開日:2024-10-11

# 高速フィードフォワード3次元ガウス平滑化圧縮

Fast Feedforward 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2410.08017v2 )

ライセンス: Link先を確認

Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,

翻訳日:2024-10-31 05:55:13 公開日:2024-10-11

# 内部解釈可能性のための回路探索の計算複雑性

The Computational Complexity of Circuit Discovery for Inner Interpretability ( http://arxiv.org/abs/2410.08025v1 )

ライセンス: Link先を確認

Federico Adolfi, Martina G. Vilas, Todd Wareham,

(参考訳) ニューラルネットワークの機械学習、認知/脳科学、社会のヒンジへの応用が、回路発見による内的解釈の可能性について提案されている。これは、実行可能なアルゴリズムオプションを経験的および理論的に探索することを要求する。ヒューリスティックスの設計とテストの進歩にもかかわらず、そのスケーラビリティと忠実さには懸念がある。これを解決するために,古典的・パラメータ化された計算複雑性理論を用いて回路探索について検討する:(1)説明,説明,予測,制御の余裕の観点から,回路探索クエリを推論するための概念的足場を記述する;(2)機械的説明を捉えるための包括的なクエリセットを形式化し,その分析のための形式的フレームワークを提案する;(3)多層パーセプトロン(例えば,トランスフォーマー)に対する実用的関心の複雑さの解決に使用する。私たちの発見は、難しい複雑さの風景を明らかにします。多くのクエリは引き込み可能(NP-hard, $\Sigma^p_2$-hard)であり、モデル/回路の特徴(例えば深さ)を制約する場合は固定パラメータ引き込み可能(W[1]-hard)であり、加法的、乗法的、確率的近似スキームでは不適応である。この状況をナビゲートするために、よりよく理解されたヒューリスティックスを用いてこれらの難解な問題(NP-対$\Sigma^p_2$-complete)に取り組むための変換が存在することを証明し、有用な空き容量を保持するより控えめなクエリのトラクタビリティ(PTIME)または固定パラメータトラクタビリティ(FPT)を証明する。このフレームワークは、解釈可能性クエリの範囲と限界を理解し、実行可能な選択肢を探究し、リソース要求を既存のアーキテクチャと将来のアーキテクチャと比較することを可能にする。

Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries that capture mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons (part of, e.g., transformers). Our findings reveal a challenging complexity landscape. Many queries are intractable (NP-hard, $\Sigma^p_2$-hard), remain fixed-parameter intractable (W[1]-hard) when constraining model/circuit features (e.g., depth), and are inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems (NP- vs. $\Sigma^p_2$-complete) with better-understood heuristics, and prove the tractability (PTIME) or fixed-parameter tractability (FPT) of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands among existing and future architectures.

翻訳日:2024-10-31 05:55:13 公開日:2024-10-11

# 内部解釈可能性のための回路探索の計算複雑性

The Computational Complexity of Circuit Discovery for Inner Interpretability ( http://arxiv.org/abs/2410.08025v2 )

ライセンス: Link先を確認

Federico Adolfi, Martina G. Vilas, Todd Wareham,

翻訳日:2024-10-31 05:55:13 公開日:2024-10-11

# SPA: 効果的な身体表現を可能にする3次元空間認識

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ( http://arxiv.org/abs/2410.08208v1 )

ライセンス: Link先を確認

Haoyi Zhu, Honghui Yang, Yating Wang, Jiange Yang, Limin Wang, Tong He,

(参考訳) 本稿では,具体的AIにおける3次元空間認識の重要性を強調する表現学習フレームワークであるSPAを紹介する。提案手法は,多視点画像上での識別可能なニューラルレンダリングを利用して,固有空間理解を備えたバニラビジョントランス (ViT) を実現する。本稿では,8つのシミュレータにまたがる268のタスクを,単一タスクおよび言語条件のマルチタスクシナリオにおいて多種多様なポリシーでカバーし,これまでに最も包括的な表現学習の評価を行った。 SPAは、AI、ビジョン中心のタスク、マルチモーダルアプリケーションに特化して設計されたものを含む、10以上の最先端表現メソッドを一貫して上回り、トレーニングデータが少ない。さらに,実際のシナリオにおいて実世界の実験を行い,その有効性を確認する。これらの結果は,表現学習における3次元空間認識の重要性を浮き彫りにした。私たちの最強のモデルは、トレーニングに6000時間以上を要し、すべてのコードとモデルの重みをオープンソースにして、具体的表現学習における将来の研究を促進することにコミットしています。プロジェクトページ: https://haoyizhu.github.io/spa/。

In this paper, we introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. Our approach leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. The results are compelling: SPA consistently outperforms more than 10 state-of-the-art representation methods, including those specifically designed for embodied AI, vision-centric tasks, and multi-modal applications, while using less training data. Furthermore, we conduct a series of real-world experiments to confirm its effectiveness in practical scenarios. These results highlight the critical role of 3D spatial awareness for embodied representation learning. Our strongest model takes more than 6000 GPU hours to train and we are committed to open-sourcing all code and model weights to foster future research in embodied representation learning. Project Page: https://haoyizhu.github.io/spa/.

翻訳日:2024-10-31 04:46:03 公開日:2024-10-11

# SPA: 効果的な身体表現を可能にする3次元空間認識

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ( http://arxiv.org/abs/2410.08208v2 )

ライセンス: Link先を確認

Haoyi Zhu, Honghui Yang, Yating Wang, Jiange Yang, Limin Wang, Tong He,

翻訳日:2024-10-31 04:46:03 公開日:2024-10-11

# オルタナティブビルドからのバイナリの比較におけるバイナリ等価性のレベル

Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds ( http://arxiv.org/abs/2410.08427v1 )

ライセンス: Link先を確認

Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan,

(参考訳) ソフトウェアサプライチェーンのセキュリティ上の課題に応えて、いくつかの組織が独立したオープンソースプロジェクトを構築し、その結果のバイナリをリリースするインフラストラクチャを構築した。ビルドプラットフォームの可変性は、妥協されたビルド環境の検出を容易にするため、セキュリティを強化することができる。さらに、ビルドプラットフォームのセキュリティ姿勢を改善し、ビルド中に実績情報を集めることで、結果のアーティファクトをより信頼性の高いものにすることができる。これらのサービスは、Google、Oracle、RedHatから利用可能である。同じソースから構築された複数のバイナリが利用可能になったことで、新たな課題と機会が生まれ、"Does build B?"や"Can build A revealed a compromiseed build B?"といった疑問が提起される。そのような質問に答えるためには、バイナリ間の等価性の概念が必要である。ビットワイド平等に基づく明らかなアプローチは、実際は重大な欠点があり、代替概念を選択することに価値があることを実証する。我々は、クローン検出タイプにインスパイアされた同値のレベルを導入することで、これを概念化する。いくつかの実験を通して、これらの新しいレベルの価値を実証する。我々は、異なるプロバイダによって同じソースから構築されたJavaバイナリからなるデータセットを構築し、合計14,156対のバイナリを生成する。次に、それらのjarファイルのコンパイルされたクラスファイルを比較した結果、3,750対のjar(26.49%)に対して、少なくとも1つの異なるファイルが存在し、jarファイルとそれらの暗号化ハッシュが異なることを強制することがわかった。しかし、新しい同値性レベルに基づいて、これらの多くが事実上同値であることを示すことができる。半合成データセット上のいくつかの候補同値関係を評価した結果、同値であるべき、あるいは同値でなくてもよいバイナリのペアからなるオラクルが得られた。

In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collecting provenance information during the build, the resulting artifacts can be used with greater trust. Such offerings are now available from Google, Oracle and RedHat. The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: 'Does build A confirm the integrity of build B?' or 'Can build A reveal a compromised build B?'. To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types. We demonstrate the value of these new levels through several experiments. We construct a dataset consisting of Java binaries built from the same sources independently by different providers, resulting in 14,156 pairs of binaries in total. We then compare the compiled class files in those jar files and find that for 3,750 pairs of jars (26.49%) there is at least one such file that is different, also forcing the jar files and their cryptographic hashes to be different. However, based on the new equivalence levels, we can still establish that many of them are practically equivalent. We evaluate several candidate equivalence relations on a semi-synthetic dataset that provides oracles consisting of pairs of binaries that either should be, or must not be equivalent.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# 有限温度崩壊における量子場相互作用のためのマスター方程式の厳密解

Exact solution of the master equation for interacting quantized fields at finite temperature decay ( http://arxiv.org/abs/2410.08428v1 )

ライセンス: Link先を確認

L. Hernández-Sánchez, I. A. Bocanegra-Garay, I. Ramos-Prieto, F. Soto-Eguibar, H. M. Moya-Cessa,

(参考訳) 有限温度崩壊における2つの量子化場の相互作用を含む量子系のマルコフ力学を解析する。超作用素技術を利用し、2つの非ユニタリ変換を適用することにより、リンドブラッド・マスター方程式を実効的な非エルミート的ハミルトニアンを持つフォン・ノイマン様方程式に再構成する。さらに、このハミルトニアンを対角化するために追加の非ユニタリ変換が使用され、リンドブラッドマスター方程式の正確な解を導出することができる。この方法は、完全な量子状態における任意の初期状態の進化を計算するための枠組みを提供する。特定の例として、最初に空洞内で相互作用する2つの識別不可能な光子の光子一致率を示す。

We analyze the Markovian dynamics of a quantum system involving the interaction of two quantized fields at finite temperature decay. Utilizing superoperator techniques and applying two non-unitary transformations, we reformulate the Lindblad master equation into a von Neumann-like equation with an effective non-Hermitian Hamiltonian. Furthermore, an additional non-unitary transformation is employed to diagonalize this Hamiltonian, enabling us to derive an exact solution to the Lindblad master equation. This method provides a framework to calculate the evolution of any initial state in a fully quantum regime. As a specific example, we present the photon coincidence rates for two indistinguishable photons initially interacting within a cavity.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# 大規模言語モデルにおける oRetrieval Augmented Generation とその医学的適合度評価における一般化可能性

oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness ( http://arxiv.org/abs/2410.08431v1 )

ライセンス: Link先を確認

Yu He Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Chang-Fu Kuo, Shao-Chun Wu, Vesela P. Kovacheva, Daniel Shu Wei Ting,

(参考訳) 大規模言語モデル(LLM)は医学的応用の可能性を示すが、専門的な臨床知識が欠如していることが多い。 Retrieval Augmented Generation (RAG)は、ドメイン固有の情報によるカスタマイズを可能にし、医療に適している。本研究は,手術適応の判定と術前指導におけるRAGモデルの精度,一貫性,安全性について検討した。 35の局所的および23の国際的術前ガイドラインを用いてLLM-RAGモデルを開発し、人為的な反応に対して試験を行った。合計3,682件の回答が得られた。臨床文書はLlamaindexを用いて処理され, GPT3.5, GPT4, Claude-3を含む10個のLCMが評価された。術前指導の7つの側面に焦点をあてて14の臨床シナリオを解析した。正しい回答を判断するために確立されたガイドラインと専門家の判断が用いられ、人為的な回答が比較として役立った。 LLM-RAGモデルでは、20秒以内に反応が生成され、臨床医 (10分) よりも有意に速かった。 GPT4 LLM-RAGモデルが最も精度が高く(96.4%対86.6%、p=0.016)、幻覚は無く、臨床医に匹敵する正しい指示が得られた。結果は地域と国際両方のガイドラインで一致していた。本研究は, LLM-RAGモデルの有効性を実証し, その効率性, 拡張性, 信頼性を明らかにした。

Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# MYCROFT: 効果的かつ効率的な外部データ拡張を目指して

MYCROFT: Towards Effective and Efficient External Data Augmentation ( http://arxiv.org/abs/2410.08432v1 )

ライセンス: Link先を確認

Zain Sarwar, Van Tran, Arjun Nitin Bhagoji, Nick Feamster, Ben Y. Zhao, Supriyo Chakraborty,

(参考訳) 機械学習(ML)モデルは、よく機能するために大量のデータを必要とすることが多い。利用可能なデータが限られている場合、モデルトレーナーは外部ソースからより多くのデータを取得する必要がある。多くの場合、有用なデータは、プライバシーやプライバシーの懸念から、自分のデータを共有することをためらうプライベートなエンティティによって保持される。これにより、モデルトレーナーは、モデルパフォーマンスを改善するために必要なデータを取得するのが難しく、コストがかかる。この課題に対処するために、制約付きデータ共有予算で作業しながら、モデルトレーナーが異なるデータソースの相対的有用性を評価することのできる、データ効率のよい方法であるMycroftを提案する。特徴空間距離と勾配マッチングを活用することで、Mycroftは各所有者の小さなが情報に富むデータサブセットを特定し、モデルトレーナーは最小限のデータ露出でパフォーマンスを最大化することができる。 2つの領域における4つのタスクにまたがる実験結果から、Mycroftはすべてのデータが共有される全情報ベースラインのパフォーマンスに急速に収束することが示された。さらに、Mycroftはノイズに対して堅牢であり、ユーティリティによってデータ所有者を効果的にランク付けすることができる。 Mycroftは、高性能MLモデルの民主化トレーニングの道を開くことができる。

Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to improve model performance. To address this challenge, we propose Mycroft, a data-efficient method that enables model trainers to evaluate the relative utility of different data sources while working with a constrained data-sharing budget. By leveraging feature space distances and gradient matching, Mycroft identifies small but informative data subsets from each owner, allowing model trainers to maximize performance with minimal data exposure. Experimental results across four tasks in two domains show that Mycroft converges rapidly to the performance of the full-information baseline, where all data is shared. Moreover, Mycroft is robust to noise and can effectively rank data owners by utility. Mycroft can pave the way for democratized training of high performance ML models.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# SoK: ソフトウェア比較

SoK: Software Compartmentalization ( http://arxiv.org/abs/2410.08434v1 )

ライセンス: Link先を確認

Hugo Lefeuvre, Nathan Dautenhahn, David Chisnall, Pierre Olivier,

(参考訳) 大きなシステムを限られた特権を持つ小さなコンポーネントに分解することは、エクスプロイトの影響を最小限に抑える効果的な方法として長年認識されてきた。歴史的ルーツ、実証された利益、そして学術と産業における多くの研究努力にもかかわらず、ソフトウェアの区画化は依然として主流ではない。本稿では,この現状をどう改善できるかを考察する。既存の手法が用語学や分析手法の不整合に悩まされていることに留意し, 構成化手法の体系的分析, 比較, 指示のための統一モデルを提案する。このモデルを用いて、211の研究成果をレビューし、61の主流のコンパートナライズドシステムを分析し、研究と生産の両方の限界を理解する。中でも本研究は,手作業の方法,カスタム抽象化,レガシメカニズムに大きく依存していることが明らかとなった。分断化は、全体論的に解決されるべきである; 分断化ポリシーの定義を単純化するためには進歩が必要である; 混乱した議題とハードウェアの制限から脅威モデルに挑戦するためには、研究と主流のニーズの間のギャップを埋めることが必要である。本稿では, 歴史的, 現状の区画化の展望を地図化するとともに, それらの進化と導入を促進する枠組みを策定する。

Decomposing large systems into smaller components with limited privileges has long been recognized as an effective means to minimize the impact of exploits. Despite historical roots, demonstrated benefits, and a plethora of research efforts in academia and industry, the compartmentalization of software is still not a mainstream practice. This paper investigates why, and how this status quo can be improved. Noting that existing approaches are fraught with inconsistencies in terminology and analytical methods, we propose a unified model for the systematic analysis, comparison, and directing of compartmentalization approaches. We use this model to review 211 research efforts and analyze 61 mainstream compartmentalized systems, confronting them to understand the limitations of both research and production works. Among others, our findings reveal that mainstream efforts largely rely on manual methods, custom abstractions, and legacy mechanisms, poles apart from recent research. We conclude with recommendations: compartmentalization should be solved holistically; progress is needed towards simplifying the definition of compartmentalization policies; towards better challenging our threat models in the light of confused deputies and hardware limitations; as well as towards bridging the gaps we pinpoint between research and mainstream needs. This paper not only maps the historical and current landscape of compartmentalization, but also sets forth a framework to foster their evolution and adoption.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# 微粒な対話型テクスチャ誘導によるシンボリック音楽生成

Symbolic Music Generation with Fine-grained Interactive Textural Guidance ( http://arxiv.org/abs/2410.08435v1 )

ライセンス: Link先を確認

Tingyu Zhu, Haoyu Liu, Zhimin Jiang, Zeyu Zheng,

(参考訳) シンボリック・ミュージック・ジェネレーションの問題は、限られたデータ・アベイラビリティーと音符ピッチの高精度化の必要性が組み合わさって、独特な課題を呈している。これらの課題を克服するために,学習分布の誤りを補正するために,拡散モデル内に細粒度テクスチュラルガイダンス(FTG)を導入する。 FTGを取り入れることで、拡散モデルは音楽生成の精度を向上し、プログレッシブな音楽生成、即興、インタラクティブな音楽生成といった高度なタスクに適している。シンボリック音楽生成における課題とFTGアプローチの効果について理論的特徴を導出する。ユーザ入力による対話型音楽生成のための数値実験とデモページを提供し,提案手法の有効性を実証する。

The problem of symbolic music generation presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To overcome these difficulties, we introduce Fine-grained Textural Guidance (FTG) within diffusion models to correct errors in the learned distributions. By incorporating FTG, the diffusion models improve the accuracy of music generation, which makes them well-suited for advanced tasks such as progressive music generation, improvisation and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effect of the FTG approach. We provide numerical experiments and a demo page for interactive music generation with user input to showcase the effectiveness of our approach.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# 大規模言語モデルを用いた多段階自然言語推論における推論構造の役割を探る

Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models ( http://arxiv.org/abs/2410.08436v1 )

ライセンス: Link先を確認

Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, Xiaodan Zhu,

(参考訳) 複雑な多段階推論タスクを行う場合、構造化中間証明ステップを導出する大規模言語モデル(LLM)の能力は、モデルが本当に望ましい推論を実行し、モデルの説明可能性を向上させるために重要である。本稿では,現在最先端のジェネラリスト LLM が,いくつかの例でこれらの構造を活用できるかどうかを,‘textit{in-context learning} を用いて,より優れた証明構造を構築することができるか,という,焦点を絞った研究に焦点をあてる。本研究は,構造認識型実演と構造認識型実演に焦点を当てた。どちらもパフォーマンス向上に役立ちます。結果を理解するのに役立つ詳細な分析が提供されている。

When performing complex multi-step reasoning tasks, the ability of Large Language Models (LLMs) to derive structured intermediate proof steps is important for ensuring that the models truly perform the desired reasoning and for improving models' explainability. This paper is centred around a focused study: whether the current state-of-the-art generalist LLMs can leverage the structures in a few examples to better construct the proof structures with \textit{in-context learning}. Our study specifically focuses on structure-aware demonstration and structure-aware pruning. We demonstrate that they both help improve performance. A detailed analysis is provided to help understand the results.

翻訳日:2024-10-31 03:26:42 公開日:2024-10-11

# $\forall$uto$\exists$$\lor\!

$\forall$uto$\exists$$\lor\!\land$L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks ( http://arxiv.org/abs/2410.08437v1 )

ライセンス: Link先を確認

Rushang Karia, Daniel Bramblett, Daksh Dobhal, Siddharth Srivastava,

(参考訳) 本稿では,$\forall$uto$\exists$$\lor\! \land$Lは、翻訳における真理維持や論理的推論など、正確性を明確に定義したフォーマルなタスクにおいて、LLM(Large Language Model)アセスメントをスケールするための新しいベンチマークである。 $\forall$uto$\exists$$\lor\! 最初のベンチマークパラダイムである \land$L は、人間のラベルなしで LLM の客観的評価をスケールするのに必要ないくつかの重要な利点を提供する。 (a)難易度の異なるタスクの自動生成による高度化のLLMを評価する能力ロ費用及び時間のかかる人的注釈への依存を排除した地底真理の自動生成 (c) 自動生成されたランダム化されたデータセットを使用することで、多くの現代的なベンチマークで使用される静的データセットに過度に適合する連続LLMの能力を緩和する。実証分析によると、LLMのパフォーマンスは$\forall$uto$\exists$$\lor\! \land$Lは、翻訳と推論タスクに重点を置くさまざまなベンチマークのパフォーマンスを高く評価しているため、手作業によるデータセットの取得や更新が困難になるような環境では、貴重な自律的な評価パラダイムとなっている。

This paper presents $\forall$uto$\exists$$\lor\!\land$L, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. $\forall$uto$\exists$$\lor\!\land$L is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of increasing sophistication by auto-generating tasks at different levels of difficulty; (b) auto-generation of ground truth that eliminates dependence on expensive and time-consuming human annotation; (c) the use of automatically generated, randomized datasets that mitigate the ability of successive LLMs to overfit to static datasets used in many contemporary benchmarks. Empirical analysis shows that an LLM's performance on $\forall$uto$\exists$$\lor\!\land$L is highly indicative of its performance on a diverse array of other benchmarks focusing on translation and reasoning tasks, making it a valuable autonomous evaluation paradigm in settings where hand-curated datasets can be hard to obtain and/or update.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# 非マルコフ細胞集団動態の制御のための強化学習

Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics ( http://arxiv.org/abs/2410.08439v1 )

ライセンス: Link先を確認

Josiah C. Kratz, Jacob Adamczyk,

(参考訳) 細菌からがん細胞に至るまで、多くの生物や細胞は、変動する環境に適応する顕著な能力を示す。さらに、細胞は過去の環境の記憶を利用して、以前記録されたストレスを生き残ることができる。制御の観点からは、この適応性は細胞集団を絶滅へと駆り立てる上で重要な課題となり、臨床上大きな意味を持つオープンな問題となっている。本研究では,表現型可塑性を示す細胞集団における薬物投与に焦点を当てた。抵抗状態と感受性状態の間を切り替える特定の力学モデルについては、正確な解が知られている。しかし、基礎となるシステムパラメータが不明で、複雑なメモリベースのシステムでは、最適解を得るのは難解である。この課題に対処するために、新しい非マルコフ力学の下で進化する細胞集団を制御するための情報量削減戦略の同定に強化学習(RL)を適用した。モデルのない深部RLは、長距離時間力学の存在下でも正確な解を回復し、細胞集団を制御することができる。

Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and is thus an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# 単語・オブ・ムート社会学習における相互作用するカルマンフィルタの低収束性

Slow Convergence of Interacting Kalman Filters in Word-of-Mouth Social Learning ( http://arxiv.org/abs/2410.08447v1 )

ライセンス: Link先を確認

Vikram Krishnamurthy, Cristian Rojas,

(参考訳) 我々は、連続的に動作するカルマンフィルタエージェントを$m$で提供する、口語ソーシャルラーニングについて検討する。第1カルマンフィルタは生の観測を受け、その後の各カルマンフィルタは前のカルマンフィルタの条件平均のノイズ測定を受ける。前者は$m$-th Kalmanフィルタによって更新される。 m=2$と観測値がガウス確率変数のノイズのある測度であるとき、共分散は標準カルマンフィルタの$O(k^{-1})$の代わりに$k$観測に対して$k^{-1/3}$としてゼロとなる。本稿では、$m$エージェントの場合、共分散は$k^{-(2^m-1)}$としてゼロに減少し、つまり、エージェントの数とともに学習が指数関数的に遅くなることを示す。また, 学習速度をk^{-1}$とすることで, 学習速度を最大化できることを示す。意味するところは、口語社会学習において、人為的に前者を振り返れば、最適な学習率が得られるということである。

We consider word-of-mouth social learning involving $m$ Kalman filter agents that operate sequentially. The first Kalman filter receives the raw observations, while each subsequent Kalman filter receives a noisy measurement of the conditional mean of the previous Kalman filter. The prior is updated by the $m$-th Kalman filter. When $m=2$, and the observations are noisy measurements of a Gaussian random variable, the covariance goes to zero as $k^{-1/3}$ for $k$ observations, instead of $O(k^{-1})$ in the standard Kalman filter. In this paper we prove that for $m$ agents, the covariance decreases to zero as $k^{-(2^m-1)}$, i.e, the learning slows down exponentially with the number of agents. We also show that by artificially weighing the prior at each time, the learning rate can be made optimal as $k^{-1}$. The implication is that in word-of-mouth social learning, artificially re-weighing the prior can yield the optimal learning rate.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# 相関雑音を伴う確率勾配アルゴリズムの有限サンプルと大偏差解析

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise ( http://arxiv.org/abs/2410.08449v1 )

ライセンス: Link先を確認

George Yin, Vikram Krishnamurthy,

(参考訳) ステップサイズを減少させる確率勾配アルゴリズムの有限標本残差を解析する。相関雑音を仮定し,解析の体系的アプローチとして摂動リアプノフ関数を用いる。最後に、大規模偏差理論を用いて、繰り返しの逃避時間を分析する。

We analyze the finite sample regret of a decreasing step size stochastic gradient algorithm. We assume correlated noise and use a perturbed Lyapunov function as a systematic approach for the analysis. Finally we analyze the escape time of the iterates using large deviations theory.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# Kolmogorov-Arnold によるニューラルネットワーク学習の証明

The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning ( http://arxiv.org/abs/2410.08451v1 )

ライセンス: Link先を確認

Michael H. Freedman,

(参考訳) コルモゴロフとアーノルドはヒルベルトの13番目の問題(連続関数の文脈で)に答え、現代のニューラルネットワーク理論の基礎を築いた。その証明は多変量関数の表現を次の2つのステップに分割する: 最初の(非線形でない)層間写像は、データ多様体を単一の隠れ層に普遍的な埋め込みを与える。私はこのパターンを、ほぼ至るところで定義された層間写像のヤコビアンの「小さな濃度」と解釈する。微量の濃度は、ヤコビアンのより高い外界の力に比例する。本稿では、今日のディープNNにおける高次概念の出現の舞台となるような空間性について概念的議論を行い、この仮説をテストするための2つの実験のクラスを提案する。

Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# 高エントロピー合金設計のためのコルモゴロフ・アルノルドニューラルネットワーク

Kolmogorov-Arnold Neural Networks for High-Entropy Alloys Design ( http://arxiv.org/abs/2410.08452v1 )

ライセンス: Link先を確認

Yagnik Bandyopadhyay, Harshil Avlani, Houlong L. Zhuang,

(参考訳) 深層学習に基づく機械学習技術は、高エントロピー合金(HEA)の設計に広く応用されており、多くの貴重な洞察を得ている。 Kolmogorov-Arnold Networks (KAN)は、最近開発されたアーキテクチャであり、入力機能の精度と解釈性の両方を改善することを目的としている。本研究では,HEA設計のための3つの異なるデータセットを探索し,分類モデルと回帰モデルの両方に対するkanの適用を実証する。最初の例では、エンタルピーと価電子濃度の混合といった様々な性質に基づいて、高エントロピー炭化物セラミックスの単相生成の確率を予測するために、KA分類モデルを用いる。第2の例では, 熱処理時間, 冷間圧延率, 均質化温度を含むプロセス条件と化学組成に基づいて, HEAの降伏強度と究極引張強度を予測するために, KA回帰モデルを用いる。第3の例は、ある組成がHEAであるか非HEAであるかを判断するカン分類モデルと、同定されたHEAのバルク率を予測するカン回帰モデルである。これら3つの例は、分類のためのF1スコアや平均正方誤差(MSE)、多層パーセプトロン(MLP)の回帰のための決定係数(R2)などの精度において、分類および回帰タスクの両処理におけるkanの有効性を実証することにより、パフォーマンスを上回るか、一致させるかのどちらかである。我々は、先進的な機械学習技術を探求し、複雑な材料をより正確に予測し、より解釈しやすくし、最終的に望ましい特性を持つHEAの発見と最適化を加速する将来的な方向を提供する。

A wide range of deep learning-based machine learning techniques are extensively applied to the design of high-entropy alloys (HEAs), yielding numerous valuable insights. Kolmogorov-Arnold Networks (KAN) is a recently developed architecture that aims to improve both the accuracy and interpretability of input features. In this work, we explore three different datasets for HEA design and demonstrate the application of KAN for both classification and regression models. In the first example, we use a KAN classification model to predict the probability of single-phase formation in high-entropy carbide ceramics based on various properties such as mixing enthalpy and valence electron concentration. In the second example, we employ a KAN regression model to predict the yield strength and ultimate tensile strength of HEAs based on their chemical composition and process conditions including annealing time, cold rolling percentage, and homogenization temperature. The third example involves a KAN classification model to determine whether a certain composition is an HEA or non-HEA, followed by a KAN regressor model to predict the bulk modulus of the identified HEA, aiming to identify HEAs with high bulk modulus. In all three examples, KAN either outperform or match the performance in terms of accuracy such as F1 score for classification and Mean Square Error (MSE), and coefficient of determination (R2) for regression of the multilayer perceptron (MLP) by demonstrating the efficacy of KAN in handling both classification and regression tasks. We provide a promising direction for future research to explore advanced machine learning techniques, which lead to more accurate predictions and better interpretability of complex materials, ultimately accelerating the discovery and optimization of HEAs with desirable properties.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# AdvDiffuser: 誘導拡散による対人安全批判運転シナリオの生成

AdvDiffuser: Generating Adversarial Safety-Critical Driving Scenarios via Guided Diffusion ( http://arxiv.org/abs/2410.08453v1 )

ライセンス: Link先を確認

Yuting Xie, Xianda Guo, Cong Wang, Kunhua Liu, Long Chen,

(参考訳) 安全クリティカルなシナリオは、自然運転環境では頻繁に発生するが、自律運転システムの訓練とテストにおいて重要な役割を担っている。一般的なアプローチは、自然環境に敵対的な調整を導入することで、シミュレーションにおいて安全クリティカルなシナリオを自動的に生成することである。これらの調整は、しばしば特定のテストシステムに適合し、異なるシステム間での転送可能性を無視している。本稿では,誘導拡散による安全クリティカルな運転シナリオを生成するための逆フレームワークであるAdvDiffuserを提案する。 AdvDiffuserは、拡散モデルを導入して、背景車両の可視的集合行動と、敵のシナリオを効果的に扱うための軽量ガイドモデルを取り込むことにより、転送可能性を促進する。 nuScenesデータセットの実験結果によると、オフラインの運転ログに基づいてトレーニングされたAdvDiffuserは、最小限のウォームアップエピソードデータを持つ様々なテストシステムに適用でき、現実性、多様性、対向性能の点で他の既存手法よりも優れている。

Safety-critical scenarios are infrequent in natural driving environments but hold significant importance for the training and testing of autonomous driving systems. The prevailing approach involves generating safety-critical scenarios automatically in simulation by introducing adversarial adjustments to natural environments. These adjustments are often tailored to specific tested systems, thereby disregarding their transferability across different systems. In this paper, we propose AdvDiffuser, an adversarial framework for generating safety-critical driving scenarios through guided diffusion. By incorporating a diffusion model to capture plausible collective behaviors of background vehicles and a lightweight guide model to effectively handle adversarial scenarios, AdvDiffuser facilitates transferability. Experimental results on the nuScenes dataset demonstrate that AdvDiffuser, trained on offline driving logs, can be applied to various tested systems with minimal warm-up episode data and outperform other existing methods in terms of realism, diversity, and adversarial performance.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# なぜ事前学習が下流の分類作業にとって有益か?

Why pre-training is beneficial for downstream classification tasks? ( http://arxiv.org/abs/2410.08455v1 )

ライセンス: Link先を確認

Xin Jiang, Xu Cheng, Zechao Li,

(参考訳) 事前学習は、精度を高め、収束を早めることによって下流タスクに顕著な利点を示したが、これらの利点の正確な理由は未だに不明である。そこで本研究では,深層ニューラルネットワーク(DNN)の学習行動に新たな光を当てる新たなゲーム理論的視点から,下流タスクに対する事前学習の効果を定量的かつ明示的に説明することを提案する。具体的には、事前学習されたモデルによって符号化された知識を抽出し、定量化し、さらに微調整過程における知識の変化を追跡する。興味深いことに、下流タスクの推測のために、訓練済みのモデルの知識が少量しか保存されていないことが判明した。しかし、そのような保存された知識は、スクラッチから学ぶためのモデルトレーニングにとって非常に難しい。したがって、この学習と有用な知識の助けを借りて、事前トレーニングで微調整されたモデルは、スクラッチからトレーニングしたモデルよりもパフォーマンスが良くなります。さらに、事前学習により、より直接的かつ迅速にダウンストリームタスクの目標知識を学習し、より高速な微調整モデルの収束を導出できることがわかった。

Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear. To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs). Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process. Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks. However, such preserved knowledge is very challenging for a model training from scratch to learn. Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch. Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# ドメイン一般化された人物再識別のための統合された深部セマンティック拡張フレームワーク

A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification ( http://arxiv.org/abs/2410.08456v1 )

ライセンス: Link先を確認

Eugene P. W. Ang, Shan Lin, Alex C. Kot,

(参考訳) 監視された人物再識別法(Person ReID)は,1台のカメラネットワーク内でのトレーニングおよびテストにおいて優れた性能を発揮する。しかし、通常、異なるカメラシステムに適用した場合、かなりの性能劣化に悩まされる。近年,対象ドメインからのラベル付きデータを必要とせず,優れた性能を実現するために,多数のドメイン適応型人物ReID手法が提案されている。しかし、これらのアプローチはトレーニングプロセス中にターゲットドメインのラベル付けされていないデータを必要とするため、現実の多くのシナリオでは実用的ではない。本研究は、より実践的なドメイン一般化人再識別(DG-ReID)問題に焦点を当てる。 1つ以上のソースドメインが与えられたら、目に見えないターゲットドメインに適用可能な一般化されたモデルを学ぶことを目指している。 DG-ReIDにおける有望な研究方向の1つは、暗黙的な深い意味的特徴拡張の利用であり、我々の以前の手法であるDomain Embedding Expansion (DEX)は、DG-ReIDの強力な結果をもたらす一例である。しかし,本研究では,DeXと類似の暗黙的意味的特徴拡張手法が,提案した損失関数の制限により,飽和が早すぎる傾向にあるため,大規模な評価ベンチマークにおいて,その潜在能力を最大限に発揮できないことを示す。この分析を生かして、我々の新しいフレームワークであるUnified Deep Semantic Expansionを提案する。このフレームワークは、暗黙的かつ明示的なセマンティックな特徴拡張技術を単一のフレームワークに統合し、この初期の過剰適合を緩和し、すべてのDG-ReIDベンチマークで新しい最先端(SOTA)を実現する。さらに,提案手法をより一般的な画像検索タスクに適用し,これらのベンチマークのすべてにおいて,現在のSOTAをはるかに上回っている。

Supervised Person Re-identification (Person ReID) methods have achieved excellent performance when training and testing within one camera network. However, they usually suffer from considerable performance degradation when applied to different camera systems. In recent years, many Domain Adaptation Person ReID methods have been proposed, achieving impressive performance without requiring labeled data from the target domain. However, these approaches still need the unlabeled data of the target domain during the training process, making them impractical in many real-world scenarios. Our work focuses on the more practical Domain Generalized Person Re-identification (DG-ReID) problem. Given one or more source domains, it aims to learn a generalized model that can be applied to unseen target domains. One promising research direction in DG-ReID is the use of implicit deep semantic feature expansion, and our previous method, Domain Embedding Expansion (DEX), is one such example that achieves powerful results in DG-ReID. However, in this work we show that DEX and other similar implicit deep semantic feature expansion methods, due to limitations in their proposed loss function, fail to reach their full potential on large evaluation benchmarks as they have a tendency to saturate too early. Leveraging on this analysis, we propose Unified Deep Semantic Expansion, our novel framework that unifies implicit and explicit semantic feature expansion techniques in a single framework to mitigate this early over-fitting and achieve a new state-of-the-art (SOTA) in all DG-ReID benchmarks. Further, we apply our method on more general image retrieval tasks, also surpassing the current SOTA in all of these benchmarks by wide margins.

翻訳日:2024-10-31 03:16:22 公開日:2024-10-11

# Unity is Power: リソース制限クライアントにおける構造化プルーニングを伴う大規模モデルの半非同期協調学習

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients ( http://arxiv.org/abs/2410.08457v1 )

ライセンス: Link先を確認

Yan Li, Mingyi Li, Xiao Zhang, Guangwei Xu, Feng Chen, Yuan Yuan, Yifei Zou, Mengying Zhao, Jianbo Lu, Dongxiao Yu,

(参考訳) 本研究では,分散データセット上で大規模モデルを協調的に学習するための,巨大不均一な弱い計算能力の可能性を明らかにする。資源適応型協調学習における効率性と精度を両立させるため, 同時に, \textit{unstructured pruning}, \textit{var submodel architectures}, \textit{knowledge loss}, \textit{straggler}の課題を考える。本稿では,データ分散を意識した構造化プルーニングとブロック間知識伝達機構を備えた半非同期協調学習フレームワーク,すなわち${Co\text{-}S}^2{P}$を提案する。さらに、${Co\text{-}S}^2{P}$が$O(1/\sqrt{N^*EQ})$の漸近的最適収束率を達成できるという理論的証明を与える。最後に,16個の異種ジェットソンデバイスを一体化して,最大0.11億のパラメータを持つ大規模モデルのトレーニングを行う実世界のハードウェアテストベッド上で,広範な実験を行う。実験結果によると、$Co\text{-}S^2P$はリソース使用率を最大8.8\%改善し、リソース使用率を1.2$\times$に向上し、メモリ使用量を約22\%削減し、すべてのリソース制限されたデバイスでトレーニング時間を約24\%短縮した。

In this work, we study to release the potential of massive heterogeneous weak computing power to collaboratively train large-scale models on dispersed datasets. In order to improve both efficiency and accuracy in resource-adaptive collaborative learning, we take the first step to consider the \textit{unstructured pruning}, \textit{varying submodel architectures}, \textit{knowledge loss}, and \textit{straggler} challenges simultaneously. We propose a novel semi-asynchronous collaborative training framework, namely ${Co\text{-}S}^2{P}$, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns. Furthermore, we provide theoretical proof that ${Co\text{-}S}^2{P}$ can achieve asymptotic optimal convergence rate of $O(1/\sqrt{N^*EQ})$. Finally, we conduct extensive experiments on a real-world hardware testbed, in which 16 heterogeneous Jetson devices can be united to train large-scale models with parameters up to 0.11 billion. The experimental results demonstrate that $Co\text{-}S^2P$ improves accuracy by up to 8.8\% and resource utilization by up to 1.2$\times$ compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# Reward蒸留とPreference Learningの同時学習:両方を実行できる言語モデルを得る

Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both ( http://arxiv.org/abs/2410.08458v1 )

ライセンス: Link先を確認

Abhijnan Nath, Changsoo Jung, Ethan Seefried, Nikhil Krishnaswamy,

(参考訳) 人選好のリワードモデリングは、使用可能な生成型大言語モデル(LLM)を構築するための基盤の1つである。従来のRLHFベースのアライメント手法は、別の報酬モデルから期待される報酬を明示的に最大化するが、より最近のDPO(Direct Preference Optimization)のような教師付きアライメント手法はこのフェーズを回避し、モデルのドリフトや報酬オーバーフィッティングなどの問題を回避する。単純さから人気があるが、DPOや同様の直接アライメント手法はいまだに退化ポリシーを導いており、ブラッドリー・テリーによる選好の定式化に大きく依存して、候補出力のペア間の報酬差をモデル化している。この定式化は、例えば2つの候補出力の人間のスコアが低いというような、非決定的またはノイズの多い選好ラベルによって挑戦される。本稿では,DRDO(Direct Reward Distillation and Policy-Optimization)について紹介する。 DRDOは、新規な嗜好確率の定式化から人間の嗜好を学習しながら、託宣によって割り当てられた報酬を直接模倣する。 Ultrafeedback と TL;DR データセットによる実験結果から,DRDO を用いて訓練したポリシーは,期待される報酬の点において,DPO や e-DPO といった従来の手法を上回り,ノイズの多い選好信号やアウト・オブ・ディストリビューション(OOD)の設定に対して,より堅牢であることが示された。

Reward modeling of human preferences is one of the cornerstones of building usable generative large language models (LLMs). While traditional RLHF-based alignment methods explicitly maximize the expected rewards from a separate reward model, more recent supervised alignment methods like Direct Preference Optimization (DPO) circumvent this phase to avoid problems including model drift and reward overfitting. Although popular due to its simplicity, DPO and similar direct alignment methods can still lead to degenerate policies, and rely heavily on the Bradley-Terry-based preference formulation to model reward differences between pairs of candidate outputs. This formulation is challenged by non-deterministic or noisy preference labels, for example human scoring of two candidate outputs is of low confidence. In this paper, we introduce DRDO (Direct Reward Distillation and policy-Optimization), a supervised knowledge distillation-based preference alignment method that simultaneously models rewards and preferences to avoid such degeneracy. DRDO directly mimics rewards assigned by an oracle while learning human preferences from a novel preference likelihood formulation. Our experimental results on the Ultrafeedback and TL;DR datasets demonstrate that policies trained using DRDO surpass previous methods such as DPO and e-DPO in terms of expected rewards and are more robust, on average, to noisy preference signals as well as out-of-distribution (OOD) settings.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# Omni-Domain Generalized Person Re-identificationのための多元的深層特徴アンサンブル学習

Diverse Deep Feature Ensemble Learning for Omni-Domain Generalized Person Re-identification ( http://arxiv.org/abs/2410.08460v1 )

ライセンス: Link先を確認

Eugene P. W. Ang, Shan Lin, Alex C. Kot,

(参考訳) 個人再識別(Person ReID)は、単一ドメインで管理されるPerson ReIDのパフォーマンスが飽和したレベルまで進んでいる。しかし、これらの手法は、異なるデータセット間でトレーニングおよびテストを行う際に、大幅なパフォーマンス低下を経験し、ドメインの一般化技術の開発を動機付けている。しかし,本研究では,単一データセットのベンチマークにおいて,ドメイン一般化手法が単一ドメイン管理手法を著しく低下させることを明らかにした。理想的なPerson ReIDメソッドは、関連するドメインの数に関係なく有効であり、テストドメインデータがトレーニングに利用できる場合は、最先端(SOTA)と同様に、完全に管理されたメソッドも実行すべきである。これはOmni-Domain Generalization Person ReID(ODG-ReID)と呼ばれるパラダイムです。本稿では,自己アンサンブルによる深い特徴の多様性を生かし,ODG-ReIDを実現する方法を提案する。提案手法であるDiverse Deep Feature Ensemble Learning (D2FEL)は,複数の多様なビューを生成し,これらのビューをコンパクトなエンコードに再結合するユニークなインスタンス正規化パターンをデプロイする。我々の知る限り、ペルソナ・リIDにおける全ドメインの一般化を考えることは少ないものの1つであり、ペルソナ・リIDにおける特徴アンサンブルの研究を進める。 D2FELは、主要なドメイン一般化と単一ドメイン管理ベンチマークのためのSOTA性能を著しく改善し、一致させる。

Person Re-identification (Person ReID) has progressed to a level where single-domain supervised Person ReID performance has saturated. However, such methods experience a significant drop in performance when trained and tested across different datasets, motivating the development of domain generalization techniques. However, our research reveals that domain generalization methods significantly underperform single-domain supervised methods on single dataset benchmarks. An ideal Person ReID method should be effective regardless of the number of domains involved, and when test domain data is available for training it should perform as well as state-of-the-art (SOTA) fully supervised methods. This is a paradigm that we call Omni-Domain Generalization Person ReID (ODG-ReID). We propose a way to achieve ODG-ReID by creating deep feature diversity with self-ensembles. Our method, Diverse Deep Feature Ensemble Learning (D2FEL), deploys unique instance normalization patterns that generate multiple diverse views and recombines these views into a compact encoding. To the best of our knowledge, our work is one of few to consider omni-domain generalization in Person ReID, and we advance the study of applying feature ensembles in Person ReID. D2FEL significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# プライバシを前方に進める - 合成データ生成によるスマート車内の情報漏洩の軽減

Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation ( http://arxiv.org/abs/2410.08462v1 )

ライセンス: Link先を確認

Krish Parikh,

(参考訳) スマートカーは大量のデータを生成し、そのほとんどが機密性があり、プライバシー侵害のリスクがある。攻撃者がこれらのデータセット内の匿名メタデータをプロファイリングドライバに活用する傾向にあるため、イノベーションや進行中の研究を妨げることなく、情報漏洩を緩和するソリューションを見つけることが重要である。合成データは、これらのプライバシー問題に対処するための有望なツールとして登場し、現実のデータ関係の複製を可能にすると同時に、機密情報を開示するリスクを最小限にする。本稿では,これらの課題に対処するための合成データの利用について検討する。まず、14の車載センサーを包括的に分類し、潜在的な攻撃を特定し、その脆弱性を分類することから始めます。次に、PVS(Passive Vehicular Sensor)データセットを使用して、100万以上のデータポイントを含むTabular Variational Autoencoder(TVAE)モデルで合成データを生成する。最後に、これらを3つのコアメトリクス – 忠実度、ユーティリティ、プライバシ – に対して評価する。その結果, 運転者のプロファイリングを防止しつつ, 本来の意図でテストした場合, 90.1%の統計的類似度と78%の分類精度を達成できた。コードはhttps://github.com/krish-parikh/Synthetic-Data-Generationにある。

Smart vehicles produce large amounts of data, much of which is sensitive and at risk of privacy breaches. As attackers increasingly exploit anonymised metadata within these datasets to profile drivers, it's important to find solutions that mitigate this information leakage without hindering innovation and ongoing research. Synthetic data has emerged as a promising tool to address these privacy concerns, as it allows for the replication of real-world data relationships while minimising the risk of revealing sensitive information. In this paper, we examine the use of synthetic data to tackle these challenges. We start by proposing a comprehensive taxonomy of 14 in-vehicle sensors, identifying potential attacks and categorising their vulnerability. We then focus on the most vulnerable signals, using the Passive Vehicular Sensor (PVS) dataset to generate synthetic data with a Tabular Variational Autoencoder (TVAE) model, which included over 1 million data points. Finally, we evaluate this against 3 core metrics: fidelity, utility, and privacy. Our results show that we achieved 90.1% statistical similarity and 78% classification accuracy when tested on its original intent while also preventing the profiling of the driver. The code can be found at https://github.com/krish-parikh/Synthetic-Data-Generation

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# ARCap:拡張現実フィードバックによるロボット学習のための高品質な人間デモ収集

ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback ( http://arxiv.org/abs/2410.08464v1 )

ライセンス: Link先を確認

Sirui Chen, Chen Wang, Kaden Nguyen, Li Fei-Fei, C. Karen Liu,

(参考訳) 人間の実演による模倣学習の進歩は,ロボットの操り方を教える上で有望な成果を上げている。トレーニングデータセットをさらにスケールアップするために、最近の研究は、物理的なロボットハードウェアを必要とせずにポータブルなデータ収集デバイスを使い始めた。しかし、データ収集中にオンボットフィードバックがないため、データ品質はユーザーの専門知識に大きく依存しており、多くのデバイスは特定のロボットの体格に限定されている。本稿では,拡張現実(AR)と触覚警告を通じて視覚的フィードバックを提供する携帯型データ収集システムARCapを提案する。広範にわたるユーザスタディを通じて,ARCapは,ロボットキネマティクスにマッチし,シーンとの衝突を避けるロボット実行可能なデータ収集を可能にする。 ARCapから収集されたデータにより、ロボットは散らかった環境での操作や長い水平交叉操作といった困難なタスクを実行できる。 ARCapは完全にオープンソースで、キャリブレーションが簡単で、すべてのコンポーネントは既製の製品から作られている。詳細と結果は、私たちのWebサイトにある。

Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products. More details and results can be found on our website: https://stanford-tml.github.io/ARCap

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# Omni-Domain Generalized Person Re-Identificationのための配向分岐経路

Aligned Divergent Pathways for Omni-Domain Generalized Person Re-Identification ( http://arxiv.org/abs/2410.08466v1 )

ライセンス: Link先を確認

Eugene P. W. Ang, Shan Lin, Alex C. Kot,

(参考訳) パーソン・リID (Person ReID) は、完全に監督され、ドメインが一般化されたパーソン R e ID において著しく進歩している。しかし、一方のタスクドメインの転送のために開発されたメソッドは他方によくない。理想的なPerson ReIDメソッドは、トレーニングやテストに関わるドメインの数に関係なく有効であるべきです。さらに、対象ドメインからのトレーニングデータから、少なくとも最先端(SOTA)のPerson ReIDメソッドと同様に、実行すべきである。我々は、このパラダイムをODG-ReIDと呼ぶOmni-Domain Generalization Person ReIDと呼び、互換性のあるバックボーンアーキテクチャを複数の多様な経路に拡張することで、これを実現する方法を提案する。提案手法であるAligned Divergent Pathways (ADP) は,まずベースアーキテクチャを元のバックボーンのテールをコピーしてマルチブランチ構造に変換する。 DyMAIN(Dynamic Max-Deviance Adaptive Instance Normalization)を設計し、Omniドメイン方向に対して堅牢な一般化特徴の学習を促進し、DyMAINをADPのブランチに適用する。提案したPMoC(Pysid Mixture-of-Cosines)は,より多様な学習を行うために,枝間で安定な学習率と乱流の学習率の混合を協調する。最後に,提案した次元距離損失(DCML)を用いて,枝間の特徴空間を同定する。 ADPは、マルチソースドメインの一般化のための最先端(SOTA)結果より優れ、同じドメイン内でReIDを教師する。さらに,本手法は,Person ReIDタスクに対するOmni-Domain Generalizationを達成し,幅広い単一ソース領域の一般化ベンチマークの改善を示す。

Person Re-identification (Person ReID) has advanced significantly in fully supervised and domain generalized Person R e ID. However, methods developed for one task domain transfer poorly to the other. An ideal Person ReID method should be effective regardless of the number of domains involved in training or testing. Furthermore, given training data from the target domain, it should perform at least as well as state-of-the-art (SOTA) fully supervised Person ReID methods. We call this paradigm Omni-Domain Generalization Person ReID, referred to as ODG-ReID, and propose a way to achieve this by expanding compatible backbone architectures into multiple diverse pathways. Our method, Aligned Divergent Pathways (ADP), first converts a base architecture into a multi-branch structure by copying the tail of the original backbone. We design our module Dynamic Max-Deviance Adaptive Instance Normalization (DyMAIN) that encourages learning of generalized features that are robust to omni-domain directions and apply DyMAIN to the branches of ADP. Our proposed Phased Mixture-of-Cosines (PMoC) coordinates a mix of stable and turbulent learning rate schedules among branches for further diversified learning. Finally, we realign the feature space between branches with our proposed Dimensional Consistency Metric Loss (DCML). ADP outperforms the state-of-the-art (SOTA) results for multi-source domain generalization and supervised ReID within the same domain. Furthermore, our method demonstrates improvement on a wide range of single-source domain generalization benchmarks, achieving Omni-Domain Generalization over Person ReID tasks.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# 可溶性広帯域相互作用を持つ格子フェルミオン

Lattice fermions with solvable wide range interactions ( http://arxiv.org/abs/2410.08467v1 )

ライセンス: Link先を確認

Ryu Sasaki,

(参考訳) 厳密に解ける(スピンレス)格子フェルミオンは、数年前に小田家と私によって報告された、正確に解ける定常かつ可逆的なマルコフ鎖によって明示的に構成される。定常分布 $\pi$ に対する $\mathcal{K}^R$ の可逆性は、正の古典的ハミルトニアン $\mathcal{H}^R$ につながる。 $\mathcal{H}^R$ の正確な可解性は、スピンレス格子フェルミオン $c_x$, $c_x^\dagger$, $\mathcal{H}^R_f=\sum_{x,y\in\mathcal{X}}c_x^\dagger\mathcal{H}^R(x,y) c_y$ の保証である。可逆マルコフ連鎖 $\mathcal{K}^R$ はアスキースキースキームの離散直交多項式の直交測度の畳み込みによって構成される。広い範囲の相互作用を持つフェルミオン系のいくつかの明示的な例を示す。

Exactly solvable (spinless) lattice fermions with wide range interactions are constructed explicitly based on {\em exactly solvable stationary and reversible Markov chains} $\mathcal{K}^R$ reported a few years earlier by Odake and myself. The reversibility of $\mathcal{K}^R$ with the stationary distribution $\pi$ leads to a positive classical Hamiltonian $\mathcal{H}^R$. The exact solvability of $\mathcal{H}^R$ warrants that of a spinless lattice fermion $c_x$, $c_x^\dagger$, $\mathcal{H}^R_f=\sum_{x,y\in\mathcal{X}}c_x^\dagger\mathcal{H}^R(x,y) c_y$ based on the principle advocated recently by myself. The reversible Markov chains $\mathcal{K}^R$ are constructed by convolutions of the orthogonality measures of the discrete orthogonal polynomials of Askey scheme. Several explicit examples of the fermion systems with wide range interactions are presented.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# DAT:人間エンゲージメント推定のためのModality-Group Fusionを用いた対話認識変換器

DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation ( http://arxiv.org/abs/2410.08470v1 )

ライセンス: Link先を確認

Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong,

(参考訳) エンゲージメント推定は、人間の社会的行動を理解する上で重要な役割を担い、感情コンピューティングや人間とコンピュータの相互作用といった分野における研究の関心を惹きつける。本稿では,対話における人間のエンゲージメントを推定するために,音声・視覚入力のみに依存し,言語に依存しないモダリティ・グループ・フュージョン(MGF)を用いた対話対応トランスフォーマフレームワーク(DAT)を提案する。具体的には、音声・視覚コンテンツ全体を推測する前に、各人ごとのモーダル内での音響特徴と視覚的特徴を独立に融合するモーダル群融合戦略を用いる。この戦略はモデルの性能と堅牢性を大幅に向上させる。さらに、対象者のエンゲージメントレベルをより正確に推定するために、紹介された対話意識変換器は、参加者の行動と会話相手からの手がかりの両方を考慮する。提案手法は,MultiMediate'24が実施したマルチドメインエンゲージメント推定チャレンジで厳密に検証され,ベースラインモデルに対するエンゲージメントレベル回帰精度の顕著な改善が示された。提案手法は,NoXiベーステストセットの平均CCCスコア0.76,NoXiベース,NoXi-Add,MPIIGIテストセットの平均CCC0.64を達成する。

Engagement estimation plays a crucial role in understanding human social behaviors, attracting increasing research interests in fields such as affective computing and human-computer interaction. In this paper, we propose a Dialogue-Aware Transformer framework (DAT) with Modality-Group Fusion (MGF), which relies solely on audio-visual input and is language-independent, for estimating human engagement in conversations. Specifically, our method employs a modality-group fusion strategy that independently fuses audio and visual features within each modality for each person before inferring the entire audio-visual content. This strategy significantly enhances the model's performance and robustness. Additionally, to better estimate the target participant's engagement levels, the introduced Dialogue-Aware Transformer considers both the participant's behavior and cues from their conversational partners. Our method was rigorously tested in the Multi-Domain Engagement Estimation Challenge held by MultiMediate'24, demonstrating notable improvements in engagement-level regression precision over the baseline model. Notably, our approach achieves a CCC score of 0.76 on the NoXi Base test set and an average CCC of 0.64 across the NoXi Base, NoXi-Add, and MPIIGI test sets.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# Deep Graph Convolutional Networksのより深い洞察:安定性と一般化

Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization ( http://arxiv.org/abs/2410.08473v1 )

ライセンス: Link先を確認

Guangrui Yang, Ming Li, Han Feng, Xiaosheng Zhuang,

(参考訳) グラフ畳み込みネットワーク(GCN)は、グラフ学習タスクの強力なモデルとして登場し、様々な領域で有望なパフォーマンスを示している。彼らの経験的成功は明らかだが、理論的な観点から本質的な能力を理解する必要性が高まっている。既存の理論的研究は主に単層GCNの解析に重点を置いているが、深いGCNの安定性と一般化に関する包括的な理論的探索は依然として限られている。本稿では,このギャップを深いGCNの安定性と一般化特性を掘り下げることで橋渡しする。本理論により,深いGCNの安定性と一般化は,グラフフィルタ演算子の絶対固有値の最大値やネットワークの深さなど,特定の要因の影響を受けていることが明らかとなった。我々の理論的研究は、深いGCNの安定性と一般化特性のより深い理解に寄与し、より信頼性が高く良好なモデルを開発するための道を開く可能性がある。

Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks, exhibiting promising performance in various domains. While their empirical success is evident, there is a growing need to understand their essential ability from a theoretical perspective. Existing theoretical research has primarily focused on the analysis of single-layer GCNs, while a comprehensive theoretical exploration of the stability and generalization of deep GCNs remains limited. In this paper, we bridge this gap by delving into the stability and generalization properties of deep GCNs, aiming to provide valuable insights by characterizing rigorously the associated upper bounds. Our theoretical results reveal that the stability and generalization of deep GCNs are influenced by certain key factors, such as the maximum absolute eigenvalue of the graph filter operators and the depth of the network. Our theoretical studies contribute to a deeper understanding of the stability and generalization properties of deep GCNs, potentially paving the way for developing more reliable and well-performing models.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# GIVE:知識グラフにインスパイアされた正当性外挿による構造化推論

GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation ( http://arxiv.org/abs/2410.08475v1 )

ライセンス: Link先を確認

Jiashu He, Mingyu Derek Ma, Jinxuan Fan, Dan Roth, Wei Wang, Alejandro Ribeiro,

(参考訳) 既存の検索に基づく大規模言語モデル(LLM)の推論手法は、ドメイン知識と明示的な推論チェーンを提供するために、非パラメトリックな知識ソースの密度と品質に大きく依存している。しかし、包括的知識源は高価であり、科学ドメインやコーナードメインのために構築することができない場合もある。この課題に対処するために,グラフインスパイアされた正当性外挿法(GIVE)を導入する。これは,パラメトリックメモリと非パラメトリックメモリを統合し,知識検索と忠実な推論プロセスの両面を,非常にスパースな知識グラフ上で強化する新しい推論フレームワークである。外部構造知識を活用してLCMを刺激し,関連する概念間の相互関係をモデル化することにより,金の回答検索よりも専門家の問題解決に類似した論理的,ステップワイドな推論手法を実現する。具体的には、このフレームワークはLLMに対して、クエリを重要な概念と属性に分解し、関連するエンティティを持つエンティティグループを構築し、これらのエンティティグループにまたがるノードペア間の潜在的な関係を探索することによって、拡張された推論チェーンを構築するよう促す。提案手法は, 事実と外挿の両方のリンクを組み込んで, 包括的理解と応答生成を可能にする。バイオメディカルおよびコモンセンスQAにおける推論・インセンスベンチマークの大規模な実験により,提案手法の有効性が示された。具体的には、GIVE は GPT3.5-turbo を追加のトレーニングコストなしで GPT4 のような先進モデルより優れており、そのため、外部リソースの限られた特殊なタスクに対処するための構造化情報の統合と LLM の内部推論能力の有効性が強調される。

Existing retrieval-based reasoning approaches for large language models (LLMs) heavily rely on the density and quality of the non-parametric knowledge source to provide domain knowledge and explicit reasoning chain. However, inclusive knowledge sources are expensive and sometimes infeasible to build for scientific or corner domains. To tackle the challenges, we introduce Graph Inspired Veracity Extrapolation (GIVE), a novel reasoning framework that integrates the parametric and non-parametric memories to enhance both knowledge retrieval and faithful reasoning processes on very sparse knowledge graphs. By leveraging the external structured knowledge to inspire LLM to model the interconnections among relevant concepts, our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval. Specifically, the framework prompts LLMs to decompose the query into crucial concepts and attributes, construct entity groups with relevant entities, and build an augmented reasoning chain by probing potential relationships among node pairs across these entity groups. Our method incorporates both factual and extrapolated linkages to enable comprehensive understanding and response generation. Extensive experiments on reasoning-intense benchmarks on biomedical and commonsense QA demonstrate the effectiveness of our proposed method. Specifically, GIVE enables GPT3.5-turbo to outperform advanced models like GPT4 without any additional training cost, thereby underscoring the efficacy of integrating structured information and internal reasoning ability of LLMs for tackling specialized tasks with limited external resources.

翻訳日:2024-10-31 03:06:36 公開日:2024-10-11

# 動的語彙による生成

Generation with Dynamic Vocabulary ( http://arxiv.org/abs/2410.08481v1 )

ライセンス: Link先を確認

Yanting Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, Xiaoling Wang,

(参考訳) 言語モデルのための動的語彙を新たに導入する。生成中に任意のテキストスパンを含むことができる。これらのテキストは、伝統的な静的語彙のトークンに似た、基本的な世代のレンガとして機能する。その結果,マルチトークンの生成能力は生成品質と効率の両方をアトミックに向上させる(標準言語モデルと比較すると,MAUVEメトリックは25%向上し,レイテンシは20%低下する)。動的語彙はプラグイン・アンド・プレイ方式で展開できるため、様々なダウンストリームアプリケーションには魅力的である。例えば、動的語彙は訓練のない方法で異なる領域に適用できることを示す。また、質問応答タスク(回答精度を損なうことなく、サブストラテティヴに引用結果を強化する)において、信頼性の高い引用を生成するのにも役立ちます。

We introduce a new dynamic vocabulary for language models. It can involve arbitrary text spans during generation. These text spans act as basic generation bricks, akin to tokens in the traditional static vocabularies. We show that, the ability to generate multi-tokens atomically improve both generation quality and efficiency (compared to the standard language model, the MAUVE metric is increased by 25%, the latency is decreased by 20%). The dynamic vocabulary can be deployed in a plug-and-play way, thus is attractive for various downstream applications. For example, we demonstrate that dynamic vocabulary can be applied to different domains in a training-free manner. It also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# GFVCを超えて - 適応的なビジュアルトークンを備えたプログレッシブな顔ビデオ圧縮フレームワーク

Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens ( http://arxiv.org/abs/2410.08485v1 )

ライセンス: Link先を確認

Bolin Chen, Shanzhi Yin, Zihan Zhang, Jie Chen, Ru-Ling Liao, Lingyu Zhu, Shiqi Wang, Yan Ye,

(参考訳) 近年、深層生成モデルにより、将来性のある速度歪み性能と多種多様なアプリケーション機能に向けて、顔映像符号化の進歩が著しく進んでいる。従来のハイブリッドビデオ符号化のパラダイムを超えて、GFVC(Generative Face Video Compression)は、深層生成モデルの強力な能力と初期のモデルベース符号化(MBC)の哲学を頼りに、視覚的顔信号のコンパクトな表現と現実的な再構築を容易にし、超低ビットレートの顔ビデオ通信を実現する。しかし、これらのGFVCアルゴリズムは、不安定な再構成品質と限られたビットレート範囲に直面することがある。これらの問題に対処するために, 適応型視覚トークンを用いた新しいプログレッシブ・フェイス・ビデオ圧縮フレームワークであるPFVCを提案し, 再構成ロバスト性と帯域幅インテリジェンスとの異例なトレードオフを実現する。特に、提案したPFVCのエンコーダは、高次元の顔信号をプログレッシブな方法で適応的な視覚トークンに投影し、デコーダは、これらの適応的な視覚トークンを運動推定や信号合成のために、異なる粒度レベルで再構築することができる。実験により,提案したPFVCフレームワークは,最新のVersatile Video Coding(VVC)コーデックや最先端GFVCアルゴリズムと比較して,符号化の柔軟性と速度歪み性能を向上できることを示した。プロジェクトのページはhttps://github.com/Berlin0610/PFVCで見ることができる。

Recently, deep generative models have greatly advanced the progress of face video coding towards promising rate-distortion performance and diverse application functionalities. Beyond traditional hybrid video coding paradigms, Generative Face Video Compression (GFVC) relying on the strong capabilities of deep generative models and the philosophy of early Model-Based Coding (MBC) can facilitate the compact representation and realistic reconstruction of visual face signal, thus achieving ultra-low bitrate face video communication. However, these GFVC algorithms are sometimes faced with unstable reconstruction quality and limited bitrate ranges. To address these problems, this paper proposes a novel Progressive Face Video Compression framework, namely PFVC, that utilizes adaptive visual tokens to realize exceptional trade-offs between reconstruction robustness and bandwidth intelligence. In particular, the encoder of the proposed PFVC projects the high-dimensional face signal into adaptive visual tokens in a progressive manner, whilst the decoder can further reconstruct these adaptive visual tokens for motion estimation and signal synthesis with different granularity levels. Experimental results demonstrate that the proposed PFVC framework can achieve better coding flexibility and superior rate-distortion performance in comparison with the latest Versatile Video Coding (VVC) codec and the state-of-the-art GFVC algorithms. The project page can be found at https://github.com/Berlin0610/PFVC.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# 量子信頼実行環境のための量子オペレーティングシステム

Quantum Operating System Support for Quantum Trusted Execution Environments ( http://arxiv.org/abs/2410.08486v1 )

ライセンス: Link先を確認

Theodoros Trochatos, Jakub Szefer,

(参考訳) クラウドベースの量子コンピューティングへの依存が高まり、量子コンピューティングの機密性と完全性を保証することが最重要である。 Quantum Trusted Execution Environments (QTEE) は、リモートクラウドベースの量子コンピュータに送信されたユーザの量子回路を保護するために提案されている。しかし、QTEEの配備にはQTEEのハードウェアと操作をサポートする量子オペレーティングシステム(QOS)が必要である。この作業では、クラウドプラットフォーム上でのセキュアな量子タスク実行に必要な、QOSをサポートするための最初のアーキテクチャを導入している。

With the growing reliance on cloud-based quantum computing, ensuring the confidentiality and integrity of quantum computations is paramount. Quantum Trusted Execution Environments (QTEEs) have been proposed to protect users' quantum circuits when they are submitted to remote cloud-based quantum computers. However, deployment of QTEEs necessitates a Quantum Operating Systems (QOS) that can support QTEEs hardware and operation. This work introduces the first architecture for a QOS to support and enable essential steps required for secure quantum task execution on cloud platforms.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# コントラストフリー血管造影のためのCAS-GAN

CAS-GAN for Contrast-free Angiography Synthesis ( http://arxiv.org/abs/2410.08490v1 )

ライセンス: Link先を確認

De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Hao Li, Tian-Yu Xiang, Zeng-Guang Hou,

(参考訳) ヨウ化コントラスト剤は、多くの介入手順で広く利用されるが、患者にかなりの健康リスクをもたらす。 CAS-GANは「仮想コントラスト剤」として機能する新規なGANフレームワークであり, 血管意味指導によるX線アンジオグラフィーを合成し, 介入処理中のヨウ素化剤への依存を低減させる。具体的には,X線アンギオグラフィーを背景および血管成分に分解し,医学的先行知識を活用する。特殊予測器は、これらのコンポーネント間の相互関係をマップする。さらに、生成した画像の視覚的忠実度を高めるために、容器意味誘導ジェネレータとそれに対応する損失関数を導入する。 CAS-GANのFIDは5.94,MDは0.017であった。これらの有望な結果はCAS-GANの臨床応用の可能性を強調している。

Iodinated contrast agents are widely utilized in numerous interventional procedures, yet posing substantial health risks to patients. This paper presents CAS-GAN, a novel GAN framework that serves as a ``virtual contrast agent" to synthesize X-ray angiographies via disentanglement representation learning and vessel semantic guidance, thereby reducing the reliance on iodinated agents during interventional procedures. Specifically, our approach disentangles X-ray angiographies into background and vessel components, leveraging medical prior knowledge. A specialized predictor then learns to map the interrelationships between these components. Additionally, a vessel semantic-guided generator and a corresponding loss function are introduced to enhance the visual fidelity of generated images. Experimental results on the XCAD dataset demonstrate the state-of-the-art performance of our CAS-GAN, achieving a FID of 5.94 and a MMD of 0.017. These promising results highlight CAS-GAN's potential for clinical applications.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# 自動走行におけるエッジケース検出のシステムレビュー:方法,課題,今後の方向性

A Systematic Review of Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions ( http://arxiv.org/abs/2410.08491v1 )

ライセンス: Link先を確認

Saeed Rahmani, Sabine Rieder, Erwin de Gelder, Marcel Sonntag, Jorge Lorente Mallada, Sytze Kalisvaart, Vahid Hashemi, Simeon C. Calvert,

(参考訳) 自動車両(AV)の急速な開発は、安全と効率を高めることで輸送に革命をもたらすことを約束している。しかしながら、様々な現実世界の状況における信頼性を確保することは、特にエッジケースとして知られる稀で予期せぬ状況のため、重要な課題である。エッジケースの検出には多くのアプローチが存在するが、これらのテクニックを体系的にレビューする包括的な調査が欠如している。本稿では,エッジケースの検出と評価手法の体系的分類を実践的かつ階層的に検討することにより,このギャップを埋める。本分類は, AVモジュールによる検出手法の分類, 知覚関連, 軌跡関連エッジケースの分類, および基礎となる方法論と理論に基づく2つのレベルで構成されている。我々は、この分類を「知識駆動」アプローチと呼ばれる新しいクラスを導入することで拡張する。さらに,エッジケース検出手法の評価手法と,エッジケースの同定手法について検討した。我々の知る限りでは、すべてのAVサブシステムにおけるエッジケース検出手法を包括的にカバーし、知識駆動エッジケースについて議論し、検出方法の評価手法を探求する最初の調査である。この構造化・多面解析は、AVのターゲットとなる研究とモジュラーテストを促進することを目的としている。さらに、様々なアプローチの長所と短所を特定し、課題と今後の方向性について議論することにより、効率的なエッジケース検出を通じて自動運転(AD)システムの安全性と信頼性を高めるために、AV開発者、研究者、政策立案者を支援することを目的とする。

The rapid development of automated vehicles (AVs) promises to revolutionize transportation by enhancing safety and efficiency. However, ensuring their reliability in diverse real-world conditions remains a significant challenge, particularly due to rare and unexpected situations known as edge cases. Although numerous approaches exist for detecting edge cases, there is a notable lack of a comprehensive survey that systematically reviews these techniques. This paper fills this gap by presenting a practical, hierarchical review and systematic classification of edge case detection and assessment methodologies. Our classification is structured on two levels: first, categorizing detection approaches according to AV modules, including perception-related and trajectory-related edge cases; and second, based on underlying methodologies and theories guiding these techniques. We extend this taxonomy by introducing a new class called "knowledge-driven" approaches, which is largely overlooked in the literature. Additionally, we review the techniques and metrics for the evaluation of edge case detection methods and identified edge cases. To our knowledge, this is the first survey to comprehensively cover edge case detection methods across all AV subsystems, discuss knowledge-driven edge cases, and explore evaluation techniques for detection methods. This structured and multi-faceted analysis aims to facilitate targeted research and modular testing of AVs. Moreover, by identifying the strengths and weaknesses of various approaches and discussing the challenges and future directions, this survey intends to assist AV developers, researchers, and policymakers in enhancing the safety and reliability of automated driving (AD) systems through effective edge case detection.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# ミニマックス問題に対するシャーパリスク境界に向けて

Towards Sharper Risk Bounds for Minimax Problems ( http://arxiv.org/abs/2410.08497v1 )

ライセンス: Link先を確認

Bowei Zhu, Shaojie Li, Yong Liu,

(参考訳) ミニマックス問題は、敵の訓練、堅牢な最適化、強化学習などの機械学習で成功している。理論解析において、一般化誤差と最適化誤差によって構成される現在の最適過大なリスク境界は、強凸強対流(SC-SC)設定において1/nレートを示す。既存の研究は主に最適化誤差の特定のアルゴリズムによるミニマックス問題に焦点を合わせており、より優れた過大なリスク境界を制限する一般化性能についてはほとんど研究されていない。本稿では,一様局所収束を用いた一次関数の勾配によって測定される一般化境界について検討する。我々は,非凸強対流(NC-SC)確率最小値問題に対して,よりシャープな高確率一般化誤差を求める。さらに、外層に対して、ポリアック・ロジャシエヴィチ条件の下で次元に依存しない結果を与える。一般化誤差に基づいて、経験的サドル点(ESP)、勾配勾配上昇(GDA)、確率勾配上昇(SGDA)などの一般的なアルゴリズムを分析した。我々は、より合理的な仮定を伴って、過剰な原始的リスク境界を導出するが、これは私たちの知識の最も良いところは、ミニマックス問題における結果よりもn倍高速である。

Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# 計算イメージングにおける隠れ特性について

On a Hidden Property in Computational Imaging ( http://arxiv.org/abs/2410.08498v1 )

ライセンス: Link先を確認

Yinan Feng, Yinpeng Chen, Yueh Lee, Youzuo Lin,

(参考訳) FWI(Full Waveform Inversion)、CT(Computerd Tomography)、EM(Electromagnetic)インバージョン(EM)インバージョンなど、様々な科学的・医学的応用において、計算画像は重要な役割を担っている。これらの手法は、両モードが複雑な数学的方程式によって制御される測定データ(例えば、FWIの地震波形データ)から物理特性(例えば、FWIの音響速度マップ)を再構成することで、逆問題に対処する。本稿では, 異なる支配方程式にもかかわらず, 3つの逆問題 (FWI, CT, EM) が潜在空間内に隠れた性質を共有することを実証的に示す。具体的には、FWI を例として、モーダル性(速度マップと地震波形データ)が、潜時空間において同じ一方向波動方程式のセットに従うが、線形に相関する異なる初期条件を持つことを示す。このことは、潜在埋め込み空間への射影の後、2つのモジュラリティが、その初期条件を通して連結された同じ方程式の異なる解に対応することを示唆している。実験により,この隠蔽特性は3つの画像問題すべてに一貫性があることが確認された。

Computational imaging plays a vital role in various scientific and medical applications, such as Full Waveform Inversion (FWI), Computed Tomography (CT), and Electromagnetic (EM) inversion. These methods address inverse problems by reconstructing physical properties (e.g., the acoustic velocity map in FWI) from measurement data (e.g., seismic waveform data in FWI), where both modalities are governed by complex mathematical equations. In this paper, we empirically demonstrate that despite their differing governing equations, three inverse problems (FWI, CT, and EM inversion) share a hidden property within their latent spaces. Specifically, using FWI as an example, we show that both modalities (the velocity map and seismic waveform data) follow the same set of one-way wave equations in the latent space, yet have distinct initial conditions that are linearly correlated. This suggests that after projection into the latent embedding space, the two modalities correspond to different solutions of the same equation, connected through their initial conditions. Our experiments confirm that this hidden property is consistent across all three imaging problems, providing a novel perspective for understanding these computational imaging tasks.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# ログレベルの提案のための大規模言語モデルの学習とベンチマーク

Studying and Benchmarking Large Language Models For Log Level Suggestion ( http://arxiv.org/abs/2410.08499v1 )

ライセンス: Link先を確認

Yi Wen Heng, Zeyang Ma, Zhenhao Li, Dong Jae Kim, Tse-Hsun, Chen,

(参考訳) 大規模言語モデル(LLM)は、ソフトウェア工学を含む様々な分野における研究の焦点となり、その能力はますます活用されている。近年の研究では,LLMをソフトウェア開発ツールやフレームワークに統合し,テキストおよびコード関連タスクのパフォーマンス向上の可能性を明らかにしている。ログレベルはロギングステートメントの重要な部分であり、開発者はシステム実行中に記録された情報を制御できる。ログメッセージが自然言語とコードライクな変数を混在することが多いことから、LLMの言語翻訳能力は、ロギングステートメントに適した冗長度を決定するために応用できる。本稿では,12個のオープンソースLCMの性能に及ぼす特性と学習パラダイムの影響を,ログレベルの提案で詳細に分析する。機密情報を効果的に保護し、データセキュリティを維持しながら、社内コードの利用を可能にするため、オープンソースモデルを選択しました。我々は,Zero-shot,Few-shot,Few-tuningなど,多種多様なLCMを用いて,正確なログレベル提案のための最も効果的な組み合わせを特定する。私たちの研究は、9つの大規模なJavaシステムで実施された実験によって支援されています。その結果,より小規模なLLMは適切な指導と適切な手法で効果的に動作可能であるが,ログレベルを提案する能力が向上する可能性は高いことがわかった。

Large Language Models (LLMs) have become a focal point of research across various domains, including software engineering, where their capabilities are increasingly leveraged. Recent studies have explored the integration of LLMs into software development tools and frameworks, revealing their potential to enhance performance in text and code-related tasks. Log level is a key part of a logging statement that allows software developers control the information recorded during system runtime. Given that log messages often mix natural language with code-like variables, LLMs' language translation abilities could be applied to determine the suitable verbosity level for logging statements. In this paper, we undertake a detailed empirical analysis to investigate the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion. We opted for open-source models because they enable us to utilize in-house code while effectively protecting sensitive information and maintaining data security. We examine several prompting strategies, including Zero-shot, Few-shot, and fine-tuning techniques, across different LLMs to identify the most effective combinations for accurate log level suggestions. Our research is supported by experiments conducted on 9 large-scale Java systems. The results indicate that although smaller LLMs can perform effectively with appropriate instruction and suitable techniques, there is still considerable potential for improvement in their ability to suggest log levels.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# セマンティック-トポ-メトリ表現誘導LDM推論による空中視覚・言語ナビゲーション

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning ( http://arxiv.org/abs/2410.08500v1 )

ライセンス: Link先を確認

Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao,

(参考訳) ALN(Aerial Vision-and-Language Navigation)は、無人航空機(Unmanned Aerial Vehicles、UAV)が自然言語の指示や視覚的手がかりを通じて屋外の環境を航行できるようにする新しいタスクである。屋外空間の複雑な空間的関係のため、依然として困難である。本稿では,大規模言語モデル(LLM)をアクション予測のエージェントとして導入する,航空VLNタスクのためのエンドツーエンドゼロショットフレームワークを提案する。具体的には,LLMの空間的推論能力を高めるために,Stemantic-Topo-Metric Representation (STMR) を開発した。これは、ランドマークの指示関連セマンティックマスクを、周囲のランドマークの位置情報を含むトップダウンマップに抽出して投影することで達成される。さらに、この地図は、LLMへのテキストプロンプトとして距離メトリクスを持つ行列表現に変換され、命令に従って動作予測を行う。 AerialVLN-Sデータセット上でのOracle Success Rate(OSR)において、15.9%と12.5%の改善(絶対)を達成した。

Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues. It remains challenging due to the complex spatial relationships in outdoor aerial scenes. In this paper, we propose an end-to-end zero-shot framework for aerial VLN tasks, where the large language model (LLM) is introduced as our agent for action prediction. Specifically, we develop a novel Semantic-Topo-Metric Representation (STMR) to enhance the spatial reasoning ability of LLMs. This is achieved by extracting and projecting instruction-related semantic masks of landmarks into a top-down map that contains the location information of surrounding landmarks. Further, this map is transformed into a matrix representation with distance metrics as the text prompt to the LLM, for action prediction according to the instruction. Experiments conducted in real and simulation environments have successfully proved the effectiveness and robustness of our method, achieving 15.9% and 12.5% improvements (absolute) in Oracle Success Rate (OSR) on AerialVLN-S dataset.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# 対人訓練はロバスト性を改善する:構造化データに基づく特徴学習過程の理論解析

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data ( http://arxiv.org/abs/2410.08503v1 )

ライセンス: Link先を確認

Binghui Li, Yuanzhi Li,

(参考訳) 敵のトレーニングは、敵の摂動に対して堅牢であるようにディープニューラルネットワークをトレーニングするための広く応用されたアプローチである。しかし、実際は敵の訓練が経験的な成功をおさめているが、なぜ敵の例が存在するのか、また敵の訓練方法がモデル堅牢性をどのように改善するかはいまだ不明である。本稿では, 特徴学習理論の観点から, 対角的例と対角的学習アルゴリズムの理論的理解を提供する。具体的には, 頑健な特徴, 摂動に抵抗するがスパースに抵抗する特徴, 摂動に敏感な非破壊的特徴, という2つのタイプの特徴から構成される。我々は、構造化データを学ぶために、2層スムーズなReLU畳み込みニューラルネットワークを訓練する。まず、ネットワーク学習者は、標準的な訓練(経験的リスクよりも緩やかな降下)を用いることで、頑健な特徴よりも非破壊的特徴を学習し、それによって、負の非破壊的特徴方向に整合した摂動によって生じる逆の例が導かれることを証明した。そこで, 勾配に基づく逆数学習アルゴリズムについて検討し, モデル更新において, 逆数を求めるために勾配を上昇させ, 経験的リスクよりも勾配を降下させ, モデル更新を行う。本手法は,頑健な特徴学習を効果的に強化し,非ロバストな特徴学習を抑え,ネットワークの堅牢性を向上させることができることを示す。最後に,MNIST, CIFAR10, SVHNなどの実画像データセットを用いた実験により, 理論的知見を実証的に検証した。

Adversarial training is a widely-applied approach to training deep neural networks to be robust against adversarial perturbation. However, although adversarial training has achieved empirical success in practice, it still remains unclear why adversarial examples exist and how adversarial training methods improve model robustness. In this paper, we provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory. Specifically, we focus on a multiple classification setting, where the structured data can be composed of two types of features: the robust features, which are resistant to perturbation but sparse, and the non-robust features, which are susceptible to perturbation but dense. We train a two-layer smoothed ReLU convolutional neural network to learn our structured data. First, we prove that by using standard training (gradient descent over the empirical risk), the network learner primarily learns the non-robust feature rather than the robust feature, which thereby leads to the adversarial examples that are generated by perturbations aligned with negative non-robust feature directions. Then, we consider the gradient-based adversarial training algorithm, which runs gradient ascent to find adversarial examples and runs gradient descent over the empirical risk at adversarial examples to update models. We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning to improve the network robustness. Finally, we also empirically validate our theoretical findings with experiments on real-image datasets, including MNIST, CIFAR10 and SVHN.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# 弱視下腹腔鏡下画像分割に対するベイズ的アプローチ

A Bayesian Approach to Weakly-supervised Laparoscopic Image Segmentation ( http://arxiv.org/abs/2410.08509v1 )

ライセンス: Link先を確認

Zhou Zheng, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kensaku Mori,

(参考訳) 本稿では,スパースアノテーションを用いた腹腔鏡下画像分割法について検討する。本稿では,モデルセグメンテーションの精度と解釈可能性の向上を目的として,ベイズ的枠組みを包括的に構築し,ロバストかつ理論的に検証された手法を確実にする新しいベイズ的深層学習手法を提案する。提案手法は,観察画像とそれに対応する弱いアノテーションを直接訓練する従来の手法と異なる。その代わり、得られたデータから画像とラベルの同時分布を推定する。これにより、画像とその高品質な擬似ラベルのサンプリングが容易になり、一般化可能なセグメンテーションモデルのトレーニングが可能になる。モデルの各コンポーネントは確率的定式化によって表現され、コヒーレントで解釈可能な構造を提供する。この確率的性質は、スパースアノテーションから正確で実践的な学習の恩恵を受け、不確実性を定量化する能力に私たちのモデルを装備する。 2つの公開腹腔鏡的データセットによる広範囲な評価の結果,既存の手法よりも優れた結果が得られた。さらに, 本法はスクリブル制御型心筋多構造分割法に適応し, 従来法と比較して高い性能を示した。コードはhttps://github.com/MoriLabNU/Bayesian_WSS.comで公開されている。

In this paper, we study weakly-supervised laparoscopic image segmentation with sparse annotations. We introduce a novel Bayesian deep learning approach designed to enhance both the accuracy and interpretability of the model's segmentation, founded upon a comprehensive Bayesian framework, ensuring a robust and theoretically validated method. Our approach diverges from conventional methods that directly train using observed images and their corresponding weak annotations. Instead, we estimate the joint distribution of both images and labels given the acquired data. This facilitates the sampling of images and their high-quality pseudo-labels, enabling the training of a generalizable segmentation model. Each component of our model is expressed through probabilistic formulations, providing a coherent and interpretable structure. This probabilistic nature benefits accurate and practical learning from sparse annotations and equips our model with the ability to quantify uncertainty. Extensive evaluations with two public laparoscopic datasets demonstrated the efficacy of our method, which consistently outperformed existing methods. Furthermore, our method was adapted for scribble-supervised cardiac multi-structure segmentation, presenting competitive performance compared to previous methods. The code is available at https://github.com/MoriLabNU/Bayesian_WSS.

翻訳日:2024-10-30 23:34:54 公開日:2024-10-11

# コヒーレンス変動からみた量子速度限界

Quantum Speed Limit in Terms of Coherence Variations ( http://arxiv.org/abs/2410.08514v1 )

ライセンス: Link先を確認

Zi-yi Mai, CHang-shui Yu,

(参考訳) コヒーレンス(Coherence)は、量子情報処理における最も基本的な量子資源である。物理的システムがいかに早くコヒーレンスやデコヒーレンスを得るかは重要な要素である。本稿では,動的プロセスによる量子コヒーレンスの変化に基づいて,達成可能な量子速度制限を提案する。これは、2次元の量子状態に対して、それに対応するダイナミクスが必ず見つけ、測地線に沿ってあるコヒーレンスな変化で他の状態へと進化することを示している。応用として、縮退力学と散逸力学のコヒーレンス量子速度限界について検討する。劣化力学は我々のコヒーレンス量子速度限界を飽和させ、同じ人口を持つ状態のデコヒーレンスは他のものよりも速くなることが示されている。しかし、散逸力学は反対の振る舞いを持つ。さらに、上述のダイナミクスに対する境界のより強い厳密さを比較によって示す。

Coherence is the most fundamental quantum resource in quantum information processing. How fast a physical system gets coherence or decoherence is a critical ingredient. We present an attainable quantum speed limit based on the variation of quantum coherence subject to a dynamical process. It indicates that for a 2-dimensional quantum state, one can always find corresponding dynamics driving it to evolve along the geodesic to another state with certain coherence variation. As applications, we study the coherence quantum speed limits of the dephasing and dissipative dynamics. It is shown that the dephasing dynamics can saturate our coherence quantum speed limit, and the decoherence of the state with identical populations will be faster than others. However, the dissipative dynamics have the opposite behavior. In addition, we illustrate a stronger tightness of our bound for the mentioned dynamics by comparison.

翻訳日:2024-10-30 23:24:45 公開日:2024-10-11

# WasmWalker: WebAssemblyプログラム分析を改善するパスベースのコード表現

WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis ( http://arxiv.org/abs/2410.08517v1 )

ライセンス: Link先を確認

Mohammad Robati Shirzad, Patrick Lam,

(参考訳) WebAssembly(Wasm)は、Webブラウザでほぼネイティブなコードの実行を可能にする低レベルのバイナリ言語である。 Wasmは、ゲーム、オーディオ、ビデオ処理、クラウドコンピューティングなどのアプリケーションで有用であることが証明され、Web開発におけるJavaScriptのハイパフォーマンスで低オーバーヘッドな代替手段を提供する。すべての主要なブラウザがWebAssemblyを迅速かつ広く採用していることにより、この新しいテクノロジをサポートする分析ツールが誕生した。ディープラーニングプログラム分析モデルは、AST(Abstract Syntax Tree)対応のコード表現に含まれるプログラム構造情報から大きな恩恵を受けることができる。このようなコード表現を得るために、Ubuntu 18.04リポジトリのソースパッケージからコンパイルされたWebAssemblyバイナリファイルの大規模なデータセットのWebAssembly TextフォーマットでASTパスを実証分析した。収集したパスを精査した結果、これらのバイナリファイルに3,352のユニークなパスしか現れていないことがわかった。この知見により、WebAssemblyバイナリ用の2つの新しいコード表現を提案する。これらの新しい表現は、固定サイズのコード埋め込みを生成するだけでなく、シーケンス・ツー・シーケンス・モデルに追加情報を提供するのに役立つ。最終的に、我々のアプローチは、プログラム分析モデルがWasmバイナリから新しい性質を明らかにするのに役立つ。 2つのアプリケーションで新しいコード表現を評価しました。 (i)メソッド名予測及び方法 (ii)正確な戻り型を復元する。本研究は,従来の手法よりも新しい手法が優れていることを示すものである。具体的には,従来の最先端技術であるSnowWhiteと比較して,メソッド名予測におけるTop-1(Top-5)の精度が5.36%(11.31%)向上し,精度が8.02%(7.92%)向上した。

WebAssembly, or Wasm, is a low-level binary language that enables execution of near-native-performance code in web browsers. Wasm has proven to be useful in applications including gaming, audio and video processing, and cloud computing, providing a high-performance, low-overhead alternative to JavaScript in web development. The fast and widespread adoption of WebAssembly by all major browsers has created an opportunity for analysis tools that support this new technology. Deep learning program analysis models can greatly benefit from the program structure information included in Abstract Syntax Tree (AST)-aware code representations. To obtain such code representations, we performed an empirical analysis on the AST paths in the WebAssembly Text format of a large dataset of WebAssembly binary files compiled from source packages in the Ubuntu 18.04 repositories. After refining the collected paths, we discovered that only 3,352 unique paths appeared across all of these binary files. With this insight, we propose two novel code representations for WebAssembly binaries. These novel representations serve not only to generate fixed-size code embeddings but also to supply additional information to sequence-to-sequence models. Ultimately, our approach helps program analysis models uncover new properties from Wasm binaries, expanding our understanding of their potential. We evaluated our new code representation on two applications: (i) method name prediction and (ii) recovering precise return types. Our results demonstrate the superiority of our novel technique over previous methods. More specifically, our new method resulted in 5.36% (11.31%) improvement in Top-1 (Top-5) accuracy in method name prediction and 8.02% (7.92%) improvement in recovering precise return types, compared to the previous state-of-the-art technique, SnowWhite.

翻訳日:2024-10-30 23:24:45 公開日:2024-10-11

# ハイブリッド変圧器モデルと意味的フィルタリング手法による法的エンティティ認識の改善

Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach ( http://arxiv.org/abs/2410.08521v1 )

ライセンス: Link先を確認

Duraimurugan Rajamanickam,

(参考訳) 法的エンティティ認識(LER)は、契約分析、コンプライアンス監視、訴訟支援などの法的ワークフローを自動化する上で重要である。ルールベースのシステムや古典的な機械学習モデルを含む既存のアプローチは、特にあいまいさやネストされたエンティティ構造を扱う際に、法的文書やドメイン特異性の複雑さに悩まされている。本稿では,意味的類似性に基づくフィルタリング機構を導入することで,法律テキスト処理用に微調整されたトランスフォーマモデルである Legal-BERT の精度と精度を向上させる新しいハイブリッドモデルを提案する。 15,000の注釈付き法律文書のデータセット上で、F1スコア93.4%を達成し、従来の手法よりも精度とリコールが大幅に向上したことを示す。

Legal Entity Recognition (LER) is critical in automating legal workflows such as contract analysis, compliance monitoring, and litigation support. Existing approaches, including rule-based systems and classical machine learning models, struggle with the complexity of legal documents and domain specificity, particularly in handling ambiguities and nested entity structures. This paper proposes a novel hybrid model that enhances the accuracy and precision of Legal-BERT, a transformer model fine-tuned for legal text processing, by introducing a semantic similarity-based filtering mechanism. We evaluate the model on a dataset of 15,000 annotated legal documents, achieving an F1 score of 93.4%, demonstrating significant improvements in precision and recall over previous methods.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# グラフ畳み込みニューラルネットワークによるリンクレベル自転車容積推定におけるデータ空間の影響評価

Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach ( http://arxiv.org/abs/2410.08522v1 )

ライセンス: Link先を確認

Mohit Gupta, Debjit Bhowmick, Meead Saberi, Shirui Pan, Ben Beck,

(参考訳) 自転車の正確な体積推定は、将来の自転車インフラへの投資に関する情報決定に不可欠である。従来のリンクレベルの容積推定モデルは、交通のモーター化に有効であるが、スパースデータと自転車の移動パターンの複雑な性質のために、自転車のコンテキストに適用した場合、重大な課題に直面している。我々の知る限り、リンクレベルの自転車の体積をモデル化するために、グラフ畳み込みネットワーク(GCN)アーキテクチャを利用する最初の研究を示す。オーストラリア,メルボルン市全体での年間平均自転車数(AADB)を,Strava Metro の自転車数データを用いて推定した。 GCNモデルの有効性を評価するため、線形回帰、サポートベクトルマシン、ランダムフォレストといった従来の機械学習モデルと比較した。以上の結果から,GCNモデルはAADB数予測において従来のモデルよりも優れた性能を示し,自転車交通データに固有の空間依存性を捉える能力を示した。さらに、GCNアーキテクチャの性能に様々なレベルのデータ空間がどう影響するかについても検討する。 GCNアーキテクチャは、最大80%のスパーシリティレベルで良好に機能するが、その制限は、データのスパーシリティがさらに増加するにつれて明らかになり、自転車の体積推定における極端なデータスパーシリティの処理に関するさらなる研究の必要性を強調している。本研究は, 自転車インフラの整備と持続可能な交通の促進をめざして, 都市計画者にとって貴重な知見を提供する。

Accurate bicycling volume estimation is crucial for making informed decisions about future investments in bicycling infrastructure. Traditional link-level volume estimation models are effective for motorised traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes. We estimate the Annual Average Daily Bicycle (AADB) counts across the City of Melbourne, Australia using Strava Metro bicycling count data. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model performs better than these traditional models in predicting AADB counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic data. We further investigate how varying levels of data sparsity affect performance of the GCN architecture. The GCN architecture performs well and better up to 80% sparsity level, but its limitations become apparent as the data sparsity increases further, emphasizing the need for further research on handling extreme data sparsity in bicycling volume estimation. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# IGNN-Solver: 暗黙のグラフニューラルネットワークのためのグラフニューラルネットワーク

IGNN-Solver: A Graph Neural Solver for Implicit Graph Neural Networks ( http://arxiv.org/abs/2410.08524v1 )

ライセンス: Link先を確認

Junchao Lin, Zenan Ling, Zhanbo Feng, Feng Zhou, Jingwen Xu, Robert C Qiu,

(参考訳) 単一層で強い表現力を示すインプリシットグラフニューラルネットワーク(IGNN)は,近年,過度なスムーシング問題を効果的に軽減しつつ,基礎となるグラフの長距離依存性(LRD)を捕捉する際,顕著な性能を示した。しかし、IGNNは計算コストのかかる固定点反復に依存するため、大幅なスピードとスケーラビリティの制限が生じ、大規模グラフへの応用が妨げられる。 IGNNの高速な固定点解法を実現するために,一般化されたAnderson Acceleration法を利用した新しいグラフニューラルネットワークIGNN-Solverを提案し,グラフ依存時間プロセスとして繰り返し更新を学習する。大規模な実験では、IGNN-Solverは推論を著しく加速し、精度を犠牲にすることなく1.5\times$から8\times$スピードアップを達成する。さらに、グラフの規模が大きくなるにつれて、この利点はますます顕著になり、現実世界のアプリケーションに大規模に展開する上で役立ちます。

Implicit graph neural networks (IGNNs), which exhibit strong expressive power with a single layer, have recently demonstrated remarkable performance in capturing long-range dependencies (LRD) in underlying graphs while effectively mitigating the over-smoothing problem. However, IGNNs rely on computationally expensive fixed-point iterations, which lead to significant speed and scalability limitations, hindering their application to large-scale graphs. To achieve fast fixed-point solving for IGNNs, we propose a novel graph neural solver, IGNN-Solver, which leverages the generalized Anderson Acceleration method, parameterized by a small GNN, and learns iterative updates as a graph-dependent temporal process. Extensive experiments demonstrate that the IGNN-Solver significantly accelerates inference, achieving a $1.5\times$ to $8\times$ speedup without sacrificing accuracy. Moreover, this advantage becomes increasingly pronounced as the graph scale grows, facilitating its large-scale deployment in real-world applications.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# I Am the One and Only, Your Cyber BFF: GenAIの影響を理解するには人類型AIの影響を理解する必要がある

"I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI ( http://arxiv.org/abs/2410.08526v1 )

ライセンス: Link先を確認

Myra Cheng, Alicia DeVrio, Lisa Egede, Su Lin Blodgett, Alexandra Olteanu,

(参考訳) 最先端のジェネレーティブAI(GenAI)システムの多くは、人為的行動、すなわち人間に類似したアウトプットを生成する傾向が増している。このことが、人類型AIシステムのようなネガティブな影響の可能性への懸念を増大させているが、AI開発、展開、使用における人為的同型は、見過ごされ、調査され、未特定のままである。この観点からは、人為的AIの社会的影響をマッピングすることなく、生成AIの社会的影響を徹底的にマッピングすることはできないと論じ、行動への呼びかけを概説する。

Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that are perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. In this perspective, we argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI, and outline a call to action.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# LLMにおける下流性能予測のためのスケーリング法則

Scaling Laws for Predicting Downstream Performance in LLMs ( http://arxiv.org/abs/2410.08527v1 )

ライセンス: Link先を確認

Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji,

(参考訳) 学習前の大規模言語モデル(LLM)の下流性能の正確な推定は,開発プロセスの指導に不可欠である。スケーリング法則解析は、ターゲットLLMの性能を予測するために、かなり小さなサンプリング言語モデル(LM)の統計を利用する。下流のパフォーマンス予測にとって重要な課題は、タスク固有の計算しきい値を超えて発生するLLMの創発的能力にある。本研究では,性能評価のための計算効率の高い指標として,事前学習損失に着目した。我々の2段階のアプローチは、まず一連のサンプリングモデルを用いて計算資源(例えばFLOP)を事前学習損失にマッピングする関数を推定し、続いてクリティカルな「創発的なフェーズ」の後、トレーニング前の損失を下流タスクのパフォーマンスにマッピングする。予備実験では、7Bパラメータと13BパラメータのLCMの性能を3BまでのサンプリングLMを用いて正確に予測し、それぞれ5%と10%の誤差マージンを達成し、FLOPs-to-Performanceアプローチを著しく上回った。これは、事前トレーニング中に複数のソースからのデータセットを統合する必要性に対処する、パフォーマンス予測の基本的なアプローチであるFLP-Mを動機付けている。 FLP-Mは、データソース間のFLOPに基づいて、ドメイン固有の事前トレーニング損失を予測するために、電力法解析関数を拡張し、複数のドメイン固有の損失と下流のパフォーマンスの間の非線形関係をモデル化するために、2層ニューラルネットワークを使用する。 FLP-Mは、特定の比率とより小さなサンプリング用LMを訓練した3B LLMを利用することで、多くのベンチマークで10%の誤差マージンで3Bおよび7B LLMの性能を効果的に予測できる。

Precise estimation of downstream performance in large language models (LLMs) prior to training is essential for guiding their development process. Scaling laws analysis utilizes the statistics of a series of significantly smaller sampling language models (LMs) to predict the performance of the target LLM. For downstream performance prediction, the critical challenge lies in the emergent abilities in LLMs that occur beyond task-specific computational thresholds. In this work, we focus on the pre-training loss as a more computation-efficient metric for performance estimation. Our two-stage approach consists of first estimating a function that maps computational resources (e.g., FLOPs) to the pre-training Loss using a series of sampling models, followed by mapping the pre-training loss to downstream task Performance after the critical "emergent phase". In preliminary experiments, this FLP solution accurately predicts the performance of LLMs with 7B and 13B parameters using a series of sampling LMs up to 3B, achieving error margins of 5% and 10%, respectively, and significantly outperforming the FLOPs-to-Performance approach. This motivates FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training, specifically blending general corpora with code data to accurately represent the common necessity. FLP-M extends the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources, and employs a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance. By utilizing a 3B LLM trained on a specific ratio and a series of smaller sampling LMs, FLP-M can effectively forecast the performance of 3B and 7B LLMs across various data mixtures for most benchmarks within 10% error margins.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# VOVTrack: オープン語彙オブジェクト追跡のためのビデオの可能性を探る

VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking ( http://arxiv.org/abs/2410.08529v1 )

ライセンス: Link先を確認

Zekun Qian, Ruize Han, Junhui Hou, Linqi Song, Wei Feng,

(参考訳) Open-vocabulary Multi-Object Tracking (OVMOT)は、ビデオにおける多様なオブジェクトカテゴリの検出と追跡に関わる重要な新しい課題であり、目に見えないカテゴリ(ベースクラス)と見えないカテゴリ(ノーベルクラス)の両方を含んでいる。この問題は、OVD(Open-vocabulary Object Detection)とMOT(Multi-object Tracking)の複雑さと相容れない。 OVMOT の既存のアプローチは、OVD と MOT の方法論を別個のモジュールとして統合することが多く、主に画像中心のレンズによる問題に焦点を当てている。本稿では,この課題に対処するため,ビデオオブジェクト追跡の観点からMOTおよびビデオ中心トレーニングに関連するオブジェクト状態を統合する新しい手法VOVTrackを提案する。まず、追跡中の物体の追跡関連状態を考察し、時間変化物体のより正確な位置決めと分類(検出)のための新しい注意機構を提案する。その後,自己教師型オブジェクト類似性学習手法を定式化し,時間的オブジェクト関連(追跡)を容易にすることによって,アノテーションを使わずに生のビデオデータを活用する。実験の結果、VOVTrackは既存の手法よりも優れており、オープン語彙追跡タスクの最先端ソリューションとして確立されている。

Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel classes). This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT). Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens. In this paper, we propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint. First, we consider the tracking-related state of the objects during tracking and propose a new prompt-guided attention mechanism for more accurate localization and classification (detection) of the time-varying objects. Subsequently, we leverage raw video data without annotations for training by formulating a self-supervised object similarity learning technique to facilitate temporal object association (tracking). Experimental results underscore that VOVTrack outperforms existing methods, establishing itself as a state-of-the-art solution for open-vocabulary tracking task.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# Ego3DT:エゴ中心のビデオですべての3Dオブジェクトを追跡する

Ego3DT: Tracking Every 3D Object in Ego-centric Videos ( http://arxiv.org/abs/2410.08530v1 )

ライセンス: Link先を確認

Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang,

(参考訳) インテリジェンスへの関心の高まりは、現代の研究にエゴ中心の視点をもたらした。この領域における重要な課題の1つは、エゴ中心のビデオにおける物体の正確な位置決めと追跡である。本稿では,エゴ中心映像からの物体の3次元再構成と追跡のための新しいゼロショット手法を提案する。 Ego3DTは,エゴ環境内のオブジェクトの検出とセグメンテーション情報を最初に識別し,抽出する新しいフレームワークである。 Ego3DTは、隣接するビデオフレームからの情報を利用して、事前に訓練された3Dシーン再構成モデルを用いて、エゴビューの3Dシーンを動的に構築する。さらに,エゴ中心ビデオにおける物体の3次元追跡軌道を安定的に作成するための動的階層化機構を革新した。さらに,本手法の有効性は, HOTAの1.04x - 2.90xの2つの新たにコンパイルされたデータセットに対して, 多様なエゴ中心のシナリオにおいて, 提案手法の堅牢性と精度を示す広範な実験によって裏付けられている。

The growing interest in embodied intelligence has brought ego-centric perspectives to contemporary research. One significant challenge within this realm is the accurate localization and tracking of objects in ego-centric videos, primarily due to the substantial variability in viewing angles. Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video. We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment. Utilizing information from adjacent video frames, Ego3DT dynamically constructs a 3D scene of the ego view using a pre-trained 3D scene reconstruction model. Additionally, we have innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos. Moreover, the efficacy of our approach is corroborated by extensive experiments on two newly compiled datasets, with 1.04x - 2.90x in HOTA, showcasing the robustness and accuracy of our method in diverse ego-centric scenarios.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# 画像生成に視覚的優先順位を必要とする拡散モデル

Diffusion Models Need Visual Priors for Image Generation ( http://arxiv.org/abs/2410.08531v1 )

ライセンス: Link先を確認

Xiaoyu Yue, Zidong Wang, Zeyu Lu, Shuyang Sun, Meng Wei, Wanli Ouyang, Lei Bai, Luping Zhou,

(参考訳) 従来のクラス誘導拡散モデルは一般的に正しいセマンティックな内容の画像を生成できるが、テクスチャの詳細に苦しむことが多い。この制限は、粗い条件情報のみを提供するクラス事前の使用に起因している。この問題に対処するために,Diffusion on Diffusion (DoD)を提案する。Diffusion on Diffusionは,以前に生成したサンプルから視覚的先行情報を抽出し,拡散サンプリングの初期段階から視覚的先行情報を活用する拡散モデルについて,豊富なガイダンスを提供する。具体的には,各段階における条件付きサンプルから冗長な詳細情報を排除し,ガイダンスのセマンティック情報のみを保持する圧縮再構成手法を用いた潜伏埋め込みモジュールを提案する。我々は、人気のあるImageNet-$256 \times 256$データセット上でDoDを評価し、7$\times$トレーニングコストをSiTやDiTと比較して削減し、FID-50Kスコアよりもパフォーマンスが向上した。私たちの最大のモデルであるDoD-XLは、FID-50Kスコアが1.83で、100万のトレーニングステップしか達成していません。

Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and limited conditional information. To address this issue, we propose Diffusion on Diffusion (DoD), an innovative multi-stage generation framework that first extracts visual priors from previously generated samples, then provides rich guidance for the diffusion model leveraging visual priors from the early stages of diffusion sampling. Specifically, we introduce a latent embedding module that employs a compression-reconstruction approach to discard redundant detail information from the conditional samples in each stage, retaining only the semantic information for guidance. We evaluate DoD on the popular ImageNet-$256 \times 256$ dataset, reducing 7$\times$ training cost compared to SiT and DiT with even better performance in terms of the FID-50K score. Our largest model DoD-XL achieves an FID-50K score of 1.83 with only 1 million training steps, which surpasses other state-of-the-art methods without bells and whistles during inference.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# 複数の情報源からの観測データを用いたロバストオフライン政策学習

Robust Offline Policy Learning with Observational Data from Multiple Sources ( http://arxiv.org/abs/2410.08537v1 )

ライセンス: Link先を確認

Aldo Gael Carranza, Susan Athey,

(参考訳) 複数の異種データソースからの観測帯域フィードバックデータを用いて、多様なターゲット設定を安定的に一般化するパーソナライズされた決定ポリシーを学習する。そこで本研究では,ソース分布の一般混合条件下での一様に低い後悔度を確保するために,最小限の後悔度最適化手法を提案する。我々は,この目的に合わせたポリシー学習アルゴリズムを開発し,2つの頑健なオフラインポリシー評価手法と,最小限の最適化のための非回帰学習アルゴリズムを組み合わせた。我々の後悔分析は、この手法が全ソースにわたる全データの適度な消滅率まで、最小限の最悪の混合を達成していることを示している。分析,拡張,実験結果は,複数のデータソースから堅牢な意思決定ポリシーを学習する上で,このアプローチの利点を示すものである。

We consider the problem of using observational bandit feedback data from multiple heterogeneous data sources to learn a personalized decision policy that robustly generalizes across diverse target settings. To achieve this, we propose a minimax regret optimization objective to ensure uniformly low regret under general mixtures of the source distributions. We develop a policy learning algorithm tailored to this objective, combining doubly robust offline policy evaluation techniques and no-regret learning algorithms for minimax optimization. Our regret analysis shows that this approach achieves the minimal worst-case mixture regret up to a moderated vanishing rate of the total data across all sources. Our analysis, extensions, and experimental results demonstrate the benefits of this approach for learning robust decision policies from multiple data sources.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# Kaleidoscope: 不均一なマルチエージェント強化学習のための学習可能なマスク

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2410.08540v1 )

ライセンス: Link先を確認

Xinran Li, Ling Pan, Jun Zhang,

(参考訳) マルチエージェント強化学習(MARL)では、パラメータ共有がサンプリング効率を高めるために一般的に用いられる。しかし、完全なパラメータ共有の一般的なアプローチは、しばしばエージェント間の均質なポリシーをもたらし、ポリシーの多様性から得られるパフォーマンス上の利点を制限する可能性がある。この限界に対処するために、我々は高サンプル効率を維持しながら政策の不均一性を育む新しい適応型部分パラメータ共有スキームである 'emph{Kaleidoscope} を導入する。具体的には、Kaleidoscopeは、異なるエージェントに対して複数の異なる学習可能なマスクのセットとともに共通のパラメータのセットを維持し、パラメータの共有を規定している。パラメータ共有の効率を犠牲にすることなく、これらのマスク間の相違を促進することで、ポリシーネットワーク間の多様性を促進する。この設計により、カレイドスコープは広いポリシー表現能力で高効率を動的にバランスさせ、様々な環境における全パラメータ共有と非パラメータ共有のギャップを効果的に埋めることができる。我々はさらに、Keleidoscopeを、価値評価の改善に役立つアクタークリティカルアルゴリズムの文脈におけるアンサンブルを批判するために拡張し、マルチエージェント粒子環境、マルチエージェントMuJoCo、スタークラフトマルチエージェントチャレンジv2を含む広範囲な環境における実験的な評価を行い、既存のパラメータ共有アプローチと比較してKaleidoscopeの優れた性能を示し、MARLの性能向上の可能性を示している。コードは \url{https://github.com/LXXXXR/Kaleidoscope} で公開されている。

In multi-agent reinforcement learning (MARL), parameter sharing is commonly employed to enhance sample efficiency. However, the popular approach of full parameter sharing often leads to homogeneous policies among agents, potentially limiting the performance benefits that could be derived from policy diversity. To address this critical limitation, we introduce \emph{Kaleidoscope}, a novel adaptive partial parameter sharing scheme that fosters policy heterogeneity while still maintaining high sample efficiency. Specifically, Kaleidoscope maintains one set of common parameters alongside multiple sets of distinct, learnable masks for different agents, dictating the sharing of parameters. It promotes diversity among policy networks by encouraging discrepancy among these masks, without sacrificing the efficiencies of parameter sharing. This design allows Kaleidoscope to dynamically balance high sample efficiency with a broad policy representational capacity, effectively bridging the gap between full parameter sharing and non-parameter sharing across various environments. We further extend Kaleidoscope to critic ensembles in the context of actor-critic algorithms, which could help improve value estimations.Our empirical evaluations across extensive environments, including multi-agent particle environment, multi-agent MuJoCo and StarCraft multi-agent challenge v2, demonstrate the superior performance of Kaleidoscope compared with existing parameter sharing approaches, showcasing its potential for performance enhancement in MARL. The code is publicly available at \url{https://github.com/LXXXXR/Kaleidoscope}.

翻訳日:2024-10-30 23:24:44 公開日:2024-10-11

# AIにおける人間性: 大規模言語モデルの個性を検出する

Humanity in AI: Detecting the Personality of Large Language Models ( http://arxiv.org/abs/2410.08545v1 )

ライセンス: Link先を確認

Baohua Zhan, Yongyi Huang, Wenyao Cui, Huaping Zhang, Jianyun Shang,

(参考訳) アンケートは,Large Language Models (LLMs) の個性を検出する一般的な方法である。しかしながら、その信頼性は幻覚(LLMが不正確または無関係に反応する)と、提示されたオプションの順序に対する応答の感度の2つの主要な問題によってしばしば損なわれる。これらの課題に対処するために,テキストマイニングとアンケート手法を組み合わせることを提案する。テキストマイニングは、オプションの順序に影響されることなく、LSMの反応から心理的特徴を抽出することができる。さらに, 本手法は特定の回答に依存しないため, 幻覚の影響を低減させる。両手法のスコアの正規化とルート平均二乗誤差の計算により,本手法の有効性を検証した。 LLMの性格特性の起源をさらに解明するために, BERT や GPT などの事前学習言語モデル (PLM) と ChatGPT のような会話モデル (ChatLLM) の両方で実験を行った。その結果,LLMには特定の個性があることが明らかとなった。例えば,ChatGPTとChatGLMは「良心」の性格特性を示す。さらに, LLMの個人性は, 事前学習したデータから導かれることがわかった。 ChatLLMを訓練するために使用される命令データは、個性を含むデータの生成を高め、その隠された個性を公開することができる。結果と人間の平均的性格スコアを比較し,ChatLLMsにおけるPLMにおけるFLAN-T5とChatGPTの性格は,それぞれ0.34と0.22のスコア差で人間と類似していることがわかった。

Questionnaires are a common method for detecting the personality of Large Language Models (LLMs). However, their reliability is often compromised by two main issues: hallucinations (where LLMs produce inaccurate or irrelevant responses) and the sensitivity of responses to the order of the presented options. To address these issues, we propose combining text mining with questionnaires method. Text mining can extract psychological features from the LLMs' responses without being affected by the order of options. Furthermore, because this method does not rely on specific answers, it reduces the influence of hallucinations. By normalizing the scores from both methods and calculating the root mean square error, our experiment results confirm the effectiveness of this approach. To further investigate the origins of personality traits in LLMs, we conduct experiments on both pre-trained language models (PLMs), such as BERT and GPT, as well as conversational models (ChatLLMs), such as ChatGPT. The results show that LLMs do contain certain personalities, for example, ChatGPT and ChatGLM exhibit the personality traits of 'Conscientiousness'. Additionally, we find that the personalities of LLMs are derived from their pre-trained data. The instruction data used to train ChatLLMs can enhance the generation of data containing personalities and expose their hidden personality. We compare the results with the human average personality score, and we find that the personality of FLAN-T5 in PLMs and ChatGPT in ChatLLMs is more similar to that of a human, with score differences of 0.34 and 0.22, respectively.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 反射多重エントロピーとそのホログラフィック双対

Reflected multi-entropy and its holographic dual ( http://arxiv.org/abs/2410.08546v1 )

ライセンス: Link先を確認

Ma-Ke Yuan, Mingyi Li, Yang Zhou,

(参考訳) 反射多エントロピーと呼ばれる正準浄化による多エントロピーの混合状態一般化を導入する。我々はこの測度のホログラフィック双対を提案する。三部式の場合、大きめの$c$制限でツイスト作用素の6点関数を用いて場理論計算を行う。零温度と有限温度の両方において、場の理論的な結果はホログラフィックの結果と正確に一致し、この新しい測度に対するホログラフィック予想を支持した。

We introduce a mixed-state generalization of the multi-entropy through the canonical purification, which we called reflected multi-entropy. We propose the holographic dual of this measure. For the tripartite case, a field-theoretical calculation is performed using a six-point function of twist operators at large $c$ limit. At both zero and finite temperature, the field-theoretical results match the holographic results exactly, supporting our holographic conjecture of this new measure.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 量子状態群アクション

Quantum State Group Actions ( http://arxiv.org/abs/2410.08547v1 )

ライセンス: Link先を確認

Saachi Mutreja, Mark Zhandry,

(参考訳) 暗号グループのアクションは、量子後暗号の主要な候補であり、量子暗号プロトコルの開発にも使われてきた。本研究では、量子状態の集合に作用する群からなる量子状態群作用について検討する。 1) ある設定では、統計的(クエリ境界付きでさえも)セキュリティは、量子後古典群アクションと類似して不可能である。 2) 量子状態群の動作を構築し, 暗号学者によって提案された多くの計算問題がそれを保持することを証明した。構成によっては、我々の証明は無条件であり、LWEに依存しているか、量子ランダムオラクルモデルに依存している。我々の分析は古典的な集団行動に直接当てはまらないが、暗号学者による量子後仮定に明らかな欠陥はないという、少なくとも正当性チェックを与えると論じている。 3 我々の量子状態群アクションは、グループアクションに基づくものと非折り畳みハッシュに基づくものという、2つの既存の量子マネースキームを統一することができる。また、古典的および量子的鍵分布を統一する方法についても説明する。

Cryptographic group actions are a leading contender for post-quantum cryptography, and have also been used in the development of quantum cryptographic protocols. In this work, we explore quantum state group actions, which consist of a group acting on a set of quantum states. We show the following results: 1. In certain settings, statistical (even query bounded) security is impossible, analogously to post-quantum classical group actions. 2. We construct quantum state group actions and prove that many computational problems that have been proposed by cryptographers hold it. Depending on the construction, our proofs are either unconditional, rely on LWE, or rely on the quantum random oracle model. While our analysis does not directly apply to classical group actions, we argue it gives at least a sanity check that there are no obvious flaws in the post-quantum assumptions made by cryptographers. 3. Our quantum state group action allows for unifying two existing quantum money schemes: those based on group actions, and those based on non-collapsing hashes. We also explain how they can unify classical and quantum key distribution.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# イノベーションとプライバシのバランスをとる - 自然言語処理アプリケーションにおけるデータセキュリティ戦略

Balancing Innovation and Privacy: Data Security Strategies in Natural Language Processing Applications ( http://arxiv.org/abs/2410.08553v1 )

ライセンス: Link先を確認

Shaobo Liu, Guiran Liu, Binrong Zhu, Yuanshuai Luo, Linxiao Wu, Rui Wang,

(参考訳) 本研究は,チャットボット,感情分析,機械翻訳などの共通アプリケーションにおけるユーザデータの保護を目的とした,差分プライバシーに基づく新しいアルゴリズムを導入することにより,自然言語処理(NLP)におけるプライバシ保護に対処する。 NLP技術の普及により、ユーザデータのセキュリティとプライバシ保護が重要な問題となり、緊急に解決する必要がある。本稿では,ユーザの機密情報の漏洩を効果的に防止する新しいプライバシ保護アルゴリズムを提案する。差分プライバシー機構を導入することにより、ランダムノイズを付加しながらデータ解析結果の精度と信頼性を確保することができる。この方法は,データ漏洩によるリスクを軽減するだけでなく,ユーザのプライバシ保護を図りながら,効率的なデータ処理を実現する。データ匿名化や同型暗号化といった従来のプライバシ手法と比較して,本手法はデータ解析の精度を維持しつつ,計算効率とスケーラビリティの面で大きな利点をもたらす。提案アルゴリズムの有効性は、精度(0.89)、精度(0.85)、リコール(0.88)などの性能指標で示され、プライバシーとユーティリティのバランスをとる他の方法よりも優れている。プライバシー保護規制がますます厳しくなっているため、企業や開発者は、プライバシーリスクに対処するための効果的な措置を講じなければならない。我々の研究は、NLP分野におけるプライバシ保護技術の応用に重要な参考を提供し、技術革新とユーザプライバシのバランスを達成する必要性を強調している。将来的には、テクノロジの継続的な進歩により、プライバシ保護はデータ駆動アプリケーションの中核的な要素となり、業界全体の健全な開発を促進するでしょう。

This research addresses privacy protection in Natural Language Processing (NLP) by introducing a novel algorithm based on differential privacy, aimed at safeguarding user data in common applications such as chatbots, sentiment analysis, and machine translation. With the widespread application of NLP technology, the security and privacy protection of user data have become important issues that need to be solved urgently. This paper proposes a new privacy protection algorithm designed to effectively prevent the leakage of user sensitive information. By introducing a differential privacy mechanism, our model ensures the accuracy and reliability of data analysis results while adding random noise. This method not only reduces the risk caused by data leakage but also achieves effective processing of data while protecting user privacy. Compared to traditional privacy methods like data anonymization and homomorphic encryption, our approach offers significant advantages in terms of computational efficiency and scalability while maintaining high accuracy in data analysis. The proposed algorithm's efficacy is demonstrated through performance metrics such as accuracy (0.89), precision (0.85), and recall (0.88), outperforming other methods in balancing privacy and utility. As privacy protection regulations become increasingly stringent, enterprises and developers must take effective measures to deal with privacy risks. Our research provides an important reference for the application of privacy protection technology in the field of NLP, emphasizing the need to achieve a balance between technological innovation and user privacy. In the future, with the continuous advancement of technology, privacy protection will become a core element of data-driven applications and promote the healthy development of the entire industry.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 高齢者の安全, プライバシを重視し, アクセシブルな電子支払アプリケーションの設計

Design of Secure, Privacy-focused, and Accessible E-Payment Applications for Older Adults ( http://arxiv.org/abs/2410.08555v1 )

ライセンス: Link先を確認

Sanchari Das,

(参考訳) 電子決済は、今日のデジタル経済においてトランザクションの利便性に不可欠であり、セキュリティ、プライバシ、ユーザビリティの向上の必要性を強調しながら、高齢者にとってますます重要になっている。そこで本稿では,65歳以上の高齢者400名を対象に,多要素認証(MFA)やQRコードによる受取者追加などの機能を備えた電子支払モバイルアプリケーションの高忠実度プロトタイプの評価を行った。以上の結果から, この人口動態の具体的なニーズを満たすために, アプリケーション用に調整した \b{eta} バージョンを開発した。特に、参加者の約91%は、専門家が推奨するMFAに比べて、従来の知識ベースとシングルモード認証を好んだ。我々は,高齢者のセキュリティ,プライバシ,ユーザビリティ要件に対処する包括的電子支払いソリューションの開発を目的としたレコメンデーションを提供することで,結論付けた。

E-payments are essential for transactional convenience in today's digital economy and are becoming increasingly important for older adults, emphasizing the need for enhanced security, privacy, and usability. To address this, we conducted a survey-based study with 400 older adults aged 65 and above to evaluate a high-fidelity prototype of an e-payment mobile application, which included features such as multi-factor authentication (MFA) and QR code-based recipient addition. Based on our findings, we developed a tailored \b{eta} version of the application to meet the specific needs of this demographic. Notably, approximately 91% of participants preferred traditional knowledge-based and single-mode authentication compared to expert-recommended MFA. We concluded by providing recommendations aimed at developing inclusive e-payment solutions that address the security, privacy, and usability requirements of older adults.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# MUSO:Over-Parameterized Regimesでエキサイティングな機械学習を実現する

MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes ( http://arxiv.org/abs/2410.08557v1 )

ライセンス: Link先を確認

Ruikai Yang, Mingzhen He, Zhengbao He, Youmei Qiu, Xiaolin Huang,

(参考訳) マシン・アンラーニング(MU)とは、訓練されたモデルを特定のデータでトレーニングされたことがないかのように振る舞うことである。今日の過度パラメータ化モデルでは、ニューラルネットワークが支配する一般的なアプローチは、手動でデータをレバーブルし、よく訓練されたモデルを微調整することである。出力空間のMUモデルを近似することができるが、パラメータ空間のMUを正確に達成できるかどうかという疑問は残る。本稿では、ランダムな特徴技術を用いて分析フレームワークを構築することにより、この問題に答える。確率勾配勾配勾配によるモデル最適化の前提の下では、過パラメータ化線形モデルが特定のデータを許容することで正確なMUを達成できることを理論的に証明した。また、本研究を実世界の非線形ネットワークに拡張し、未学習と再学習のタスクを統一する交互最適化アルゴリズムを提案する。このアルゴリズムの有効性は、数値実験によって確認され、現在の最先端の手法と比較して、様々なシナリオで学習する際の優れた性能、特に類似のラベリングに基づくMUアプローチよりも優れていることを強調している。

Machine unlearning (MU) is to make a well-trained model behave as if it had never been trained on specific data. In today's over-parameterized models, dominated by neural networks, a common approach is to manually relabel data and fine-tune the well-trained model. It can approximate the MU model in the output space, but the question remains whether it can achieve exact MU, i.e., in the parameter space. We answer this question by employing random feature techniques to construct an analytical framework. Under the premise of model optimization via stochastic gradient descent, we theoretically demonstrated that over-parameterized linear models can achieve exact MU through relabeling specific data. We also extend this work to real-world nonlinear networks and propose an alternating optimization algorithm that unifies the tasks of unlearning and relabeling. The algorithm's effectiveness, confirmed through numerical experiments, highlights its superior performance in unlearning across various scenarios compared to current state-of-the-art methods, particularly excelling over similar relabeling-based MU approaches.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 複合埋め込み予測アーキテクチャを用いた12左心電図の一般表現

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive architecture ( http://arxiv.org/abs/2410.08559v1 )

ライセンス: Link先を確認

Sehun Kim,

(参考訳) 本稿では,ECG-JEPA (Joint Embedding Predictive Architecture) と呼ばれる12誘導心電図解析のための自己教師付き学習手法を提案する。 ECG-JEPAは、ECGデータのセマンティック表現を学ぶためにマスキング戦略を採用している。既存の方法とは異なり、ECG-JEPAは生データを再構築するのではなく、隠された表現レベルで予測する。このアプローチはECG領域にいくつかの利点をもたらす:(1)標準ECGで一般的なノイズのような不要な詳細を発生させないこと、(2)生信号間のna\\ive L2損失の制限に対処すること。もうひとつの重要な貢献は、12リードのECGデータであるCross-Pattern Attention (CroPA)用に調整された、特別なマスク付きアテンションの導入である。 CroPAは、モデルがパッチ間の関係を効果的にキャプチャすることを可能にする。さらに、ECG-JEPAは非常にスケーラブルで、大規模なデータセットの効率的なトレーニングを可能にする。私たちのコードはhttps://github.com/sehunfromdaegu/ECG_JEPAで公開されています。

We propose a self-supervised learning method for 12-lead Electrocardiogram (ECG) analysis, named ECG Joint Embedding Predictive Architecture (ECG-JEPA). ECG-JEPA employs a masking strategy to learn semantic representations of ECG data. Unlike existing methods, ECG-JEPA predicts at the hidden representation level rather than reconstructing raw data. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in standard ECG; and (2) it addresses the limitations of na\"ive L2 loss between raw signals. Another key contribution is the introduction of a special masked attention tailored for 12-lead ECG data, Cross-Pattern Attention (CroPA). CroPA enables the model to effectively capture inter-patch relationships. Additionally, ECG-JEPA is highly scalable, allowing efficient training on large datasets. Our code is openly available https://github.com/sehunfromdaegu/ECG_JEPA.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 民事訴訟の訴訟の原因としての類似のフレーム

Similar Phrases for Cause of Actions of Civil Cases ( http://arxiv.org/abs/2410.08564v1 )

ライセンス: Link先を確認

Ho-Chien Huang, Chao-Lin Liu,

(参考訳) 台湾の司法制度では、関連する法的判断を特定するために、行動原因(COA)が不可欠である。しかし、標準化されたCOAラベルの欠如は、基本手法を用いてケースをフィルタリングする際の課題を生んでいる。本研究は, 埋込法とクラスタリング法を利用して, 引用法論文に基づいてCOAの類似性を解析することによってこの問題に対処する。この研究は、ディース係数やピアソンの相関係数など、様々な類似度尺度を実装している。アンサンブルモデルはランキングを組み合わせ、ソーシャルネットワーク分析は関連するCOAのクラスタを特定する。このアプローチは、COA間の不明瞭な関係を明らかにすることで法的な分析を強化し、民事法を超えた法研究に潜在的に応用する可能性がある。

In the Taiwanese judicial system, Cause of Actions (COAs) are essential for identifying relevant legal judgments. However, the lack of standardized COA labeling creates challenges in filtering cases using basic methods. This research addresses this issue by leveraging embedding and clustering techniques to analyze the similarity between COAs based on cited legal articles. The study implements various similarity measures, including Dice coefficient and Pearson's correlation coefficient. An ensemble model combines rankings, and social network analysis identifies clusters of related COAs. This approach enhances legal analysis by revealing inconspicuous connections between COAs, offering potential applications in legal research beyond civil law.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# Baichuan-Omni技術報告

Baichuan-Omni Technical Report ( http://arxiv.org/abs/2410.08565v1 )

ライセンス: Link先を確認

Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang, Bowen Ding, Wei Song, Zhenglin Cheng, Yuqi Huo, Song Chen, Xu Li, Da Pan, Shusen Zhang, Xin Wu, Zheng Liang, Jun Liu, Tao Zhang, Keer Lu, Yaqi Zhao, Yanjun Shen, Fan Yang, Kaicheng Yu, Tao Lin, Jianhua Xu, Zenan Zhou, Weipeng Chen,

(参考訳) GPT-4oの健全なマルチモーダル機能とインタラクティブな体験は、実用アプリケーションにおけるその重要な役割を浮き彫りにしている。本稿では,画像,ビデオ,音声,テキストのモダリティを同時処理・解析できる,オープンソースの7B Multimodal Large Language Model (MLLM) であるBaichuan-Omniを紹介する。本稿では、7Bモデルから始まり、2段階のマルチモーダルアライメントと、オーディオ、画像、ビデオ、テキストモダルをまたいだマルチタスク微調整を行う効果的なマルチモーダルトレーニングスキーマを提案する。このアプローチは、視覚的および音声的データを効果的に扱う能力を備えた言語モデルである。様々なOmni-modalベンチマークとマルチモーダルベンチマークにまたがる強力なパフォーマンスを実証し、この貢献は、マルチモーダル理解とリアルタイムインタラクションの進歩において、オープンソースコミュニティの競争基盤となることを目的としている。

The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-Omni, the first open-source 7B Multimodal Large Language Model (MLLM) adept at concurrently processing and analyzing modalities of image, video, audio, and text, while delivering an advanced multimodal interactive experience and strong performance. We propose an effective multimodal training schema starting with 7B model and proceeding through two stages of multimodal alignment and multitask fine-tuning across audio, image, video, and text modal. This approach equips the language model with the ability to handle visual and audio data effectively. Demonstrating strong performance across various omni-modal and multimodal benchmarks, we aim for this contribution to serve as a competitive baseline for the open-source community in advancing multimodal understanding and real-time interaction.

翻訳日:2024-10-30 23:14:57 公開日:2024-10-11

# 透過的・反射的物体に対する拡散法による深度塗布

Diffusion-Based Depth Inpainting for Transparent and Reflective Objects ( http://arxiv.org/abs/2410.08567v1 )

ライセンス: Link先を確認

Tianyu Sun, Dingchang Hu, Yixiang Dai, Guijin Wang,

(参考訳) 我々の日常生活に共通する透明で反射的な物体は、その独特の視覚的・光学的特性から、3Dイメージング技術に重大な課題をもたらす。この種の物体に直面して、RGB-Dカメラは正確な空間情報で実際の深度を捉えることができない。この問題に対処するために,透過的および反射的オブジェクトに特化して設計された拡散型Depth InpaintingフレームワークであるDITRを提案する。このネットワークは、リージョンプロポーザルステージとディープス・インペインティングステージの2つのステージで構成されている。 DITRは光学的および幾何学的深さ損失を動的に解析し、それらを自動的に塗布する。さらに、総合的な実験結果から、DITRは堅牢な適応性を持つ透明で反射性のある物体の深部塗布作業に極めて効果的であることが示された。

Transparent and reflective objects, which are common in our everyday lives, present a significant challenge to 3D imaging techniques due to their unique visual and optical properties. Faced with these types of objects, RGB-D cameras fail to capture the real depth value with their accurate spatial information. To address this issue, we propose DITR, a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects. This network consists of two stages, including a Region Proposal stage and a Depth Inpainting stage. DITR dynamically analyzes the optical and geometric depth loss and inpaints them automatically. Furthermore, comprehensive experimental results demonstrate that DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.