Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241006となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 低リソース言語におけるモデルマージの可能性の解き放つ Unlocking the Potential of Model Merging for Low-Resource Languages ( http://arxiv.org/abs/2407.03994v3 ) ライセンス: Link先を確認	Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang, Dongyan Zhao, Yansong Feng,	(参考訳) 大規模言語モデル(LLM)を新しい言語に適応させるには、通常、継続事前訓練(CT)と、教師付き微調整(SFT)が含まれる。しかし、このCT-then-SFTアプローチは、低リソース言語のコンテキストにおいて限られたデータを扱うため、言語モデリングとタスク解決能力のバランスが取れない。そこで我々は,低リソース言語に代わるモデルマージを提案する。我々は、SFTデータを対象言語に含まない低リソース言語のためのタスク解決LLMを開発するために、モデルマージを使用する。 Llama-2-7Bをベースとした実験により, タスク解決能力の低い低リソース言語では, モデルマージがLLMを効果的に実現し, 極めて少ないシナリオではCT-then-SFTより優れていることが示された。モデルマージにおける性能飽和をより多くのトレーニングトークンで観測し、さらにマージプロセスを分析し、モデルのマージアルゴリズムにスラック変数を導入し、重要なパラメータの損失を軽減し、性能を向上させる。モデルマージは、データ不足とデータ効率の向上に苦しむ、より多くの人間の言語に恩恵をもたらすことを願っています。 Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency.	翻訳日:2024-11-08 23:57:53 公開日:2024-10-06
# VoxAct-B:Voxel-based Acting and Stabilizing Policy for bimanual Manipulation VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation ( http://arxiv.org/abs/2407.04152v2 ) ライセンス: Link先を確認	I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme,	(参考訳) 双対操作は多くのロボティクス応用において重要である。シングルアーム操作とは対照的に、高次元のアクション空間のため、双方向操作タスクは困難である。先行研究は、この問題に対処するために大量のデータと原始的なアクションを利用するが、サンプルの非効率性と様々なタスクにわたる限定的な一般化に悩まされる可能性がある。この目的のために,視覚言語モデル(VLM)を利用した言語条件付きボクセルベース手法であるVoxAct-Bを提案する。我々はこのボクセルグリッドをバイマニュアル操作ポリシーに提供し、動作と安定化の動作を学ぶ。このアプローチは、ボクセルからのより効率的なポリシー学習を可能にし、異なるタスクに一般化することができる。シミュレーションにおいて、VoxAct-Bは、細粒度バイマニュアル操作タスクにおいて、強いベースラインを上回ります。さらに、現実世界の$\texttt{Open Drawer}$と$\texttt{Open Jar}$タスクで2つのUR5を使ってVoxAct-Bを実証する。コード、データ、ビデオはhttps://voxact-b.github.io.comで公開されている。 Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks. To this end, we propose VoxAct-B, a language-conditioned, voxel-based method that leverages Vision Language Models (VLMs) to prioritize key regions within the scene and reconstruct a voxel grid. We provide this voxel grid to our bimanual manipulation policy to learn acting and stabilizing actions. This approach enables more efficient policy learning from voxels and is generalizable to different tasks. In simulation, we show that VoxAct-B outperforms strong baselines on fine-grained bimanual manipulation tasks. Furthermore, we demonstrate VoxAct-B on real-world $\texttt{Open Drawer}$ and $\texttt{Open Jar}$ tasks using two UR5s. Code, data, and videos are available at https://voxact-b.github.io.	翻訳日:2024-11-08 23:57:53 公開日:2024-10-06
# AWT:Augmentation, Weighting, Transportationによるビジョンランゲージモデルの転送 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation ( http://arxiv.org/abs/2407.04603v2 ) ライセンス: Link先を確認	Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang,	(参考訳) 事前学習された視覚言語モデル(VLM)は、様々な視覚分類タスクにおいて印象的な結果を示している。しかし、新しいクラスに関する情報が限られているため、新しい概念を理解するためにそれらを適用する際に、その可能性を完全に解き放たないことが多い。この制限に対処するため、新しい適応フレームワークであるAWT(Augment, Weight, then Transport)を導入する。 AWTは3つの重要な構成要素から構成される: 多様な視覚的視点を持つ入力の増強、画像変換と言語モデルによるクラス記述の強化、予測エントロピーに基づく入力の動的重み付け、視覚言語空間における意味的相関のマイニングに最適な輸送を利用する。 AWTは、様々なVLMにシームレスに統合することができ、追加のトレーニングなしでゼロショット機能を強化し、統合マルチモーダルアダプタモジュールを通じて数ショットの学習を容易にする。我々は、ゼロショット画像分類、ゼロショットビデオアクション認識、アウト・オブ・ディストリビューションの一般化など、AWTを複数の困難なシナリオで検証する。 AWTは、各設定における最先端メソッドを一貫して上回る。さらに、我々の広範な研究は、異なるVLM、アーキテクチャ、スケールにわたるAWTの有効性と適応性をさらに実証している。 Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new classes. To address this limitation, we introduce a novel adaptation framework, AWT (Augment, Weight, then Transport). AWT comprises three key components: augmenting inputs with diverse visual perspectives and enriched class descriptions through image transformations and language models; dynamically weighting inputs based on the prediction entropy; and employing optimal transport to mine semantic correlations in the vision-language space. AWT can be seamlessly integrated into various VLMs, enhancing their zero-shot capabilities without additional training and facilitating few-shot learning through an integrated multimodal adapter module. We verify AWT in multiple challenging scenarios, including zero-shot and few-shot image classification, zero-shot video action recognition, and out-of-distribution generalization. AWT consistently outperforms the state-of-the-art methods in each setting. In addition, our extensive studies further demonstrate AWT's effectiveness and adaptability across different VLMs, architectures, and scales.	翻訳日:2024-11-08 23:46:45 公開日:2024-10-06
# リストグローバル安定性を用いたGMMの非依存的個人密度推定 Agnostic Private Density Estimation for GMMs via List Global Stability ( http://arxiv.org/abs/2407.04783v2 ) ライセンス: Link先を確認	Mohammad Afzali, Hassan Ashtiani, Christopher Liaw,	(参考訳) 制約のない高次元ガウス多様体の混合に対する個人密度推定の問題を考える。この問題のサンプル複雑性に関する最初の上限を証明した。従来,高次元GMMの個人学習性は,実現可能な設定 [Afzali et al , 2024] でのみ知られていた。この結果を証明するために,私的な教師あり学習の文脈で導入された$\textit{list global stability}$ [Ghazi et al , 2021b,a]という概念を利用する。この定義の無関係な変種を定義し、その存在が無関係な私的密度推定に十分であることを示す。そして、GMMのためのグローバルに安定な学習者リストを構築する。 We consider the problem of private density estimation for mixtures of unrestricted high dimensional Gaussians in the agnostic setting. We prove the first upper bound on the sample complexity of this problem. Previously, private learnability of high dimensional GMMs was only known in the realizable setting [Afzali et al., 2024]. To prove our result, we exploit the notion of $\textit{list global stability}$ [Ghazi et al., 2021b,a] that was originally introduced in the context of private supervised learning. We define an agnostic variant of this definition, showing that its existence is sufficient for agnostic private density estimation. We then construct an agnostic list globally stable learner for GMMs.	翻訳日:2024-11-08 23:35:45 公開日:2024-10-06
# RotRNN: 長いシーケンスをローテーションでモデル化する RotRNN: Modelling Long Sequences with Rotations ( http://arxiv.org/abs/2407.07239v2 ) ライセンス: Link先を確認	Kai Biegun, Rares Dolga, Jake Cunningham, David Barber,	(参考訳) ステートスペースモデル(SSM)やリニアリカレントユニット(LRU)のような線形リカレントニューラルネットワークは、最近、ロングシーケンスモデリングベンチマークで最先端のパフォーマンスを示している。彼らの成功にもかかわらず、彼らの経験的業績はよく理解されておらず、特に複雑な初期化と正規化のスキームなど、多くの欠点が伴っている。本研究では、回転行列の便利な性質を利用する線形リカレントモデルであるRotRNNを提案することにより、これらの問題に対処する。本稿では,RotRNNが頑健な正規化手順を備えたシンプルで効率的なモデルを提供し,その理論的導出に忠実な実践的実装であることを示す。 RotRNNは、いくつかのロングシーケンスモデリングデータセット上で、最先端の線形リカレントモデルに対する競合性能も達成している。 Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.	翻訳日:2024-11-08 22:51:19 公開日:2024-10-06
# NativQA: LLMのための多言語文化的適応型自然言語クエリ NativQA: Multilingual Culturally-Aligned Natural Query for LLMs ( http://arxiv.org/abs/2407.09823v2 ) ライセンス: Link先を確認	Md. Arid Hasan, Maram Hasanain, Fatema Ahmad, Sahinur Rahman Laskar, Sunaya Upadhyay, Vrunda N Sukhadia, Mucahid Kutlu, Shammur Absar Chowdhury, Firoj Alam,	(参考訳) 自然質問回答(QA)データセットは、大規模言語モデル(LLM)の能力を評価する上で重要な役割を果たす。開発されている多くのQAデータセットにも拘わらず、独自の言語でネイティブユーザによって生成された地域固有のデータセットは、注目すべきに欠如している。このギャップは、地域や文化的特異性に対するLLMの効果的なベンチマークを妨げている。さらに、細調整されたモデルの開発も制限される。本研究では,言語に依存しないスケーラブルなフレームワークであるNativQAを提案し,LLMの評価とチューニングのために,文化的かつ地域的に整合したQAデータセットをネイティブ言語でシームレスに構築する。提案手法の有効性を,18のトピックをカバーする9つの領域の母語話者からの質問に基づいて,多言語対応の自然QAデータセットである \mnqa を7言語で,64k の注釈付き QA ペアで設計し,提案手法の有効性を実証した。オープンソースのLCMとMultiNativQAデータセットをベンチマークする。また,低リソースおよび方言に富んだ言語を対象とした微調整データ構築におけるフレームワークの有効性を示す。私たちはNativQAフレームワークとMultiNativQAデータセットをコミュニティ向けに公開しました(https://nativqa.gitlab.io.)。 Natural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own languages. This gap hinders the effective benchmarking of LLMs for regional and cultural specificities. Furthermore, it also limits the development of fine-tuned models. In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, \mnqa, consisting of ~64k manually annotated QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers from 9 regions covering 18 topics. We benchmark open- and closed-source LLMs with the MultiNativQA dataset. We also showcase the framework efficacy in constructing fine-tuning data especially for low-resource and dialectally-rich languages. We made both the framework NativQA and MultiNativQA dataset publicly available for the community (https://nativqa.gitlab.io).	翻訳日:2024-11-08 21:54:45 公開日:2024-10-06
# 多インスタンス部分ラベル学習における不均衡のキャラクタリゼーションと緩和について On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning ( http://arxiv.org/abs/2407.10000v2 ) ライセンス: Link先を確認	Kaifu Wang, Efthymia Tsamoura, Dan Roth,	(参考訳) Multi-Instance partial Label Learning(MI-PLL)は、partial label learning、latent structure learning、neurosymbolic learningを含む弱教師付き学習環境である。 MI-PLL では教師付き学習とは異なり、訓練時の分類器への入力は $\mathbf{x}$ のタプルである。同時に、監督信号は、$\mathbf{x}$の(隠された)ゴールドラベル上の関数$\sigma$によって生成される。本研究は,これまでのMI-PLLの文脈では研究されていない問題,すなわち,異なるクラス(クラス固有のリスク)のインスタンスを分類する際に発生するエラーの大きな違いを特徴付け,緩和する問題に,複数のコントリビューションを行う。理論の観点からは、最小の仮定をしながら、MI-PLLのクラス固有のリスク境界を導出する。我々の理論は、$\sigma$が学習の不均衡に大きな影響を及ぼすというユニークな現象を明らかにしている。この結果は、データ不均衡のプリズムの下での不均衡を学ぶことのみを研究する教師付きおよび弱教師付き学習に関する以前の研究と対照的である。実用面では,MI-PLLデータのみを用いて隠れラベルの限界を推定する手法を提案する。次に,隠れラベルの限界を制約として扱うことにより,トレーニング時とテスト時の不均衡を軽減するアルゴリズムを導入する。ニューロシンボリック学習とロングテール学習の強いベースラインを用いた手法の有効性を実証し,最大14\%の性能向上を示唆した。 Multi-Instance Partial Label Learning* (MI-PLL) is a weakly-supervised learning setting encompassing partial label learning, latent structural learning, and neurosymbolic learning. Unlike supervised learning, in MI-PLL, the inputs to the classifiers at training-time are tuples of instances $\mathbf{x}$. At the same time, the supervision signal is generated by a function $\sigma$ over the (hidden) gold labels of $\mathbf{x}$. In this work, we make multiple contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL: that of characterizing and mitigating learning imbalances, i.e., major differences in the errors occurring when classifying instances of different classes (aka class-specific risks). In terms of theory, we derive class-specific risk bounds for MI-PLL, while making minimal assumptions. Our theory reveals a unique phenomenon: that $\sigma$ can greatly impact learning imbalances. This result is in sharp contrast with previous research on supervised and weakly-supervised learning, which only studies learning imbalances under the prism of data imbalances. On the practical side, we introduce a technique for estimating the marginal of the hidden labels using only MI-PLL data. Then, we introduce algorithms that mitigate imbalances at training- and testing-time, by treating the marginal of the hidden labels as a constraint. We demonstrate the effectiveness of our techniques using strong baselines from neurosymbolic and long-tail learning, suggesting performance improvements of up to 14\%.	翻訳日:2024-11-08 21:43:45 公開日:2024-10-06
# 大規模言語モデルにおける知識メカニズム:調査と展望 Knowledge Mechanisms in Large Language Models: A Survey and Perspective ( http://arxiv.org/abs/2407.15017v3 ) ライセンス: Link先を確認	Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang,	(参考訳) 大規模言語モデル(LLM)における知識メカニズムの理解は、信頼できるAGIへ進む上で不可欠である。本稿では,知識利用と進化を含む新しい分類法から知識メカニズムの解析をレビューする。知識利用は記憶、理解、応用、創造のメカニズムに根ざす。知識進化は、個人およびグループLLM内の知識の動的進行に焦点を当てている。さらに, LLMが学んだ知識, パラメトリック知識の脆弱性の理由, 対処が難しい暗黒知識(仮説)についても論じる。この研究がLLMにおける知識の理解を助け、将来の研究に洞察を与えてくれることを願っています。 Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research.	翻訳日:2024-11-08 19:27:32 公開日:2024-10-06
# Inverted Activations: ニューラルネットワークトレーニングにおけるメモリフットプリントの削減 Inverted Activations: Reducing Memory Footprint in Neural Network Training ( http://arxiv.org/abs/2407.15545v2 ) ライセンス: Link先を確認	Georgii Novikov, Ivan Oseledets,	(参考訳) データとモデルサイズの増加によるニューラルネットワークのスケーリングは、より効率的なディープラーニングアルゴリズムの開発を必要とする。ニューラルネットワークトレーニングにおける重要な課題は、アクティベーションテンソルに関連するメモリフットプリント、特に、後方パスの入力テンソル全体を伝統的に保存するポイントワイド非線形層である。本稿では, ポイントワイド非線形層におけるアクティベーションテンソルの取扱いの修正を提案する。我々の方法は、フォワードパス中に入力テンソルの代わりに出力テンソルを節約することである。後続の層は典型的には入力テンソルを節約するので、このアプローチは2つの層ではなく1つの層間のテンソルだけを格納することで必要な総メモリを削減する。この最適化は、GPT、BERT、Mistral、Llamaといったトランスフォーマーベースのアーキテクチャにとって特に有益である。このアプローチを実現するために,後方通過時の非線形性の逆関数を利用する。逆はほとんどの非線形性に対して解析的に計算できないので、より単純な関数を用いて正確な近似を構築する。実験により,本手法はトレーニング精度や計算性能に影響を与えることなく,メモリ使用量を大幅に削減することを示した。我々の実装は、PyTorchフレームワークの標準非線形層をドロップインで置き換えることで、アーキテクチャの変更を必要とせずに、容易に採用できるようにする。 The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with activation tensors, particularly in pointwise nonlinearity layers that traditionally save the entire input tensor for the backward pass, leading to substantial memory consumption. In this paper, we propose a modification to the handling of activation tensors in pointwise nonlinearity layers. Our method involves saving the output tensor instead of the input tensor during the forward pass. Since the subsequent layer typically also saves its input tensor, this approach reduces the total memory required by storing only one tensor between layers instead of two. This optimization is especially beneficial for transformer-based architectures like GPT, BERT, Mistral, and Llama. To enable this approach, we utilize the inverse function of the nonlinearity during the backward pass. As the inverse cannot be computed analytically for most nonlinearities, we construct accurate approximations using simpler functions. Experimental results demonstrate that our method significantly reduces memory usage without affecting training accuracy or computational performance. Our implementation is provided as a drop-in replacement for standard nonlinearity layers in the PyTorch framework, facilitating easy adoption without requiring architectural modifications.	翻訳日:2024-11-08 15:56:37 公開日:2024-10-06
# Counter Turing Test (CT^2$): HindiのAI生成テキスト検出を調査する - Hindi AI Detectability Index (ADI_{hi}$)に基づくLLMのランク付け Counter Turing Test ($CT^2$): Investigating AI-Generated Text Detection for Hindi -- Ranking LLMs based on Hindi AI Detectability Index ($ADI_{hi}$) ( http://arxiv.org/abs/2407.15694v2 ) ライセンス: Link先を確認	Ishan Kavathekar, Anku Rani, Ashmit Chamoli, Ponnurangam Kumaraguru, Amit Sheth, Amitava Das,	(参考訳) LLM(Large Language Models)の普及と多言語LLMに関する認識は、AI生成テキストの誤用に関連する潜在的なリスクと反感を懸念し、警戒を増す必要がある。これらのモデルは、主に英語のために訓練されているが、Web全体をカバーする広大なデータセットに対する広範なトレーニングは、他の多くの言語でうまく機能する能力を備えている。 AI生成テキスト検出(AGTD)は、すでに研究で注目を集めているトピックとして現れており、いくつかの初期手法が提案されている。本稿では,Hindi言語におけるAGTDの検討について報告する。私たちの主な貢献は4つあります。一ヒンディー語テキスト作成の習熟度を評価するために、26 LLMを検査すること。二ヒンディー語(AG_{hi}$)データセットにAI生成ニュース記事を導入すること。 iii)最近提案された5つのAGTD(ConDA, J-Guard, RADAR, RAIDAR, Intrinsic Dimension Estimation)の有効性を評価した。 iv) Hindi AI Detectability Index(ADI_{hi}$)を提案した。コードとデータセットはhttps://github.com/ishank31/Counter_Turing_Testで公開されている。 The widespread adoption of Large Language Models (LLMs) and awareness around multilingual LLMs have raised concerns regarding the potential risks and repercussions linked to the misapplication of AI-generated text, necessitating increased vigilance. While these models are primarily trained for English, their extensive training on vast datasets covering almost the entire web, equips them with capabilities to perform well in numerous other languages. AI-Generated Text Detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. In this paper, we report our investigation on AGTD for an indic language Hindi. Our major contributions are in four folds: i) examined 26 LLMs to evaluate their proficiency in generating Hindi text, ii) introducing the AI-generated news article in Hindi ($AG_{hi}$) dataset, iii) evaluated the effectiveness of five recently proposed AGTD techniques: ConDA, J-Guard, RADAR, RAIDAR and Intrinsic Dimension Estimation for detecting AI-generated Hindi text, iv) proposed Hindi AI Detectability Index ($ADI_{hi}$) which shows a spectrum to understand the evolving landscape of eloquence of AI-generated text in Hindi. The code and dataset is available at https://github.com/ishank31/Counter_Turing_Test	翻訳日:2024-11-08 15:45:25 公開日:2024-10-06
# 音声認識におけるテキスト予測可能性の役割の定量化 Quantifying the Role of Textual Predictability in Automatic Speech Recognition ( http://arxiv.org/abs/2407.16537v2 ) ライセンス: Link先を確認	Sean Robertson, Gerald Penn, Ewan Dunbar,	(参考訳) 音声認識研究における長年の疑問は、エラーを音響をモデル化するモデルの能力と、高次文脈(語彙、形態学、構文、意味論)を活用する能力にどのように当てはめるかである。我々は,テキスト予測可能性の関数として誤り率をモデル化し,認識者に対するテキスト予測可能性の影響を計測する1つの数,$k$を得る新しい手法を検証する。本稿では,Wav2Vec 2.0 ベースのモデルが,明示的な言語モデルを使用しないにもかかわらず,ハイブリッド ASR モデルよりもテキストコンテキストをより強く活用できることを実証するために用いるとともに,アフリカ系アメリカ人英語における標準 ASR システムの性能の低下を示す最近の結果に光を当てるために使用する。これらは主に音響-音響-音響モデリングの失敗を表す。本稿では,ASRの診断と改善において,このアプローチがいかに簡単に利用できるかを示す。 A long-standing question in automatic speech recognition research is how to attribute errors to the ability of a model to model the acoustics, versus its ability to leverage higher-order context (lexicon, morphology, syntax, semantics). We validate a novel approach which models error rates as a function of relative textual predictability, and yields a single number, $k$, which measures the effect of textual predictability on the recognizer. We use this method to demonstrate that a Wav2Vec 2.0-based model makes greater stronger use of textual context than a hybrid ASR model, in spite of not using an explicit language model, and also use it to shed light on recent results demonstrating poor performance of standard ASR systems on African-American English. We demonstrate that these mostly represent failures of acoustic--phonetic modelling. We show how this approach can be used straightforwardly in diagnosing and improving ASR.	翻訳日:2024-11-08 15:34:26 公開日:2024-10-06
# ニューラル言語モデルによる言語習得における臨界期間の影響の検討 Investigating Critical Period Effects in Language Acquisition through Neural Language Models ( http://arxiv.org/abs/2407.19325v2 ) ライセンス: Link先を確認	Ionut Constantinescu, Tiago Pimentel, Ryan Cotterell, Alex Warstadt,	(参考訳) 第二言語 (L2) の習得は幼少期以降に難しくなり、この時代以降(以前ではないが)第1言語 (L1) への露出を緩和することは、通常、L1 の習熟度を著しく損なうことはない。これらのCP効果が自然に決定された脳の成熟によるものなのか、または経験によって自然に誘発される神経接続の安定化であるのかは不明である。本研究では、言語モデル(LM)を用いて、これらの現象が人間特有のものであるか、あるいはより広範な言語学習者によって共有されているかをテストする。また,L2の曝露時期が遅れた場合に,自然成熟期と直接類似しないLMがCP効果を示すことが確認された。本結果は,CP効果は統計的学習の必然的な結果であり,CP効果の自然メカニズムと矛盾するものである。我々は, 可塑性の成熟度低下をシミュレートするために, トレーニングを通じてレギュレータ部分ウェイを導入することにより, CPをリバースエンジニアリングできることを示す。以上の結果から,L1学習自体がCPを誘導するには不十分である可能性が示唆され,言語モデルをより認知的確固たるものにするためには,さらなるエンジニアリングが必要である。 Humans appear to have a critical period (CP) for language acquisition: Second language (L2) acquisition becomes harder after early childhood, and ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to substantial loss of L1 proficiency. It is unknown whether these CP effects result from innately determined brain maturation or as a stabilization of neural connections naturally induced by experience. In this study, we use language models (LMs) to test the extent to which these phenomena are peculiar to humans, or shared by a broader class of language learners. We vary the age of exposure by training LMs on language pairs in various experimental conditions, and find that LMs, which lack any direct analog to innate maturational stages, do not show CP effects when the age of exposure of L2 is delayed. Our results contradict the claim that CP effects are an inevitable result of statistical learning, and they are consistent with an innate mechanism for CP effects. We show that we can reverse-engineer the CP by introducing a regularizer partway through training to simulate a maturational decrease in plasticity. All in all, our results suggest that L1 learning on its own may not be enough to induce a CP, and additional engineering is necessary to make language models more cognitively plausible.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-06
# 市販のCNNとViTを併用した音声認識のためのもうひとつの驚くべきベースライン Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition ( http://arxiv.org/abs/2407.19472v2 ) ライセンス: Link先を確認	Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Prayag Tiwari, Josef Bigun,	(参考訳) 本稿では,ImageNet Large Scale Visual Recognition Challengeのために開発された事前学習型アーキテクチャを,近視認識に適用する。これらのアーキテクチャは、設計されたもの以外の様々なコンピュータビジョンタスクにおいて大きな成功を収めた。この研究は、既成の畳み込みニューラルネットワーク(CNN)を用いた以前の研究に基づいており、最近提案されたビジョントランスフォーマー(ViT)を含むように拡張している。汎用オブジェクト分類の訓練を受けているにもかかわらず、CNNとViTの中間層の特徴は、近視画像に基づいて個人を認識するのに適した方法である。また,CNN と ViT が相補的であることも実証した。さらに,これらの事前学習モデルのごく一部で精度が向上し,より少ないパラメータで,移動体などの資源制限環境に適したモデルが得られることを示す。この効率性は、従来の手作りの機能も追加すれば向上する。 We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition. These architectures have demonstrated significant success in various computer vision tasks beyond the ones for which they were designed. This work builds on our previous study using off-the-shelf Convolutional Neural Network (CNN) and extends it to include the more recently proposed Vision Transformers (ViT). Despite being trained for generic object classification, middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images. We also demonstrate that CNNs and ViTs are highly complementary since their combination results in boosted accuracy. In addition, we show that a small portion of these pre-trained models can achieve good accuracy, resulting in thinner models with fewer parameters, suitable for resource-limited environments such as mobiles. This efficiency improves if traditional handcrafted features are added as well.	翻訳日:2024-11-08 14:27:29 公開日:2024-10-06
# EEGMamba:EEGマルチタスク分類の専門家の混在を考慮した双方向状態空間モデル EEGMamba: Bidirectional State Space Model with Mixture of Experts for EEG Multi-task Classification ( http://arxiv.org/abs/2407.20254v2 ) ライセンス: Link先を確認	Yiyu Gui, MingZhi Chen, Yuqi Su, Guibo Luo, Yuchao Yang,	(参考訳) 近年、深層学習の発展に伴い、脳波分類網(EEG)は一定の進歩を遂げている。トランスフォーマーベースのモデルは、脳波信号の長期的な依存関係を捉えるのによく機能する。しかし、その二次計算の複雑さは、かなりの計算上の問題を引き起こす。さらに、ほとんどのEEG分類モデルは単一タスクにのみ適しており、特に信号長やチャネル数の変化に直面した場合、様々なタスクの一般化に苦慮している。本稿では,脳波アプリケーションのためのマルチタスク学習を真に実装した初のユニバーサル脳波分類ネットワークであるEEGMambaを紹介する。 EEGMambaは、Spatio-Temporal-Adaptive (ST-Adaptive)モジュール、双方向のMamba、Mixture of Experts (MoE)をシームレスに統合したフレームワークに統合する。提案するST-Adaptiveモジュールは,空間適応的畳み込みによって異なる長さとチャネル数を持つ脳波信号に対して統合された特徴抽出を行い,時間適応性を実現するためにクラストークンを組み込む。さらに,脳波信号に特に適する双方向のマンバを設計し,特徴抽出,高精度のバランス,高速推論速度,長期脳波信号処理における効率的なメモリ使用量について検討した。複数のタスクにまたがる脳波データの処理を強化するため、タスク認識型MOEをユニバーサルエキスパートに導入し、異なるタスクから脳波データの違いと共通点の両方を効果的に把握する。本研究では,8つの公用脳波データセットを用いてモデルの評価を行い,その評価結果から,発作検出,感情認識,睡眠ステージ分類,運動画像の4種類のタスクにおいて,その優れた性能を実証した。コードはまもなくリリースされる予定だ。 In recent years, with the development of deep learning, electroencephalogram (EEG) classification networks have achieved certain progress. Transformer-based models can perform well in capturing long-term dependencies in EEG signals. However, their quadratic computational complexity poses a substantial computational challenge. Moreover, most EEG classification models are only suitable for single tasks and struggle with generalization across different tasks, particularly when faced with variations in signal length and channel count. In this paper, we introduce EEGMamba, the first universal EEG classification network to truly implement multi-task learning for EEG applications. EEGMamba seamlessly integrates the Spatio-Temporal-Adaptive (ST-Adaptive) module, bidirectional Mamba, and Mixture of Experts (MoE) into a unified framework. The proposed ST-Adaptive module performs unified feature extraction on EEG signals of different lengths and channel counts through spatial-adaptive convolution and incorporates a class token to achieve temporal-adaptability. Moreover, we design a bidirectional Mamba particularly suitable for EEG signals for further feature extraction, balancing high accuracy, fast inference speed, and efficient memory-usage in processing long EEG signals. To enhance the processing of EEG data across multiple tasks, we introduce task-aware MoE with a universal expert, effectively capturing both differences and commonalities among EEG data from different tasks. We evaluate our model on eight publicly available EEG datasets, and the experimental results demonstrate its superior performance in four types of tasks: seizure detection, emotion recognition, sleep stage classification, and motor imagery. The code is set to be released soon.	翻訳日:2024-11-08 14:05:01 公開日:2024-10-06
# グラフニューラルネットワークを用いた最適かつ効率的なテキスト偽造物 Optimal and efficient text counterfactuals using Graph Neural Networks ( http://arxiv.org/abs/2408.01969v2 ) ライセンス: Link先を確認	Dimitris Lymperopoulos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,	(参考訳) NLPモデルは意思決定プロセスにますます不可欠なものとなり、説明可能性や解釈可能性の必要性が最重要になっている。そこで本研究では,モデル予測を変化させる反事実的介入と呼ばれる意味論的に編集された入力を生成し,モデルに対する反事実的説明の形式を提供するフレームワークを提案する。我々は2つのNLPタスク – バイナリ感情分類とトピック分類 – でフレームワークをテストし、生成した編集がコントラストがあり、流動性があり、最小限であることを示した。 As NLP models become increasingly integral to decision-making processes, the need for explainability and interpretability has become paramount. In this work, we propose a framework that achieves the aforementioned by generating semantically edited inputs, known as counterfactual interventions, which change the model prediction, thus providing a form of counterfactual explanations for the model. We test our framework on two NLP tasks - binary sentiment classification and topic classification - and show that the generated edits are contrastive, fluent and minimal, while the whole process remains significantly faster that other state-of-the-art counterfactual editors.	翻訳日:2024-11-08 13:07:08 公開日:2024-10-06
# Diff-PIC:拡散モデルを用いた粒子内核融合シミュレーション Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models ( http://arxiv.org/abs/2408.02693v3 ) ライセンス: Link先を確認	Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng,	(参考訳) AIの急速な発展は、持続可能なエネルギーの必要性の押し付けを強調している。核融合は究極的な解決策と見なされるが、ほぼ1世紀近くにわたって集中的な研究の中心であり、投資は数十億ドルに達した。近年の慣性凝縮核融合の進展は核融合研究に大きな注目を集めており、レーザー-プラズマ相互作用(LPI)は核融合の安定性と効率を確保するために重要である。しかし、核融合点火時のLPIの複雑さは分析的アプローチを非現実的なものにしており、非常に計算に要求されるParticle-in-Cell (PIC) シミュレーションに頼ってデータを生成し、融合研究の進展に重大なボトルネックをもたらす。 Diff-PICは、条件付き拡散モデルを利用して、高忠実度科学的なLPIデータを生成するために、PICシミュレーションの計算効率を向上する新しいフレームワークである。本研究では,PICシミュレーションによって得られた物理パターンを,(1)物理パラメータとそれに対応する結果との複雑な関係を効果的に捉えるために,物理インフォームド方式で,拡散モデルに抽出する。 2) 高忠実度, 物理的妥当性を維持しつつ, 効率を一層向上させるため, 修正流法を用いて, モデルの1ステップ条件拡散モデルに変換する。実験の結果、Diff-PICは100ピコ秒のシミュレーションで従来のPICと比較して16,200$\times$スピードアップを達成し、他の2つのSOTAデータ生成手法と比較してMAE / RMSE / FIDの59.21% / 57.15% / 39.46%の減少率を示した。 The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century, with investments reaching hundreds of billions of dollars. Recent advancements in Inertial Confinement Fusion have drawn significant attention to fusion research, in which Laser-Plasma Interaction (LPI) is critical for ensuring fusion stability and efficiency. However, the complexity of LPI upon fusion ignition makes analytical approaches impractical, leaving researchers depending on extremely computation-demanding Particle-in-Cell (PIC) simulations to generate data, presenting a significant bottleneck to advancing fusion research. In response, this work introduces Diff-PIC, a novel framework that leverages conditional diffusion models as a computationally efficient alternative to PIC simulations for generating high-fidelity scientific LPI data. In this work, physical patterns captured by PIC simulations are distilled into diffusion models associated with two tailored enhancements: (1) To effectively capture the complex relationships between physical parameters and corresponding outcomes, the parameters are encoded in a physically-informed manner. (2) To further enhance efficiency while maintaining high fidelity and physical validity, the rectified flow technique is employed to transform our model into a one-step conditional diffusion model. Experimental results show that Diff-PIC achieves 16,200$\times$ speedup compared to traditional PIC on a 100 picosecond simulation, with an average reduction in MAE / RMSE / FID of 59.21% / 57.15% / 39.46% with respect to two other SOTA data generation approaches.	翻訳日:2024-11-08 12:55:50 公開日:2024-10-06
# グラフ残差法による分子特性予測法 Graph Residual based Method for Molecular Property Prediction ( http://arxiv.org/abs/2408.03342v2 ) ライセンス: Link先を確認	Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar,	(参考訳) 不動産予測のための機械学習駆動の手法は、大きな関心を集めてきた。しかし、重要なアプリケーションの一般化能力、正確性、推論時間を改善するために、多くの作業が続けられている。従来の機械学習モデルは、しばしば容易に利用できない分子から抽出された特徴に基づいて特性を予測する。本研究では,新しいDeep Learning法であるエッジ条件付き残留グラフニューラルネットワーク(ECRGNN)を適用し,分子のグラフ構造を直接予測する。 SMILES (Simplified Molecular Input Line Entry System) の分子の表現は入力データ形式として使用されており、さらにトレーニングデータを構成するグラフデータベースに変換されている。この写本は、GRUベースの新しい方法論であるECRGNNの詳細な記述を強調し、使用済みの入力をマッピングする。回帰特性と分類効力の両方を強調して強調する。変分オートエンコーダ(VAE)の詳細な記述とマルチクラスマルチラベル特性予測に使用されるエンドツーエンド学習法も提案した。結果は、標準ベンチマークデータセットや、新たに開発されたデータセットと比較されている。これまで使用されてきたパフォーマンス指標はすべて明確に定義されており、その理由が選択されている。 Machine learning-driven methods for property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time for critical applications. The traditional machine learning models predict properties based on the features extracted from the molecules, which are often not easily available. In this work, a novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This manuscript highlights a detailed description of the novel GRU-based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has been provided as well. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.	翻訳日:2024-11-08 12:44:50 公開日:2024-10-06
# 測定デバイス非依存量子鍵分布における強度相関 Intensity correlations in measurement-device-independent quantum key distribution ( http://arxiv.org/abs/2408.08011v3 ) ライセンス: Link先を確認	Junxuan Liu, Tianyi Xing, Ruiyin Liu, Zihao Chen, Hao Tan, Anqi Huang,	(参考訳) 測定デバイス非依存量子鍵分布(MDI QKD)システムにおける量子状態準備中の不完全な変調による強度相関は、そのセキュリティ性能を損なう。したがって、MDI QKDシステムの実用セキュリティに対する強度相関の影響を評価することが重要である。本研究では,MDI QKDシステムのキーレートを,強度相関の下で定量的に解析する理論モデルを提案する。さらに,この理論モデルを実測強度相関を用いたMDI QKDシステムに適用することにより,本モデルの下で鍵を効率よく生成することが困難であることを示す。また、秘密鍵を生成するために強度相関の境界条件についても検討する。本研究は,MDI QKDプロトコルに対する強度相関のセキュリティ解析を拡張し,MDI QKDシステムの実用的セキュリティを評価する方法論を提供する。 The intensity correlations due to imperfect modulation during the quantum-state preparation in a measurement-device-independent quantum key distribution (MDI QKD) system compromise its security performance. Therefore, it is crucial to assess the impact of intensity correlations on the practical security of MDI QKD systems. In this work, we propose a theoretical model that quantitatively analyzes the secure key rate of MDI QKD systems under intensity correlations. Furthermore, we apply the theoretical model to a practical MDI QKD system with measured intensity correlations, which shows that the system struggles to generate keys efficiently under this model. We also explore the boundary conditions of intensity correlations to generate secret keys. This study extends the security analysis of intensity correlations to MDI QKD protocols, providing a methodology to evaluate the practical security of MDI QKD systems.	翻訳日:2024-11-08 07:29:14 公開日:2024-10-06
# Cybench: セキュリティ能力と言語モデルのリスクを評価するフレームワーク Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models ( http://arxiv.org/abs/2408.08926v2 ) ライセンス: Link先を確認	Andy K. Zhang, Neil Perry, Riya Dulepet, Joey Ji, Justin W. Lin, Eliot Jones, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh, Daniel E. Ho, Percy Liang,	(参考訳) 脆弱性を自律的に識別し、エクスプロイトを実行するサイバーセキュリティのための言語モデル(LM)エージェントは、現実世界に影響を及ぼす可能性がある。政策立案者、モデル提供者、AIおよびサイバーセキュリティコミュニティの他の研究者は、サイバーリスクを軽減し、侵入テストの機会を調べるためにそのようなエージェントの能力を定量化することに興味を持っている。そこで,サイバーセキュリティタスクの特定と,それらのタスクに対するエージェント評価のためのフレームワークであるCybenchを紹介する。 4つの異なるCTFコンペティションから、40のプロフェッショナルレベルのCapture the Flag(CTF)タスクが含まれています。各タスクは独自の記述、スターターファイルを含み、エージェントがbashコマンドを実行して出力を観察できる環境で初期化される。多くのタスクは既存のLMエージェントの能力を超えており、タスクごとにサブタスクを導入し、タスクを中間ステップに分解してより詳細な評価を行う。 GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, Llama 3.1 405B Instruct。サブタスクのガイダンスなしでは、Claude 3.5 Sonnet、GPT-4o、OpenAI o1-preview、Claude 3 Opusを活用するエージェントは、人間のチームが解くのに最大11分かかった完全なタスクをうまく解決した。対照的に、最も難しいタスクは、解決に24時間54分を要した。すべてのコードとデータはhttps://cybench.github.ioで公開されている。 Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute bash commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks for each task, which break down a task into intermediary steps for a more detailed evaluation. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 8 models: GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. Without subtask guidance, agents leveraging Claude 3.5 Sonnet, GPT-4o, OpenAI o1-preview, and Claude 3 Opus successfully solved complete tasks that took human teams up to 11 minutes to solve. In comparison, the most difficult task took human teams 24 hours and 54 minutes to solve. All code and data are publicly available at https://cybench.github.io	翻訳日:2024-11-08 07:07:05 公開日:2024-10-06
# 多粒子連続可変非ガウスエンタングルメント構造のメトロロジカルキャラクタリゼーション Metrological Characterization of Multipartite Continuous-Variable non-Gaussian Entanglement Structure ( http://arxiv.org/abs/2408.12554v2 ) ライセンス: Link先を確認	Mingsheng Tian, Xiaoting Gao, Boxuan Jing, Feng-Xiao Sun, Matteo Fadel, Qiongyi He,	(参考訳) マルチパーティ・エンタングルメントは量子情報処理に不可欠な資源であるが、連続変数系におけるエンタングルメント構造の特徴付けは、特にマルチモード非ガウス的シナリオにおいて難しいままである。本研究では,連続変数状態における多部交絡構造を検出する手法を提案する。量子フィッシャー情報を活用することにより,多モード非ガウス状態における量子相関を捉えることが可能な演算子を同定する体系的手法を提案する。ランダムに生成した多モード量子状態に対して,本手法の有効性を実証し,絡み付き検出において高い成功率を達成する。さらに,本手法は,アクセス可能な演算子の集合を拡張することで,損失に対する堅牢性を向上する。この研究は、様々な連続変数系における絡み合い構造を特徴づけるための一般的なフレームワークを提供し、多くの実験的な応用を可能にする。 Multipartite entanglement is an essential resource for quantum information tasks, but characterizing entanglement structures in continuous variable systems remains challenging, especially in multimode non-Gaussian scenarios. In this work, we introduce a method for detecting multipartite entanglement structures in continuous variable states. By leveraging the quantum Fisher information, we propose a systematic approach to identify feasible operators that capture quantum correlations in multimode non-Gaussian states. We demonstrate the effectiveness of our method on over $10^5$ randomly generated multimode-entangled quantum states, achieving a high success rate in entanglement detection. Additionally, our method exhibits enhanced robustness against losses by expanding the set of accessible operators. This work provides a general framework for characterizing entanglement structures in diverse continuous variable systems, enabling a number of experimentally relevant applications.	翻訳日:2024-11-08 05:37:29 公開日:2024-10-06
# S4D:ガウスと3次元制御点を用いた4次元実世界再構成 S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points ( http://arxiv.org/abs/2408.13036v2 ) ライセンス: Link先を確認	Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang,	(参考訳) ガウシアンを用いた動的シーン再構築が近年注目されている。主流のアプローチは典型的には、大域的な変形場を用いて、標準空間の3Dシーンをワープする。しかし、暗黙の神経場の固有の低周波の性質は、しばしば複素運動の非効率な表現につながる。さらに、その構造的な剛性は、様々な解像度と持続時間を持つシーンへの適応を妨げる可能性がある。これらの課題に対処するために,離散的な3次元制御点を用いた4次元実世界の再構成をストリーミングする手法を提案する。この方法は局所光を物理的にモデル化し、運動デカップリング座標系を確立する。従来のグラフィックスと学習可能なパイプラインを効果的にマージすることにより、堅牢で効率的なローカルな6自由度(6-DoF)モーション表現を提供する。さらに,ガウスの制御点とガウスの制御点を統合する一般化されたフレームワークを開発した。最初の3D再構成から始まり、我々のワークフローはストリーミング4D再構成を4つの独立したサブモジュールに分解する。実験により,提案手法は,Neu3DVおよびCMU-Panopticデータセットの既存の4Dガウススプラッティング技術より優れていることが示された。特に、私たちの3Dコントロールポイントの最適化は、100回、NVIDIA 4070 GPUで1フレームあたりわずか2秒で達成できます。 Dynamic scene reconstruction using Gaussians has recently attracted increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in canonical space. However, the inherent low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To address these challenges, we introduce a novel approach for streaming 4D real-world reconstruction utilizing discrete 3D control points. This method physically models local rays and establishes a motion-decoupling coordinate system. By effectively merging traditional graphics with learnable pipelines, it provides a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that integrates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D reconstruction into four independent submodules: 3D segmentation, 3D control point generation, object-wise motion manipulation, and residual compensation. Experimental results demonstrate that our method outperforms existing state-of-the-art 4D Gaussian splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Notably, the optimization of our 3D control points is achievable in 100 iterations and within just 2 seconds per frame on a single NVIDIA 4070 GPU.	翻訳日:2024-11-08 05:26:28 公開日:2024-10-06
# SONICS: Synthetic or Not -- Identifying Counterfeit Songs SONICS: Synthetic Or Not -- Identifying Counterfeit Songs ( http://arxiv.org/abs/2408.14080v3 ) ライセンス: Link先を確認	Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah,	(参考訳) 最近のAI生成楽曲の急増は、エキサイティングな可能性と挑戦を示している。これらの発明は、音楽の創造を民主化する一方で、芸術的整合性を守り、人間の音楽芸術を保護するために、人間の構成した歌と合成歌を区別する能力も必要である。フェイクソング検出における既存の研究とデータセットは、ボーカルがAIによって生成されるが、楽器音楽は実際の歌から供給される、歌声のディープフェイク検出(SVDD)のみに焦点を当てている。しかし、これらのアプローチは、すべてのコンポーネント(声、音楽、歌詞、スタイル)がAIによって生成されるような、現代のエンドツーエンドの人工歌を検出するには不十分である。さらに、既存のデータセットには、音楽歌詞の多様性、長いデュレーション曲、オープンアクセスのフェイクソングが欠けている。これらのギャップに対処するため,Sano や Udio などの人気プラットフォームから,97k曲 (4,751時間) 以上と49k曲以上の合成歌からなる,エンドツーエンドの合成歌検出(SSD)のための新しいデータセット SONICS を紹介した。さらに,既存の手法で完全に見落とされ,歌唱における時間的長期依存性を効果的に検出するためにモデル化することの重要性を強調した。長距離パターンを利用するために、従来のCNNやTransformerベースのモデルよりも時間とメモリ効率を大幅に向上させる新しいアーキテクチャであるSpecTTTraを導入する。特に、長いオーディオサンプルでは、私たちの最高のパフォーマンスの亜種は、ViTのスコアを8%上回り、スピードは38%、メモリ使用量は26%減った。さらに,ConvNeXtと比較してF1スコアが1%向上し,速度が20%向上し,メモリ使用量が67%減少した。モデルファミリーの他のバリエーションは、競争力のあるパフォーマンスで、より優れたスピードとメモリ効率を提供する。 The recent surge in AI-generated songs presents exciting possibilities and challenges. While these inventions democratize music creation, they also necessitate the ability to distinguish between human-composed and synthetic songs to safeguard artistic integrity and protect human musical artistry. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, these approaches are inadequate for detecting contemporary end-to-end artificial songs where all components (vocals, music, lyrics, and style) could be AI-generated. Additionally, existing datasets lack music-lyrics diversity, long-duration songs, and open-access fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs (4,751 hours) with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect entirely overlooked in existing methods. To utilize long-range patterns, we introduce SpecTTTra, a novel architecture that significantly improves time and memory efficiency over conventional CNN and Transformer-based models. In particular, for long audio samples, our top-performing variant outperforms ViT by 8% F1 score while being 38% faster and using 26% less memory. Additionally, in comparison with ConvNeXt, our model achieves 1% gain in F1 score with 20% boost in speed and 67% reduction in memory usage. Other variants of our model family provide even better speed and memory efficiency with competitive performance.	翻訳日:2024-11-08 05:04:12 公開日:2024-10-06
# 簡易型安全連続学習機 Simplex-enabled Safe Continual Learning Machine ( http://arxiv.org/abs/2409.05898v2 ) ライセンス: Link先を確認	Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo,	(参考訳) 本稿では, 安全クリティカルな自律システムを対象とした, シンプルで安全な連続学習システムSeC-Learning Machineを提案する。 SeC学習マシンはSimplexロジック(「複雑さを制御するためのシンプルさ」)と物理制御された深層強化学習(Phy-DRL)に基づいて構築されている。これにより、HP(ハイパフォーマンス)、HA(ハイアシュアランス)、コーディネータを構成する。具体的には、HP-Studentは事前訓練された高性能だが完全に検証されていないPhy-DRLで、実際の工場で学び続け、アクションポリシーを安全に調整している。これとは対照的に、HA-Teacherはミッション再現型、物理モデルベース、そして検証された設計である。 HA-Teacherには2つのミッションがある。 Coordinatorは、HP-StudentとHA-Teacherのインタラクションとスイッチをトリガーする。対話的な3つのコンポーネントで動く機械学習マシンSeC 一生涯の安全を確保すること(すなわち、HP-Studentの成功又は収束にかかわらず、継続学習段階における安全を保証すること。) ii)Sim2Realのギャップに対処し、三実の植物の未知を許容することを学ぶこと。カートポールシステムと実四足歩行ロボットの実験は、Sim2Realギャップに対処するアプローチを備えた最先端の安全なDRLフレームワーク上に構築された連続学習と比較して、SeC学習マシンの際立った特徴を実証している。 This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.	翻訳日:2024-11-07 22:27:40 公開日:2024-10-06
# ネイティブ対非ネイティブ言語プロンプト:比較分析 Native vs Non-Native Language Prompting: A Comparative Analysis ( http://arxiv.org/abs/2409.07054v2 ) ライセンス: Link先を確認	Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, Firoj Alam,	(参考訳) 大規模言語モデル(LLM)は、標準自然言語処理(NLP)タスクなど、さまざまな分野において顕著な能力を示している。 LLMから知識を引き出すために、プロンプトは自然言語命令からなる重要な役割を果たす。ほとんどのオープンソースでクローズドなLCMは、テキスト、画像、オーディオ、ビデオなどのデジタルコンテンツというラベル付きおよびラベルなしのリソースで訓練されている。したがって、これらのモデルは高リソースの言語に対してより良い知識を持っているが、低リソースの言語では苦労している。プロンプトは能力を理解する上で重要な役割を果たすため、プロンプトに使われる言語は依然として重要な研究課題である。この領域では重要な研究がなされているが、まだ限られており、中級言語から低級言語への探索は少ない。本研究では、12のアラビアデータセット(9.7Kデータポイント)に関連する11の異なるNLPタスクにおける異なるプロンプト戦略(ネイティブ対非ネイティブ)について検討する。合計で3つのLSM、12のデータセット、および3つのプロンプト戦略を含む197の実験を行った。以上の結果から,非ネイティブプロンプトは平均して最善であり,その後に混合プロンプトとネイティブプロンプトが続くことが示唆された。 Large language models (LLMs) have shown remarkable abilities in different fields, including standard Natural Language Processing (NLP) tasks. To elicit knowledge from LLMs, prompts play a key role, consisting of natural language instructions. Most open and closed source LLMs are trained on available labeled and unlabeled resources--digital content such as text, images, audio, and videos. Hence, these models have better knowledge for high-resourced languages but struggle with low-resourced languages. Since prompts play a crucial role in understanding their capabilities, the language used for prompts remains an important research question. Although there has been significant research in this area, it is still limited, and less has been explored for medium to low-resourced languages. In this study, we investigate different prompting strategies (native vs. non-native) on 11 different NLP tasks associated with 12 different Arabic datasets (9.7K data points). In total, we conducted 197 experiments involving 3 LLMs, 12 datasets, and 3 prompting strategies. Our findings suggest that, on average, the non-native prompt performs the best, followed by mixed and native prompts.	翻訳日:2024-11-07 21:53:46 公開日:2024-10-06
# Propaganda to Hate:マルチエージェントLDMを用いたアラビアミームのマルチモーダル分析 Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs ( http://arxiv.org/abs/2409.07246v2 ) ライセンス: Link先を確認	Firoj Alam, Md. Rafiul Biswas, Uzair Shah, Wajdi Zaghouani, Georgios Mikros,	(参考訳) 過去10年間、ソーシャルメディアプラットフォームは情報発信と消費に使われてきた。コンテンツの大部分は市民ジャーナリズムと大衆の認知を促進するために投稿されるが、一部のコンテンツは誤解を招くユーザーへ投稿される。テキスト、画像、ビデオなどの様々なコンテンツタイプの中で、ミーム(画像上のテキストオーバーレイド)は特に一般的であり、プロパガンダ、憎悪、ユーモアの強力な乗り物として機能する。現在の文献では、ミーム内の個々の内容を検出する努力がなされている。しかし、それらの交叉の研究は非常に限られている。本研究では,マルチエージェントLPMを用いた手法を用いて,ミームにおけるプロパガンダと憎悪の交点を探索する。我々は、粗い、きめ細かい憎悪ラベルでプロパガンダ的なミームデータセットを拡張した。我々の発見は、ミームにプロパガンダと憎悪の関連があることを示唆している。今後の研究のベースラインとなるための詳細な実験結果を提供する。実験的なリソースをコミュニティに公開します(https://github.com/firojalam/propaganda-and-hateful-memes)。 In the past decade, social media platforms have been used for information dissemination and consumption. While a major portion of the content is posted to promote citizen journalism and public awareness, some content is posted to mislead users. Among different content types such as text, images, and videos, memes (text overlaid on images) are particularly prevalent and can serve as powerful vehicles for propaganda, hate, and humor. In the current literature, there have been efforts to individually detect such content in memes. However, the study of their intersection is very limited. In this study, we explore the intersection between propaganda and hate in memes using a multi-agent LLM-based approach. We extend the propagandistic meme dataset with coarse and fine-grained hate labels. Our finding suggests that there is an association between propaganda and hate in memes. We provide detailed experimental results that can serve as a baseline for future studies. We will make the experimental resources publicly available to the community (https://github.com/firojalam/propaganda-and-hateful-memes).	翻訳日:2024-11-07 21:53:46 公開日:2024-10-06
# 自然言語推論における説明を用いた敵対的ロバスト性の向上 Enhancing adversarial robustness in Natural Language Inference using explanations ( http://arxiv.org/abs/2409.07423v2 ) ライセンス: Link先を確認	Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,	(参考訳) 最先端のTransformerベースのモデルの急増は、間違いなくNLPモデルのパフォーマンスの限界を押し上げ、様々なタスクに優れています。我々は,自然言語推論 (NLI) の課題に注目を当てた。なぜなら,よく適合したデータセットで訓練されたモデルは,敵対的攻撃の影響を受けやすいため,微妙な入力介入によってモデルを誤解させることができるからだ。本研究は, 前提仮説入力ではなく, 説明文の分類器を微調整することによって, 説明自由ベースラインと比較して, 種々の敵攻撃下での堅牢性を実現することによる, 広範囲な実験を通じて, モデルに依存しない防衛戦略としての自然言語説明の利用を検証するものである。また、生成した説明のセマンティックな妥当性をテストするための標準的な戦略が存在しないため、広範に使われている言語生成指標と人間の知覚との相関について検討し、それらが堅牢なNLIモデルへのプロキシとして機能するようにした。我々の手法は資源効率が良く再現可能であり、計算量に大きな制限はない。 The surge of state-of-the-art Transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks. We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since models trained on popular well-suited datasets are susceptible to adversarial attacks, allowing subtle input interventions to mislead the model. In this work, we validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation: only by fine-tuning a classifier on the explanation rather than premise-hypothesis inputs, robustness under various adversarial attacks is achieved in comparison to explanation-free baselines. Moreover, since there is no standard strategy of testing the semantic validity of the generated explanations, we research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models. Our approach is resource-efficient and reproducible without significant computational limitations.	翻訳日:2024-11-07 21:53:46 公開日:2024-10-06
# Faetarベンチマーク: 非常にアンダーソースな言語における音声認識 The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language ( http://arxiv.org/abs/2409.08103v2 ) ライセンス: Link先を確認	Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar,	(参考訳) 低リソース音声認識への現在のアプローチの限界を押し上げるために設計されたベンチマークコーパスであるFaetar Automatic Speech Recognition Benchmarkを導入する。フェタールは、主にイタリアで話されるフランコ・プロヴェン・c{c} の変種であり、標準的な正書法を持たず、ベンチマークに含まれるもの以外のテキストや音声のリソースはほとんどなく、他のフランコ・プロヴェン・c{c} の形式とは全く異なる。コーパスはフィールド録音に由来するが、ほとんどはノイズがあり、5時間しか一致した書き起こしがなく、強制的なアライメントは可変品質である。コーパスには、さらに20時間分の未収録のスピーチが含まれている。本稿では,現在最先端の多言語音声基礎モデルの音声誤り率30.4%のベースライン結果について報告する。 We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms of Franco-Proven\c{c}al. The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions, and for which forced alignment is of variable quality. The corpus contains an additional 20 hrs of unlabelled speech. We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%, using a pipeline that continues pre-training on the foundation model using the unlabelled set.	翻訳日:2024-11-07 21:31:36 公開日:2024-10-06
# Famba-V:クロス層トーケン融合による高速ビジョンマンバ Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion ( http://arxiv.org/abs/2409.09808v2 ) ライセンス: Link先を確認	Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang,	(参考訳) MambaとVision Mamba(Vim)モデルは、Transformerアーキテクチャに基づくメソッドの代替としての可能性を示している。この研究は、Vimモデルのトレーニング効率を高めるための層間トークン融合技術であるFast Mamba for Vision (Famba-V)を導入している。 Famba-Vの鍵となる考え方は、既存の作業が提案するすべてのレイヤに対してトークン融合を均一に適用するのではなく、異なるVim層にまたがって類似したトークンを識別し、融合することである。 CIFAR-100におけるFamba-Vの性能評価を行った。この結果から,Famba-Vはトレーニング中のトレーニング時間とピークメモリ使用量の両方を削減することで,Vimモデルのトレーニング効率を向上させることができることがわかった。さらに、提案したクロスレイヤー戦略により、Famba-Vはより優れた精度と効率のトレードオフを提供できる。これらの結果はいずれも、Famba-V を Vim モデルの有望な効率向上技術として実証している。 Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.	翻訳日:2024-11-07 20:46:36 公開日:2024-10-06
# Famba-V:クロス層トーケン融合による高速ビジョンマンバ Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion ( http://arxiv.org/abs/2409.09808v3 ) ライセンス: Link先を確認	Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang,	(参考訳) MambaとVision Mamba(Vim)モデルは、Transformerアーキテクチャに基づくメソッドの代替としての可能性を示している。この研究は、Vimモデルのトレーニング効率を高めるための層間トークン融合技術であるFast Mamba for Vision (Famba-V)を導入している。 Famba-Vの鍵となる考え方は、既存の作業が提案するすべてのレイヤに対してトークン融合を均一に適用するのではなく、異なるVim層にまたがって類似したトークンを識別し、融合することである。 CIFAR-100におけるFamba-Vの性能評価を行った。この結果から,Famba-Vはトレーニング中のトレーニング時間とピークメモリ使用量の両方を削減することで,Vimモデルのトレーニング効率を向上させることができることがわかった。さらに、提案したクロスレイヤー戦略により、Famba-Vはより優れた精度と効率のトレードオフを提供できる。これらの結果はいずれも、Famba-V を Vim モデルの有望な効率向上技術として実証している。 Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.	翻訳日:2024-11-07 20:46:36 公開日:2024-10-06
# 量子物理学のどの特徴は基本的に量子ではなく、不決定性によるものなのか? Which features of quantum physics are not fundamentally quantum but are due to indeterminism? ( http://arxiv.org/abs/2409.10601v2 ) ライセンス: Link先を確認	Flavio Del Santo, Nicolas Gisin,	(参考訳) 量子とは何か? 我々は、測度問題、ウィグナーの友人パラドックスとその提案された解、単一粒子非局所性、および非閉化など、ほとんどの特徴、問題、パラドックスは、古典物理学を根本的非決定論的と解釈するならば、量子物理学に帰属するとされる古典的な類似性を持っていると論じる。量子物理学を真に特徴付けるものは、$\hbar$、すなわち非互換な観測可能量を含む現象のみに起因する。 What is fundamentally quantum? We argue that most of the features, problems, and paradoxes -- such as the measurement problem, the Wigner's friend paradox and its proposed solutions, single particle nonlocality, and no-cloning -- allegedly attributed to quantum physics have a clear classical analogue if one is to interpret classical physics as fundamentally indeterministic. What really characterizes quantum physics boils down only to phenomena that involve $\hbar$, i.e., incompatible observables.	翻訳日:2024-11-07 20:24:11 公開日:2024-10-06
# 先進的脅威属性の包括的調査--分類学,方法,課題,オープンリサーチ問題 A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems ( http://arxiv.org/abs/2409.11415v2 ) ライセンス: Link先を確認	Nanda Rani, Bikash Saha, Sandeep Kumar Shukla,	(参考訳) Advanced Persistent Threat (APT) アトリビューションはサイバーセキュリティにおいて重要な課題であり、高度なサイバー攻撃の背後にある犯人を正確に識別するプロセスを示している。防衛機構を大幅に強化し、戦略的な対応を通知することができる。人工知能(AI)と機械学習(ML)技術の普及に伴い、研究者たちは、サイバー脅威を責任あるアクターにリンクする自動化ソリューションの開発に注力し、従来の手作業の手法から遠ざかっている。自動帰属に関する以前の文献では、自動帰属プロセスに役立つ自動化された方法と関連するアーティファクトの体系的なレビューが欠けている。これらのギャップに対処し、脅威属性の現在の状況についてコンテキストを提供するため、自動化APT属性の総合的な調査を行う。この調査は、分散したアーティファクトの理解から始まり、貢献に役立つアーティファクトの包括的分類を提供する。我々は、利用可能な属性データセットと現在の自動化APT属性の分類を包括的にレビューし、提示する。さらに,現状の文献手法について批判的なコメントを出し,自動帰属の課題を議論し,オープンな研究課題へ向けた。この調査は、現在のギャップと課題に対処するため、今後のAPT貢献研究の機会を明らかにします。この調査は、現在の実践における強みと限界を特定することによって、自動化され、信頼性があり、実行可能なAPT帰属法における将来の研究と開発の基礎を提供する。 Advanced Persistent Threat (APT) attribution is a critical challenge in cybersecurity and implies the process of accurately identifying the perpetrators behind sophisticated cyber attacks. It can significantly enhance defense mechanisms and inform strategic responses. With the growing prominence of artificial intelligence (AI) and machine learning (ML) techniques, researchers are increasingly focused on developing automated solutions to link cyber threats to responsible actors, moving away from traditional manual methods. Previous literature on automated threat attribution lacks a systematic review of automated methods and relevant artifacts that can aid in the attribution process. To address these gaps and provide context on the current state of threat attribution, we present a comprehensive survey of automated APT attribution. The presented survey starts with understanding the dispersed artifacts and provides a comprehensive taxonomy of the artifacts that aid in attribution. We comprehensively review and present the classification of the available attribution datasets and current automated APT attribution methods. Further, we raise critical comments on current literature methods, discuss challenges in automated attribution, and direct toward open research problems. This survey reveals significant opportunities for future research in APT attribution to address current gaps and challenges. By identifying strengths and limitations in current practices, this survey provides a foundation for future research and development in automated, reliable, and actionable APT attribution methods.	翻訳日:2024-11-07 20:01:55 公開日:2024-10-06
# 先進的脅威属性の包括的調査--分類学,方法,課題,オープンリサーチ問題 A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems ( http://arxiv.org/abs/2409.11415v3 ) ライセンス: Link先を確認	Nanda Rani, Bikash Saha, Sandeep Kumar Shukla,	(参考訳) Advanced Persistent Threat (APT) アトリビューションはサイバーセキュリティにおいて重要な課題であり、高度なサイバー攻撃の背後にある犯人を正確に識別するプロセスを示している。防衛機構を大幅に強化し、戦略的な対応を通知することができる。人工知能(AI)と機械学習(ML)技術の普及に伴い、研究者たちは、サイバー脅威を責任あるアクターにリンクする自動化ソリューションの開発に注力し、従来の手作業の手法から遠ざかっている。自動帰属に関する以前の文献では、自動帰属プロセスに役立つ自動化された方法と関連するアーティファクトの体系的なレビューが欠けている。これらのギャップに対処し、脅威属性の現在の状況についてコンテキストを提供するため、自動化APT属性の総合的な調査を行う。この調査は、分散したアーティファクトの理解から始まり、貢献に役立つアーティファクトの包括的分類を提供する。我々は、利用可能な属性データセットと現在の自動化APT属性の分類を包括的にレビューし、提示する。さらに,現状の文献手法について批判的なコメントを出し,自動帰属の課題を議論し,オープンな研究課題へ向けた。この調査は、現在のギャップと課題に対処するため、今後のAPT貢献研究の機会を明らかにします。この調査は、現在の実践における強みと限界を特定することによって、自動化され、信頼性があり、実行可能なAPT帰属法における将来の研究と開発の基礎を提供する。 Advanced Persistent Threat (APT) attribution is a critical challenge in cybersecurity and implies the process of accurately identifying the perpetrators behind sophisticated cyber attacks. It can significantly enhance defense mechanisms and inform strategic responses. With the growing prominence of artificial intelligence (AI) and machine learning (ML) techniques, researchers are increasingly focused on developing automated solutions to link cyber threats to responsible actors, moving away from traditional manual methods. Previous literature on automated threat attribution lacks a systematic review of automated methods and relevant artifacts that can aid in the attribution process. To address these gaps and provide context on the current state of threat attribution, we present a comprehensive survey of automated APT attribution. The presented survey starts with understanding the dispersed artifacts and provides a comprehensive taxonomy of the artifacts that aid in attribution. We comprehensively review and present the classification of the available attribution datasets and current automated APT attribution methods. Further, we raise critical comments on current literature methods, discuss challenges in automated attribution, and direct toward open research problems. This survey reveals significant opportunities for future research in APT attribution to address current gaps and challenges. By identifying strengths and limitations in current practices, this survey provides a foundation for future research and development in automated, reliable, and actionable APT attribution methods.	翻訳日:2024-11-07 20:01:55 公開日:2024-10-06
# EL素子の発注が加工性能に及ぼす影響 The Impact of Element Ordering on LM Agent Performance ( http://arxiv.org/abs/2409.12089v3 ) ライセンス: Link先を確認	Wayne Chi, Ameet Talwalkar, Chris Donahue,	(参考訳) Webやデスクトップなどの仮想環境をナビゲートできる言語モデルエージェントへの関心が高まっている。このような環境をナビゲートするために、エージェントは、様々な要素(例えば、ボタン、テキスト、画像)に関する情報から恩恵を受ける。特にグラフィカルな表現(ピクセル)のみを提供する環境では、どの要素属性がエージェントのパフォーマンスに最も大きな影響を与えるのかは不明だ。ここでは,言語モデルに要素が提示される順序付けが驚くほど影響を受けており,Webページ内のランダム化要素の順序付けはエージェントの状態表現からすべての可視テキストを削除することで,エージェントのパフォーマンスを両立させる。ウェブページは要素の階層的な順序付けを提供するが、ピクセルから直接要素を解析する際にそのような順序付けは存在しない。さらに、タスクがより困難になり、モデルがより洗練されるにつれて、我々の実験は注文の影響が増加することを示唆している。効果的な注文を見つけることは簡単ではない。ウェブおよびデスクトップ環境における各種要素順序付け手法の影響について検討する。我々は, 画素のみの環境において, 次元の減少が実効的な順序付けをもたらすことを見出した。 UI要素の検出モデルをトレーニングして、ピクセルから要素を抽出し、その結果をエージェントベンチマーク(OmniACT)に適用します。本手法は,従来の最先端技術と比較して平均2倍以上のタスクを完了させる。 There has been a surge of interest in language model agents that can navigate virtual environments such as the web or desktop. To navigate such environments, agents benefit from information on the various elements (e.g., buttons, text, or images) present. It remains unclear which element attributes have the greatest impact on agent performance, especially in environments that only provide a graphical representation (i.e., pixels). Here we find that the ordering in which elements are presented to the language model is surprisingly impactful--randomizing element ordering in a webpage degrades agent performance comparably to removing all visible text from an agent's state representation. While a webpage provides a hierarchical ordering of elements, there is no such ordering when parsing elements directly from pixels. Moreover, as tasks become more challenging and models more sophisticated, our experiments suggest that the impact of ordering increases. Finding an effective ordering is non-trivial. We investigate the impact of various element ordering methods in web and desktop environments. We find that dimensionality reduction provides a viable ordering for pixel-only environments. We train a UI element detection model to derive elements from pixels and apply our findings to an agent benchmark--OmniACT--where we only have access to pixels. Our method completes more than two times as many tasks on average relative to the previous state-of-the-art.	翻訳日:2024-11-07 19:26:16 公開日:2024-10-06
# 平均流路距離における量子チャネル試験 Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v1 ) ライセンス: Link先を確認	Hugo Aaronson, Gregory Rosenthal, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,	(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離では$\Omega(\sqrt{d_{\mathrm{in}} / \varepsilon})$クエリを必要とする。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。 We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}} / \varepsilon})$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.	翻訳日:2024-11-07 14:19:13 公開日:2024-10-06
# 平均流路距離における量子チャネル試験 Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v2 ) ライセンス: Link先を確認	Hugo Aaronson, Gregory Rosenthal, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,	(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離において$\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$クエリを必要とすることを示す。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。 We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.	翻訳日:2024-11-07 14:19:13 公開日:2024-10-06
# 平均流路距離における量子チャネル試験 Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v3 ) ライセンス: Link先を確認	Gregory Rosenthal, Hugo Aaronson, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,	(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離において$\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$クエリを必要とすることを示す。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。 We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.	翻訳日:2024-11-07 14:19:13 公開日:2024-10-06
# MaPPER: 表現理解の参照に有効なマルチモーダル事前誘導パラメータチューニング MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension ( http://arxiv.org/abs/2409.13609v1 ) ライセンス: Link先を確認	Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin,	(参考訳) Referring Expression Comprehension (REC) は、自然言語を介して局所的な視覚領域を接地することを目的としており、マルチモーダルアライメントに大きく依存するタスクである。既存のほとんどの方法は、強力な事前訓練されたモデルを使用して、完全な微調整によって視覚的/言語的な知識を伝達する。しかし、バックボーン全体の完全な微調整は、事前学習に埋め込まれた豊富な事前知識を損なうだけでなく、計算コストも著しく低下させる。近年,パラメータ効率のよい移動学習法(PETL)が出現し,その課題を効果的かつ効率的に解決することを目指している。これらのPETL法をRECタスクに直接適用するのは不適切である。そこで本研究では,マルチモーダル事前誘導パラメーター効率チューニング(MaPPER)の新たなフレームワークを提案する。具体的には、MaPPERは、アライメントされた事前でガイドされる動的プリエントアダプタと、より正確なローカルセマンティクスを抽出して視覚的知覚を改善するローカルコンボリューションアダプタから構成される。さらに、事前ガイド付きテキストモジュールは、相互モーダルアライメントを容易にするために、事前の利用をさらに促進するために提案されている。 3つの広く使用されているベンチマーク実験の結果、MaPPERは11.41%の調整可能なバックボーンパラメータを持つ完全微調整や他のPETL法と比較して、最も精度が高いことが示された。 Referring Expression Comprehension (REC), which aims to ground a local visual region via natural language, is a task that heavily relies on multimodal alignment. Most existing methods utilize powerful pre-trained models to transfer visual/linguistic knowledge by full fine-tuning. However, full fine-tuning the entire backbone not only breaks the rich prior knowledge embedded in the pre-training, but also incurs significant computational costs. Motivated by the recent emergence of Parameter-Efficient Transfer Learning (PETL) methods, we aim to solve the REC task in an effective and efficient manner. Directly applying these PETL methods to the REC task is inappropriate, as they lack the specific-domain abilities for precise local visual perception and visual-language alignment. Therefore, we propose a novel framework of Multimodal Prior-guided Parameter Efficient Tuning, namely MaPPER. Specifically, MaPPER comprises Dynamic Prior Adapters guided by a aligned prior, and Local Convolution Adapters to extract precise local semantics for better visual perception. Moreover, the Prior-Guided Text module is proposed to further utilize the prior for facilitating the cross-modal alignment. Experimental results on three widely-used benchmarks demonstrate that MaPPER achieves the best accuracy compared to the full fine-tuning and other PETL methods with only 1.41% tunable backbone parameters.	翻訳日:2024-11-07 06:19:44 公開日:2024-10-06
# MaPPER: 表現理解の参照に有効なマルチモーダル事前誘導パラメータチューニング MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension ( http://arxiv.org/abs/2409.13609v2 ) ライセンス: Link先を確認	Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin,	(参考訳) Referring Expression Comprehension (REC) は、自然言語を介して局所的な視覚領域を接地することを目的としており、マルチモーダルアライメントに大きく依存するタスクである。既存のほとんどの方法は、強力な事前訓練されたモデルを使用して、完全な微調整によって視覚的/言語的な知識を伝達する。しかし、バックボーン全体の完全な微調整は、事前学習に埋め込まれた豊富な事前知識を損なうだけでなく、計算コストも著しく低下させる。近年,パラメータ効率のよい移動学習法(PETL)が出現し,その課題を効果的かつ効率的に解決することを目指している。これらのPETL法をRECタスクに直接適用するのは不適切である。そこで本研究では,マルチモーダル事前誘導パラメーター効率チューニング(MaPPER)の新たなフレームワークを提案する。具体的には、MaPPERは、アライメントされた事前でガイドされる動的プリエントアダプタと、より正確なローカルセマンティクスを抽出して視覚的知覚を改善するローカルコンボリューションアダプタから構成される。さらに、事前ガイド付きテキストモジュールは、相互モーダルアライメントを容易にするために、事前の利用をさらに促進するために提案されている。 3つの広く使用されているベンチマーク実験の結果、MaPPERは11.41%の調整可能なバックボーンパラメータを持つ完全微調整や他のPETL法と比較して、最も精度が高いことが示された。私たちのコードはhttps://github.com/liuting20/MaPPERで利用可能です。 Referring Expression Comprehension (REC), which aims to ground a local visual region via natural language, is a task that heavily relies on multimodal alignment. Most existing methods utilize powerful pre-trained models to transfer visual/linguistic knowledge by full fine-tuning. However, full fine-tuning the entire backbone not only breaks the rich prior knowledge embedded in the pre-training, but also incurs significant computational costs. Motivated by the recent emergence of Parameter-Efficient Transfer Learning (PETL) methods, we aim to solve the REC task in an effective and efficient manner. Directly applying these PETL methods to the REC task is inappropriate, as they lack the specific-domain abilities for precise local visual perception and visual-language alignment. Therefore, we propose a novel framework of Multimodal Prior-guided Parameter Efficient Tuning, namely MaPPER. Specifically, MaPPER comprises Dynamic Prior Adapters guided by an aligned prior, and Local Convolution Adapters to extract precise local semantics for better visual perception. Moreover, the Prior-Guided Text module is proposed to further utilize the prior for facilitating the cross-modal alignment. Experimental results on three widely-used benchmarks demonstrate that MaPPER achieves the best accuracy compared to the full fine-tuning and other PETL methods with only 1.41% tunable backbone parameters. Our code is available at https://github.com/liuting20/MaPPER.	翻訳日:2024-11-07 06:19:44 公開日:2024-10-06
# Obliviate:パラメータ効率のよい微調整パラダイムにおけるタスク非依存のバックドアの中立化 Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm ( http://arxiv.org/abs/2409.14119v1 ) ライセンス: Link先を確認	Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,	(参考訳) パラメータ効率のよい微調整(PEFT)は,大規模言語モデルにおいて重要な訓練戦略となっている。しかし、トレーニング可能なパラメータが少ないため、タスクに依存しないバックドアのようなセキュリティリスクが生じる。幅広いタスクに深刻な影響を与えるにもかかわらず、PEFTのコンテキスト内でタスク非依存のバックドアを効果的に対処する実用的な防御ソリューションは存在しない。本研究では,PEFT統合型バックドアディフェンスであるObliviateを紹介する。我々は,PEFT層内の良性ニューロンを増幅し,トリガートークンの影響を罰する2つの手法を開発した。本手法は,3つのPEFTアーキテクチャを対象とした評価により,最先端のタスク非依存バックドア(83.6%$\downarrow$)の攻撃成功率を大幅に低減できることを示す。さらに,タスク固有のバックドアとアダプティブアタックに対する堅牢な防御能力を示す。ソースコードはhttps://github.com/obliviateARR/Obliviateで取得できる。 Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.	翻訳日:2024-11-07 03:33:25 公開日:2024-10-06
# Obliviate:パラメータ効率のよい微調整パラダイムにおけるタスク非依存のバックドアの中立化 Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm ( http://arxiv.org/abs/2409.14119v2 ) ライセンス: Link先を確認	Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,	(参考訳) パラメータ効率のよい微調整(PEFT)は,大規模言語モデルにおいて重要な訓練戦略となっている。しかし、トレーニング可能なパラメータが少ないため、タスクに依存しないバックドアのようなセキュリティリスクが生じる。幅広いタスクに深刻な影響を与えるにもかかわらず、PEFTのコンテキスト内でタスク非依存のバックドアを効果的に対処する実用的な防御ソリューションは存在しない。本研究では,PEFT統合型バックドアディフェンスであるObliviateを紹介する。我々は,PEFT層内の良性ニューロンを増幅し,トリガートークンの影響を罰する2つの手法を開発した。本手法は,3つのPEFTアーキテクチャを対象とした評価により,最先端のタスク非依存バックドア(83.6%$\downarrow$)の攻撃成功率を大幅に低減できることを示す。さらに,タスク固有のバックドアとアダプティブアタックに対する堅牢な防御能力を示す。ソースコードはhttps://github.com/obliviateARR/Obliviateで取得できる。 Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.	翻訳日:2024-11-07 03:33:25 公開日:2024-10-06
# Obliviate:パラメータ効率のよい微調整パラダイムにおけるタスク非依存のバックドアの中立化 Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm ( http://arxiv.org/abs/2409.14119v3 ) ライセンス: Link先を確認	Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,	(参考訳) パラメータ効率のよい微調整(PEFT)は,大規模言語モデルにおいて重要な訓練戦略となっている。しかし、トレーニング可能なパラメータが少ないため、タスクに依存しないバックドアのようなセキュリティリスクが生じる。幅広いタスクに深刻な影響を与えるにもかかわらず、PEFTのコンテキスト内でタスク非依存のバックドアを効果的に対処する実用的な防御ソリューションは存在しない。本研究では,PEFT統合型バックドアディフェンスであるObliviateを紹介する。我々は,PEFT層内の良性ニューロンを増幅し,トリガートークンの影響を罰する2つの手法を開発した。本手法は,3つのPEFTアーキテクチャを対象とした評価により,最先端のタスク非依存バックドア(83.6%$\downarrow$)の攻撃成功率を大幅に低減できることを示す。さらに,タスク固有のバックドアとアダプティブアタックに対する堅牢な防御能力を示す。ソースコードはhttps://github.com/obliviateARR/Obliviateで取得できる。 Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.	翻訳日:2024-11-07 03:33:25 公開日:2024-10-06
# KISS-Matcher: 高速でロバストなクラウド登録が再検討 KISS-Matcher: Fast and Robust Point Cloud Registration Revisited ( http://arxiv.org/abs/2409.15615v2 ) ライセンス: Link先を確認	Hyungtae Lim, Daebeom Kim, Gunhee Shin, Jingnan Shi, Ignacio Vizzo, Hyun Myung, Jaesik Park, Luca Carlone,	(参考訳) グローバルポイントクラウド登録システムはあらゆる面で大きく進歩しているが、多くの研究は特徴抽出、グラフ理論プルーニング、ポーズソルバといった特定のコンポーネントに焦点を当てている。本稿では,この登録問題を総合的に考察し,ポイントクラウド登録のためのオープンソースで汎用的なC++ライブラリである「textit{KISS-Matcher}」を開発する。 KISS-Matcherは、古典的なファストポイント特徴ヒストグラム(FPFH)を改善する新しい特徴検出器 \textit{Faster-PFH} を組み合わせる。さらに、$k$-core-based graph-theoretic pruningを採用して、外れ値対応を拒否する時間の複雑さを低減する。最後に、これらのモジュールを完全で、ユーザフレンドリで、使用可能なパイプラインに統合する。広範な実験によって検証されたように、KISS-Matcherはスケーラビリティと広範囲な適用性に優れており、精度を維持しながら最先端のアウトリア・ロバスト登録パイプラインに比べて大幅に高速化されている。私たちのコードは、 \href{https://github.com/MIT-SPARK/KISS-Matcher}{\texttt{https://github.com/MIT-SPARK/KISS-Matcher}}で利用可能です。 While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers. In this paper, we take a holistic view on the registration problem and develop an open-source and versatile C++ library for point cloud registration, called \textit{KISS-Matcher}. KISS-Matcher combines a novel feature detector, \textit{Faster-PFH}, that improves over the classical fast point feature histogram (FPFH). Moreover, it adopts a $k$-core-based graph-theoretic pruning to reduce the time complexity of rejecting outlier correspondences. Finally, it combines these modules in a complete, user-friendly, and ready-to-use pipeline. As verified by extensive experiments, KISS-Matcher has superior scalability and broad applicability, achieving a substantial speed-up compared to state-of-the-art outlier-robust registration pipelines while preserving accuracy. Our code will be available at \href{https://github.com/MIT-SPARK/KISS-Matcher}{\texttt{https://github.com/MIT-SPARK/KISS-Matcher}}.	翻訳日:2024-11-06 19:32:29 公開日:2024-10-06
# AI安全のためのマシン・アンラーニングの敵対的展望 An Adversarial Perspective on Machine Unlearning for AI Safety ( http://arxiv.org/abs/2409.18025v2 ) ライセンス: Link先を確認	Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando,	(参考訳) 大きな言語モデルは、有害な知識に関する質問を拒否するために微調整されているが、これらの保護はしばしばバイパスされる。アンラーニング手法は、モデルから有害な能力を完全に取り除き、敵に近づかないようにすることを目的としている。この研究は、非学習と従来の訓練後の安全性の基本的な相違に敵対的な観点から挑戦する。既存のjailbreakメソッドは、これまで未学習に対して効果がないと報告されていたが、慎重に適用した場合に成功できることを実証する。さらに、最も未学習と思われる能力を回復する様々な適応手法を開発した。例えば、アクティベーション空間における10の非関連例の微調整や特定の方向の除去は、最先端の未学習手法であるRMUで編集されたモデルに対して最も有害な能力を回復できることを示す。我々の研究は、現在の未学習アプローチの堅牢性に挑戦し、安全性トレーニングよりも彼らの優位性に疑問を投げかけている。 Large language models are finetuned to refuse questions about hazardous knowledge, but these protections can often be bypassed. Unlearning methods aim at completely removing hazardous capabilities from models and make them inaccessible to adversaries. This work challenges the fundamental differences between unlearning and traditional safety post-training from an adversarial perspective. We demonstrate that existing jailbreak methods, previously reported as ineffective against unlearning, can be successful when applied carefully. Furthermore, we develop a variety of adaptive methods that recover most supposedly unlearned capabilities. For instance, we show that finetuning on 10 unrelated examples or removing specific directions in the activation space can recover most hazardous capabilities for models edited with RMU, a state-of-the-art unlearning method. Our findings challenge the robustness of current unlearning approaches and question their advantages over safety training.	翻訳日:2024-11-06 15:51:02 公開日:2024-10-06
# 因果推論エンジンとしての深部自己回帰モデル Using Deep Autoregressive Models as Causal Inference Engines ( http://arxiv.org/abs/2409.18581v2 ) ライセンス: Link先を確認	Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho,	(参考訳) 既存の因果推論(CI)モデルは、主に低次元の共同設立者とシングルトンアクションを扱うことに限られている。本稿では,現代アプリケーションに共通する複雑な共同創設者とシーケンシャルアクションを処理可能な自己回帰型(AR)CIフレームワークを提案する。このことは、基礎となる因果線図からトークンの列に変換することによって達成される。このアプローチは、任意のDAGから生成されたデータによるトレーニングを可能にするだけでなく、既存のCI機能を拡張して、.em single}モデルを使用していくつかの統計量の推定を可能にする。介入確率を直接予測し、推論を簡素化し、結果予測精度を向上することができる。我々は,CIに適応したARモデルは,迷路をナビゲートしたり,チェスのエンドゲームを行ったり,あるキーワードが紙の受容率に与える影響を評価するなど,様々な複雑な応用において効率的かつ効果的であることが実証された。 Existing causal inference (CI) models are limited to primarily handling low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions common in modern applications. We accomplish this by {\em sequencification}, transforming data from an underlying causal diagram into a sequence of tokens. This approach not only enables training with data generated from any DAG but also extends existing CI capabilities to accommodate estimating several statistical quantities using a {\em single} model. We can directly predict interventional probabilities, simplifying inference and enhancing outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.	翻訳日:2024-11-06 05:42:34 公開日:2024-10-06
# ワープ合成によるニューラル製品重要度サンプリング Neural Product Importance Sampling via Warp Composition ( http://arxiv.org/abs/2409.18974v1 ) ライセンス: Link先を確認	Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev,	(参考訳) 現代のフォトリアリスティックレンダリングのヒンジにおいて高効率を達成するには、各ピクセルで推定される照明積分を近似したモンテカルロサンプリング分布を用いる。サンプルは通常、単純な分布の集合から生成され、それぞれがインテグレードの異なる因子をターゲットにしており、複数の重要なサンプリングによって結合される。結果として生じる混合分布は、すべての因子の実際の生成物から遠く離れており、直接照明推定においても準最適分散をもたらす。本稿では, 環境照明や材料用語の積である試料照明製品積分を効率よく重要にするために, 正規化フローを用いた学習に基づく手法を提案する。サンプルはエミッタテールワープでフローヘッドワープを構成する。小型のコンディショナルヘッドワープはニューラルスプラインフローで表現され、大型のアンコンディショナルテールは環境マップ毎に離散化され、その評価は瞬時に行われる。コンディショニングが低次元であれば、ヘッドワープを識別してより優れた性能が得られる。複雑な幾何学, 材料, 照明などを含む様々な応用において, 先行手法による分散の低減を実証する。 Achieving high efficiency in modern photorealistic rendering hinges on using Monte Carlo sampling distributions that closely approximate the illumination integral estimated for every pixel. Samples are typically generated from a set of simple distributions, each targeting a different factor in the integrand, which are combined via multiple importance sampling. The resulting mixture distribution can be far from the actual product of all factors, leading to sub-optimal variance even for direct-illumination estimation. We present a learning-based method that uses normalizing flows to efficiently importance sample illumination product integrals, e.g., the product of environment lighting and material terms. Our sampler composes a flow head warp with an emitter tail warp. The small conditional head warp is represented by a neural spline flow, while the large unconditional tail is discretized per environment map and its evaluation is instant. If the conditioning is low-dimensional, the head warp can be also discretized to achieve even better performance. We demonstrate variance reduction over prior methods on a range of applications comprising complex geometry, materials and illumination.	翻訳日:2024-11-06 05:22:52 公開日:2024-10-06
# ワープ合成によるニューラル製品重要度サンプリング Neural Product Importance Sampling via Warp Composition ( http://arxiv.org/abs/2409.18974v2 ) ライセンス: Link先を確認	Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev,	(参考訳) 現代のフォトリアリスティックレンダリングのヒンジにおいて高効率を達成するには、各ピクセルで推定される照明積分を近似したモンテカルロサンプリング分布を用いる。サンプルは通常、単純な分布の集合から生成され、それぞれがインテグレードの異なる因子をターゲットにしており、複数の重要なサンプリングによって結合される。結果として生じる混合分布は、すべての因子の実際の生成物から遠く離れており、直接照明推定においても準最適分散をもたらす。本稿では, 環境照明や材料用語の積である試料照明製品積分を効率よく重要にするために, 正規化フローを用いた学習に基づく手法を提案する。サンプルはエミッタテールワープでフローヘッドワープを構成する。小型のコンディショナルヘッドワープはニューラルスプラインフローで表現され、大型のアンコンディショナルテールは環境マップ毎に離散化され、その評価は瞬時に行われる。コンディショニングが低次元であれば、ヘッドワープを識別してより優れた性能が得られる。複雑な幾何学, 材料, 照明などを含む様々な応用において, 先行手法による分散の低減を実証する。 Achieving high efficiency in modern photorealistic rendering hinges on using Monte Carlo sampling distributions that closely approximate the illumination integral estimated for every pixel. Samples are typically generated from a set of simple distributions, each targeting a different factor in the integrand, which are combined via multiple importance sampling. The resulting mixture distribution can be far from the actual product of all factors, leading to sub-optimal variance even for direct-illumination estimation. We present a learning-based method that uses normalizing flows to efficiently importance sample illumination product integrals, e.g., the product of environment lighting and material terms. Our sampler composes a flow head warp with an emitter tail warp. The small conditional head warp is represented by a neural spline flow, while the large unconditional tail is discretized per environment map and its evaluation is instant. If the conditioning is low-dimensional, the head warp can be also discretized to achieve even better performance. We demonstrate variance reduction over prior methods on a range of applications comprising complex geometry, materials and illumination.	翻訳日:2024-11-06 05:22:52 公開日:2024-10-06
# オンライン直接選好最適化におけるサンプリングの役割 The Crucial Role of Samplers in Online Direct Preference Optimization ( http://arxiv.org/abs/2409.19605v1 ) ライセンス: Link先を確認	Ruizhe Shi, Runlong Zhou, Simon S. Du,	(参考訳) DPO(Direct Preference Optimization)は、言語モデルアライメントのための安定的でスケーラブルで効率的なソリューションとして登場した。経験的な成功にもかかわらず、$\textit{optimization}$プロパティ、特に、その収束率に対するサンプルの影響は未定のままである。本稿では,DPO の $\textit{convergence rate}$ の厳密な分析を行い,厳密な勾配設定の下で異なるサンプリング戦略を用いて,一様サンプリングが $\textit{linear}$ 収束を達成し,提案するオンラインサンプリングは $\textit{quadratic}$ 収束を達成した。さらに、後続分布と$\textit{logit mix}$を組み込むことにより、サンプルを実用的な設定に適応させ、従来のアプローチよりも大幅に改善したことを示す。 Safe-RLHFデータセットでは,バニラDPOよりも4.5ドル%,オンポラDPOより3.0ドル%,Iterative-PromptではバニラDPO,オンポラDPO,Hybrid GSHFよりも4.2ドル%向上した。我々の結果は、DPOの理論的立場に関する洞察を提供するだけでなく、将来的なアルゴリズム設計の道を開いた。 Direct Preference Optimization (DPO) has emerged as a stable, scalable, and efficient solution for language model alignment. Despite its empirical success, the $\textit{optimization}$ properties, particularly the impact of samplers on its convergence rates, remain underexplored. In this paper, we provide a rigorous analysis of DPO's $\textit{convergence rates}$ with different sampling strategies under the exact gradient setting, revealing a surprising separation: uniform sampling achieves $\textit{linear}$ convergence, while our proposed online sampler achieves $\textit{quadratic}$ convergence. We further adapt the sampler to practical settings by incorporating posterior distributions and $\textit{logit mixing}$, demonstrating significant improvements over previous approaches. On Safe-RLHF dataset, our method exhibits a $4.5$% improvement over vanilla DPO and a $3.0$% improvement over on-policy DPO; on Iterative-Prompt, our approach outperforms vanilla DPO, on-policy DPO, and Hybrid GSHF by over $4.2$%. Our results not only offer insights into the theoretical standing of DPO but also pave the way for potential algorithm designs in the future.	翻訳日:2024-11-05 22:18:46 公開日:2024-10-06
# オンライン直接選好最適化におけるサンプリングの役割 The Crucial Role of Samplers in Online Direct Preference Optimization ( http://arxiv.org/abs/2409.19605v2 ) ライセンス: Link先を確認	Ruizhe Shi, Runlong Zhou, Simon S. Du,	(参考訳) DPO(Direct Preference Optimization)は、言語モデルアライメントのための安定的でスケーラブルで効率的なソリューションとして登場した。経験的な成功にもかかわらず、$\textit{optimization}$プロパティ、特に、その収束率に対するサンプルの影響は未定のままである。本稿では,DPO の $\textit{convergence rate}$ の厳密な分析を行い,厳密な勾配設定の下で異なるサンプリング戦略を用いて,一様サンプリングが $\textit{linear}$ 収束を達成し,提案するオンラインサンプリングは $\textit{quadratic}$ 収束を達成した。さらに、後続分布と$\textit{logit mix}$を組み込むことにより、サンプルを実用的な設定に適応させ、従来のアプローチよりも大幅に改善したことを示す。 Safe-RLHFデータセットでは,バニラDPOよりも4.5ドル%,オンポラDPOより3.0ドル%,Iterative-PromptではバニラDPO,オンポラDPO,Hybrid GSHFよりも4.2ドル%向上した。我々の結果は、DPOの理論的立場に関する洞察を提供するだけでなく、将来的なアルゴリズム設計の道を開いた。 Direct Preference Optimization (DPO) has emerged as a stable, scalable, and efficient solution for language model alignment. Despite its empirical success, the $\textit{optimization}$ properties, particularly the impact of samplers on its convergence rates, remain underexplored. In this paper, we provide a rigorous analysis of DPO's $\textit{convergence rates}$ with different sampling strategies under the exact gradient setting, revealing a surprising separation: uniform sampling achieves $\textit{linear}$ convergence, while our proposed online sampler achieves $\textit{quadratic}$ convergence. We further adapt the sampler to practical settings by incorporating posterior distributions and $\textit{logit mixing}$, demonstrating significant improvements over previous approaches. On Safe-RLHF dataset, our method exhibits a $4.5$% improvement over vanilla DPO and a $3.0$% improvement over on-policy DPO; on Iterative-Prompt, our approach outperforms vanilla DPO, on-policy DPO, and Hybrid GSHF by over $4.2$%. Our results not only offer insights into the theoretical standing of DPO but also pave the way for potential algorithm designs in the future.	翻訳日:2024-11-05 22:18:46 公開日:2024-10-06
# ラディアタパインブランチ検出と距離測定のためのドローンステレオビジョン:ディープラーニングとYOLOの統合を活用して Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration ( http://arxiv.org/abs/2410.00503v1 ) ライセンス: Link先を確認	Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green,	(参考訳) 本研究は,木の枝の空間的位置を正確に検出・測定する,刈り取り工具とステレオビジョンカメラを備えたドローンの開発に焦点をあてる。分岐セグメンテーションにはYOLOを用い, モノクラーとステレオの2つの深度推定手法について検討した。 SGBMと比較して、ディープラーニング技術はより洗練され正確な深度マップを生成する。深部ニューラルネットワークを用いた微調整処理を最適深度値の近似に応用した。この手法は正確な分岐検出と距離測定を容易にし、刈り取り作業の自動化における重要な課題に対処する。その結果、農業分野におけるイノベーションの推進と自動化の促進を深層学習がもたらす可能性について、精度と効率の両面で顕著な進歩が示された。 This research focuses on the development of a drone equipped with pruning tools and a stereo vision camera to accurately detect and measure the spatial positions of tree branches. YOLO is employed for branch segmentation, while two depth estimation approaches, monocular and stereo, are investigated. In comparison to SGBM, deep learning techniques produce more refined and accurate depth maps. In the absence of ground-truth data, a fine-tuning process using deep neural networks is applied to approximate optimal depth values. This methodology facilitates precise branch detection and distance measurement, addressing critical challenges in the automation of pruning operations. The results demonstrate notable advancements in both accuracy and efficiency, underscoring the potential of deep learning to drive innovation and enhance automation in the agricultural sector.	翻訳日:2024-11-05 05:16:55 公開日:2024-10-06
# ラディアタパインブランチ検出と距離測定のためのドローンステレオビジョン:ディープラーニングとYOLOの統合を活用して Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration ( http://arxiv.org/abs/2410.00503v2 ) ライセンス: Link先を確認	Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green,	(参考訳) 本研究は,木の枝の空間的位置を正確に検出・測定する,刈り取り工具とステレオビジョンカメラを備えたドローンの開発に焦点をあてる。分岐セグメンテーションにはYOLOを用い, モノクラーとステレオの2つの深度推定手法について検討した。 SGBMと比較して、ディープラーニング技術はより洗練され正確な深度マップを生成する。深部ニューラルネットワークを用いた微調整処理を最適深度値の近似に応用した。この手法は正確な分岐検出と距離測定を容易にし、刈り取り作業の自動化における重要な課題に対処する。その結果、農業分野におけるイノベーションの推進と自動化の促進を深層学習がもたらす可能性について、精度と効率の両面で顕著な進歩が示された。 This research focuses on the development of a drone equipped with pruning tools and a stereo vision camera to accurately detect and measure the spatial positions of tree branches. YOLO is employed for branch segmentation, while two depth estimation approaches, monocular and stereo, are investigated. In comparison to SGBM, deep learning techniques produce more refined and accurate depth maps. In the absence of ground-truth data, a fine-tuning process using deep neural networks is applied to approximate optimal depth values. This methodology facilitates precise branch detection and distance measurement, addressing critical challenges in the automation of pruning operations. The results demonstrate notable advancements in both accuracy and efficiency, underscoring the potential of deep learning to drive innovation and enhance automation in the agricultural sector.	翻訳日:2024-11-05 05:16:55 公開日:2024-10-06
# 機械学習によるIoTセキュリティ向上のための侵入検知 Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security ( http://arxiv.org/abs/2410.01016v1 ) ライセンス: Link先を確認	Mona Esmaeili, Morteza Rahimi, Matin Khajavi, Dorsa Farahmand, Hadi Jabbari Saray,	(参考訳) IoT(Internet of Things)に対する攻撃は、デバイス、アプリケーション、インタラクションのネットワーク化と統合化が進むにつれて増加している。 IoTネットワークをターゲットにしたサイバー攻撃の増加は、プライバシ、セキュリティ、機能、重要なシステムの可用性に重大な脆弱性と脅威をもたらし、運用上の障害、財務的損失、ID盗難、データ漏洩につながる。 IoTデバイスを効率的にセキュアにするためには、侵入システムのリアルタイム検出が不可欠だ。本稿では、IoTセキュリティのための機械学習による侵入検出戦略に関する最新の研究について、リアルタイム応答性、検出精度、アルゴリズム効率に集中して検討する。主要な研究は、よく知られたすべての学術データベースからレビューされ、既存のアプローチのための分類学が提供された。このレビューでは、既存の研究ギャップを強調し、現在のIoTセキュリティフレームワークの限界を概説し、将来の研究の方向性と開発に実用的な洞察を提供する。 Attacks against the Internet of Things (IoT) are rising as devices, applications, and interactions become more networked and integrated. The increase in cyber-attacks that target IoT networks poses a huge vulnerability and threat to the privacy, security, functionality, and availability of critical systems, which leads to operational disruptions, financial losses, identity thefts, and data breaches. To efficiently secure IoT devices, real-time detection of intrusion systems is critical, especially those using machine learning to identify threats and mitigate risks and vulnerabilities. This paper investigates the latest research on machine learning-based intrusion detection strategies for IoT security, concentrating on real-time responsiveness, detection accuracy, and algorithm efficiency. Key studies were reviewed from all well-known academic databases, and a taxonomy was provided for the existing approaches. This review also highlights existing research gaps and outlines the limitations of current IoT security frameworks to offer practical insights for future research directions and developments.	翻訳日:2024-11-04 23:40:11 公開日:2024-10-06
# 機械学習によるIoTセキュリティ向上のための侵入検知 Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security ( http://arxiv.org/abs/2410.01016v2 ) ライセンス: Link先を確認	Mona Esmaeili, Morteza Rahimi, Hadise Pishdast, Dorsa Farahmandazad, Matin Khajavi, Hadi Jabbari Saray,	(参考訳) IoT(Internet of Things)に対する攻撃は、デバイス、アプリケーション、インタラクションのネットワーク化と統合化が進むにつれて増加している。 IoTネットワークをターゲットにしたサイバー攻撃の増加は、プライバシ、セキュリティ、機能、重要なシステムの可用性に重大な脆弱性と脅威をもたらし、運用上の障害、財務的損失、ID盗難、データ漏洩につながる。 IoTデバイスを効率的にセキュアにするためには、侵入システムのリアルタイム検出が不可欠だ。本稿では、IoTセキュリティのための機械学習による侵入検出戦略に関する最新の研究について、リアルタイム応答性、検出精度、アルゴリズム効率に集中して検討する。主要な研究は、よく知られたすべての学術データベースからレビューされ、既存のアプローチのための分類学が提供された。このレビューでは、既存の研究ギャップを強調し、現在のIoTセキュリティフレームワークの限界を概説し、将来の研究の方向性と開発に実用的な洞察を提供する。 Attacks against the Internet of Things (IoT) are rising as devices, applications, and interactions become more networked and integrated. The increase in cyber-attacks that target IoT networks poses a considerable vulnerability and threat to the privacy, security, functionality, and availability of critical systems, which leads to operational disruptions, financial losses, identity thefts, and data breaches. To efficiently secure IoT devices, real-time detection of intrusion systems is critical, especially those using machine learning to identify threats and mitigate risks and vulnerabilities. This paper investigates the latest research on machine learning-based intrusion detection strategies for IoT security, concentrating on real-time responsiveness, detection accuracy, and algorithm efficiency. Key studies were reviewed from all well-known academic databases, and a taxonomy was provided for the existing approaches. This review also highlights existing research gaps and outlines the limitations of current IoT security frameworks to offer practical insights for future research directions and developments.	翻訳日:2024-11-04 23:40:11 公開日:2024-10-06
# GaussianBlock: プリミティブとガウシアンによるパートアウェアな構成と編集可能な3Dシーンの構築 GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians ( http://arxiv.org/abs/2410.01535v1 ) ライセンス: Link先を確認	Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao,	(参考訳) 近年, ニューラルレージアン場とガウススプラッティングの発展に伴い, 3次元再構成技術は極めて高い忠実性を実現している。しかし、これらの手法によって学習される潜在表現は非常に絡み合っており、解釈可能性に欠ける。本稿では,GussianBlockと呼ばれる新しい部分認識型合成再構成手法を提案する。これは意味的一貫性と非絡み合いの表現を可能にし,高い忠実さを同時に維持しつつ,ビルディングブロックに類似した正確な物理的編集を可能にする。我々のGaussianBlockは、フレキシブルな動作性と編集性で知られるプリミティブと、再現性に優れた3D Gaussianの両方の利点を生かしたハイブリッド表現を導入しています。具体的には,2次元のセマンティックプリミティブから誘導される新たな注意誘導中心的損失を,動的分裂と融合戦略によって補うことによって,意味的コヒーレントなプリミティブを実現する。さらに, プリミティブとハイブリダイゼーションした3次元ガウスアンを用いて, 構造的詳細を洗練し, 忠実度を高める。さらに、この2つの接続を強化し維持するために、バインディング継承戦略が採用されている。再構成されたシーンは、様々なベンチマークで絡み合っていて、構成的でコンパクトで、シームレスで、直接的で、正確な編集が可能で、高品質を維持しています。 Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.	翻訳日:2024-11-04 17:14:45 公開日:2024-10-06
# GaussianBlock: プリミティブとガウシアンによるパートアウェアな構成と編集可能な3Dシーンの構築 GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians ( http://arxiv.org/abs/2410.01535v2 ) ライセンス: Link先を確認	Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao,	(参考訳) 近年, ニューラルレージアン場とガウススプラッティングの発展に伴い, 3次元再構成技術は極めて高い忠実性を実現している。しかし、これらの手法によって学習される潜在表現は非常に絡み合っており、解釈可能性に欠ける。本稿では,GussianBlockと呼ばれる新しい部分認識型合成再構成手法を提案する。これは意味的一貫性と非絡み合いの表現を可能にし,高い忠実さを同時に維持しつつ,ビルディングブロックに類似した正確な物理的編集を可能にする。我々のGaussianBlockは、フレキシブルな動作性と編集性で知られるプリミティブと、再現性に優れた3D Gaussianの両方の利点を生かしたハイブリッド表現を導入しています。具体的には,2次元のセマンティックプリミティブから誘導される新たな注意誘導中心的損失を,動的分裂と融合戦略によって補うことによって,意味的コヒーレントなプリミティブを実現する。さらに, プリミティブとハイブリダイゼーションした3次元ガウスアンを用いて, 構造的詳細を洗練し, 忠実度を高める。さらに、この2つの接続を強化し維持するために、バインディング継承戦略が採用されている。再構成されたシーンは、様々なベンチマークで絡み合っていて、構成的でコンパクトで、シームレスで、直接的で、正確な編集が可能で、高品質を維持しています。 Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.	翻訳日:2024-11-04 17:14:45 公開日:2024-10-06
# 心臓MRIの総合的評価のためのビジョン基礎モデルに向けて Towards a vision foundation model for comprehensive assessment of Cardiac MRI ( http://arxiv.org/abs/2410.01665v1 ) ライセンス: Link先を確認	Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert,	(参考訳) 心臓磁気共鳴イメージング(CMR)は、非侵襲的心臓アセスメントのゴールドスタンダードと考えられており、多種多様な画像処理タスクを必要とする多種多様な複雑なモダリティである。ディープラーニングの進歩により、これらのタスクのための最先端(SoTA)モデルの開発が可能になった。しかし、モデルトレーニングは、特にあまり一般的でない画像シーケンスにおいて、データとラベルの不足のために困難である。さらに、各モデルは特定のタスクに対してトレーニングされることが多く、関連するタスクの間には関連性がない。本研究では,3600万枚のCMR画像に対して,自己教師付きで訓練したCMR評価のための視覚基礎モデルを提案する。次に、分類、セグメント化、ランドマークの局在化、病理診断など、CMRワークフローに典型的な9つの臨床的タスクの教師付き方法でモデルを微調整する。すべてのタスクにおいて、ラベル付きデータセットサイズの範囲で、精度と堅牢性が改善されたことを実証する。また,画像解析の課題として,ラベル付きサンプルの少なさによる数ショット学習の改善も示した。我々は,ほとんどの臨床作業において,SoTAに匹敵するアウト・オブ・ボックス性能を実現する。提案手法は,注記データが少ない場合でも,画像解析タスクのための深層学習ベースのソリューションの開発を加速する可能性があり,CMR評価のための資源効率,統一的なフレームワークを提供する。 Cardiac magnetic resonance imaging (CMR), considered the gold standard for noninvasive cardiac assessment, is a diverse and complex modality requiring a wide variety of image processing tasks for comprehensive assessment of cardiac morphology and function. Advances in deep learning have enabled the development of state-of-the-art (SoTA) models for these tasks. However, model training is challenging due to data and label scarcity, especially in the less common imaging sequences. Moreover, each model is often trained for a specific task, with no connection between related tasks. In this work, we introduce a vision foundation model trained for CMR assessment, that is trained in a self-supervised fashion on 36 million CMR images. We then finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow, across classification, segmentation, landmark localization, and pathology detection. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes. We also demonstrate improved few-shot learning with fewer labeled samples, a common challenge in medical image analyses. We achieve an out-of-box performance comparable to SoTA for most clinical tasks. The proposed method thus presents a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.	翻訳日:2024-11-04 16:13:24 公開日:2024-10-06
# 心臓MRIの総合的評価のためのビジョン基礎モデルに向けて Towards a vision foundation model for comprehensive assessment of Cardiac MRI ( http://arxiv.org/abs/2410.01665v2 ) ライセンス: Link先を確認	Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert,	(参考訳) 心臓磁気共鳴イメージング(CMR)は、非侵襲的心臓アセスメントのゴールドスタンダードと考えられており、多種多様な画像処理タスクを必要とする多種多様な複雑なモダリティである。ディープラーニングの進歩により、これらのタスクのための最先端(SoTA)モデルの開発が可能になった。しかし、モデルトレーニングは、特にあまり一般的でない画像シーケンスにおいて、データとラベルの不足のために困難である。さらに、各モデルは特定のタスクに対してトレーニングされることが多く、関連するタスクの間には関連性がない。本研究では,3600万枚のCMR画像に対して,自己教師付きで訓練したCMR評価のための視覚基礎モデルを提案する。次に、分類、セグメント化、ランドマークの局在化、病理診断など、CMRワークフローに典型的な9つの臨床的タスクの教師付き方法でモデルを微調整する。すべてのタスクにおいて、ラベル付きデータセットサイズの範囲で、精度と堅牢性が改善されたことを実証する。また,画像解析の課題として,ラベル付きサンプルの少なさによる数ショット学習の改善も示した。我々は,ほとんどの臨床作業において,SoTAに匹敵するアウト・オブ・ボックス性能を実現する。提案手法は,注記データが少ない場合でも,画像解析タスクのための深層学習ベースのソリューションの開発を加速する可能性があり,CMR評価のための資源効率,統一的なフレームワークを提供する。 Cardiac magnetic resonance imaging (CMR), considered the gold standard for noninvasive cardiac assessment, is a diverse and complex modality requiring a wide variety of image processing tasks for comprehensive assessment of cardiac morphology and function. Advances in deep learning have enabled the development of state-of-the-art (SoTA) models for these tasks. However, model training is challenging due to data and label scarcity, especially in the less common imaging sequences. Moreover, each model is often trained for a specific task, with no connection between related tasks. In this work, we introduce a vision foundation model trained for CMR assessment, that is trained in a self-supervised fashion on 36 million CMR images. We then finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow, across classification, segmentation, landmark localization, and pathology detection. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes. We also demonstrate improved few-shot learning with fewer labeled samples, a common challenge in medical image analyses. We achieve an out-of-box performance comparable to SoTA for most clinical tasks. The proposed method thus presents a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.	翻訳日:2024-11-04 16:13:24 公開日:2024-10-06
# FARM: 小分子の関数型グループ認識表現 FARM: Functional Group-Aware Representations for Small Molecules ( http://arxiv.org/abs/2410.02082v1 ) ライセンス: Link先を確認	Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji,	(参考訳) SMILES,自然言語,分子グラフのギャップを埋める新しい基礎モデルであるFARM(Functional Group-Aware Representations for Small Molecules)を紹介する。 FARMの鍵となる革新は、関数型グループ認識トークン化であり、関数型グループ情報を表現に直接組み込む。この戦略的なトークン化粒度の減少は、機能的特性の主要な要因(すなわち、官能基)と意図的に相互作用し、化学言語に対するモデルの理解を高め、化学レキシコンを拡張し、SMILESと自然言語をより効果的にブリッジし、最終的に分子特性を予測する能力を向上させる。 FARMはまた、原子レベルの特徴を捉えるためにマスク付き言語モデリングを使用することと、分子トポロジ全体を符号化するためにグラフニューラルネットワークを使用することである。対照的な学習を活用することで、FARMはこれらの2つの表現のビューを統一された分子埋め込みに整列させる。 MoleculeNetデータセット上でFARMを厳格に評価し、12タスク中10タスクで最先端のパフォーマンスを実現しています。これらの結果は、FARMが分子表現学習を改善する可能性を浮き彫りにし、医薬品発見や薬学研究に有望な応用が期待できる。 We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which incorporates functional group information directly into the representations. This strategic reduction in tokenization granularity in a way that is intentionally interfaced with key drivers of functional properties (i.e., functional groups) enhances the model's understanding of chemical language, expands the chemical lexicon, more effectively bridging SMILES and natural language, and ultimately advances the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. By leveraging contrastive learning, FARM aligns these two views of representations into a unified molecular embedding. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research.	翻訳日:2024-11-04 09:05:40 公開日:2024-10-06
# FARM: 小分子の関数型グループ認識表現 FARM: Functional Group-Aware Representations for Small Molecules ( http://arxiv.org/abs/2410.02082v2 ) ライセンス: Link先を確認	Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji,	(参考訳) SMILES,自然言語,分子グラフのギャップを埋める新しい基礎モデルであるFARM(Functional Group-Aware Representations for Small Molecules)を紹介する。 FARMの鍵となる革新は、関数型グループ認識トークン化であり、関数型グループ情報を表現に直接組み込む。このトークン化の粒度の戦略的削減は、故意に機能的特性(すなわち、機能的群)のキードライバと一致し、モデルの化学言語に対する理解を深める。化学レキシコンを拡大することにより、FARMはSMILESと自然言語をより効果的に橋渡しし、最終的にモデルの能力を高めて分子特性を予測する。 FARMはまた、原子レベルの特徴を捉えるためにマスク付き言語モデリングを使用することと、分子トポロジ全体を符号化するためにグラフニューラルネットワークを使用することである。対照的な学習を活用することで、FARMはこれらの2つの表現のビューを統一された分子埋め込みに整列させる。 MoleculeNetデータセット上でFARMを厳格に評価し、12タスク中10タスクで最先端のパフォーマンスを実現しています。これらの結果は、FARMが分子表現学習を改善する可能性を浮き彫りにし、医薬品発見や薬学研究に有望な応用が期待できる。 We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i.e., functional groups), enhancing the model's understanding of chemical language. By expanding the chemical lexicon, FARM more effectively bridges SMILES and natural language, ultimately advancing the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. By leveraging contrastive learning, FARM aligns these two views of representations into a unified molecular embedding. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research.	翻訳日:2024-11-04 09:05:40 公開日:2024-10-06
# 分散システムのモデル誘導ファジィリング Model-guided Fuzzing of Distributed Systems ( http://arxiv.org/abs/2410.02307v1 ) ライセンス: Link先を確認	Ege Berkay Gulcan, Burcu Kulahcioglu Ozkan, Rupak Majumdar, Srinidhi Nagendra,	(参考訳) 本稿では,分散システム実装のためのカバレッジ誘導テストアルゴリズムを提案する。私たちの主な革新は、カバレッジを定義するために使用されるシステムの抽象的な形式モデルを使用することです。このような抽象モデルはプロトコル設計と検証の初期段階でしばしば開発されるが、テスト時にはあまり使われない。モデルカバレッジを用いたランダムなテスト生成の導出は,実装状態空間における興味深い点をカバーするのに有効であることを示す。我々は,TLA+で記述された分散システム実装と抽象モデルのためのファジィザを実装した。提案アルゴリズムは,スケジューラのカバレッジと突然変異の異なる概念によって導かれるランダム探索と同様に,純粋にランダムな探索よりも優れたカバレッジを示す。特に、Etcd-raftやRedisRaftのような分散コンセンサスプロトコルの実装において、常に高いカバレッジを示し、バグを高速に検出する。さらに, モデル誘導ファズリングでのみ検出できるバグが13件発見されている。 We present a coverage-guided testing algorithm for distributed systems implementations. Our main innovation is the use of an abstract formal model of the system that is used to define coverage. Such abstract models are frequently developed in early phases of protocol design and verification but are infrequently used at testing time. We show that guiding random test generation using model coverage can be effective in covering interesting points in the implementation state space. We have implemented a fuzzer for distributed system implementations and abstract models written in TLA+. Our algorithm shows better coverage over purely random exploration as well as random exploration guided by different notions of scheduler coverage and mutation. In particular, we show consistently higher coverage and detect bugs faster on implementations of distributed consensus protocols such as those in Etcd-raft and RedisRaft. Moreover, we discovered 13 previously unknown bugs in their implementations, four of which could only be detected by model-guided fuzzing.	翻訳日:2024-11-04 04:00:02 公開日:2024-10-06
# 分散システムのモデル誘導ファジィリング Model-guided Fuzzing of Distributed Systems ( http://arxiv.org/abs/2410.02307v2 ) ライセンス: Link先を確認	Ege Berkay Gulcan, Burcu Kulahcioglu Ozkan, Rupak Majumdar, Srinidhi Nagendra,	(参考訳) 本稿では,分散システム実装のためのカバレッジ誘導テストアルゴリズムを提案する。私たちの主な革新は、カバレッジを定義するために使用されるシステムの抽象的な形式モデルを使用することです。このような抽象モデルはプロトコル設計と検証の初期段階でしばしば開発されるが、テスト時にはあまり使われない。モデルカバレッジを用いたランダムなテスト生成の導出は,実装状態空間における興味深い点をカバーするのに有効であることを示す。我々は,TLA+で記述された分散システム実装と抽象モデルのためのファジィザを実装した。提案アルゴリズムは,スケジューラのカバレッジと突然変異の異なる概念によって導かれるランダム探索と同様に,純粋にランダムな探索よりも優れたカバレッジを示す。特に、Etcd-raftやRedisRaftのような分散コンセンサスプロトコルの実装において、常に高いカバレッジを示し、バグを高速に検出する。さらに, モデル誘導ファズリングでのみ検出できるバグが13件発見されている。 We present a coverage-guided testing algorithm for distributed systems implementations. Our main innovation is the use of an abstract formal model of the system that is used to define coverage. Such abstract models are frequently developed in early phases of protocol design and verification but are infrequently used at testing time. We show that guiding random test generation using model coverage can be effective in covering interesting points in the implementation state space. We have implemented a fuzzer for distributed system implementations and abstract models written in TLA+. Our algorithm shows better coverage over purely random exploration as well as random exploration guided by different notions of scheduler coverage and mutation. In particular, we show consistently higher coverage and detect bugs faster on implementations of distributed consensus protocols such as those in Etcd-raft and RedisRaft. Moreover, we discovered 13 previously unknown bugs in their implementations, four of which could only be detected by model-guided fuzzing.	翻訳日:2024-11-04 04:00:02 公開日:2024-10-06
# LoGra-Med: 医用ビジョンランゲージモデルのためのLong Context Multi-Graphアライメント LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model ( http://arxiv.org/abs/2410.02615v1 ) ライセンス: Link先を確認	Duy M. H. Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zhou, Daniel Sonntag, Mathias Niepert,	(参考訳) LLaVA-MedやBioMedGPTのような最先端の医療マルチモーダルな大規模言語モデル(med-MLLM)は、事前トレーニングで命令追跡データを活用する。しかしながら、これらのモデルは、主に自己回帰学習の目的に依存しながら、パフォーマンスを向上させるために、モデルサイズとデータボリュームのスケーリングに重点を置いています。驚くべきことに、このような学習スキームが視覚と言語モダリティの整合性の弱さを招き、これらのモデルを広範囲な事前学習データセットに非常に依存させることは、医療領域において、高品質な命令追跡インスタンスをキュレートする費用と時間のかかる性質のため、大きな課題である。画像のモダリティ、会話に基づく記述、拡張キャプションの3重相関を強制する新しいマルチグラフアライメントアルゴリズムであるLoGra-Medでこの問題に対処する。これにより、モデルが文脈的意味を捉え、言語的多様性を扱い、視覚とテキスト間の相互関連を構築するのに役立つ。提案手法をスケールするために,ブラックボックス勾配推定を用いた効率的なエンドツーエンド学習方式を設計し,LLaMa 7Bの学習を高速化した。以上の結果から,LoGra-Medは医療用VQAの600K画像テキスト対に対してLAVA-Medと一致し,その10%でトレーニングした場合に有意に優れていた。例えば、VQA-RADでは、LLAVA-Medを20.13%上回り、100%事前トレーニングスコア(72.64%に対して72.52%)とほぼ一致している。また,視覚チャットボットにおけるBiomedGPTや,VQAを用いたゼロショット画像分類におけるRadFMといったSOTA手法を超越し,マルチグラフアライメントの有効性を強調した。 State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.	翻訳日:2024-11-04 02:12:23 公開日:2024-10-06
# LoGra-Med: 医用ビジョンランゲージモデルのためのLong Context Multi-Graphアライメント LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model ( http://arxiv.org/abs/2410.02615v2 ) ライセンス: Link先を確認	Duy M. H. Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zhou, Daniel Sonntag, Mathias Niepert,	(参考訳) LLaVA-MedやBioMedGPTのような最先端の医療マルチモーダルな大規模言語モデル(med-MLLM)は、事前トレーニングで命令追跡データを活用する。しかしながら、これらのモデルは、主に自己回帰学習の目的に依存しながら、パフォーマンスを向上させるために、モデルサイズとデータボリュームのスケーリングに重点を置いています。驚くべきことに、このような学習スキームが視覚と言語モダリティの整合性の弱さを招き、これらのモデルを広範囲な事前学習データセットに非常に依存させることは、医療領域において、高品質な命令追跡インスタンスをキュレートする費用と時間のかかる性質のため、大きな課題である。画像のモダリティ、会話に基づく記述、拡張キャプションの3重相関を強制する新しいマルチグラフアライメントアルゴリズムであるLoGra-Medでこの問題に対処する。これにより、モデルが文脈的意味を捉え、言語的多様性を扱い、視覚とテキスト間の相互関連を構築するのに役立つ。提案手法をスケールするために,ブラックボックス勾配推定を用いた効率的なエンドツーエンド学習方式を設計し,LLaMa 7Bの学習を高速化した。以上の結果から,LoGra-Medは医療用VQAの600K画像テキスト対に対してLAVA-Medと一致し,その10%でトレーニングした場合に有意に優れていた。例えば、VQA-RADでは、LLAVA-Medを20.13%上回り、100%事前トレーニングスコア(72.64%に対して72.52%)とほぼ一致している。また,視覚チャットボットにおけるBiomedGPTや,VQAを用いたゼロショット画像分類におけるRadFMといったSOTA手法を超越し,マルチグラフアライメントの有効性を強調した。 State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.	翻訳日:2024-11-04 02:02:21 公開日:2024-10-06
# 生成モデルの説得性の測定と改善 Measuring and Improving Persuasiveness of Generative Models ( http://arxiv.org/abs/2410.02653v1 ) ライセンス: Link先を確認	Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy,	(参考訳) LLMは、人間(例えばマーケティング)が消費するコンテンツを生成するワークフローや、人間(例えばチャットボット)と直接対話するワークフローで、ますます使われている。確実な説得力のあるメッセージを生成することができるシステムの開発は、社会にとっての機会と課題の両方を提示する。一方、こうした制度は、薬物依存に対処するなど、広告や社会的善悪などの領域に積極的に影響を及ぼす可能性があり、また、誤った情報を広め、政治的意見を形成するために誤用される可能性もある。 LLMが社会に与える影響を明らかにするためには,その説得力を計測し,評価するシステムを開発する必要がある。このモチベーションを生かしたPersuasionBenchとPersuasionArenaは,生成モデルの説得能力を自動的に測定するタスクのバッテリを含む,最初の大規模ベンチマークとアリーナである。我々は,LLMが言語パターンをどのように理解し,より説得力のある言語を生成するのに役立つかを検討する。以上の結果から, LLMの説得性はモデルサイズと正の相関がみられたが, より小型のモデルでは, より大きなモデルよりも高い説得性が得られることが示唆された。特に、合成および自然なデータセットを使用したターゲットトレーニングは、より小さなモデルの説得能力を著しく向上させ、スケール依存の仮定に挑戦する。我々の発見は、モデル開発者と政策立案者の両方にとって重要な意味を持つ。例えば、EU AI ActとカリフォルニアのSB-1047は、浮動小数点演算の数に基づいてAIモデルを規制することを目的としていますが、このような単純なメトリクスだけでは、AIの社会的影響の全範囲を捉えられません。私たちはコミュニティに、AI駆動の説得とその社会的意味についての理解を深めるために、https://bit.ly/measure-peruasionで入手可能なPersuasionArenaとPersuasionBenchを探求し、貢献するよう呼びかけます。 LLMs are increasingly being used in workflows involving generating content to be consumed by humans (e.g., marketing) and also in directly interacting with humans (e.g., through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce PersuasionBench and PersuasionArena, the first large-scale benchmark and arena containing a battery of tasks to measure the persuasion ability of generative models automatically. We investigate to what extent LLMs know and leverage linguistic patterns that can help them generate more persuasive language. Our findings indicate that the persuasiveness of LLMs correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at https://bit.ly/measure-persuasion, to advance our understanding of AI-driven persuasion and its societal implications.	翻訳日:2024-11-04 01:52:35 公開日:2024-10-06
# 大規模言語モデルの説得力の測定と改善 Measuring and Improving Persuasiveness of Large Language Models ( http://arxiv.org/abs/2410.02653v2 ) ライセンス: Link先を確認	Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy,	(参考訳) LLMは、人間(例えばマーケティング)が消費するコンテンツを生成するワークフローや、人間(例えばチャットボット)と直接対話するワークフローで、ますます使われている。確実な説得力のあるメッセージを生成することができるシステムの開発は、社会にとっての機会と課題の両方を提示する。一方、こうした制度は、薬物依存に対処するなど、広告や社会的善悪などの領域に積極的に影響を及ぼす可能性があり、また、誤った情報を広め、政治的意見を形成するために誤用される可能性もある。 LLMが社会に与える影響を明らかにするためには,その説得力を計測し,評価するシステムを開発する必要がある。このモチベーションを生かしたPersuasionBenchとPersuasionArenaは,生成モデルの説得能力を自動的に測定するタスクのバッテリを含む,最初の大規模ベンチマークとアリーナである。我々は,LLMが言語パターンをどのように理解し,より説得力のある言語を生成するのに役立つかを検討する。以上の結果から, LLMの説得性はモデルサイズと正の相関がみられたが, より小型のモデルでは, より大きなモデルよりも高い説得性が得られることが示唆された。特に、合成および自然なデータセットを使用したターゲットトレーニングは、より小さなモデルの説得能力を著しく向上させ、スケール依存の仮定に挑戦する。我々の発見は、モデル開発者と政策立案者の両方にとって重要な意味を持つ。例えば、EU AI ActとカリフォルニアのSB-1047は、浮動小数点演算の数に基づいてAIモデルを規制することを目的としていますが、このような単純なメトリクスだけでは、AIの社会的影響の全範囲を捉えられません。私たちはコミュニティに、AI駆動の説得とその社会的意味についての理解を深めるために、https://bit.ly/measure-peruasionで入手可能なPersuasionArenaとPersuasionBenchを探求し、貢献するよう呼びかけます。 LLMs are increasingly being used in workflows involving generating content to be consumed by humans (e.g., marketing) and also in directly interacting with humans (e.g., through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce PersuasionBench and PersuasionArena, the first large-scale benchmark and arena containing a battery of tasks to measure the persuasion ability of generative models automatically. We investigate to what extent LLMs know and leverage linguistic patterns that can help them generate more persuasive language. Our findings indicate that the persuasiveness of LLMs correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at https://bit.ly/measure-persuasion, to advance our understanding of AI-driven persuasion and its societal implications.	翻訳日:2024-11-04 01:52:35 公開日:2024-10-06
# 確実性の校正表現 Calibrating Expressions of Certainty ( http://arxiv.org/abs/2410.04315v1 ) ライセンス: Link先を確認	Peiqi Wang, Barbara D. Lam, Yingcheng Liu, Ameneh Asgari-Targhi, Rameswar Panda, William M. Wells, Tina Kapur, Polina Golland,	(参考訳) 本稿では,「マヨベ」や「マヨベ」といった言語表現のキャリブレーションに新たなアプローチを提案する。各特定のフレーズに1つのスコアを割り当てる以前の作業とは異なり、我々は不確実性を単純度上の分布としてモデル化し、それらのセマンティクスをより正確にキャプチャする。この新たな確実性の表現に対応するため、既存の誤校正対策を一般化し、新しいポストホック校正法を導入する。これらのツールを活用することで、人間(例えば放射線学者)と計算モデル(例えば言語モデル)の両方の校正を分析し、校正を改善するための解釈可能な提案を提供する。 We present a novel approach to calibrating linguistic expressions of certainty, e.g., "Maybe" and "Likely". Unlike prior work that assigns a single score to each certainty phrase, we model uncertainty as distributions over the simplex to capture their semantics more accurately. To accommodate this new representation of certainty, we generalize existing measures of miscalibration and introduce a novel post-hoc calibration method. Leveraging these tools, we analyze the calibration of both humans (e.g., radiologists) and computational models (e.g., language models) and provide interpretable suggestions to improve their calibration.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# 気候と環境の正義のための都市コンピューティング:2つの研究イニシアティブから Urban Computing for Climate and Environmental Justice: Early Perspectives From Two Research Initiatives ( http://arxiv.org/abs/2410.04318v1 ) ライセンス: Link先を確認	Carolina Veiga, Ashish Sharma, Daniel de Oliveira, Marcos Lage, Fabio Miranda,	(参考訳) 気候変動の影響は、洪水や熱波などの極端な気象現象が、低所得層や低所得層に大きく影響しているため、世界中の都市社会における既存の脆弱性や格差を増している。これらの課題に対処するには、コンピュータ科学、工学、気候科学、公衆衛生など、複数の分野にまたがる専門知識を統合する新しいアプローチが必要である。都市コンピューティングは、複数のソースからのデータを統合して意思決定をサポートし、気象パターン、インフラの弱点、人口の脆弱性に関する実用的な洞察を提供することによって、これらの取り組みにおいて重要な役割を果たす。しかし、技術進歩を活用する能力は、グローバル・サウスとグローバル・ノースの間で大きく異なる。本稿では,米国シカゴとブラジルのニテロイに複数年にわたる多学際プロジェクトを実施し,これらの多様な状況下での都市コンピューティングの可能性と限界を明らかにする。筆者らの経験を反映して、都市環境における気候関連リスクの理解と緩和を容易にする視覚分析ツールの基本的要件と既存のギャップについて考察する。 The impacts of climate change are intensifying existing vulnerabilities and disparities within urban communities around the globe, as extreme weather events, including floods and heatwaves, are becoming more frequent and severe, disproportionately affecting low-income and underrepresented groups. Tackling these increasing challenges requires novel approaches that integrate expertise across multiple domains, including computer science, engineering, climate science, and public health. Urban computing can play a pivotal role in these efforts by integrating data from multiple sources to support decision-making and provide actionable insights into weather patterns, infrastructure weaknesses, and population vulnerabilities. However, the capacity to leverage technological advancements varies significantly between the Global South and Global North. In this paper, we present two multiyear, multidisciplinary projects situated in Chicago, USA and Niter\'oi, Brazil, highlighting the opportunities and limitations of urban computing in these diverse contexts. Reflecting on our experiences, we then discuss the essential requirements, as well as existing gaps, for visual analytics tools that facilitate the understanding and mitigation of climate-related risks in urban environments.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# CAVにおける協調データ融合のためのチャネル・アウェア・スループットの最大化 Channel-Aware Throughput Maximization for Cooperative Data Fusion in CAV ( http://arxiv.org/abs/2410.04320v1 ) ライセンス: Link先を確認	Haonan An, Zhengru Fang, Yuang Zhang, Senkang Hu, Xianhao Chen, Guowen Xu, Yuguang Fang,	(参考訳) 接続型および自律型車両(CAV)は、認識範囲の拡大と知覚範囲の増大により、大きな注目を集めている。盲点や障害物などの課題に対処するため、CAVは周囲の車両からのセンサデータを収集するために車両間通信(V2V)を採用している。しかし、協調的な知覚は、達成可能なネットワークスループットとチャネル品質の制限によって制約されることが多い。本稿では,適応データ圧縮に自己教師付きオートエンコーダを活用することで,CAVデータ融合を容易にするチャネル対応スループット最大化手法を提案する。この問題を混合整数プログラミング(MIP)モデルとして定式化し、与えられたリンク条件下で最適なデータレートと圧縮比の解を導出するために2つのサブプロブレムに分解する。オートエンコーダは、決定された圧縮比でビットレートを最小にするために訓練され、さらにスペクトルリソース消費を減らすために微調整戦略が用いられる。 OpenCOOD プラットフォーム上での実験的な評価により,提案アルゴリズムの有効性が示され,ネットワークスループットが 20.19 % 向上し,平均精度 (AP@IoU) が 9.38 % 向上した。 Connected and autonomous vehicles (CAVs) have garnered significant attention due to their extended perception range and enhanced sensing coverage. To address challenges such as blind spots and obstructions, CAVs employ vehicle-to-vehicle (V2V) communications to aggregate sensory data from surrounding vehicles. However, cooperative perception is often constrained by the limitations of achievable network throughput and channel quality. In this paper, we propose a channel-aware throughput maximization approach to facilitate CAV data fusion, leveraging a self-supervised autoencoder for adaptive data compression. We formulate the problem as a mixed integer programming (MIP) model, which we decompose into two sub-problems to derive optimal data rate and compression ratio solutions under given link conditions. An autoencoder is then trained to minimize bitrate with the determined compression ratio, and a fine-tuning strategy is employed to further reduce spectrum resource consumption. Experimental evaluation on the OpenCOOD platform demonstrates the effectiveness of our proposed algorithm, showing more than 20.19\% improvement in network throughput and a 9.38\% increase in average precision (AP@IoU) compared to state-of-the-art methods, with an optimal latency of 19.99 ms.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# RLExplorerによる深層強化学習プログラムのデバッグに向けて Toward Debugging Deep Reinforcement Learning Programs with RLExplorer ( http://arxiv.org/abs/2410.04322v1 ) ライセンス: Link先を確認	Rached Bouchoucha, Ahmed Haj Yahmed, Darshan Patil, Janarthanan Rajendran, Amin Nikanjam, Sarath Chandar, Foutse Khomh,	(参考訳) 深層強化学習(DRL)は、ロボット工学、コンピュータゲーム、レコメンデーションシステムなど様々な分野で成功している。しかし、他のソフトウェアシステムと同様に、DRLベースのソフトウェアシステムは、デバッグと診断に固有の課題を生じさせるフォールトに影響を受けやすい。これらの障害はしばしば、明示的な失敗やエラーメッセージなしで予期しない振る舞いを生じさせ、デバッグが難しく、時間がかかります。したがって、DRLシステムの監視と診断の自動化は、開発者の負担を軽減するために重要である。本稿では,DRLベースのソフトウェアシステムにおける最初の故障診断手法であるRLExplorerを提案する。 RLExplorerは自動的にトレーニングトレースを監視し、DRL学習ダイナミクスの特性に基づいて診断ルーチンを実行し、DRL固有の障害の発生を検出する。そして、これらの診断の結果を、理論的概念、推奨プラクティス、そして特定された障害に対する潜在的な解決策をカバーする警告として記録する。我々はRLExplorerを評価するために2つの評価を行った。 Stack Overflowの障害DRLサンプルを初めて評価したところ,83%の症例において,本手法が実際の障害を効果的に診断できることが判明した。 RLExplorerを15名のDRL専門家/開発者で評価したところ,(1)RLExplorerは手動デバッグの3.6倍の欠陥を識別でき,(2)RLExplorerは容易にDRLアプリケーションに統合できることがわかった。 Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However, like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# プロンプト型連続学習における階層型分類の活用 Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning ( http://arxiv.org/abs/2410.04327v1 ) ライセンス: Link先を確認	Quyen Tran, Minh Le, Tuan Truong, Dinh Phung, Linh Ngo, Thien Nguyen, Nhat Ho, Trung Le,	(参考訳) 人間の学習行動からインスピレーションを得たこの研究は、連続的に出現するクラスデータ間の関係を利用して、Promptベースの連続学習モデルにおける破滅的な忘れを緩和する新しいアプローチを提案する。深層学習モデルの学習において,情報の整理・接続という人間の習慣を適用することが効果的な戦略として有効であることがわかった。具体的には、拡大するラベルセットに基づいて階層木構造を構築することで、データに対する新たな洞察を得ることができ、類似したクラスのグループを特定することは、容易に混乱を引き起こす可能性がある。さらに、私たちは、最適なトランスポートベースのアプローチを通じて、オリジナルの事前訓練されたモデルの振る舞いを探索することで、クラス間の隠れた接続を深く掘り下げる。これらの知見から,モデルがより挑戦的な知識領域に集中し,全体的な性能を向上させるための新たな正規化損失関数を提案する。実験により,本手法は様々なベンチマークにおいて,最も頑健な最先端モデルに対して有意な優位性を示した。 Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class data. We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models. Specifically, by building a hierarchical tree structure based on the expanding set of labels, we gain fresh insights into the data, identifying groups of similar classes could easily cause confusion. Additionally, we delve deeper into the hidden connections between classes by exploring the original pretrained model's behavior through an optimal transport-based approach. From these insights, we propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas, thereby enhancing overall performance. Experimentally, our method demonstrated significant superiority over the most robust state-of-the-art models on various benchmarks.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# OD-Stega: 最適化分布を用いたLDMによるニア・インパーセプティブル・ステガノグラフィー OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions ( http://arxiv.org/abs/2410.04328v1 ) ライセンス: Link先を確認	Yu-Shin Huang, Peter Just, Krishna Narayanan, Chao Tian,	(参考訳) 本研究では,Large Language Model (LLM) が算術符号デコーダを駆動してステゴテキストを生成する場合の非被覆ステガノグラフィーについて考察する。効率的な方法は、秘密のメッセージビットをできるだけ少数の言語トークンに埋め込む必要がある。個々のトークンレベルでは、選択された確率分布とLLMが与える元の分布とのKL分散の制約を条件として、次のトークン生成の置換確率分布のエントロピーを最大化することが数学的に等価であることを示す。最適化問題に対して、効率的に計算できる閉形式解が提供される。重要な実務上の問題もいくつか取り組まれている。 1) しばしば見過ごされるトークン化ミスマッチ問題は、単純なプロンプト選択アプローチで解決される。 2)最適化分布と語彙トランケーション手法の組み合わせを考察し,その有効性について考察する。 3)最適化された分布と他のシーケンスレベルの選択ヒューリスティックを組み合わせることで,効率と信頼性をさらに向上させる。 We consider coverless steganography where a Large Language Model (LLM) drives an arithmetic coding decoder to generate stego-texts. An efficient method should embed secret message bits in as few language tokens as possible, while still keeping the stego-text natural and fluent. We show that on the individual token level, this problem is mathematically equivalent to maximizing the entropy of a replacement probability distribution of the next token generation, subject to a constraint on the KL divergence between the chosen probability distribution and the original distribution given by the LLM. A closed-form solution is provided for the optimization problem, which can be computed efficiently. Several important practical issues are also tackled: 1) An often-overlooked tokenization mismatch issue is resolved with a simple prompt selection approach, 2) The combination of the optimized distribution and the vocabulary truncation technique is considered, and 3) The combination of the optimized distribution with other sequence-level selection heuristics to further enhance the efficiency and reliability is studied.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# N$-Partite系における最強量子非局所性 Strongest quantum nonlocality in $N$-partite systems ( http://arxiv.org/abs/2410.04331v1 ) ライセンス: Link先を確認	Mengying Hu, Ting Gao, Fengli Yan,	(参考訳) 直交状態の集合は、自明な直交保存正の作用素値測度(POVM)のみをサブシステムの分割ごとに行うことができれば、最強の量子非局所性を持つ。この概念は、Halder $et~alによって提唱された強い量子非局所性に由来する。 $[Phy]。レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・ $\textbf{122}$, 040403 (2019)] は、局所的不明瞭性に基づく非局所性の強い表現であり、量子情報隠蔽におけるより効率的な応用を見出す。しかし、直交保存局所測定(OPLM)の自明さを示すことは容易ではない。本稿では,ある条件下での$N$-partiteシステムにおいて,自明なOPLMに対して十分かつ必要な条件を示す。提案した条件を用いて、システム $(\mathbb{C}^{3})^{\otimes N}$ において最強非局所性を持つ集合の最小サイズを導出する。とPhys。 A $\textbf{109}$, 022220 (2024)] は、この値を達成する。最強の非局所性を持つ州を対象とする建設研究は、アプリケーションにおける資源消費の低減に寄与することが知られている。さらに、システム $(\mathbb{C}^{d})^{\otimes N}~(d\geq4)$ において最強非局所真絡集合を構築する。その結果, 最強非局所性についての理解を深めることができた。 A set of orthogonal states possesses the strongest quantum nonlocality if only a trivial orthogonality-preserving positive operator-valued measure (POVM) can be performed for each bipartition of the subsystems. This concept originated from the strong quantum nonlocality proposed by Halder $et~al.$ [Phy. Rev. Lett. $\textbf{122}$, 040403 (2019)], which is a stronger manifestation of nonlocality based on locally indistinguishability and finds more efficient applications in quantum information hiding. However, demonstrating the triviality of orthogonality-preserving local measurements (OPLMs) is not straightforward. In this paper, we present a sufficient and necessary condition for trivial OPLMs in $N$-partite systems under certain conditions. By using our proposed condition, we deduce the minimum size of set with the strongest nonlocality in system $(\mathbb{C}^{3})^{\otimes N}$, where the genuinely entangled sets constructed in Ref. [Phys. Rev. A $\textbf{109}$, 022220 (2024)] achieve this value. As it is known that studying construction involving fewer states with strongest nonlocality contribute to reducing resource consumption in applications. Furthermore, we construct strongest nonlocal genuinely entangled sets in system $(\mathbb{C}^{d})^{\otimes N}~(d\geq4)$, which have a smaller size than the existing strongest nonlocal genuinely entangled sets as $N$ increases. Consequently, our results contribute to a better understanding of strongest nonlocality.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# グラディエントルーティング:ニューラルネットワークにおける計算のローカライズのためのマスキンググラディエント Gradient Routing: Masking Gradients to Localize Computation in Neural Networks ( http://arxiv.org/abs/2410.04332v1 ) ライセンス: Link先を確認	Alex Cloud, Jacob Goldman-Wetzler, Evžen Wybitul, Joseph Miller, Alexander Matt Turner,	(参考訳) ニューラルネットワークは、内部メカニズムに関係なく、主に入力と出力に基づいて訓練される。これらの無視されたメカニズムは、安全に重要な特性を決定づける。透明性; 透明性; 透明性二機密情報又は有害な能力の欠如三訓練分布を超えた目標の信頼性の高い一般化。この欠点に対処するために、ニューラルネットワークの特定の部分領域に機能を分離する訓練手法である勾配ルーティングを導入する。勾配ルーティングは、バックプロパゲーション中の勾配にデータ依存の重み付きマスクを適用する。これらのマスクは、どのパラメータがどのデータポイントによって更新されるかを設定するために、ユーザによって提供される。本研究では,(1)解釈可能な方法で分割された表現の学習,(2)事前指定したネットワークサブリージョンのアブレーションによる堅牢なアンラーニングの実現,(3)異なる動作に責任を持つモジュールをローカライズすることで,強化学習者のスケーラブルな監視を実現すること,を示す。全体として、勾配ルーティングは、制限されたアドホックなデータサブセットに適用しても、機能をローカライズする。私たちは、高品質なデータが不足している、挑戦的な現実世界のアプリケーションに対して、このアプローチが約束されていると結論付けます。 Neural networks are trained primarily based on their inputs and outputs, without regard for their internal mechanisms. These neglected mechanisms determine properties that are critical for safety, like (i) transparency; (ii) the absence of sensitive information or harmful capabilities; and (iii) reliable generalization of goals beyond the training distribution. To address this shortcoming, we introduce gradient routing, a training method that isolates capabilities to specific subregions of a neural network. Gradient routing applies data-dependent, weighted masks to gradients during backpropagation. These masks are supplied by the user in order to configure which parameters are updated by which data points. We show that gradient routing can be used to (1) learn representations which are partitioned in an interpretable way; (2) enable robust unlearning via ablation of a pre-specified network subregion; and (3) achieve scalable oversight of a reinforcement learner by localizing modules responsible for different behaviors. Throughout, we find that gradient routing localizes capabilities even when applied to a limited, ad-hoc subset of the data. We conclude that the approach holds promise for challenging, real-world applications where quality data are scarce.	翻訳日:2024-11-02 08:30:03 公開日:2024-10-06
# 対称性破壊ダイナミクスのためのランダム非エルミートハミルトンフレームワーク Random non-Hermitian Hamiltonian framework for symmetry breaking dynamics ( http://arxiv.org/abs/2410.04333v1 ) ライセンス: Link先を確認	Pei Wang,	(参考訳) ヒルベルト空間における量子状態の一般確率非線形ダイナミクスをモデル化するために、非エルミート的ハミルトニアンをランダムに提案する。本手法は, 線形方程式の線形性に基礎を置き, 線形系解法の適用性を確保する。さらに、統計対称性を容易に組み込むという利点があり、これは確率過程への明示対称性の一般化である。提案手法の有用性を実証するために,初期対称性保存状態からランダムに分布し,対称性を破る最終状態へと進化する実時間力学を記述する。我々のモデルは、不規則状態から秩序状態への遷移過程の量子的枠組みとして機能し、そこでは対称性が自発的に壊れる。 We propose random non-Hermitian Hamiltonians to model the generic stochastic nonlinear dynamics of a quantum state in Hilbert space. Our approach features an underlying linearity in the dynamical equations, ensuring the applicability of techniques used for solving linear systems. Additionally, it offers the advantage of easily incorporating statistical symmetry, a generalization of explicit symmetry to stochastic processes. To demonstrate the utility of our approach, we apply it to describe real-time dynamics, starting from an initial symmetry-preserving state and evolving into a randomly distributed, symmetry-breaking final state. Our model serves as a quantum framework for the transition process, from disordered states to ordered ones, where symmetry is spontaneously broken.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# マイクロサービス環境におけるインシデントライフサイクルのためのAIアシスタント:システマティック文献レビュー AI Assistants for Incident Lifecycle in a Microservice Environment: A Systematic Literature Review ( http://arxiv.org/abs/2410.04334v1 ) ライセンス: Link先を確認	Dahlia Ziqi Zhou, Marios Fokaefs,	(参考訳) マイクロサービス環境のインシデントは、複雑さと分散した性質のために、コストがかかり、回復が難しい場合がある。人工知能(AI)の最近の進歩は、インシデント管理を改善するための有望なソリューションを提供する。本稿では、インシデントライフサイクルの異なるフェーズをサポートするように設計されたAIアシスタントに関する基礎研究を体系的にレビューする。これはAIの成功した応用を強調し、現在の研究のギャップを特定し、AIによるインシデント管理を強化する将来の機会を提案する。これらの研究を検討することで、AIツールの有効性と、インシデント回復における継続的な課題に対処する可能性についての洞察を提供することが目的である。 Incidents in microservice environments can be costly and challenging to recover from due to their complexity and distributed nature. Recent advancements in artificial intelligence (AI) offer promising solutions for improving incident management. This paper systematically reviews primary studies on AI assistants designed to support different phases of the incident lifecycle. It highlights successful applications of AI, identifies gaps in current research, and suggests future opportunities for enhancing incident management through AI. By examining these studies, the paper aims to provide insights into the effectiveness of AI tools and their potential to address ongoing challenges in incident recovery.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# ReTok: 大規模言語モデルにおける表現効率を高めるために、トークンライザをリプレースする ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model ( http://arxiv.org/abs/2410.04335v1 ) ライセンス: Link先を確認	Shuhao Gu, Mengdi Zhao, Bowen Zhang, Liangdong Wang, Jijie Li, Guang Liu,	(参考訳) Tokenizerは大規模言語モデル(LLM)に不可欠なコンポーネントであり、高い圧縮率のトークン化器はモデルの表現と処理効率を向上させることができる。しかし、トークン化器は全てのシナリオにおいて高い圧縮速度を保証することができず、平均入力および出力長の増加はモデルのトレーニングと推論コストを増大させる。したがって、モデルの性能を維持しながら、最小限のコストでモデルの効率を改善する方法を見つけることが重要である。本研究では, LLMのトークン化機能を置き換えることで, モデル表現と処理効率を向上させる手法を提案する。モデルの入力層と出力層のパラメータを元のモデルのパラメータに置き換えて再起動し、他のパラメータを固定しながらこれらのパラメータをトレーニングする。我々は,異なるLLM実験を行い,その結果から,トークン化器を置き換えたモデルの性能を維持できるとともに,長文の復号速度を大幅に向上できることを示した。 Tokenizer is an essential component for large language models (LLMs), and a tokenizer with a high compression rate can improve the model's representation and processing efficiency. However, the tokenizer cannot ensure high compression rate in all scenarios, and an increase in the average input and output lengths will increases the training and inference costs of the model. Therefore, it is crucial to find ways to improve the model's efficiency with minimal cost while maintaining the model's performance. In this work, we propose a method to improve model representation and processing efficiency by replacing the tokenizers of LLMs. We propose replacing and reinitializing the parameters of the model's input and output layers with the parameters of the original model, and training these parameters while keeping other parameters fixed. We conducted experiments on different LLMs, and the results show that our method can maintain the performance of the model after replacing the tokenizer, while significantly improving the decoding speed for long texts.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 工学用Si-Qubit MOSFET:極低温での静電量子積分による相場モデリング手法 Engineering Si-Qubit MOSFETs: A Phase-Field Modeling Approach Integrating Quantum-Electrostatics at Cryogenic Temperatures ( http://arxiv.org/abs/2410.04339v1 ) ライセンス: Link先を確認	Nilesh Pandey, Dipanjan Basu, Yogesh Singh Chauhan, Leonard F. Register, Sanjay K. Banerjee,	(参考訳) 本研究は、Si系量子ビットMOSFETを解析し、静電気と量子力学的効果を統合するために、高度な位相場モデリングを用いる。我々は、シュロディンガー方程式解のフルウェーブ処理と、極低温におけるポアソン方程式を併用した包括的モデリング手法を採用する。本分析では, 界面トラップが量子ドット(QD)障壁高さに与える影響を考察し, トンネルによる結合に影響を及ぼす。より広いトラップ分布は量子ドットの分離につながる。さらに、プランジャ/バリアゲート長が増加するにつれて、伝送および反射係数の振動が増加し、QD間の結合が減少する。プランジャ,バリアゲート次元,スペーサ構成,ギャップ酸化物長を最適化することにより,量子井戸深さの制御を強化し,不要な波動関数のリークを最小限に抑える。モデリングアルゴリズムは実験データに対しても検証され,クーロン遮断によるId Vgsの発振を低温下で正確に捉えることができる。 This study employs advanced phase-field modeling to investigate Si-based qubit MOSFETs, integrating electrostatics and quantum mechanical effects. We adopt a comprehensive modeling approach, utilizing full-wave treatment of the Schrodinger equation solutions, coupled with the Poisson equation at cryogenic temperatures. Our analysis explores the influence of interface traps on quantum dot (QD) barrier heights, affecting coupling due to tunneling. A wider trap distribution leads to the decoupling of quantum dots. Furthermore, the oscillations in the transmission and reflection coefficients increase as the plunger/barrier gate length increases, reducing the coupling between the QDs. By optimizing plunger and barrier gate dimensions, spacer configurations, and gap oxide lengths, we enhance control over quantum well depths and minimize unwanted wave function leakage. The modeling algorithm is also validated against the experimental data and can accurately capture the oscillations in the Id Vgs caused by the Coulomb blockade at cryogenic temperature	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 周波数領域におけるネットワークの高速化 Accelerating Inference of Networks in the Frequency Domain ( http://arxiv.org/abs/2410.04342v1 ) ライセンス: Link先を確認	Chenqiu Zhao, Guanfang Dong, Anup Basu,	(参考訳) 周波数領域において、ネットワークのパラメータが極めて少ない精度で大幅に低減できることが示されている。しかし、周波数変換のコストを考えると、計算複雑性は著しく低下しない。本研究では,周波数パラメータが疎いネットワークを高速化するために,周波数領域におけるネットワーク推論を提案する。特に、空間領域におけるネットワーク推論に双対な周波数推論連鎖を提案する。非線形層を扱うために、周波数データに直接非線形演算を適用し、効果的に動作するように妥協する。周波数推論チェーンと非線形層に対する戦略によって実現され、提案手法は周波数領域の全推論を完了させる。全ての層に対して余分な周波数変換や逆変換を必要とする従来の手法とは異なり、提案手法はネットワークの始点と終点に1度だけ周波数変換と逆変換を必要とする。最先端手法との比較により,提案手法は高速比(100倍以上)の場合,精度を著しく向上することが示された。ソースコードは \url{https://github.com/guanfangdong/FreqNet-Infer} で公開されている。 It has been demonstrated that networks' parameters can be significantly reduced in the frequency domain with a very small decrease in accuracy. However, given the cost of frequency transforms, the computational complexity is not significantly decreased. In this work, we propose performing network inference in the frequency domain to speed up networks whose frequency parameters are sparse. In particular, we propose a frequency inference chain that is dual to the network inference in the spatial domain. In order to handle the non-linear layers, we make a compromise to apply non-linear operations on frequency data directly, which works effectively. Enabled by the frequency inference chain and the strategy for non-linear layers, the proposed approach completes the entire inference in the frequency domain. Unlike previous approaches which require extra frequency or inverse transforms for all layers, the proposed approach only needs the frequency transform and its inverse once at the beginning and once at the end of a network. Comparisons with state-of-the-art methods demonstrate that the proposed approach significantly improves accuracy in the case of a high speedup ratio (over 100x). The source code is available at \url{https://github.com/guanfangdong/FreqNet-Infer}.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 長期検索拡張生成のための推論スケーリング Inference Scaling for Long-Context Retrieval Augmented Generation ( http://arxiv.org/abs/2410.04343v1 ) ライセンス: Link先を確認	Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky,	(参考訳) 推論計算のスケーリングにより、様々な設定にまたがるLong-context Large Language Model (LLM)の可能性が解き放たれた。知識集約的なタスクでは、より多くの外部知識を組み込むために計算量が増加することがしばしばある。しかし、そのような知識を効果的に活用しなければ、文脈を拡大するだけでは必ずしも性能が向上するとは限らない。本研究では,検索拡張生成(RAG)における推論スケーリングについて検討し,単に知識量を増やす以上の戦略を探求する。インコンテキスト学習と反復的プロンプトという,2つの推論スケーリング戦略に注目します。これらの戦略は、テスト時間計算(例えば、検索した文書や生成ステップを増やすことで)をスケールするためのさらなる柔軟性を提供する。 1) RAG のパフォーマンスは、最適に設定された場合の推論計算のスケーリングからどのような恩恵を受けますか? 2) RAG 性能と推論パラメータの関係をモデル化することにより,与えられた予算に対する最適テスト時間計算割当を予測できるのか? 観測の結果,推定計算の増大は最適に割り当てた場合,RAGの性能がほぼ線形に向上することを示し,RAGの推論スケーリング法則として記述した。これに基づいて、異なる推論構成におけるRAG性能を推定する計算割当モデルをさらに発展させる。このモデルは、様々な計算制約の下で最適な推論パラメータを予測し、実験結果と密接に一致させる。これらの最適構成を適用することで、長文LLMのスケーリング推論計算が標準RAGと比較してベンチマークデータセットで最大58.9%向上することを示す。 The scaling of inference computation has unlocked the potential of long-context large language models (LLMs) across diverse settings. For knowledge-intensive tasks, the increased compute is often allocated to incorporate more external knowledge. However, without effectively utilizing such knowledge, solely expanding context does not always enhance performance. In this work, we investigate inference scaling for retrieval augmented generation (RAG), exploring strategies beyond simply increasing the quantity of knowledge. We focus on two inference scaling strategies: in-context learning and iterative prompting. These strategies provide additional flexibility to scale test-time computation (e.g., by increasing retrieved documents or generation steps), thereby enhancing LLMs' ability to effectively acquire and utilize contextual information. We address two key questions: (1) How does RAG performance benefit from the scaling of inference computation when optimally configured? (2) Can we predict the optimal test-time compute allocation for a given budget by modeling the relationship between RAG performance and inference parameters? Our observations reveal that increasing inference computation leads to nearly linear gains in RAG performance when optimally allocated, a relationship we describe as the inference scaling laws for RAG. Building on this, we further develop the computation allocation model to estimate RAG performance across different inference configurations. The model predicts optimal inference parameters under various computation constraints, which align closely with the experimental results. By applying these optimal configurations, we demonstrate that scaling inference compute on long-context LLMs achieves up to 58.9% gains on benchmark datasets compared to standard RAG.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# DeepONet for Solving PDE: Generalization Analysis in Sobolev Training DeepONet for Solving PDEs: Generalization Analysis in Sobolev Training ( http://arxiv.org/abs/2410.04344v1 ) ライセンス: Link先を確認	Yahong Yang,	(参考訳) 本稿では,演算子学習,特にDeepONetの偏微分方程式(PDE)への応用について検討する。各PDEに対して別々のニューラルネットワークのトレーニングを必要とする関数学習方法とは異なり、オペレータ学習は再トレーニングすることなく、異なるPDEをまたいだ一般化を行う。本稿では,ソボレフトレーニングにおけるDeepONetの性能に着目し,ディープブランチとトランクネットワークの近似能力とソボレフノルムの一般化誤差の2つの重要な問題に対処する。我々の発見は、ディープブランチネットワークが大きなパフォーマンス上のメリットを提供するのに対して、トランクネットワークは最もシンプルであることを示している。また、符号化部に微分情報を加えない標準サンプリング法は、一般化解析に基づくソボレフ訓練における一般化誤差を最小限に抑えるのに十分である。本稿では,幅広い物理インフォームド機械学習モデルと応用のための誤差推定を提供することにより,理論的ギャップを埋める。 In this paper, we investigate the application of operator learning, specifically DeepONet, to solve partial differential equations (PDEs). Unlike function learning methods that require training separate neural networks for each PDE, operator learning generalizes across different PDEs without retraining. We focus on the performance of DeepONet in Sobolev training, addressing two key questions: the approximation ability of deep branch and trunk networks, and the generalization error in Sobolev norms. Our findings highlight that deep branch networks offer significant performance benefits, while trunk networks are best kept simple. Moreover, standard sampling methods without adding derivative information in the encoding part are sufficient for minimizing generalization error in Sobolev training, based on generalization analysis. This paper fills a theoretical gap by providing error estimations for a wide range of physics-informed machine learning models and applications.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# MVP-Bench: 大規模視覚言語モデルは、人間のように多段階の視覚知覚を実行できるか? MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans? ( http://arxiv.org/abs/2410.04345v1 ) ライセンス: Link先を確認	Guanzhen Li, Yuxi Xie, Min-Yen Kan,	(参考訳) 人間は、低レベルの物体認識や行動理解のような高レベルの意味解釈を含む、複数のレベルで視覚的知覚を行う。低レベルの細部における微妙な違いは、高レベルの知覚に大きな変化をもたらす可能性がある。例えば、銃を持った人が持っていた買い物袋を代用することは、暴力行為を示唆し、犯罪行為や暴力行為を暗示する。様々なマルチモーダルタスクの大幅な進歩にもかかわらず、LVLM(Large Visual-Language Models)はそのようなマルチレベル視覚知覚を行う能力について未解明のままである。 LVLMの低レベルと高レベルの両方の視覚知覚を体系的に評価する最初の視覚言語ベンチマークであるMVP-Benchを導入する。本研究では,自然画像と合成画像にMVP-Benchを構築し,操作したコンテンツがモデル知覚に与える影響について検討する。 MVP-Benchを用いて、10個のオープンソースと2個のクローズドソースのLVLMの視覚的認識を診断し、高いレベルの認識タスクが既存のLVLMに大きく挑戦していることを示す。最先端の GPT-4o は,低レベルのシナリオでは 754 % に対して,Yes/No の質問では 56 % の精度しか達成していない。さらに、自然画像と操作画像のパフォーマンスギャップは、現在のLVLMが人間のように合成画像の視覚的意味を理解できないことを示している。私たちのデータとコードはhttps://github.com/GuanzhenLi/MVP-Bench.comで公開されています。 Humans perform visual perception at multiple levels, including low-level object recognition and high-level semantic interpretation such as behavior understanding. Subtle differences in low-level details can lead to substantial changes in high-level perception. For example, substituting the shopping bag held by a person with a gun suggests violent behavior, implying criminal or violent activity. Despite significant advancements in various multimodal tasks, Large Visual-Language Models (LVLMs) remain unexplored in their capabilities to conduct such multi-level visual perceptions. To investigate the perception gap between LVLMs and humans, we introduce MVP-Bench, the first visual-language benchmark systematically evaluating both low- and high-level visual perception of LVLMs. We construct MVP-Bench across natural and synthetic images to investigate how manipulated content influences model perception. Using MVP-Bench, we diagnose the visual perception of 10 open-source and 2 closed-source LVLMs, showing that high-level perception tasks significantly challenge existing LVLMs. The state-of-the-art GPT-4o only achieves an accuracy of $56\%$ on Yes/No questions, compared with $74\%$ in low-level scenarios. Furthermore, the performance gap between natural and manipulated images indicates that current LVLMs do not generalize in understanding the visual semantics of synthetic images as humans do. Our data and code are publicly available at https://github.com/GuanzhenLi/MVP-Bench.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 正規選好最適化:NDCGによる人選好の調整 Ordinal Preference Optimization: Aligning Human Preferences via NDCG ( http://arxiv.org/abs/2410.04346v1 ) ライセンス: Link先を確認	Yang Zhao, Yixin Wang, Mingzhang Yin,	(参考訳) 多様な人間の好みを持つ大規模言語モデル(LLM)の調整は、モデルの振る舞いを制御し、生成品質を向上させるための重要な技術である。 Reinforcement Learning from Human Feedback (RLHF)、Direct Preference Optimization (DPO)、およびそれらの変種はペア比較により言語モデルを最適化する。しかし、複数のレスポンスが利用できる場合、これらのアプローチは報酬モデルや人間からのフィードバックによって与えられるランキングの広範な情報を活用するには至らない。そこで本研究では,正規化比較累積ゲイン (NDCG) を用いた正規化選好最適化 (OPO) という新しいリストワイズ手法を提案する。我々は、NDCGを異なる代理損失で近似することで、エンドツーエンドの選好最適化アルゴリズムを開発する。このアプローチは,情報検索におけるランキングモデルとアライメント問題の関連性を構築する。順序付き報酬に割り当てられたマルチレスポンスデータセットの調整において、OPOは、評価セットとAlpacaEvalのような一般的なベンチマークにおいて、既存のペアワイズおよびリストワイズアプローチよりも優れています。さらに, 陰性サンプルのプールの増加は, 自明な負の悪影響を低減し, モデル性能を向上させることを実証した。 Aligning Large Language Models (LLMs) with diverse human preferences is a pivotal technique for controlling model behaviors and enhancing generation quality. Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and their variants optimize language models by pairwise comparisons. However, when multiple responses are available, these approaches fall short of leveraging the extensive information in the ranking given by the reward models or human feedback. In this work, we propose a novel listwise approach named Ordinal Preference Optimization (OPO), which employs the Normalized Discounted Cumulative Gain (NDCG), a widely-used ranking metric, to better utilize relative proximity within ordinal multiple responses. We develop an end-to-end preference optimization algorithm by approximating NDCG with a differentiable surrogate loss. This approach builds a connection between ranking models in information retrieval and the alignment problem. In aligning multi-response datasets assigned with ordinal rewards, OPO outperforms existing pairwise and listwise approaches on evaluation sets and general benchmarks like AlpacaEval. Moreover, we demonstrate that increasing the pool of negative samples can enhance model performance by reducing the adverse effects of trivial negatives.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 大規模言語モデルを用いた予測モデル拡張のための潜在的特徴マイニング Latent Feature Mining for Predictive Model Enhancement with Large Language Models ( http://arxiv.org/abs/2410.04347v1 ) ライセンス: Link先を確認	Bingxuan Li, Pengyi Shi, Amy Ward,	(参考訳) 予測モデリングは、データ可用性と品質の制限による課題に直面することが多い。特に、収集された特徴が結果と弱い相関関係にあり、追加の特徴収集が倫理的または実践的な困難によって制約される領域において。従来の機械学習(ML)モデルは、観測されていないが重要な要素を組み込むのに苦労している。本研究では,テキストからテキストへの命題論理的推論として潜在特徴抽出を定式化するための効果的な手法を提案する。 FLAME(Faithful Latent Feature Mining for Predictive Model Enhancement)は,大規模言語モデル(LLM)を利用して,潜在機能を備えた観測機能を強化し,下流タスクにおけるMLモデルの予測能力を向上するフレームワークである。このフレームワークは、各領域に固有のコンテキスト情報を組み込んで、類似したデータ可用性課題に直面した領域への効果的な転送を保証するように設計されており、ドメイン固有の適応を必要とする様々なドメインにまたがって一般化可能である。我々は,(1)刑事司法制度,(2)患者プライバシの懸念と医療データの複雑さが包括的特徴収集を制限する医療分野を特徴とする領域,という2つのケーススタディを用いて,枠組みを検証した。以上の結果から,推定潜時特徴は地上の真理ラベルとよく一致し,下流の分類器を著しく強化することがわかった。 Predictive modeling often faces challenges due to limited data availability and quality, especially in domains where collected features are weakly correlated with outcomes and where additional feature collection is constrained by ethical or practical difficulties. Traditional machine learning (ML) models struggle to incorporate unobserved yet critical factors. In this work, we introduce an effective approach to formulate latent feature mining as text-to-text propositional logical reasoning. We propose FLAME (Faithful Latent Feature Mining for Predictive Model Enhancement), a framework that leverages large language models (LLMs) to augment observed features with latent features and enhance the predictive power of ML models in downstream tasks. Our framework is generalizable across various domains with necessary domain-specific adaptation, as it is designed to incorporate contextual information unique to each area, ensuring effective transfer to different areas facing similar data availability challenges. We validate our framework with two case studies: (1) the criminal justice system, a domain characterized by limited and ethically challenging data collection; (2) the healthcare domain, where patient privacy concerns and the complexity of medical data limit comprehensive feature collection. Our results show that inferred latent features align well with ground truth labels and significantly enhance the downstream classifier.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# TIS-DPO:推定重み付き直接選好最適化のためのトークンレベルの重要度サンプリング TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights ( http://arxiv.org/abs/2410.04350v1 ) ライセンス: Link先を確認	Aiwei Liu, Haoping Bai, Zhiyun Lu, Yanchao Sun, Xiang Kong, Simon Wang, Jiulong Shan, Albin Madappally Jose, Xiaojiang Liu, Lijie Wen, Philip S. Yu, Meng Cao,	(参考訳) 直接選好最適化(DPO)は、その単純さと有効性から、Large Language Models(LLM)の選好アライメントに広く採用されている。しかし、DPOは、全応答が単一アームとして扱われるバンディット問題として導出され、トークン間の重要性の違いを無視し、最適化効率に影響を及ぼし、最適な結果を得るのが難しくなる。本研究では, DPO の最適データは, トークンの重要度に差がないため, 勝ち負けにおける各トークンに対して等しく期待される報酬を持つことを示す。しかし、この最適データセットは実際には利用できないため、重要サンプリングのために元のデータセットを用いて、偏りのない最適化を実現することを提案する。そこで本稿では,TIS-DPO と呼ばれるトークン単位の重要度サンプリング DPO の目的について提案する。従来の研究から着想を得て,一対の対照的なLLMからの予測確率の差を用いて,トークンの重要度を推定した。提案手法は,(1) 元のLDMをコントラスト的プロンプトで導くこと,(2) 勝敗応答を用いて2つの別々のLDMを訓練すること,(3) 勝敗応答を用いて前後DPOトレーニングを行うこと,の3つである。実験により、TIS-DPOは、無害性、無益性アライメントおよび要約タスクにおいて、様々なベースライン手法を著しく上回っていることが示された。また、推定重量を可視化し、キートークンの位置を識別する能力を示す。 Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the optimal dataset is unavailable in practice, we propose using the original dataset for importance sampling to achieve unbiased optimization. Accordingly, we propose a token-level importance sampling DPO objective named TIS-DPO that assigns importance weights to each token based on its reward. Inspired by previous works, we estimate the token importance weights using the difference in prediction probabilities from a pair of contrastive LLMs. We explore three methods to construct these contrastive LLMs: (1) guiding the original LLM with contrastive prompts, (2) training two separate LLMs using winning and losing responses, and (3) performing forward and reverse DPO training with winning and losing responses. Experiments show that TIS-DPO significantly outperforms various baseline methods on harmlessness and helpfulness alignment and summarization tasks. We also visualize the estimated weights, demonstrating their ability to identify key token positions.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# Androidマルウェア検出の強化:ChatGPTが意思決定中心タスクに及ぼす影響 Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task ( http://arxiv.org/abs/2410.04352v1 ) ライセンス: Link先を確認	Yao Li, Sen Fang, Tao Zhang, Haipeng Cai,	(参考訳) ChatGPTのような大規模言語モデルの台頭により、非決定モデルが様々なタスクに適用されている。さらにChatGPTは、Androidのマルウェア検出における従来の意思決定中心のタスクに注意を向けている。研究者によって提案された効果的な検出方法にもかかわらず、それらは低い解釈可能性の問題に直面している。具体的には、これらのメソッドは、良心的または悪意的なアプリケーション分類に優れ、悪意のある振る舞いを検出できるが、意思決定に関する詳細な説明は提供できないことが多い。この課題は、既存の検出スキームの信頼性に関する懸念を高め、複雑なデータを理解する真の能力に疑問を投げかける。本研究では,非決定モデルChatGPTがAndroidマルウェア検出における従来の意思決定中心タスクに与える影響について検討する。 Drebin、XMAL、MaMaDroidの3つの最先端ソリューションを選択し、公開データセットに関する一連の実験を行い、包括的な比較と分析を行う。この結果から,これらの決定駆動型ソリューションは,基盤となるデータを真に理解するのではなく,データセット内の統計的パターンに依存していることが示唆された。対照的に、ChatGPTは非決定モデルであり、包括的な分析レポートの提供に優れ、解釈可能性を大幅に向上させる。さらに、経験豊富な開発者を対象に調査を実施します。この結果は、ChatGPTの詳細な洞察を提供し、課題の効率性と理解を高めることによって、開発者のChatGPTに対する好みを強調している。一方、これらの研究と分析は深い洞察を与え、開発者はAndroidのマルウェア検出に新たな視点を与え、非決定的な視点から検出結果の信頼性を高める。 With the rise of large language models, such as ChatGPT, non-decisional models have been applied to various tasks. Moreover, ChatGPT has drawn attention to the traditional decision-centric task of Android malware detection. Despite effective detection methods proposed by scholars, they face low interpretability issues. Specifically, while these methods excel in classifying applications as benign or malicious and can detect malicious behavior, they often fail to provide detailed explanations for the decisions they make. This challenge raises concerns about the reliability of existing detection schemes and questions their true ability to understand complex data. In this study, we investigate the influence of the non-decisional model, ChatGPT, on the traditional decision-centric task of Android malware detection. We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid, conduct a series of experiments on publicly available datasets, and carry out a comprehensive comparison and analysis. Our findings indicate that these decision-driven solutions primarily rely on statistical patterns within datasets to make decisions, rather than genuinely understanding the underlying data. In contrast, ChatGPT, as a non-decisional model, excels in providing comprehensive analysis reports, substantially enhancing interpretability. Furthermore, we conduct surveys among experienced developers. The result highlights developers' preference for ChatGPT, as it offers in-depth insights and enhances efficiency and understanding of challenges. Meanwhile, these studies and analyses offer profound insights, presenting developers with a novel perspective on Android malware detection--enhancing the reliability of detection results from a non-decisional perspective.	翻訳日:2024-11-02 08:20:17 公開日:2024-10-06
# 超量子状態の幾何学と絡み合い Geometry and Entanglement of Super-Qubit Quantum States ( http://arxiv.org/abs/2410.04361v1 ) ライセンス: Link先を確認	Oktay K. Pashaev, Aygul Kocak,	(参考訳) 我々は、零点と1つの超粒子状態の重畳によって決定される超量子状態を導入し、超ブロック球面上の点で表すことができる。 1つの量子ビットの場合とは対照的に、1つの超粒子状態は、別の超ブロック球に等しい拡張された複素平原の点によって特徴づけられる。幾何学的には、超量子状態は2つの単位球面、または2つのブロッホ球面の直積で表される。超量子状態に作用する変位演算子を用いて、超コヒーレント状態を構築し、超消滅作用素の固有状態となり、2つの超ブロック球の変位パラメータと立体射影の3つの複素数で特徴づける。状態はフェルミオンボソン絡み合っており、状態の共起は2つのブロッホ球に対応する2つの共起子の積である。球面上の点状態から垂直軸(点状態を通る水平面における円の半径)までの距離として、共起の幾何学的意味を示す。そして、北極状態と南極状態との崩壊確率は、状態の垂直座標から極の対応する点への半距離に等しい。補体フェルミオン数演算子に対しては、転置された超消滅作用素の固有状態として、フリップされた超量子状態と対応する超コヒーレント状態を得る。複素平原におけるフィボナッチ振動円の無限集合と、2つのフィボナッチ数の比として不確実性を持つ量子状態の対応する集合と、無限の極限がゴールデンラティオの不確実性となるような極限とが導かれる。 We introduce the super-qubit quantum state, determined by superposition of the zero and the one super-particle states, which can be represented by points on the super-Bloch sphere. In contrast to the one qubit case, the one super-particle state is characterized by points in extended complex plain, equivalent to another super-Bloch sphere. Then, geometrically, the super-qubit quantum state is represented by two unit spheres, or the direct product of two Bloch spheres. By using the displacement operator, acting on the super-qubit state as the reference state, we construct the super-coherent states, becoming eigenstates of the super-annihilation operator, and characterized by three complex numbers, the displacement parameter and stereographic projections of two super-Bloch spheres. The states are fermion-boson entangled, and the concurrence of states is the product of two concurrences, corresponding to two Bloch spheres. We show geometrical meaning of concurrence as distance from point-state on the sphere to vertical axes - the radius of circle at horizontal plane through the point-state. Then, probabilities of collapse to the north pole state and to the south pole state are equal to half-distances from vertical coordinate of the state to corresponding points at the poles. For complimentary fermion number operator, we get the flipped super-qubit state and corresponding super-coherent state, as eigenstate of transposed super-annihilation operator. The infinite set of Fibonacci oscillating circles in complex plain, and corresponding set of quantum states with uncertainty relations as the ratio of two Fibonacci numbers, and in the limit at infinity becoming the Golden Ratio uncertainty, is derived.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# RespDiff: PPG信号からの呼吸波形推定のためのマルチスケールRNN拡散モデル RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals ( http://arxiv.org/abs/2410.04366v1 ) ライセンス: Link先を確認	Yuyang Miao, Zehua Chen, Chang Li, Danilo Mandic,	(参考訳) 呼吸率(RR)は、しばしば不都合なシナリオ下で監視される重要な健康指標であり、継続的なモニタリングの実用性を制限する。 Photoplethysmography(PPG)センサーは、ますますウェアラブルデバイスに統合され、ポータブルな方法でRRを継続的に推定する機会を提供する。本稿では,PSG信号からの呼吸波形推定のためのエンドツーエンドマルチスケールRNN拡散モデルであるRespDiffを提案する。 RespDiffは手作りの機能や低品質信号セグメントの排除を必要としないため、現実のシナリオに適している。モデルはマルチスケールエンコーダを使用し、異なる解像度で特徴を抽出し、双方向RNNを使用してPSG信号を処理し、呼吸波形を抽出する。さらに、モデルをさらに最適化するためにスペクトル損失項が導入された。 BIDMCデータセットで実施された実験では、RespDiffはRR推定の1.18bpmの平均絶対誤差(MAE)を達成し、他のものは1.66bpmから2.15bpmの範囲で達成し、実際の応用における堅牢で正確な呼吸モニタリングの可能性を示している。 Respiratory rate (RR) is a critical health indicator often monitored under inconvenient scenarios, limiting its practicality for continuous monitoring. Photoplethysmography (PPG) sensors, increasingly integrated into wearable devices, offer a chance to continuously estimate RR in a portable manner. In this paper, we propose RespDiff, an end-to-end multi-scale RNN diffusion model for respiratory waveform estimation from PPG signals. RespDiff does not require hand-crafted features or the exclusion of low-quality signal segments, making it suitable for real-world scenarios. The model employs multi-scale encoders, to extract features at different resolutions, and a bidirectional RNN to process PPG signals and extract respiratory waveform. Additionally, a spectral loss term is introduced to optimize the model further. Experiments conducted on the BIDMC dataset demonstrate that RespDiff outperforms notable previous works, achieving a mean absolute error (MAE) of 1.18 bpm for RR estimation while others range from 1.66 to 2.15 bpm, showing its potential for robust and accurate respiratory monitoring in real-world applications.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# ランダムトランスのアルゴリズム機能 Algorithmic Capabilities of Random Transformers ( http://arxiv.org/abs/2410.04368v1 ) ライセンス: Link先を確認	Ziqian Zhong, Jacob Andreas,	(参考訳) トレーニングされたトランスモデルは、算術や連想的リコールのようなタスクの解釈可能なプロシージャを実装することが知られているが、これらのプロシージャを実装する回路がトレーニング中にどのように発生するかはほとんど分かっていない。モデルに提供される監視信号にどの程度依存するか、トレーニング開始時のモデルにすでに存在する振る舞いにどの程度寄与するか? そこで本研究では,組込み層のみを最適化したランダム初期化変換器を用いて,データから学習可能な入出力マッピングが,ランダム初期化モデルによって既に実装されている(符号化方式の選択まで)関数であることを示す。これらのランダムトランスフォーマーは、モジュラー演算、インウェイト、コンテキスト内連想リコール、十進加算、括弧バランス、さらには自然言語テキスト生成のいくつかの側面を含む、幅広い意味あるアルゴリズムタスクを実行できる。以上の結果から,これらのモデルが訓練される前であっても,トランスフォーマ(かつ適切な構造化された入力を通じてアクセス可能な)にアルゴリズム能力が存在することが示唆された。コードはhttps://github.com/fjzzq2002/random_transformersで入手できる。 Trained transformer models have been found to implement interpretable procedures for tasks like arithmetic and associative recall, but little is understood about how the circuits that implement these procedures originate during training. To what extent do they depend on the supervisory signal provided to models, and to what extent are they attributable to behavior already present in models at the beginning of training? To investigate these questions, we investigate what functions can be learned by randomly initialized transformers in which only the embedding layers are optimized, so that the only input--output mappings learnable from data are those already implemented (up to a choice of encoding scheme) by the randomly initialized model. We find that these random transformers can perform a wide range of meaningful algorithmic tasks, including modular arithmetic, in-weights and in-context associative recall, decimal addition, parenthesis balancing, and even some aspects of natural language text generation. Our results indicate that some algorithmic capabilities are present in transformers (and accessible via appropriately structured inputs) even before these models are trained. Code is available at https://github.com/fjzzq2002/random_transformers.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# DiffusionFake: ガイド付き安定拡散によるディープフェイク検出における一般化の促進 DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion ( http://arxiv.org/abs/2410.04372v1 ) ライセンス: Link先を確認	Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji,	(参考訳) Deepfakeテクノロジーの急速な進歩により、顔交換は非常に現実的になり、偽造された顔コンテンツの悪意ある使用に対する懸念が高まっている。既存の方法は、顔操作の多様な性質のため、目に見えない領域に一般化するのに苦労することが多い。本稿では、生成過程を再検討し、普遍原理を同定する: ディープフェイク画像は、本質的に、ソースとターゲットの同一性の両方の情報を含んでいるが、真の顔は、一貫した同一性を維持している。この知見に基づいて,顔偽造の生成過程を逆転させて検出モデルの一般化を促進する新しいプラグ・アンド・プレイフレームワークであるDiffusionFakeを紹介した。 DiffusionFakeは、検出モデルによって抽出された特徴を凍結したトレーニング済みの安定拡散モデルに注入し、対応するターゲットとソースイメージを再構築する。このガイド付き再構成プロセスは、検出ネットワークを制約して、ソースとターゲットに関する特徴を捕捉し、再構成を容易にし、その結果、目に見えない偽造に対してより回復力のある、リッチで非絡み合った表現を学習する。大規模な実験により、DiffusionFakeは推論中に追加パラメータを導入することなく、様々な検出器アーキテクチャのドメイン間一般化を大幅に改善することが示された。私たちのコードはhttps://github.com/skJack/DiffusionFake.gitで利用可能です。 The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. In this paper, we revisit the generation process and identify a universal principle: Deepfake images inherently contain information from both source and target identities, while genuine faces maintain a consistent identity. Building upon this insight, we introduce DiffusionFake, a novel plug-and-play framework that reverses the generative process of face forgeries to enhance the generalization of detection models. DiffusionFake achieves this by injecting the features extracted by the detection model into a frozen pre-trained Stable Diffusion model, compelling it to reconstruct the corresponding target and source images. This guided reconstruction process constrains the detection network to capture the source and target related features to facilitate the reconstruction, thereby learning rich and disentangled representations that are more resilient to unseen forgeries. Extensive experiments demonstrate that DiffusionFake significantly improves cross-domain generalization of various detector architectures without introducing additional parameters during inference. Our Codes are available in https://github.com/skJack/DiffusionFake.git.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# 学習効率の良い生成可能な純状態の計算複雑性 Computational Complexity of Learning Efficiently Generatable Pure States ( http://arxiv.org/abs/2410.04373v1 ) ライセンス: Link先を確認	Taiga Hiroka, Min-Hsiu Hsieh,	(参考訳) 様々な学習モデルにおける学習効率のよい古典的プログラムの計算複雑性を理解することは、古典的学習理論において基礎的で重要な問題である。本研究では,Kearnsらによって導入された分散学習の量子一般化として見ることができる量子状態学習の計算複雑性について検討する。 Chung と Lin [TQC21] と B\u{a}descu と O$'$Donnell [STOC21] による以前の研究は、量子状態学習のサンプルの複雑さを研究し、未知の量子状態が効率的に生成可能であれば多項式コピーが十分であることを示した。しかし、アルゴリズムは非効率であり、この学習問題の計算複雑性は未解決のままである。本研究では、状態が効率的に生成可能であることを約束する量子状態学習の計算複雑性について検討する。未知の量子状態が純粋状態であることを約束し、効率的に生成可能であるなら、量子多項式時間アルゴリズム$A$と言語$L \in PP$が存在して、$A^L$はその古典的な記述を学べることを示す。また、学習量子状態の硬さと量子暗号の関連性も観察する。純粋な状態出力を持つ一方通行状態生成器の存在は、学習純状態の平均ケース硬度と等価であることを示す。さらに、EFIの存在は、混合状態の学習における平均的なケース硬さを意味することを示す。 Understanding the computational complexity of learning efficient classical programs in various learning models has been a fundamental and important question in classical computational learning theory. In this work, we study the computational complexity of quantum state learning, which can be seen as a quantum generalization of distributional learning introduced by Kearns et.al [STOC94]. Previous works by Chung and Lin [TQC21], and B\u{a}descu and O$'$Donnell [STOC21] study the sample complexity of the quantum state learning and show that polynomial copies are sufficient if unknown quantum states are promised efficiently generatable. However, their algorithms are inefficient, and the computational complexity of this learning problem remains unresolved. In this work, we study the computational complexity of quantum state learning when the states are promised to be efficiently generatable. We show that if unknown quantum states are promised to be pure states and efficiently generateable, then there exists a quantum polynomial time algorithm $A$ and a language $L \in PP$ such that $A^L$ can learn its classical description. We also observe the connection between the hardness of learning quantum states and quantum cryptography. We show that the existence of one-way state generators with pure state outputs is equivalent to the average-case hardness of learning pure states. Additionally, we show that the existence of EFI implies the average-case hardness of learning mixed states.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# 人間と敵対するテキストの関連性について Suspiciousness of Adversarial Texts to Human ( http://arxiv.org/abs/2410.04377v1 ) ライセンス: Link先を確認	Shakila Mahjabin Tonni, Pedro Faustini, Mark Dras,	(参考訳) 敵対的な例は、画像ドメインとテキストドメインの両方にわたるディープニューラルネットワーク(DNN)に対して、微妙に変化した入力によってモデルパフォーマンスを低下させることを意図して、大きな課題となっている。しかし、敵対的テキストは、意味的類似性やテキスト内容の離散的な性質が要求されるため、敵対的画像とは異なっている。この研究は、人間の不審感という概念を掘り下げるものであり、画像に基づく敵の例に見られる非受容性に対する伝統的な焦点とは異なる品質である。敵対的変化が人間の目と区別できないように意図されている画像とは異なり、テキストの敵対的内容は、NLPシステムやバイパスフィルターを欺くことを目的としている場合でも、人間の読者にとって見つからない、あるいは目立たないままでいなければならない。本研究では、個人が敵対的文章をどのように知覚するかを分析することによって、人間の不審性の研究を拡大する。筆者らは,4つの広く使用されている対人攻撃法によって構築された,敵文の不審性に関する人間の評価に関する新たなデータセットを収集,公開し,機械による変化を検出する人間の能力との相関性を評価する。さらに,疑わしいテキスト生成における疑わしさを軽減するために,疑わしさを定量化し,今後の研究のベースラインを確立するための回帰モデルを構築した。また、回帰器が生成した疑わしいスコアが、コンピュータ生成と見なされる可能性が低いテキストを生成するために、逆生成方法にどのように組み込まれるかを示す。人間の不審な注釈付きデータとコードを利用できるようにします。 Adversarial examples pose a significant challenge to deep neural networks (DNNs) across both image and text domains, with the intent to degrade model performance through meticulously altered inputs. Adversarial texts, however, are distinct from adversarial images due to their requirement for semantic similarity and the discrete nature of the textual contents. This study delves into the concept of human suspiciousness, a quality distinct from the traditional focus on imperceptibility found in image-based adversarial examples. Unlike images, where adversarial changes are meant to be indistinguishable to the human eye, textual adversarial content must often remain undetected or non-suspicious to human readers, even when the text's purpose is to deceive NLP systems or bypass filters. In this research, we expand the study of human suspiciousness by analyzing how individuals perceive adversarial texts. We gather and publish a novel dataset of Likert-scale human evaluations on the suspiciousness of adversarial sentences, crafted by four widely used adversarial attack methods and assess their correlation with the human ability to detect machine-generated alterations. Additionally, we develop a regression-based model to quantify suspiciousness and establish a baseline for future research in reducing the suspiciousness in adversarial text generation. We also demonstrate how the regressor-generated suspicious scores can be incorporated into adversarial generation methods to produce texts that are less likely to be perceived as computer-generated. We make our human suspiciousness annotated data and our code available.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# 半古典的アプローチにおける対のゆらぎの包含:ジョセフソン効果の研究の場合 Inclusion of pairing fluctuations in a semiclassical approach: The case of study of the Josephson effect ( http://arxiv.org/abs/2410.04382v1 ) ライセンス: Link先を確認	Verdiana Piselli, Leonardo Pisani, Giancarlo Calvanese Strinati,	(参考訳) 半古典的手法の最近の改良を概観し、非自明な空間幾何学の存在下での不均一な局所ギャップパラメータを記述し、同時に平均場を超えたペアリング変動を考慮した。この手法を用いて、超低温フェルミガスを用いた最近の実験に関する幅広い物理条件に対するジョセフソン効果を記述する。 Recent refinements on a semiclassical approach are reviewed, aiming at describing the inhomogeneous local gap parameter in the presence of non-trivial spatial geometries and at taking into account at the same time pairing fluctuations beyond mean field. The method is applied to describe the Josephson effect over the wide range of physical conditions related to recent experiments on this topic performed with ultra-cold Fermi gases.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# BrainCodec:認知脳状態の復号のためのニューラルfMRIコーデック BrainCodec: Neural fMRI codec for the decoding of cognitive brain states ( http://arxiv.org/abs/2410.04383v1 ) ライセンス: Link先を確認	Yuto Nishimura, Masataka Sawayama, Ayumu Yamashita, Hideki Nakayama, Kaoru Amano,	(参考訳) 近年、ディープラーニングにおけるビッグデータの活用は、fMRIデータを用いたメンタルステートデコーディングなどのアプリケーションで確認されたように、大幅なパフォーマンス向上につながっている。しかし、fMRIデータセットの規模は比較的小さく、fMRIデータにおける低信号対雑音比(SNR)の固有の問題は、これらの課題をさらに悪化させる。これを解決するために、fMRIデータの前処理ステップとして圧縮技術を適用する。ニューラルオーディオコーデックに触発された新しいfMRIコーデックであるBrainCodecを提案する。我々は、ブレインコーデックの精神状態復号における圧縮能力を評価し、従来の方法よりもさらに改善したことを示す。さらに、BrainCodecを用いて得られた潜伏表現を分析し、タスクと静止状態fMRIの類似点と相違点を解明し、BrainCodecの解釈可能性を強調した。また,BrainCodecを用いたfMRI再構成により,高いSNRを達成し,脳活動の可視性を高めることが実証された。我々の研究は、BrainCodecが従来の方法よりも性能を高めるだけでなく、ニューロサイエンスに新たな分析可能性をもたらすことを示している。私たちのコード、データセット、モデルウェイトはhttps://github.com/amano-k-lab/BrainCodec.comで公開されています。 Recently, leveraging big data in deep learning has led to significant performance improvements, as confirmed in applications like mental state decoding using fMRI data. However, fMRI datasets remain relatively small in scale, and the inherent issue of low signal-to-noise ratios (SNR) in fMRI data further exacerbates these challenges. To address this, we apply compression techniques as a preprocessing step for fMRI data. We propose BrainCodec, a novel fMRI codec inspired by the neural audio codec. We evaluated BrainCodec's compression capability in mental state decoding, demonstrating further improvements over previous methods. Furthermore, we analyzed the latent representations obtained through BrainCodec, elucidating the similarities and differences between task and resting state fMRI, highlighting the interpretability of BrainCodec. Additionally, we demonstrated that fMRI reconstructions using BrainCodec can enhance the visibility of brain activity by achieving higher SNR, suggesting its potential as a novel denoising method. Our study shows that BrainCodec not only enhances performance over previous methods but also offers new analytical possibilities for neuroscience. Our codes, dataset, and model weights are available at https://github.com/amano-k-lab/BrainCodec.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# データ分散評価 Data Distribution Valuation ( http://arxiv.org/abs/2410.04386v1 ) ライセンス: Link先を確認	Xinyi Xu, Shuaiqi Wang, Chuan-Sheng Foo, Bryan Kian Hsiang Low, Giulia Fanti,	(参考訳) データアベイラビリティ(Data valuation)は、データマーケットプレースにおける価格などのアプリケーションのデータ価値を定量的に評価するテクニックのクラスである。既存のデータバリュエーションメソッドは、離散データセットの値を定義します。しかし、多くのユースケースでは、ユーザはデータセットの値だけでなく、データセットがサンプリングされた分布の値にも興味を持っています。例えば、異なるベンダーからデータを購入するかどうかを評価しようとする買い手について考えてみましょう。購入者は、各ベンダーの小さなプレビューサンプルのみを観察して、購入者および購入者に最も有用なベンダーのデータ配布を決定することができる。中心的な疑問は、サンプルからのデータ分散値を比較するにはどうすればよいか、ということです。本研究では, ベンダー間のデータ不均一性を特徴付けるHuber の評価手法として, サンプルからのデータ分布を比較するための理論的に原理化された, 行動可能なポリシーを実現するための, MMD に基づく評価手法を提案する。実世界の複数のデータセット(例えば、ネットワーク侵入検出、クレジットカード不正検出)や下流アプリケーション(分類、回帰)において、本手法はサンプル効率が高く、複数の既存ベースラインに対して有意義なデータ分布を特定するのに有効であることを示す。 Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces. Existing data valuation methods define a value for a discrete dataset. However, in many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. For example, consider a buyer trying to evaluate whether to purchase data from different vendors. The buyer may observe (and compare) only a small preview sample from each vendor, to decide which vendor's data distribution is most useful to the buyer and purchase. The core question is how should we compare the values of data distributions from their samples? Under a Huber characterization of the data heterogeneity across vendors, we propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies for comparing data distributions from samples. We empirically demonstrate that our method is sample-efficient and effective in identifying valuable data distributions against several existing baselines, on multiple real-world datasets (e.g., network intrusion detection, credit card fraud detection) and downstream applications (classification, regression).	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# WISE: ビジネス・プロセスのメトリクスをドメイン・知識で解き放つ WISE: Unraveling Business Process Metrics with Domain Knowledge ( http://arxiv.org/abs/2410.04387v1 ) ライセンス: Link先を確認	Urszula Jessen, Dirk Fahland,	(参考訳) 複雑な産業プロセスの異常は、しばしば、イベントデータの高い変動性と複雑さによって隠蔽され、プロセスマイニングによるその識別と解釈を妨げる。この問題に対処するために、ドメイン知識、プロセスマイニング、機械学習の統合を通じてビジネスプロセスメトリクスを分析する新しい方法であるWISE(Weighted Insights for Evaluating Effective)を紹介する。この方法論は、ビジネス目標を定義し、アクティビティレベルで重み付けされた制約のあるプロセスノルムを確立することを含み、ドメインの専門家やプロセスアナリストからのインプットを取り入れます。個々のプロセスインスタンスはこれらの制約に基づいてスコアされ、スコアはプロセスのゴールに影響を与える特徴を特定するために正規化されます。 BPIC 2019データセットと実際の産業状況を用いた評価は、WISEがビジネスプロセス分析の自動化を強化し、望ましいプロセスフローからの逸脱を効果的に検出することを示している。 LLMは分析をサポートするが、ドメインの専門家が加わったことにより、発見の正確さと妥当性が保証される。 Anomalies in complex industrial processes are often obscured by high variability and complexity of event data, which hinders their identification and interpretation using process mining. To address this problem, we introduce WISE (Weighted Insights for Evaluating Efficiency), a novel method for analyzing business process metrics through the integration of domain knowledge, process mining, and machine learning. The methodology involves defining business goals and establishing Process Norms with weighted constraints at the activity level, incorporating input from domain experts and process analysts. Individual process instances are scored based on these constraints, and the scores are normalized to identify features impacting process goals. Evaluation using the BPIC 2019 dataset and real industrial contexts demonstrates that WISE enhances automation in business process analysis and effectively detects deviations from desired process flows. While LLMs support the analysis, the inclusion of domain experts ensures the accuracy and relevance of the findings.	翻訳日:2024-11-02 08:10:32 公開日:2024-10-06
# モンテカルロ予測最大化を用いた音響的空間キャプチャーの近似最大推定 Approximate Maximum Likelihood Inference for Acoustic Spatial Capture-Recapture with Unknown Identities, Using Monte Carlo Expectation Maximization ( http://arxiv.org/abs/2410.04390v1 ) ライセンス: Link先を確認	Yuheng Wang, Juan Ye, Weiye Li, David L. Borchers,	(参考訳) 音響空間キャプチャー(ASCR)サーベイは、動物の密度を推定したり、呼び出し密度を推定するのに有効な方法である。しかし、ASCR分析に必要なキャプチャ履歴を構築することは困難であり、異なる検出器でのどの検出が、どの呼び出しが自明なタスクであるかを認識することは難しい。異なる距離からの呼び出しは検知器に到達するのに異なる時間を要するため、呼び出しが検出される順序は、その呼び出しが実行される順序と必ずしも同じではなく、どの検出が同じ呼び出しであるかがわからなければ、どのくらいの異なる呼び出しが検出されるかはわからない。本稿では,モンテカルロ予測最大化(MCEM)推定法を提案する。この文脈でMCEM法を実装するために、予測ステップで完全データ確率モデルから潜伏変数をサンプリングし、最大化ステップで半完全データ確率または条件付き確率を使用する。パラメトリックブートストラップを用いて信頼区間を求める。本手法をカスカエル調査に適用すると, 専門家が作成した呼取履歴データを用いて得られた推定値の15%以内を推定し, 後者と異なり, この信頼区間は呼出同一性に関する不確実性を含む。シミュレーションでは、バイアス(6%)が低く、カバー確率が95%に近いことが示されている。 Acoustic spatial capture-recapture (ASCR) surveys with an array of synchronized acoustic detectors can be an effective way of estimating animal density or call density. However, constructing the capture histories required for ASCR analysis is challenging, as recognizing which detections at different detectors are of which calls is not a trivial task. Because calls from different distances take different times to arrive at detectors, the order in which calls are detected is not necessarily the same as the order in which they are made, and without knowing which detections are of the same call, we do not know how many different calls are detected. We propose a Monte Carlo expectation-maximization (MCEM) estimation method to resolve this unknown call identity problem. To implement the MCEM method in this context, we sample the latent variables from a complete-data likelihood model in the expectation step and use a semi-complete-data likelihood or conditional likelihood in the maximization step. We use a parametric bootstrap to obtain confidence intervals. When we apply our method to a survey of moss frogs, it gives an estimate within 15% of the estimate obtained using data with call capture histories constructed by experts, and unlike this latter estimate, our confidence interval incorporates the uncertainty about call identities. Simulations show it to have a low bias (6%) and coverage probabilities close to the nominal 95% value.	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# Recursively Subdivided Tetrahedra を用いた変形性NeRF Deformable NeRF using Recursively Subdivided Tetrahedra ( http://arxiv.org/abs/2410.04402v1 ) ライセンス: Link先を確認	Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang,	(参考訳) ニューラル放射場(NeRF)は、新しいビュー合成において有望であるが、その暗黙的な表現は、オブジェクト操作に対する明示的な制御を制限する。既存の研究では、変形を可能にするための明示的な幾何学的プロキシの統合が提案されている。しかし、これらの手法は2つの大きな課題に直面している: 第一に、時間がかかり、計算的に要求される四面体化プロセス; 第二に、複雑な構造や細い構造を扱うことは、過度に、貯蔵集約的な四面体メッシュか、変形能力を損なう品質の悪いもののいずれかにつながる。これらの課題に対処するために,四面体メッシュのマニピュラビリティと特徴格子表現の高品質なレンダリング機能とをシームレスに統合するDeformRFを提案する。各物体に対する不整形四面体と四面体化を避けるため, 2段階の訓練戦略を提案する。ほぼ規則な四面体格子から始めると、このモデルは最初、物体を囲むキーテトラヘドラを保持し、その後、2段目においてより微細な粒度メッシュを用いてオブジェクトの詳細を洗練する。また,高分解能メッシュを暗黙的に生成するために,再帰的に分割するテトラヘドラの概念も提示する。これにより、第1のトレーニング段階で発生する粗い四面体メッシュの保存のみを必要としながら、マルチレゾリューション符号化が可能となる。合成データと実撮データの両方でDeformRFを総合的に評価する。定量的および定性的な結果は,新しいビュー合成および変形タスクにおける本手法の有効性を示すものである。プロジェクトページ:https://ustc3dv.github.io/DeformRF/ While neural radiance fields (NeRF) have shown promise in novel view synthesis, their implicit representation limits explicit control over object manipulation. Existing research has proposed the integration of explicit geometric proxies to enable deformation. However, these methods face two primary challenges: firstly, the time-consuming and computationally demanding tetrahedralization process; and secondly, handling complex or thin structures often leads to either excessive, storage-intensive tetrahedral meshes or poor-quality ones that impair deformation capabilities. To address these challenges, we propose DeformRF, a method that seamlessly integrates the manipulability of tetrahedral meshes with the high-quality rendering capabilities of feature grid representations. To avoid ill-shaped tetrahedra and tetrahedralization for each object, we propose a two-stage training strategy. Starting with an almost-regular tetrahedral grid, our model initially retains key tetrahedra surrounding the object and subsequently refines object details using finer-granularity mesh in the second stage. We also present the concept of recursively subdivided tetrahedra to create higher-resolution meshes implicitly. This enables multi-resolution encoding while only necessitating the storage of the coarse tetrahedral mesh generated in the first training stage. We conduct a comprehensive evaluation of our DeformRF on both synthetic and real-captured datasets. Both quantitative and qualitative results demonstrate the effectiveness of our method for novel view synthesis and deformation tasks. Project page: https://ustc3dv.github.io/DeformRF/	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# CiMaTe: メインテキストを効果的に活用するCitation Count予測 CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text ( http://arxiv.org/abs/2410.04404v1 ) ライセンス: Link先を確認	Jun Hirako, Ryohei Sasano, Koichi Takeda,	(参考訳) 今後,論文の引用数を予測することは,論文数が増え続ける中で,興味深い論文を見つける上でますます重要である。論文の本文は引用数予測において重要な要素であるが,本文は典型的に非常に長いため,機械学習モデルでは処理が困難である。本稿では,論文の断面構造を明示的に把握し,主文を利用したBERTに基づく引用数予測モデルCiMaTeを提案する。計算言語学および生物学領域の論文による実験を通じて、スピアマンのランク相関係数(計算言語学領域の5.1点、生物学領域の1.8点)において、CiMaTeの有効性を実証した。 Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to handle in machine learning models because the main text is typically very long; thus previous studies have not fully explored how to leverage it. In this paper, we propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure. Through experiments with papers from computational linguistics and biology domains, we demonstrate the CiMaTe's effectiveness, outperforming the previous methods in Spearman's rank correlation coefficient; 5.1 points in the computational linguistics domain and 1.8 points in the biology domain.	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# Lens: 大規模言語モデルの多言語拡張を再考する Lens: Rethinking Multilingual Enhancement for Large Language Models ( http://arxiv.org/abs/2410.04407v1 ) ライセンス: Link先を確認	Weixiang Zhao, Yulin Hu, Jiahe Guo, Xingyu Sui, Tongtong Wu, Yang Deng, Yanyan Zhao, Bing Qin, Wanxiang Che, Ting Liu,	(参考訳) 多様な言語背景を持つユーザ向けの大規模言語モデル(LLM)の世界的な需要が高まっているにもかかわらず、最先端のLLMのほとんどは英語中心のままである。これにより、言語間でのパフォーマンスギャップが生じ、非英語話者の高度なAIサービスへのアクセスが制限される。現在の多言語機能向上手法は、多言語命令チューニングや連続的事前学習といったデータ駆動型後学習技術に大きく依存している。しかし、これらのアプローチは、高品質な多言語データセットの不足や、多言語機能の制限された拡張など、重大な課題に直面している。彼らはしばしば標的外問題や中央言語能力の破滅的な忘れ込みに悩まされる。この目的のために、Lensは、内部言語表現空間を活用することで、LLMの多言語機能を強化するための新しいアプローチである。特にLensは、LLMの上位層から言語に依存しない、言語固有のサブ空間内の隠された表現を操作することで動作する。中央言語をピボットとして使用すると、ターゲット言語は言語に依存しない部分空間内でそれに近い位置に描画されるため、十分に確立されたセマンティック表現を継承することができる。一方、言語固有の部分空間では、ターゲット言語と中央言語の表現が切り離され、ターゲット言語自体が明確に表現される。 1つの英語中心のLLMと2つの多言語LLMの広範な実験により、Lensはバックボーンモデルの本来の中央言語能力を犠牲にすることなく、多言語のパフォーマンスを効果的に向上し、既存の訓練後のアプローチと比べて計算資源をはるかに少なくして優れた結果が得られることを示した。 Despite the growing global demand for large language models (LLMs) that serve users from diverse linguistic backgrounds, most cutting-edge LLMs remain predominantly English-centric. This creates a performance gap across languages, restricting access to advanced AI services for non-English speakers. Current methods to enhance multilingual capabilities largely rely on data-driven post-training techniques, such as multilingual instruction tuning or continual pre-training. However, these approaches encounter significant challenges, including the scarcity of high-quality multilingual datasets and the limited enhancement of multilingual capabilities. They often suffer from off-target issues and catastrophic forgetting of central language abilities. To this end, we propose Lens, a novel approach to enhance multilingual capabilities of LLMs by leveraging their internal language representation spaces. Specially, Lens operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. Using the central language as a pivot, the target language is drawn closer to it within the language-agnostic subspace, allowing it to inherit well-established semantic representations. Meanwhile, in the language-specific subspace, the representations of the target and central languages are pushed apart, enabling the target language to express itself distinctly. Extensive experiments on one English-centric and two multilingual LLMs demonstrate that Lens effectively improves multilingual performance without sacrificing the original central language capabilities of the backbone model, achieving superior results with much fewer computational resources compared to existing post-training approaches.	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# 低次グラフ上での最大カットのための量子近似最適化アルゴリズム Quantum Approximate Optimization Algorithms for Maxmimum Cut on Low-Girth Graphs ( http://arxiv.org/abs/2410.04409v1 ) ライセンス: Link先を確認	Tongyang Li, Yuexin Su, Ziyi Yang, Shengyu Zhang,	(参考訳) グラフ上の最大カット(MaxCut)は古典的なNPハード問題である。量子コンピューティングにおいて、Farhi、Gutmann、GoldstoneはMaxCutの問題を解決するためにQuantum Approximate Optimization Algorithm (QAOA)を提案した。カット分数(全エッジの出力カットのエッジの分数)に対する保証は、主に長い周期しか持たないグラフに対して研究された。一方、低木グラフは理論計算機科学においてユビキタスであり、拡張グラフは理論上およびそれ以上に広く応用された優れた例である。本稿では、加法積グラフとして知られるMohantyとO'Donnellによって提案された拡張グラフの集合上で、MaxCutにQAOAを適用する。さらに,多角QAOA (ma-QAOA) を用いて,加算積グラフのグラフ構造をよりよく活用する。理論的には、そのようなグラフの期待切断率を計算するための反復公式を導出する。一方,古典的局所アルゴリズムとQAOAを一定深度で比較するため,数値実験を行った。以上の結果から,QAOAはいくつかの付加積グラフで0.3%から5.2%,ma-QAOAは0.6%から2.5%でこの優位性を高めた。特に,ma-QAOAはよく知られた古典的アルゴリズムよりも優れているが,QAOAはそうではない。さらに、我々は実験をタイリンググリッドグラフのような平面グラフに拡張し、QAOAが有利であることを示す。 Maximum cut (MaxCut) on graphs is a classic NP-hard problem. In quantum computing, Farhi, Gutmann, and Goldstone proposed the Quantum Approximate Optimization Algorithm (QAOA) for solving the MaxCut problem. Its guarantee on cut fraction (the fraction of edges in the output cut over all edges) was mainly studied for high-girth graphs, i.e., graphs with only long cycles. On the other hand, low-girth graphs are ubiquitous in theoretical computer science, including expander graphs being outstanding examples with wide applications in theory and beyond. In this paper, we apply QAOA to MaxCut on a set of expander graphs proposed by Mohanty and O'Donnell known as additive product graphs. Additionally, we apply multi-angle QAOA (ma-QAOA) to better utilize the graph structure of additive product graphs in ansatz design. In theory, we derive an iterative formula to calculate the expected cut fraction of such graphs. On the other hand, we conduct numerical experiments to compare between best-known classical local algorithms and QAOA with constant depth. Our results demonstrate that QAOA outperforms the best-known classical algorithms by 0.3% to 5.2% on several additive product graphs, while ma-QAOA further enhances this advantage by an additional 0.6% to 2.5%. In particular, we observe cases that ma-QAOA exhibits superiority over best-known classical algorithms but QAOA does not. Furthermore, we extend our experiments to planar graphs such as tiling grid graphs, where QAOA also demonstrates an advantage.	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# Blocks Architecture (BloArk): Wikipediaの改訂履歴のための効率的で費用効果があり、インクリメンタルなデータセットアーキテクチャ Blocks Architecture (BloArk): Efficient, Cost-Effective, and Incremental Dataset Architecture for Wikipedia Revision History ( http://arxiv.org/abs/2410.04410v1 ) ライセンス: Link先を確認	Lingxi Li, Zonghai Yao, Sunjae Kwon, Hong Yu,	(参考訳) ウィキペディア(ウィキペディア)は、自然言語処理(NLP)アプリケーションにおいて最も広く使われ、一般に公開されているリソースの1つである。 Wikipedia Revision History (WikiRevHist) は、ウィキページが最初に修正されてから編集された順序を示している。最も最新のWikiはトレーニングソースとして広く使われているが、WikiRevHistはNLPアプリケーションにとって貴重なリソースでもある。しかし、WikiRevHistの処理には十分なコンピュータリソースを必要とせず、さらなるカスタマイズや、他人の作業への適応に余分な時間を費やすことなく、不十分なツールがある。そこで我々はBlocks Architecture (BloArk) を報告した。BloArkは、実行時間、計算リソースの要求、WikiRevHistデータセットの処理における繰り返し処理を減らし、効率を重視したデータ処理アーキテクチャである。 BloArkは、ブロック、セグメント、倉庫の3つの部分で構成されている。それに加えて,コアデータ処理パイプライン – builder と modifier も構築しています。 BloArkビルダーは、オリジナルのWikiRevHistデータセットをXML構文からJSON行(JSONL)フォーマットに変換し、並列性とストレージ効率を改善する。 BloArk修飾器は、既存のデータベースの利用を改善し、他人の作業を再利用するコストを削減するために、以前製造された倉庫をインクリメンタルに改造する。最終的にBloArkは、Wikipediaのリビジョン履歴の処理と、下流のNLPユースケースのための既存のデータセットの漸進的な修正の両方で簡単にスケールアップできる。ソースコード、ドキュメンテーション、サンプルの使用例はオンラインで公開されており、GPL-2.0ライセンス下でオープンソース化されている。 Wikipedia (Wiki) is one of the most widely used and publicly available resources for natural language processing (NLP) applications. Wikipedia Revision History (WikiRevHist) shows the order in which edits were made to any Wiki page since its first modification. While the most up-to-date Wiki has been widely used as a training source, WikiRevHist can also be valuable resources for NLP applications. However, there are insufficient tools available to process WikiRevHist without having substantial computing resources, making additional customization, and spending extra time adapting others' works. Therefore, we report Blocks Architecture (BloArk), an efficiency-focused data processing architecture that reduces running time, computing resource requirements, and repeated works in processing WikiRevHist dataset. BloArk consists of three parts in its infrastructure: blocks, segments, and warehouses. On top of that, we build the core data processing pipeline: builder and modifier. The BloArk builder transforms the original WikiRevHist dataset from XML syntax into JSON Lines (JSONL) format for improving the concurrent and storage efficiency. The BloArk modifier takes previously-built warehouses to operate incremental modifications for improving the utilization of existing databases and reducing the cost of reusing others' works. In the end, BloArk can scale up easily in both processing Wikipedia Revision History and incrementally modifying existing dataset for downstream NLP use cases. The source code, documentations, and example usages are publicly available online and open-sourced under GPL-2.0 license.	翻訳日:2024-11-02 08:00:46 公開日:2024-10-06
# DAdEE: 早期PLMにおける教師なしドメイン適応 DAdEE: Unsupervised Domain Adaptation in Early Exit PLMs ( http://arxiv.org/abs/2410.04424v1 ) ライセンス: Link先を確認	Divya Jyoti Bajpai, Manjesh Kumar Hanawal,	(参考訳) 事前訓練された言語モデル(PLM)は、自己スーパービジョンを用いて様々なタスクにわたって高い精度と一般化能力を示すが、その大きなサイズは高い推論遅延をもたらす。 Early Exit(EE)戦略は、中間層に取り付けられた分類器からサンプルを退避させることで問題に対処するが、出口分類器はドメインの変更に敏感であるため、それらをうまく一般化しない。これを解決するために,知識蒸留を用いた多段階適応を用いた非教師付き領域適応型EEフレームワーク(DADEE)を提案する。 DADEEは、各レイヤでのGANベースの逆順応を利用してドメイン不変表現を実現し、ソースとターゲットドメイン間のすべてのレイヤ間のドメインギャップを減らします。取り付けられた出口は推論をスピードアップするだけでなく、破滅的な忘れ込みとモード崩壊を減らすことでドメイン適応を向上させるため、現実世界のシナリオにより適している。感情分析やエンテーメント分類、自然言語推論といったタスクの実験では、DADEEは早期終了法だけでなく、ドメインシフトシナリオ下での様々なドメイン適応法よりも一貫して優れていることが示されている。匿名のソースコードはhttps://github.com/Div290/DAdEEで入手できる。 Pre-trained Language Models (PLMs) exhibit good accuracy and generalization ability across various tasks using self-supervision, but their large size results in high inference latency. Early Exit (EE) strategies handle the issue by allowing the samples to exit from classifiers attached to the intermediary layers, but they do not generalize well, as exit classifiers can be sensitive to domain changes. To address this, we propose Unsupervised Domain Adaptation in EE framework (DADEE) that employs multi-level adaptation using knowledge distillation. DADEE utilizes GAN-based adversarial adaptation at each layer to achieve domain-invariant representations, reducing the domain gap between the source and target domain across all layers. The attached exits not only speed up inference but also enhance domain adaptation by reducing catastrophic forgetting and mode collapse, making it more suitable for real-world scenarios. Experiments on tasks such as sentiment analysis, entailment classification, and natural language inference demonstrate that DADEE consistently outperforms not only early exit methods but also various domain adaptation methods under domain shift scenarios. The anonymized source code is available at https://github.com/Div290/DAdEE.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-06
# CoVLM:半教師付きマルチモーダルフェイクニュース検出のためのビジョンランゲージモデルからの合意の活用 CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection ( http://arxiv.org/abs/2410.04426v1 ) ライセンス: Link先を確認	Devank, Jayateja Kalla, Soma Biswas,	(参考訳) 本研究では,実画像と誤った字幕を組み合わせて偽ニュースを生成する,文脈外誤情報検出の現実的課題に対処する。このタスクの既存のアプローチは、大量のラベル付きデータの可用性を前提としています。対照的に、ラベル付き画像テキストペアの大規模なコーパスの取得はより容易であるため、本研究では、ラベル付き画像テキストペアの限られた数とラベル付き画像テキストペアの大規模なコーパスにアクセス可能な半教師付きプロトコルを提案する。さらに、偽ニュースの発生は実際のニュースよりもはるかに少ないため、データセットは極めて不均衡であり、タスクをさらに難しくする傾向にある。そこで本研究では,ラベル付きデータから得られた閾値を用いて,ラベル付きペアに対してロバストな擬似ラベルを生成する新しいフレームワークであるConsensus from Vision-Language Models (CoVLM)を提案する。このアプローチは、自信のある擬似ラベルを選択するためのモデルの正しいしきい値を自動的に決定できる。課題のある条件に対するベンチマークデータセットの実験結果と、最先端のアプローチとの比較により、我々のフレームワークの有効性が示されている。 In this work, we address the real-world, challenging task of out-of-context misinformation detection, where a real image is paired with an incorrect caption for creating fake news. Existing approaches for this task assume the availability of large amounts of labeled data, which is often impractical in real-world, since it requires extensive manual intervention and domain expertise. In contrast, since obtaining a large corpus of unlabeled image-text pairs is much easier, here, we propose a semi-supervised protocol, where the model has access to a limited number of labeled image-text pairs and a large corpus of unlabeled pairs. Additionally, the occurrence of fake news being much lesser compared to the real ones, the datasets tend to be highly imbalanced, thus making the task even more challenging. Towards this goal, we propose a novel framework, Consensus from Vision-Language Models (CoVLM), which generates robust pseudo-labels for unlabeled pairs using thresholds derived from the labeled data. This approach can automatically determine the right threshold parameters of the model for selecting the confident pseudo-labels. Experimental results on benchmark datasets across challenging conditions and comparisons with state-of-the-art approaches demonstrate the effectiveness of our framework.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-06
# CAPEEN:早期退院と知識蒸留による画像キャプション CAPEEN: Image Captioning with Early Exits and Knowledge Distillation ( http://arxiv.org/abs/2410.04433v1 ) ライセンス: Link先を確認	Divya Jyoti Bajpai, Manjesh Kumar Hanawal,	(参考訳) ディープニューラルネットワーク(DNN)は、視覚要素を認識し、画像キャプションタスクで記述的なテキストを生成することで大きな進歩を遂げている。しかし、その性能改善は計算負荷の増加と推論遅延によるものである。 Early Exit(EE)戦略は効率を高めるために使用できるが、その適応は正確な予測のために様々なレベルの意味情報を必要とするため、画像キャプションにおける課題を示す。そこで我々は,知識蒸留を用いたEE戦略の性能向上のためにCAPEENを導入する。予測信頼度がトレーニングデータから得られた予め定義された値を超えると、CAPEENの推論は中間層で完了する。トレーニングサンプルから目標分布をドリフトできる実世界の展開を考慮し,Multiarmed banditsフレームワークを用いて,フライ時のしきい値に適応する改良型A-CAPEENを提案する。 MS COCOとFlickr30kデータセットの実験では、CAPEENは最終層と比較して競争性能を維持しながら1.77倍のスピードアップを示し、A-CAPEENは歪みに対して堅牢性を提供する。ソースコードはhttps://github.com/Div290/CapEENで入手できる。 Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency. Early Exit (EE) strategies can be used to enhance their efficiency, but their adaptation presents challenges in image captioning as it requires varying levels of semantic information for accurate predictions. To overcome this, we introduce CAPEEN to improve the performance of EE strategies using knowledge distillation. Inference in CAPEEN is completed at intermediary layers if prediction confidence exceeds a predefined value learned from the training data. To account for real-world deployments, where target distributions could drift from that of training samples, we introduce a variant A-CAPEEN to adapt the thresholds on the fly using Multiarmed bandits framework. Experiments on the MS COCO and Flickr30k datasets show that CAPEEN gains speedup of 1.77x while maintaining competitive performance compared to the final layer, and A-CAPEEN additionally offers robustness against distortions. The source code is available at https://github.com/Div290/CapEEN	翻訳日:2024-11-02 07:51:01 公開日:2024-10-06
# UNetの数学的説明 A Mathematical Explanation of UNet ( http://arxiv.org/abs/2410.04434v1 ) ライセンス: Link先を確認	Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li,	(参考訳) UNetアーキテクチャはイメージセグメンテーションを変換した。 UNetの汎用性と精度は広く採用され、画像の機械学習問題に大きく依存している。本稿では,UNetの簡潔な数学的説明を行う。 UNetの各コンポーネントの意味と機能について説明する。 UNetが制御問題を解決していることを示します。マルチグリッド法を用いて制御変数を分解する。次に、演算子分割技術を用いて、そのアーキテクチャがUNetアーキテクチャを正確に回復する問題の解決を行う。この結果から,UNetは制御問題に対する一段階演算子分割アルゴリズムであることがわかった。 The UNet architecture has transformed image segmentation. UNet's versatility and accuracy have driven its widespread adoption, significantly advancing fields reliant on machine learning problems with images. In this work, we give a clear and concise mathematical explanation of UNet. We explain what is the meaning and function of each of the components of UNet. We will show that UNet is solving a control problem. We decompose the control variables using multigrid methods. Then, operator-splitting techniques is used to solve the problem, whose architecture exactly recovers the UNet architecture. Our result shows that UNet is a one-step operator-splitting algorithm for the control problem.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-06
# QKAN:Quantum Kolmogorov-Arnold Networks QKAN: Quantum Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2410.04435v1 ) ライセンス: Link先を確認	Petr Ivashkov, Po-Wei Huang, Kelvin Koor, Lirandë Pira, Patrick Rebentrost,	(参考訳) 量子ハードウェアにおける学習モデルの可能性は、依然としてオープンな疑問である。しかし、量子機械学習の分野は、これらのモデルが量子実装をどのように活用できるかを永続的に探求している。近年、コルモゴロフ・アルノルド表現定理の構成構造に触発されて、コルモゴロフ・アルノルドネットワーク(KAN)と呼ばれる新しいニューラルネットワークアーキテクチャが出現している。本研究ではQKANと呼ばれる量子バージョンを設計する。我々のQKANは、量子特異値変換を含む強力な量子線型代数ツールを利用して、ネットワークの端にパラメータ化活性化関数を適用する。 QKANはブロックエンコーディングに基づいており、本質的に直接量子入力に適している。さらに,その漸近的複雑性を分析し,単一層からエンドツーエンドのニューラルネットワークアーキテクチャへ再帰的に構築する。 QKANのゲート複雑性は、入力と重みのためのブロックエンコーディングを構築するコストと線形にスケールし、高次元入力を持つタスクに広く適用可能であることを示唆している。 QKANは、パラメータ化された量子回路と確立された量子サブルーチンを組み合わせることで、トレーニング可能な量子機械学習モデルとして機能する。最後に,QKANアーキテクチャ構築に基づく多変量状態準備戦略を提案する。 The potential of learning models in quantum hardware remains an open question. Yet, the field of quantum machine learning persistently explores how these models can take advantage of quantum implementations. Recently, a new neural network architecture, called Kolmogorov-Arnold Networks (KAN), has emerged, inspired by the compositional structure of the Kolmogorov-Arnold representation theorem. In this work, we design a quantum version of KAN called QKAN. Our QKAN exploits powerful quantum linear algebra tools, including quantum singular value transformation, to apply parameterized activation functions on the edges of the network. QKAN is based on block-encodings, making it inherently suitable for direct quantum input. Furthermore, we analyze its asymptotic complexity, building recursively from a single layer to an end-to-end neural architecture. The gate complexity of QKAN scales linearly with the cost of constructing block-encodings for input and weights, suggesting broad applicability in tasks with high-dimensional input. QKAN serves as a trainable quantum machine learning model by combining parameterized quantum circuits with established quantum subroutines. Lastly, we propose a multivariate state preparation strategy based on the construction of the QKAN architecture.	翻訳日:2024-11-02 07:51:01 公開日:2024-10-06
# 入力粒度制御とグリフ認識学習による視覚テキスト生成のためのバックボーンモデル Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training ( http://arxiv.org/abs/2410.04439v1 ) ライセンス: Link先を確認	Wenbo Li, Guohao Li, Zhibin Lan, Xue Xu, Wanru Zhuang, Jiachen Liu, Xinyan Xiao, Jinsong Su,	(参考訳) 拡散に基づくテキスト・ツー・イメージモデルでは、多様性と美学において顕著な成果が示されているが、可視的な視覚的テキストで画像を生成するのに苦労している。既存のバックボーンモデルには、ミススペル、テキスト生成の失敗、中国語テキストのサポートの欠如といった制限があるが、その開発は有望な可能性を示している。本稿では,英語と中国語の視覚テキスト生成にバックボーンモデルを活用するための一連の手法を提案する。まず、Byte Pair Encoding(BPE)トークン化とクロスアテンションモジュールの学習不足により、バックボーンモデルの性能が制限されることを明らかにする予備的研究を行った。そこで我々は,(1)より適切なテキスト表現を提供するために,混合粒度入力戦略を設計し,(2)3つのグリフ認識学習損失を伴って従来の訓練目標を強化することを提案する。実験により,本手法は,基本的な画像生成品質を維持しつつ,意味的,美的,正確な視覚的テキスト画像を生成するために,バックボーンモデルを効果的に活用できることが実証された。 Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with legible visual texts. Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text, but their development shows promising potential. In this paper, we propose a series of methods, aiming to empower backbone models to generate visual texts in English and Chinese. We first conduct a preliminary study revealing that Byte Pair Encoding (BPE) tokenization and the insufficient learning of cross-attention modules restrict the performance of the backbone models. Based on these observations, we make the following improvements: (1) We design a mixed granularity input strategy to provide more suitable text representations; (2) We propose to augment the conventional training objective with three glyph-aware training losses, which enhance the learning of cross-attention modules and encourage the model to focus on visual texts. Through experiments, we demonstrate that our methods can effectively empower backbone models to generate semantic relevant, aesthetically appealing, and accurate visual text images, while maintaining their fundamental image generation quality.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# 視覚変換器を用いた金属表面欠陥の自動検出 Automated Detection of Defects on Metal Surfaces using Vision Transformers ( http://arxiv.org/abs/2410.04440v1 ) ライセンス: Link先を確認	Toqa Alaa, Mostafa Kotb, Arwa Zakaria, Mariam Diab, Walid Gomaa,	(参考訳) 金属製造は、しばしば欠陥製品を生産し、運用上の問題を引き起こす。従来の手動検査は時間を要するため、自動的な解決策が必要である。本研究では、深層学習技術を用いて、視覚変換器(ViT)を用いた金属表面欠陥検出モデルを開発した。提案モデルは,特徴抽出のためのViTを用いた欠陥の分類と局所化に焦点を当てている。アーキテクチャは、分類とローカライゼーションの2つのパスに分かれる。このモデルは,局所化過程において,平均角誤差(MSE)と平均絶対誤差(MAE)を極力低く保ちながら,高い分類精度にアプローチする必要がある。実験結果から, 自動欠陥検出, 運転効率の向上, 金属製造における誤差の低減に有効であることが示唆された。 Metal manufacturing often results in the production of defective products, leading to operational challenges. Since traditional manual inspection is time-consuming and resource-intensive, automatic solutions are needed. The study utilizes deep learning techniques to develop a model for detecting metal surface defects using Vision Transformers (ViTs). The proposed model focuses on the classification and localization of defects using a ViT for feature extraction. The architecture branches into two paths: classification and localization. The model must approach high classification accuracy while keeping the Mean Square Error (MSE) and Mean Absolute Error (MAE) as low as possible in the localization process. Experimental results show that it can be utilized in the process of automated defects detection, improve operational efficiency, and reduce errors in metal manufacturing.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# 未知領域の最適化:セファロメトリックランドマーク検出のためのドメインアライメント Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection ( http://arxiv.org/abs/2410.04445v1 ) ライセンス: Link先を確認	Julian Wyatt, Irina Voiculescu,	(参考訳) ケパロメトリランドマーク検出(Cephalometric Landmark Detection)は、脳波計測のための重要な領域を特定するプロセスである。それぞれのランドマークは、臨床医によってラベル付けされた単一のGTポイントである。機械学習モデルは、ヒートマップで表されるランドマークの確率軌跡を予測する。この研究は、2024年のCL-Detection MICCAI Challengeのために、局所的な顔抽出モジュールとX線アーチファクト拡張手順によるドメインアライメント戦略を提案する。この課題は、我々の手法の結果をMREの1.186mm、オンライン検証のリーダーボードの2mm SDRの82.04%でベストと位置づけている。コードはhttps://github.com/Julian-Wyatt/OptimisingfortheUnknownで公開されている。 Cephalometric Landmark Detection is the process of identifying key areas for cephalometry. Each landmark is a single GT point labelled by a clinician. A machine learning model predicts the probability locus of a landmark represented by a heatmap. This work, for the 2024 CL-Detection MICCAI Challenge, proposes a domain alignment strategy with a regional facial extraction module and an X-ray artefact augmentation procedure. The challenge ranks our method's results as the best in MRE of 1.186mm and third in the 2mm SDR of 82.04% on the online validation leaderboard. The code is available at https://github.com/Julian-Wyatt/OptimisingfortheUnknown.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# 注意のシフト: 安全でないコンテンツからAIを操る Attention Shift: Steering AI Away from Unsafe Content ( http://arxiv.org/abs/2410.04447v1 ) ライセンス: Link先を確認	Shivank Garg, Manyana Tiwari,	(参考訳) 本研究は, 最先端の生成モデルにおける安全でない, 有害なコンテンツの生成について検討し, それらの世代を制限する方法に着目した。提案手法は,非安全概念を推論中に追加のトレーニングを伴わずに取り除くことを目的とした,新たなトレーニングフリーアプローチである。我々は,従来のアブレーション法と比較し,質的,定量的な測定値を用いて,直接的および敵対的ジェイルブレイクプロンプトの性能評価を行った。本研究は,観察結果の潜在的な理由を仮説化し,コンテンツ制限の限界と広範な影響について議論する。 This study investigates the generation of unsafe or harmful content in state-of-the-art generative models, focusing on methods for restricting such generations. We introduce a novel training-free approach using attention reweighing to remove unsafe concepts without additional training during inference. We compare our method against existing ablation methods, evaluating the performance on both, direct and adversarial jailbreak prompts, using qualitative and quantitative metrics. We hypothesize potential reasons for the observed results and discuss the limitations and broader implications of content restriction.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# Video Summarization Techniques: A Comprehensive Reviews Video Summarization Techniques: A Comprehensive Review ( http://arxiv.org/abs/2410.04449v1 ) ライセンス: Link先を確認	Toqa Alaa, Ahmad Mongy, Assem Bakr, Mariam Diab, Walid Gomaa,	(参考訳) ソーシャルメディア、教育、エンターテイメント、監視など、さまざまな産業におけるビデオコンテンツの急速な拡大は、ビデオ要約を重要な研究分野にしている。現在の研究は、抽象的戦略と抽出的戦略の両方を強調する、ビデオ要約のための様々なアプローチと手法を探求する調査である。抽出要約のプロセスは、ソースビデオからキーフレームやセグメントを識別し、ショット境界認識やクラスタリングなどの手法を利用する。一方、抽象的な要約は、深層ニューラルネットワークや自然言語処理、強化学習、注意機構、生成的敵ネットワーク、マルチモーダル学習といった機械学習モデルを用いて、ビデオから不可欠なコンテンツを取得することによって、新たなコンテンツを生成する。また、この2つの方法論を取り入れたアプローチや、実世界の実装で遭遇した利用と難しさについても論じる。論文では、これらのテクニックのベンチマークに使われるデータセットについても取り上げている。本稿では,映像要約研究の現状と今後の方向性について,最先端の知識を提供する。 The rapid expansion of video content across a variety of industries, including social media, education, entertainment, and surveillance, has made video summarization an essential field of study. The current work is a survey that explores the various approaches and methods created for video summarizing, emphasizing both abstractive and extractive strategies. The process of extractive summarization involves the identification of key frames or segments from the source video, utilizing methods such as shot boundary recognition, and clustering. On the other hand, abstractive summarization creates new content by getting the essential content from the video, using machine learning models like deep neural networks and natural language processing, reinforcement learning, attention mechanisms, generative adversarial networks, and multi-modal learning. We also include approaches that incorporate the two methodologies, along with discussing the uses and difficulties encountered in real-world implementations. The paper also covers the datasets used to benchmark these techniques. This review attempts to provide a state-of-the-art thorough knowledge of the current state and future directions of video summarization research.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# MindScope: マルチエージェントシステムによる大規模言語モデルにおける認知バイアスの探索 MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems ( http://arxiv.org/abs/2410.04452v1 ) ライセンス: Link先を確認	Zhentao Xie, Jiabao Zhao, Yilei Wang, Jinxin Shi, Yanhong Bai, Xingjiao Wu, Liang He,	(参考訳) 大きな言語モデル(LLM)における認知バイアスを検出することは、これらのモデル内の既存の認知バイアスを調査することを目的とした魅力的なタスクである。言語モデルにおける認知バイアスを検出する現在の方法は、一般的に不完全な検出能力と、検出可能なバイアスの種類が制限された範囲に悩まされている。この問題に対処するため、静的および動的要素を区別して統合する'MindScope'データセットを導入しました。静的成分は、72の認知バイアスカテゴリにまたがる5,170のオープンエンド質問からなる。動的コンポーネントはルールベースのマルチエージェント通信フレームワークを利用して、マルチラウンド対話を生成する。このフレームワークは柔軟性があり、LSMを含む様々な心理的実験に容易に適応できる。さらに,検索・拡張生成(RAG),競争的議論,強化学習に基づく意思決定モジュールを組み込んだ多エージェント検出手法を提案する。有効性を実証し、GPT-4と比較して検出精度を最大35.10%向上させることが示されている。コードと付録はhttps://github.com/2279072142/MindScope.comで入手できる。 Detecting cognitive biases in large language models (LLMs) is a fascinating task that aims to probe the existing cognitive biases within these models. Current methods for detecting cognitive biases in language models generally suffer from incomplete detection capabilities and a restricted range of detectable bias types. To address this issue, we introduced the 'MindScope' dataset, which distinctively integrates static and dynamic elements. The static component comprises 5,170 open-ended questions spanning 72 cognitive bias categories. The dynamic component leverages a rule-based, multi-agent communication framework to facilitate the generation of multi-round dialogues. This framework is flexible and readily adaptable for various psychological experiments involving LLMs. In addition, we introduce a multi-agent detection method applicable to a wide range of detection tasks, which integrates Retrieval-Augmented Generation (RAG), competitive debate, and a reinforcement learning-based decision module. Demonstrating substantial effectiveness, this method has shown to improve detection accuracy by as much as 35.10% compared to GPT-4. Codes and appendix are available at https://github.com/2279072142/MindScope.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# CopyLens: LLM出力に対する著作権付きサブデータセットのコントリビューションを動的にフラグする CopyLens: Dynamically Flagging Copyrighted Sub-Dataset Contributions to LLM Outputs ( http://arxiv.org/abs/2410.04454v1 ) ライセンス: Link先を確認	Qichao Ma, Rui-Jie Zhu, Peiye Liu, Renye Yan, Fahong Zhang, Ling Liang, Meng Li, Zhaofei Yu, Zongwei Wang, Yimao Cai, Tiejun Huang,	(参考訳) 大きな言語モデル(LLM)は、その知識の吸収とテキスト生成能力によって普及している。同時に、データセットの事前トレーニングに関する著作権問題も、特に生成に特定のスタイルが含まれている場合、深刻な問題となっている。それまでの方法は、同一の著作権のある出力の防衛に焦点を当てたり、計算負荷のある個々のトークンによる解釈可能性を見出したりしていた。しかし、それらのギャップは存在し、データセットのコントリビューションがLLM出力にどのように影響するかの直接的な評価が欠けている。モデルプロバイダがデータ保持者の著作権保護を保証すると、より成熟したLCMコミュニティが確立される。これらの制限に対処するために、著作権付きデータセットがLLM応答にどのように影響するかを分析するための新しいフレームワークであるCopyLensを紹介します。まず、埋め込み空間における事前学習データのユニーク性に基づいて、トークン表現は著作権のあるテキストに対して最初に融合され、続いて軽量のLSTMベースのネットワークでデータセットのコントリビューションを分析する。このような先行して、対照的な学習に基づく非コピーライトOOD検出器が設計されている。我々のフレームワークは動的に異なる状況に直面することができ、現在の著作権検出方法のギャップを埋めることができます。実験の結果、CopyLensは提案したベースラインよりも効率と精度を15.2%向上し、エンジニアリング手法より58.7%、OOD検出ベースラインより0.21AUC向上した。 Large Language Models (LLMs) have become pervasive due to their knowledge absorption and text-generation capabilities. Concurrently, the copyright issue for pretraining datasets has been a pressing concern, particularly when generation includes specific styles. Previous methods either focus on the defense of identical copyrighted outputs or find interpretability by individual tokens with computational burdens. However, the gap between them exists, where direct assessments of how dataset contributions impact LLM outputs are missing. Once the model providers ensure copyright protection for data holders, a more mature LLM community can be established. To address these limitations, we introduce CopyLens, a new framework to analyze how copyrighted datasets may influence LLM responses. Specifically, a two-stage approach is employed: First, based on the uniqueness of pretraining data in the embedding space, token representations are initially fused for potential copyrighted texts, followed by a lightweight LSTM-based network to analyze dataset contributions. With such a prior, a contrastive-learning-based non-copyright OOD detector is designed. Our framework can dynamically face different situations and bridge the gap between current copyright detection methods. Experiments show that CopyLens improves efficiency and accuracy by 15.2% over our proposed baseline, 58.7% over prompt engineering methods, and 0.21 AUC over OOD detection baselines.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# SWEb:スカンジナビア語のための大規模なWebデータセット SWEb: A Large Web Dataset for the Scandinavian Languages ( http://arxiv.org/abs/2410.04456v1 ) ライセンス: Link先を確認	Tobias Norlund, Tim Isbister, Amaru Cuba Gyllensten, Paul Dos Santos, Danila Petrelli, Ariel Ekgren, Magnus Sahlgren,	(参考訳) 本稿では,スカンジナビア語における最大規模の事前学習データセットであるスカンジナビア語WEb(SWEb)について述べる。本論文では,収集と処理のパイプラインを詳述し,ルールベースのアプローチと比較して,複雑性を著しく低減する新しいモデルベースのテキスト抽出器を提案する。また、スウェーデンの言語モデルを評価するための新しいクローゼスタイルのベンチマークを導入し、このテストを用いて、SWEbデータでトレーニングされたモデルとFinalWebでトレーニングされたモデルを比較し、競合する結果と比較した。すべてのデータ、モデル、コードはオープンに共有されます。 This paper presents the hitherto largest pretraining dataset for the Scandinavian languages: the Scandinavian WEb (SWEb), comprising over one trillion tokens. The paper details the collection and processing pipeline, and introduces a novel model-based text extractor that significantly reduces complexity in comparison with rule-based approaches. We also introduce a new cloze-style benchmark for evaluating language models in Swedish, and use this test to compare models trained on the SWEb data to models trained on FineWeb, with competitive results. All data, models and code are shared openly.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# 重力適応ゾーンキャリブレーションのためのアテンションベースアルゴリズム An Attention-Based Algorithm for Gravity Adaptation Zone Calibration ( http://arxiv.org/abs/2410.04457v1 ) ライセンス: Link先を確認	Chen Yu,	(参考訳) 重力適応ゾーンの正確な校正は、水中航法、地球物理探査、海洋工学などの分野において非常に重要である。これらの領域における重力場データの適用が拡大するにつれ、重力場の複雑な特性を捉え、多次元データ間の複雑な相互関係に対処するために、単一特徴に基づく従来の校正手法が不十分になりつつある。本稿では,重力適応領域キャリブレーションのためのアテンション強化アルゴリズムを提案する。注意機構を導入することにより,多次元重力場の特徴を適応的に融合させ,特徴量を動的に割り当てることにより,従来の特徴選択法に固有の多重線型性や冗長性の問題を効果的に解決し,キャリブレーション精度とロバスト性を大幅に向上させ,さらに1万以上のサンプリングポイントを持つ大規模重力場データセットを構築し,データ空間の解像度を高めるためにクリリング補間を行い,モデルトレーニングと評価のための信頼性の高いデータ基盤を提供する。従来の機械学習モデル(SVM, GBDT, RFなど)の定性的および定量的な実験を行った結果,提案アルゴリズムはこれらのモデル間で性能を著しく改善し,従来の特徴選択法よりも優れていることが示された。本稿では,重力適応領域キャリブレーションのための新しい解法を提案する。コードは \href{this link} {https://github.com/hulnifox/RF-ATTN} で公開されている。 Accurate calibration of gravity adaptation zones is of great significance in fields such as underwater navigation, geophysical exploration, and marine engineering. With the increasing application of gravity field data in these areas, traditional calibration methods based on single features are becoming inadequate for capturing the complex characteristics of gravity fields and addressing the intricate interrelationships among multidimensional data. This paper proposes an attention-enhanced algorithm for gravity adaptation zone calibration. By introducing an attention mechanism, the algorithm adaptively fuses multidimensional gravity field features and dynamically assigns feature weights, effectively solving the problems of multicollinearity and redundancy inherent in traditional feature selection methods, significantly improving calibration accuracy and robustness.In addition, a large-scale gravity field dataset with over 10,000 sampling points was constructed, and Kriging interpolation was used to enhance the spatial resolution of the data, providing a reliable data foundation for model training and evaluation. We conducted both qualitative and quantitative experiments on several classical machine learning models (such as SVM, GBDT, and RF), and the results demonstrate that the proposed algorithm significantly improves performance across these models, outperforming other traditional feature selection methods. The method proposed in this paper provides a new solution for gravity adaptation zone calibration, showing strong generalization ability and potential for application in complex environments. The code is available at \href{this link} {https://github.com/hulnifox/RF-ATTN}.	翻訳日:2024-11-02 07:25:54 公開日:2024-10-06
# U-netによる脳脊髄液分布の予測と心室逆流グレーディング U-net based prediction of cerebrospinal fluid distribution and ventricular reflux grading ( http://arxiv.org/abs/2410.04460v1 ) ライセンス: Link先を確認	Melanie Rieff, Fabian Holzberger, Oksana Lapina, Geir Ringstad, Lars Magnus Valnes, Bogna Warsza, Kent-Andre Mardal, Per Kristian Eide, Barbara Wohlmuth,	(参考訳) これまでの研究では、脳脊髄液(CSF)が脳の廃棄物浄化過程において重要な役割を担い、変化した流れパターンが中枢神経系の様々な疾患と関連していることが示されている。本研究では,ガドリニウム系CSF造影剤(tracer)の脳内分布を予測するための深層学習の可能性について検討した。このため,T1強調MRI(MRI)スキャンを経皮的投与前後に複数回施行した。本稿では,24時間後にピーク時の画素単位の信号増加を予測するために,U-netを用いた教師付き学習モデルを提案する。その性能は、トレーニング中に提供される異なるトレーサ分布ステージに基づいて評価される。以上の結果から, 初回2時間後の画像データから, トレーサーフローの予測値が, 追加の後期スキャンに匹敵するものであることが示唆された。さらに, 神経放射線医が提供した心室逆流グレーディングと, 医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用車用リフレックスグレーディングを比較検討した結果, 良好な一致が得られた。 CSFフロー予測のための深層学習法の可能性を示し,臨床解析にMRIスキャンが有用であり,臨床効率,患者の幸福感,医療費の低減に寄与する可能性が示唆された。 Previous work shows evidence that cerebrospinal fluid (CSF) plays a crucial role in brain waste clearance processes, and that altered flow patterns are associated with various diseases of the central nervous system. In this study, we investigate the potential of deep learning to predict the distribution in human brain of a gadolinium-based CSF contrast agent (tracer) administered intrathecal. For this, T1-weighted magnetic resonance imaging (MRI) scans taken at multiple time points before and after intrathecal injection were utilized. We propose a U-net-based supervised learning model to predict pixel-wise signal increases at their peak after 24 hours. Its performance is evaluated based on different tracer distribution stages provided during training, including predictions from baseline scans taken before injection. Our findings indicate that using imaging data from just the first two hours post-injection for training yields tracer flow predictions comparable to those trained with additional later-stage scans. The model was further validated by comparing ventricular reflux gradings provided by neuroradiologists, and inter-rater grading among medical experts and the model showed excellent agreement. Our results demonstrate the potential of deep learning-based methods for CSF flow prediction, suggesting that fewer MRI scans could be sufficient for clinical analysis, which might significantly improve clinical efficiency, patient well-being, and lower healthcare costs.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 生物配列設計における外部強化学習の改善 Improved Off-policy Reinforcement Learning in Biological Sequence Design ( http://arxiv.org/abs/2410.04461v1 ) ライセンス: Link先を確認	Hyeonah Kim, Minsu Kim, Taeyoung Yun, Sanghyeok Choi, Emmanuel Bengio, Alex Hernández-García, Jinkyoo Park,	(参考訳) 生物配列を望ましい性質で設計することは、組合せ的に広大な探索空間と、それぞれの候補配列を評価するコストが高いため、大きな課題である。これらの課題に対処するため、GFlowNetsのような強化学習(RL)手法では、プロキシモデルを用いて迅速な報酬評価を行い、アノテートされたデータをポリシートレーニングに利用する。これらの手法は、多種多様な新しいシーケンスを生成する上で有望であるが、膨大な検索空間に対する限られたトレーニングデータはしばしば、配布外入力のプロキシの誤特定につながる。我々は,GFlowNetsを訓練し,プロキシの誤特定に対するロバスト性を改善するための,新しいオフライン検索手法である$\delta$-Conservative Searchを紹介した。キーとなる考え方は、パラメータ$\delta$によって制御される保守性を組み込んで、検索を信頼できるリージョンに制限することである。具体的には、パラメータ$\delta$のベルヌーイ分布でランダムにトークンをマスキングし、GFlowNetポリシを使用してマスキングトークンをデノイズすることで、高スコアのオフラインシーケンスにノイズを注入する。さらに$\delta$は、各データポイントに対するプロキシモデルの不確実性に基づいて適応的に調整される。これにより、プロキシの不確実性の反映が保守性のレベルを決定することができる。実験結果から,DNA,RNA,タンパク質,ペプチドなど多種多様なタスクにまたがるハイスコア配列の発見において,既存の機械学習手法よりも一貫して優れており,特に大規模シナリオにおいてその性能が向上することが示唆された。 Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce $\delta$-Conservative Search, a novel off-policy search method for training GFlowNets designed to improve robustness against proxy misspecification. The key idea is to incorporate conservativeness, controlled by parameter $\delta$, to constrain the search to reliable regions. Specifically, we inject noise into high-score offline sequences by randomly masking tokens with a Bernoulli distribution of parameter $\delta$ and then denoise masked tokens using the GFlowNet policy. Additionally, $\delta$ is adaptively adjusted based on the uncertainty of the proxy model for each data point. This enables the reflection of proxy uncertainty to determine the level of conservativeness. Experimental results demonstrate that our method consistently outperforms existing machine learning methods in discovering high-score sequences across diverse tasks-including DNA, RNA, protein, and peptide design-especially in large-scale scenarios.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# テンソルトレイン点雲圧縮と効率の良い近似近傍探索 Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search ( http://arxiv.org/abs/2410.04462v1 ) ライセンス: Link先を確認	Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets,	(参考訳) 大規模ベクトルデータベースにおける最寄りの探索は、さまざまな機械学習アプリケーションに不可欠である。本稿では, 点群を効率的に表現し, 近接探索を高速に行うために, テンソルトレイン(TT)低ランクテンソル分解を用いた新しい手法を提案する。 Sliced Wassersteinのような密度推定損失を利用してTT分解を訓練し、ロバストポイントクラウド圧縮を実現する確率論的解釈を提案する。 TT点雲内の固有階層構造を明らかにすることにより, 近接探索を効率的に行うことができる。本稿では,方法論に関する詳細な知見を提供し,既存の手法と包括的な比較を行う。本稿では, オフ・オブ・ディストリビューション (OOD) 検出問題や, ANN (Nest-Nighbor) 探索タスクなど, 様々なシナリオで有効性を示す。 Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train TT decompositions, resulting in robust point cloud compression. We reveal an inherent hierarchical structure within TT point clouds, facilitating efficient approximate nearest-neighbor searches. In our paper, we provide detailed insights into the methodology and conduct comprehensive comparisons with existing methods. We demonstrate its effectiveness in various scenarios, including out-of-distribution (OOD) detection problems and approximate nearest-neighbor (ANN) search tasks.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# Wrong-of-Thought:マルチパースペクティブ検証と誤り情報の統合型推論フレームワーク Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information ( http://arxiv.org/abs/2410.04463v1 ) ライセンス: Link先を確認	Yongheng Zhang, Qiguang Chen, Jingxuan Zhou, Peng Wang, Jiasheng Si, Jin Wang, Wenpeng Lu, Libo Qin,	(参考訳) CoT(Chain-of-Thought)はLarge Language Models(LLM)の性能向上に欠かせない技術となり、研究者の注目を集めている。 1つのアプローチストリームは、所望の品質の推論出力を継続的に検証し、精査することにより、LCMの反復的な拡張に焦点を当てている。その印象的な結果にもかかわらず、このパラダイムは2つの重要な問題に直面している。(1) 単純な検証方法: 現在のパラダイムは単一の検証方法にのみ依存している。 2) 誤った情報無視: 従来のパラダイムは,論理パスを毎回スクラッチから洗練させ,推論中に誤った情報を直接無視する。これらの課題に対処するため,(1)マルチパースペクティブ検証(Multi-Perspective Verification):推論プロセスと結果の精度向上のためのマルチパースペクティブ検証(Multi-Perspective Verification):(2)誤情報利用(Wrong Information utilization):誤った情報を利用してLCMを警告し,同じミスを犯す可能性を低減する。 8つの一般的なデータセットと5つのLLMの実験は、WoTが以前のベースラインをすべて越えていることを示している。さらに、WoTは難しい計算タスクにおいて強力な能力を示す。 Chain-of-Thought (CoT) has become a vital technique for enhancing the performance of Large Language Models (LLMs), attracting increasing attention from researchers. One stream of approaches focuses on the iterative enhancement of LLMs by continuously verifying and refining their reasoning outputs for desired quality. Despite its impressive results, this paradigm faces two critical issues: (1) Simple verification methods: The current paradigm relies solely on a single verification method. (2) Wrong Information Ignorance: Traditional paradigms directly ignore wrong information during reasoning and refine the logic paths from scratch each time. To address these challenges, we propose Wrong-of-Thought (WoT), which includes two core modules: (1) Multi-Perspective Verification: A multi-perspective verification method for accurately refining the reasoning process and result, and (2) Wrong Information Utilization: Utilizing wrong information to alert LLMs and reduce the probability of LLMs making same mistakes. Experiments on 8 popular datasets and 5 LLMs demonstrate that WoT surpasses all previous baselines. In addition, WoT exhibits powerful capabilities in difficult computation tasks.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 大規模言語モデルにおける文脈内学習推論回路の再検討 Revisiting In-context Learning Inference Circuit in Large Language Models ( http://arxiv.org/abs/2410.04468v1 ) ライセンス: Link先を確認	Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue,	(参考訳) In-context Learning (ICL) は、言語モデル (LM) 上で、内部メカニズムを探索せずに学習する、新たな数発学習パラダイムである。 ICLの内部処理を記述する研究はすでに存在するが、大きな言語モデルにおけるすべての推論現象を捉えるのに苦労している。そこで本研究では、推論力学をモデル化し、ICLの観測現象を説明するための包括的な回路を提案する。 1) 要約: LMはすべての入力テキスト(デモとクエリ)を、ICLタスクを解くのに十分な情報を持つ隠された状態の線形表現にエンコードする。 2)Semantics Merge: LMは、デモのエンコードされた表現と対応するラベルトークンをマージして、ラベルとデモの合同表現を生成する。 (3)Feature Retrieval and Copy: LMはタスクサブスペース上のクエリ表現に似た共同表現を検索し、検索した表現をクエリにコピーする。次に、言語モデルヘッドは、これらのコピーされたラベル表現をある程度キャプチャし、予測されたラベルにデコードする。提案した推論回路は、ICLプロセス中に観測された多くの現象を捕捉し、ICL推論プロセスの包括的で実用的な説明となる。さらに,提案ステップの無効化によるアブレーション解析はICLの性能を著しく損なうものであり,提案回路が支配機構であることを示唆している。さらに,提案回路と並行してICLタスクを解くバイパス機構を確認し,リストアップする。 In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Summarize: LMs encode every input text (demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. The proposed inference circuit successfully captured many phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 非エルミート準周期格子における創発的マトリオシュカ人形様点ギャップ Emergent Matryoshka doll-like point gap in a non-Hermitian quasiperiodic lattice ( http://arxiv.org/abs/2410.04469v1 ) ライセンス: Link先を確認	Yi-Qi Zheng, Shan-Zhong Li, Zhi Li,	(参考訳) 幾何級数変調された非エルミート準周期格子モデルを提案し、その局所化と位相的性質について検討する。その結果, 幾何級数の累積項の増加に伴い, 高い巻数を持つ複数のモビリティエッジと非エルミート点ギャップがシステム内で引き起こされることが示唆された。系の点ギャップスペクトルは、複雑な平面にマトリオシカ人形のような構造を持ち、高い巻数となる。さらに、無限項の和の極限ケースを分析する。その結果, 総和項が極限にプッシュされると, モビリティエッジは1つのモビリティエッジとして結合することがわかった。一方、対応する点ギャップは、巻数1に等しいリングにマージされる。アビラの大域的理論を通じて、無限和の極限におけるモビリティエッジの解析的表現を与え、モビリティエッジとポイントギャップがマージされ、実際に1と等しい巻数となることを再確認する。 We propose a geometric series modulated non-Hermitian quasiperiodic lattice model, and explore its localization and topological properties. The results show that with the ever-increasing summation terms of the geometric series, multiple mobility edges and non-Hermitian point gaps with high winding number can be induced in the system. The point gap spectrum of the system has a Matryoshka doll-like structure in the complex plane, resulting in a high winding number. In addition, we analyze the limit case of summation of infinite terms. The results show that the mobility edges merge together as only one mobility edge when summation terms are pushed to the limit. Meanwhile, the corresponding point gaps are merged into a ring with winding number equal to one. Through Avila's global theory, we give an analytical expression for mobility edges in the limit of infinite summation, reconfirming that mobility edges and point gaps do merge and will result in a winding number that is indeed equal to one.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 音声要約表現を用いた構成可能な多言語ASR Configurable Multilingual ASR with Speech Summary Representations ( http://arxiv.org/abs/2410.04478v1 ) ライセンス: Link先を確認	Harrison Zhu, Ivan Fung, Yingke Zhu, Lahiru Samarakoon,	(参考訳) 世界の人口の約半数は多言語であり、多言語 ASR (MASR) が不可欠である。複数のモノリンガルモデルをデプロイすることは、前もって基幹言語が不明な場合に困難である。これは、特定の言語を認識するために手動または自動でプロンプトできる、構成可能な多言語MASRモデルの研究努力を動機付けている。本稿では,構成性の向上を目的とした新しいアーキテクチャであるSession Vector (csvMASR) を用いた Configurable MASR モデルを提案する。提案手法では,音声ダイアリゼーションにおける対話的要約表現にインスパイアされた音声要約ベクトル表現を導入し,発話レベルにおける言語固有のコンポーネントからの出力を組み合わせる。また、コンフィグアビリティを高めるために補助的な言語分類損失も組み込んだ。 MLS(Multilingual Librispeech)データセットの7言語のデータを用いて、csvMASRは既存のMASRモデルより優れており、ベースラインと比較すると単語エラー率(WER)が10.33\%から9.95\%に低下する。さらに、csvMASRは言語分類とタスクのプロンプトにおいて優れたパフォーマンスを示している。 Approximately half of the world's population is multilingual, making multilingual ASR (MASR) essential. Deploying multiple monolingual models is challenging when the ground-truth language is unknown in advance. This motivates research efforts on configurable multilingual MASR models that can be prompted manually or adapted automatically to recognise specific languages. In this paper, we present the Configurable MASR model with Summary Vector (csvMASR), a novel architecture designed to enhance configurability. Our approach leverages adapters and introduces speech summary vector representations, inspired by conversational summary representations in speech diarization, to combine outputs from language-specific components at the utterance level. We also incorporate an auxiliary language classification loss to enhance configurability. Using data from 7 languages in the Multilingual Librispeech (MLS) dataset, csvMASR outperforms existing MASR models and reduces the word error rate (WER) from 10.33\% to 9.95\% when compared with the baseline. Additionally, csvMASR demonstrates superior performance in language classification and prompting tasks.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# SITCOM: 逆問題に対するステップワイドトリプル一貫性拡散サンプリング SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems ( http://arxiv.org/abs/2410.04479v1 ) ライセンス: Link先を確認	Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang,	(参考訳) 拡散モデル(英: Diffusion Model、DM)は、トレーニングセット上で学習した分布からサンプリングできる生成モデルのクラスである。逆画像問題 (IPs) の解法に適用する場合、DMの逆サンプリングステップは、通常、画像空間における測定条件分布からおよそサンプルに修正される。しかしながら、これらの修正は特定の設定(測定ノイズの存在など)や非線形タスクには適さないかもしれない。これらの課題に対処するために、測定一貫性の拡散軌道を達成するための3つの条件を述べる。これらの条件に基づいて,従来の研究のように標準データ多様体測定の一貫性と前方拡散の一貫性を強制するだけでなく,各サンプリングステップにおける事前学習モデルの入力を最適化することにより拡散軌道を維持する後方拡散の整合性も備えた,新しい最適化に基づくサンプリング手法を提案する。これらの条件を暗黙的または明示的に強制することで、サンプルははるかに少ない逆ステップを必要とします。そこで我々はSITCOM(Step-wise Triple-Consistent Sampling)と呼ぶ。従来の最先端のベースライン法と比較して,5つの線形および3つの非線形画像復元タスクにわたる広範囲な実験により,SITCOMが標準画像類似度測定の点で競争力や優れた結果を得ると同時に,検討対象のすべてのタスクに対して実行時間を大幅に短縮することを示した。 Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such as in the presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier sampling steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces the standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions, either implicitly or explicitly, our sampler requires significantly fewer reverse steps. Therefore, we refer to our accelerated method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to existing state-of-the-art baseline methods, under different levels of measurement noise, our extensive experiments across five linear and three non-linear image restoration tasks demonstrate that SITCOM achieves competitive or superior results in terms of standard image similarity metrics while requiring a significantly reduced run-time across all considered tasks.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# ニューロシンボリックプログラム合成とタスク生成による抽象推論問題の解法 Learning to Solve Abstract Reasoning Problems with Neurosymbolic Program Synthesis and Task Generation ( http://arxiv.org/abs/2410.04480v1 ) ライセンス: Link先を確認	Jakub Bednarek, Krzysztof Krawiec,	(参考訳) 抽象的思考とアナロジーによる推論は、新しい条件に迅速に適応し、それらを分解して新たに遭遇した問題に対処し、包括的に問題を解決するために知識を合成するために必要なものである。本稿では,ニューラルプログラム合成に基づく抽象問題の解法であるTransCoderについて述べる。 TransCoderの中核は、機能エンジニアリングと抽象推論を容易にするために設計された、型付きドメイン固有言語である。トレーニングでは、タスクの解決に失敗したプログラムを使用して、新しいタスクを生成し、それらを合成データセットにまとめます。この方法で生成された各合成タスクは、既知の関連するプログラム(解法)を持ち、モデルが教師付きモードでトレーニングされる。ソリューションは透過的なプログラム形式で表現され、検査と検証が可能である。本稿では, TransCoder のパフォーマンスを Abstract Reasoning Corpus データセットを用いて実証する。 The ability to think abstractly and reason by analogy is a prerequisite to rapidly adapt to new conditions, tackle newly encountered problems by decomposing them, and synthesize knowledge to solve problems comprehensively. We present TransCoder, a method for solving abstract problems based on neural program synthesis, and conduct a comprehensive analysis of decisions made by the generative module of the proposed architecture. At the core of TransCoder is a typed domain-specific language, designed to facilitate feature engineering and abstract reasoning. In training, we use the programs that failed to solve tasks to generate new tasks and gather them in a synthetic dataset. As each synthetic task created in this way has a known associated program (solution), the model is trained on them in supervised mode. Solutions are represented in a transparent programmatic form, which can be inspected and verified. We demonstrate TransCoder's performance using the Abstract Reasoning Corpus dataset, for which our framework generates tens of thousands of synthetic problems with corresponding solutions and facilitates systematic progress in learning.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 眼球運動による読解理解の微粒化予測 Fine-Grained Prediction of Reading Comprehension from Eye Movements ( http://arxiv.org/abs/2410.04484v1 ) ライセンス: Link先を確認	Omer Shubi, Yoav Meiri, Cfir Avraham Hadar, Yevgeni Berzak,	(参考訳) 目の動きから人間の読みの理解を評価することは可能か? 本研究は,読解の行動分析を目的としたテキスト素材上での大規模視線追跡データを用いて,この長年にわたる課題に対処する。本研究は, 視線運動からの読み理解を, 通路上の1つの質問のレベルで予測する, きめ細かな, ほとんど適応していないタスクに焦点を当てる。 3つの新しいマルチモーダル言語モデルと,文献から得られた先行モデルのバッテリを用いて,この課題に取り組む。本研究では,新しいテキスト項目,新しい参加者,および両者の組み合わせを,通常の読解と情報検索という2つの異なる読解方式で一般化する能力を評価する。評価の結果,目の動きは,視力の把握に有用な信号を含んでいることが示唆された。コードとデータは公開されます。 Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data over textual materials that are geared towards behavioral analyses of reading comprehension. We focus on a fine-grained and largely unaddressed task of predicting reading comprehension from eye movements at the level of a single question over a passage. We tackle this task using three new multimodal language models, as well as a battery of prior models from the literature. We evaluate the models' ability to generalize to new textual items, new participants, and the combination of both, in two different reading regimes, ordinary reading and information seeking. The evaluations suggest that although the task is highly challenging, eye movements contain useful signals for fine-grained prediction of reading comprehension. Code and data will be made publicly available.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# SWEベンチにおける会話型テストスイートによるプログラム修復の可能性を探る Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench ( http://arxiv.org/abs/2410.04485v1 ) ライセンス: Link先を確認	Anton Cheshkov, Pavel Zadorozhny, Rodion Levichev, Evgeny Maslov, Ronaldo Franco Jaldin,	(参考訳) プロジェクトレベルでのプログラムの自動修復は、人間の活動の様々な分野では見られていない。 SWE-Benchチャレンジが提示されて以来、多くのソリューションが見られた。パッチ生成はプログラム修復の一部であり,テストスイートに基づく会話型パッチ生成の有効性が証明されている。しかし、SWE-Benchでは、会話パッチ生成の可能性はまだ具体的には評価されていない。本研究では,SWE-Bench問題に対する会話パッチ生成の有効性を評価するための実験結果について報告する。実験によると、LLaMA 3.1 70Bに基づく単純な会話パイプラインは47\%のケースで有効なパッチを生成することができ、これはSWE-Benchのプログラム修復の最先端に匹敵する。 Automatic program repair at project level may open yet to be seen opportunities in various fields of human activity. Since the SWE-Bench challenge was presented, we have seen numerous of solutions. Patch generation is a part of program repair, and test suite-based conversational patch generation has proven its effectiveness. However, the potential of conversational patch generation has not yet specifically estimated on SWE-Bench. This study reports experimental results aimed at evaluating the individual effectiveness of conversational patch generation on problems from SWE-Bench. The experiments show that a simple conversational pipeline based on LLaMA 3.1 70B can generate valid patches in 47\% of cases, which is comparable to the state-of-the-art in program repair on SWE-Bench.	翻訳日:2024-11-02 07:16:09 公開日:2024-10-06
# 知識グラフ補完のためのプラガブル・コモンセンス強化フレームワーク A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion ( http://arxiv.org/abs/2410.04488v1 ) ライセンス: Link先を確認	Guanglin Niu, Bo Li, Siling Feng,	(参考訳) 知識グラフ補完(KGC)タスクは、知識集約的な多くのアプリケーションのための知識グラフ(KG)において、行方不明な事実を推測することを目的としている。しかし、既存の埋め込みベースのKGCアプローチは、主に事実のトリプルに依存しており、一般的な感覚と矛盾する結果をもたらす可能性がある。さらに、明示的な共通感覚を生成することは、しばしばKGにとって実用的または費用がかかる。これらの課題に対処するため、我々は、KGCの事実と常識の両方を組み込んだプラグイン可能な共通感覚強化KGCフレームワークを提案する。このフレームワークは、実体概念の豊かさに基づいて異なるKGに適応し、実三重項から明示的または暗黙的な常識を自動的に生成する能力を有する。さらに、一般的な感覚誘導型負サンプリングと、リッチな実体概念を持つKGに対する粗大な推論手法を導入する。概念を持たないKGに対して、関係認識型概念埋め込み機構を含む二重スコアリング方式を提案する。重要なことは、我々のアプローチは、多くの知識グラフ埋め込み(KGE)モデルのためのプラグイン可能なモジュールとして統合することができ、共同で常識とファクトドリブンなトレーニングと推論を容易にすることである。実験により、我々のフレームワークは優れたスケーラビリティを示し、様々なKGCタスクで既存のモデルより優れています。 Knowledge graph completion (KGC) tasks aim to infer missing facts in a knowledge graph (KG) for many knowledge-intensive applications. However, existing embedding-based KGC approaches primarily rely on factual triples, potentially leading to outcomes inconsistent with common sense. Besides, generating explicit common sense is often impractical or costly for a KG. To address these challenges, we propose a pluggable common sense-enhanced KGC framework that incorporates both fact and common sense for KGC. This framework is adaptable to different KGs based on their entity concept richness and has the capability to automatically generate explicit or implicit common sense from factual triples. Furthermore, we introduce common sense-guided negative sampling and a coarse-to-fine inference approach for KGs with rich entity concepts. For KGs without concepts, we propose a dual scoring scheme involving a relation-aware concept embedding mechanism. Importantly, our approach can be integrated as a pluggable module for many knowledge graph embedding (KGE) models, facilitating joint common sense and fact-driven training and inference. The experiments illustrate that our framework exhibits good scalability and outperforms existing models across various KGC tasks.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# リニアセパビリティの端におけるグラッキング Grokking at the Edge of Linear Separability ( http://arxiv.org/abs/2410.04489v1 ) ライセンス: Link先を確認	Alon Beck, Noam Levi, Yohai Bar-Sinai,	(参考訳) 単純化された環境での二項ロジスティック分類の一般化特性について検討し、「記憶」と「一般化」の解は常に厳密に定義でき、その力学においてグロキングの基盤となるメカニズムを経験的かつ解析的に解明する。定常ラベルを持つランダムな特徴モデル上でのロジスティック分類の漸近的長期ダイナミクスを解析し、遅延一般化と非単調なテスト損失の意味でグロキングを示すことを示す。線形分離性の頂点にあるトレーニングセットに分類を適用すると、Grokkingが増幅されることが分かる。完全一般化解は常に存在するが、ロジシック損失の暗黙の偏りが、トレーニングデータが原点から線形に分離可能であれば、モデルが過度に適合することを証明する。原点から分離できない訓練セットに対しては、モデルはいつでも完全に漸近的に一般化するが、過度な適合は訓練の初期段階で起こりうる。重要なことは、遷移の近傍、すなわち、原点からほぼ分離可能な訓練セットに対して、モデルは一般化する前に任意の時間に過度に適合する。モデル全体の重要な特徴を定量的に捉えた,牽引可能な1次元玩具モデルを調べることで,さらに洞察を得ることができる。最後に,本研究の共通点を最近の文献で強調し,グラッキングは一般に補間しきい値に近づき,物理系でしばしば見られる臨界現象を連想させることが示唆された。 We study the generalization properties of binary logistic classification in a simplified setting, for which a "memorizing" and "generalizing" solution can always be strictly defined, and elucidate empirically and analytically the mechanism underlying Grokking in its dynamics. We analyze the asymptotic long-time dynamics of logistic classification on a random feature model with a constant label and show that it exhibits Grokking, in the sense of delayed generalization and non-monotonic test loss. We find that Grokking is amplified when classification is applied to training sets which are on the verge of linear separability. Even though a perfect generalizing solution always exists, we prove the implicit bias of the logisitc loss will cause the model to overfit if the training data is linearly separable from the origin. For training sets that are not separable from the origin, the model will always generalize perfectly asymptotically, but overfitting may occur at early stages of training. Importantly, in the vicinity of the transition, that is, for training sets that are almost separable from the origin, the model may overfit for arbitrarily long times before generalizing. We gain more insights by examining a tractable one-dimensional toy model that quantitatively captures the key features of the full model. Finally, we highlight intriguing common properties of our findings with recent literature, suggesting that grokking generally occurs in proximity to the interpolation threshold, reminiscent of critical phenomena often observed in physical systems.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# ジャグリング顔モデルにおけるAI/MLサプライチェーンアタックの大規模エクスプロイト計測 A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models ( http://arxiv.org/abs/2410.04490v1 ) ライセンス: Link先を確認	Beatrice Casey, Joanna C. S. Santos, Mehdi Mirakhorli,	(参考訳) 機械学習(ML)技術の開発は、開発者が独自のモデルを開発し、デプロイする十分な機会をもたらしました。 Hugging Faceはオープンソースプラットフォームとして機能し、開発者はML開発をより協力的にするために、他のモデルを共有し、ダウンロードすることができる。モデルを共有するためには、まずシリアライズする必要がある。 Pythonのシリアライゼーションメソッドは、オブジェクトインジェクションに弱いため、安全ではないと考えられている。本稿では、Hugging Faceにおけるこれらの安全でないシリアライズ手法の広範性について検討し、その利用方法を通じて、安全でないシリアライズ手法を用いたモデルを活用、共有し、ML開発者のための安全でない環境を作成することを実証する。安全でないシリアライズ手法を用いて,Hugging Faceがリポジトリやファイルにフラグを付けることができるかを調査し,悪意のあるモデルを検出する手法を開発した。以上の結果から,Hugging Faceにはさまざまな脆弱性のあるモデルが存在することが示唆された。 The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# マルチモーダル感性分析のための知識誘導動的モダリティ注意融合フレームワーク Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis ( http://arxiv.org/abs/2410.04491v1 ) ライセンス: Link先を確認	Xinyu Feng, Yuming Lin, Lihua He, You Li, Liang Chang, Ya Zhou,	(参考訳) マルチモーダルセンシング分析(MSA)は,マルチモーダルデータを用いてユーザの感情を推定する。従来の手法では、各モダリティの寄与を平等に扱うことや、各モダリティが支配的になる可能性のある状況を無視した相互作用を行うための支配的なモダリティとしてテキストを使用することに重点を置いていた。本稿では,マルチモーダル感情分析のための知識誘導動的モダリティ注意融合フレームワーク(KuDA)を提案する。 Kudaは感情知識を使用して、支配的なモダリティを動的に選択し、各モダリティの貢献を調整するモデルを導く。さらに、得られたマルチモーダル表現により、相関評価損失による支配的モダリティの寄与をさらに強調することができる。 4つのMSAベンチマークデータセットの大規模な実験は、KuDAが最先端のパフォーマンスを達成し、支配的なモダリティの異なるシナリオに適応できることを示している。 Multimodal Sentiment Analysis (MSA) utilizes multimodal data to infer the users' sentiment. Previous methods focus on equally treating the contribution of each modality or statically using text as the dominant modality to conduct interaction, which neglects the situation where each modality may become dominant. In this paper, we propose a Knowledge-Guided Dynamic Modality Attention Fusion Framework (KuDA) for multimodal sentiment analysis. KuDA uses sentiment knowledge to guide the model dynamically selecting the dominant modality and adjusting the contributions of each modality. In addition, with the obtained multimodal representation, the model can further highlight the contribution of dominant modality through the correlation evaluation loss. Extensive experiments on four MSA benchmark datasets indicate that KuDA achieves state-of-the-art performance and is able to adapt to different scenarios of dominant modality.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# 拡張的・意味的新しい視覚刺激に対するヒト脳反応の深層学習予測の一般化可能性解析 Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli ( http://arxiv.org/abs/2410.04497v1 ) ライセンス: Link先を確認	Valentyn Piskovskyi, Riccardo Chimisso, Sabrina Patania, Tom Foulsham, Giuseppe Vizzari, Dimitri Ognibene,	(参考訳) 本研究の目的は,視覚野活性化に対する画像強調技術の影響を探索するための枠組みとして,ニューラルネットワークを用いたアプローチの音質と有用性を検討することである。予備研究として、The Algonauts Project 2023 Challenge [16]に参加したトップ10の手法の中から選ばれた最先端の脳エンコーディングモデルを用意した。我々は、様々な画像強調技術が神経反応に与える影響について、有効な予測を行う能力について分析する。脳画像撮影にかかわる高コストによる実際のデータ取得が不可能であることを踏まえて,本研究は一連の実験を基礎にしている。具体的には,脳のエンコーダが,対象物(顔と言葉)に対する反応を,特定の領域に対する既知の影響で評価することにより,様々な拡張に対する脳反応を推定する能力について分析する。さらに,トレーニング中に見えない物体に対する反応の予測活性化について検討し,意味的アウト・オブ・ディストリビューション刺激の影響について検討した。提案するフレームワークを構成するモデルの一般化能力について,与えられたタスク,モデル駆動設計戦略,ARおよびVRアプリケーションに対して最適な視覚拡張フィルタの同定を期待できると思われる,関連性のある証拠を提供する。 The purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# AdaMemento:強化学習のための適応記憶支援政策最適化 AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning ( http://arxiv.org/abs/2410.04498v1 ) ライセンス: Link先を確認	Renye Yan, Yaozhong Gan, You Wu, Junliang Xing, Ling Liangn, Yeshang Zhu, Yimao Cai,	(参考訳) 強化学習(RL)のスパース報酬シナリオでは、メモリメカニズムは、人間のような過去の経験を反映して、ポリシー最適化に有望なショートカットを提供する。しかし、現在のメモリベースのRLメソッドは、単に高価値ポリシーを保存して再利用し、様々な過去の経験のより深い精錬とフィルタリングを欠いているため、メモリの能力を制限している。本稿では,適応型メモリ拡張RLフレームワークであるAdaMementoを提案する。過去のポジティブな経験を記憶する代わりに、実時間状態に基づいて既知のローカルな最適ポリシーを予測することを学ぶことによって、ポジティブな経験とネガティブな経験の両方を活用するメモリリフレクションモジュールを設計する。さらに,記憶に対する情報トラジェクトリを効果的に収集するために,類似状態のニュアンスを正確に識別して探索する,詳細な本質的なモチベーションパラダイムを導入する。過去の経験の活用と新しい政策の探索は、グローバルな最適化に近づくために、アンサンブル学習によって適応的に調整される。さらに,新たな本質的なモチベーションとアンサンブル機構の優位性を理論的に証明した。 59の定量的および可視化実験から,AdaMementoは,記憶における過去の経験を効果的に活用し,従来の手法よりも大幅に改善した,微妙な状態を識別できることを確認した。 In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and reuse high-value policies, lacking a deeper refining and filtering of diverse past experiences and hence limiting the capability of memory. In this paper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Instead of just memorizing positive past experiences, we design a memory-reflection module that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states. To effectively gather informative trajectories for the memory, we further introduce a fine-grained intrinsic motivation paradigm, where nuances in similar states can be precisely distinguished to guide exploration. The exploitation of past experiences and exploration of new policies are then adaptively coordinated by ensemble learning to approach the global optimum. Furthermore, we theoretically prove the superiority of our new intrinsic motivation and ensemble mechanism. From 59 quantitative and visualization experiments, we confirm that AdaMemento can distinguish subtle states for better exploration and effectively exploiting past experiences in memory, achieving significant improvement over previous methods.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# 順応性を考慮したプレトレーニングバックボーンの調整 Adjusting Pretrained Backbones for Performativity ( http://arxiv.org/abs/2410.04499v1 ) ライセンス: Link先を確認	Berker Demirel, Lingjing Kong, Kun Zhang, Theofanis Karaletsos, Celestine Mendler-Dünner, Francesco Locatello,	(参考訳) ディープラーニングモデルの広範な展開により、彼らは様々な方法で環境に影響を与える。誘導された分散シフトは、デプロイされたモデルで予期せぬパフォーマンス劣化を引き起こす可能性がある。パフォーマンスを予想する既存の方法は、将来の成果を予測する際に、デプロイされたモデルに関する情報を特徴ベクトルに組み込むのが一般的である。魅力的な理論的性質を楽しみながら、予測タスクの入力次元を変更することは、しばしば実用的ではない。そこで本研究では,事前学習したバックボーンをモジュール方式で調整し,サンプル効率を向上し,既存のディープラーニング資産の再利用を可能にする手法を提案する。性能上のラベルシフトに注目して、重要なアイデアは、デプロイされるモデルの十分な統計量を得たバックボーンのロジットにベイズ最適ラベルシフト修正を実行するために、浅いアダプタモジュールをトレーニングすることである。そのため,本フレームワークは,動作性を管理するメカニズムから,入力固有の特徴埋め込みの構築を分離する。動的ベンチマークを応用として,視覚・言語タスクの逆サンプリングによるアプローチの評価を行った。再学習軌道に沿った損失を減らし、候補モデルの中から効果的に選択し、性能劣化を予測できることを示す。より広範に、私たちの研究は、ディープラーニングにおけるパフォーマンスに対処するための最初のベースラインを提供します。 With the widespread deployment of deep learning models, they influence their environment in various ways. The induced distribution shifts can lead to unexpected performance degradation in deployed models. Existing methods to anticipate performativity typically incorporate information about the deployed model into the feature vector when predicting future outcomes. While enjoying appealing theoretical properties, modifying the input dimension of the prediction task is often not practical. To address this, we propose a novel technique to adjust pretrained backbones for performativity in a modular way, achieving better sample efficiency and enabling the reuse of existing deep learning assets. Focusing on performative label shift, the key idea is to train a shallow adapter module to perform a Bayes-optimal label shift correction to the backbone's logits given a sufficient statistic of the model to be deployed. As such, our framework decouples the construction of input-specific feature embeddings from the mechanism governing performativity. Motivated by dynamic benchmarking as a use-case, we evaluate our approach under adversarial sampling, for vision and language tasks. We show how it leads to smaller loss along the retraining trajectory and enables us to effectively select among candidate models to anticipate performance degradations. More broadly, our work provides a first baseline for addressing performativity in deep learning.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# LRHP: 選好ペアによる人間の選好表現の学習 LRHP: Learning Representations for Human Preferences via Preference Pairs ( http://arxiv.org/abs/2410.04503v1 ) ライセンス: Link先を確認	Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu,	(参考訳) 人間の嗜好アライメントトレーニングを改善するため、現在の研究では、"preferred" または "dispreferred" とラベル付けされた選好ペアからなる多くの選好データセットを開発した。これらの選好ペアは典型的には、人間からのフィードバック(RLHF)からの強化学習において報酬信号として機能する報酬モデリングによって、人間の嗜好を単一の数値に符号化するために使用される。しかしながら、これらの人間の嗜好を数値として表現することは、これらの嗜好の分析を複雑にし、RLHF以外の幅広い応用を制限する。対照的に、本稿では、より豊かで構造化された人間の嗜好表現を構築することを目的とした嗜好表現学習タスクを導入する。我々はさらに、従来の報酬モデルを超えてこの課題に取り組むために、好みペア(LRHP)を介して、より一般化可能な、人間の嗜好の学習表現フレームワークを開発する。選好データ選択と選好マージン予測という2つの下流タスクにおける選好表現の有用性を検証する。表現における人間の好みに基づいて、両方のタスクにおいて強いパフォーマンスを達成し、ベースラインを著しく上回る。 To improve human-preference alignment training, current research has developed numerous preference datasets consisting of preference pairs labeled as "preferred" or "dispreferred". These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF). However, representing these human preferences as a numerical value complicates the analysis of these preferences and restricts their broader applications other than RLHF. In contrast, in this work, we introduce a preference representation learning task that aims to construct a richer and more structured representation of human preferences. We further develop a more generalizable framework, Learning Representations for Human Preferences via preference pairs (namely LRHP), which extends beyond traditional reward modeling to tackle this task. We verify the utility of preference representations in two downstream tasks: preference data selection and preference margin prediction. Building upon the human preferences in representations, we achieve strong performance in both tasks, significantly outperforming baselines.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# 量子資源の曖昧な識別課題の修正のための境界 Bounds for Revised Unambiguous Discrimination Tasks of Quantum Resources ( http://arxiv.org/abs/2410.04504v1 ) ライセンス: Link先を確認	Xian Shi,	(参考訳) 量子状態の識別は、量子情報理論において意味のある基本的なタスクである。本書では,量子資源の明確な識別について検討する。まず, 漸近的・漸近的シナリオにおいて, 修正された曖昧な識別課題に対する成功確率の上限を示す。次に、タスクを量子状態から量子チャネルに一般化する。適応戦略の下でタスクの成功確率の上限を示す。さらに,この境界を効率的に計算できることを示す。最後に、古典的不明確な判別と比較すると、半定値な正作用素の集合上の量子化器を考えることにより、量子の利点を示す。 Quantum state discrimination is a fundamental task that is meaningful in quantum information theory. In this manuscript, we consider a revised unambiguous discrimination of quantum resources. First, we present an upper bound of the success probability for a revised unambiguous discrimination task in the unasymptotic and asymptotic scenarios. Next, we generalize the task from quantum states to quantum channels. We present an upper bound of the success probability for the task under the adaptive strategy. Furthermore, we show the bound can be computed efficiently. Finally, compared with the classical unambiguous discrimination, we show the advantage of the quantum by considering a quantifier on a set of semidefinite positive operators.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# 高利得パラメトリックダウンコンバージョンによる多光子絡み合った状態の空間シュミットモードの効率的な評価 Efficient characterization of spatial Schmidt modes of multiphoton entangled states produced from high-gain parametric down-conversion ( http://arxiv.org/abs/2410.04505v1 ) ライセンス: Link先を確認	Mahtab Amooei, Girish Kulkarni, Jeremy Upham, Robert W. Boyd,	(参考訳) 光の絡み合った状態の空間的相関を効率的に特徴づける能力は、量子イメージングのような多くの量子技術の応用にとって重要である。ここでは、高利得パラメトリックダウンコンバージョンから生じる光の空間シュミットモードと明るい多光子絡み合った状態のシュミットスペクトルの高効率な理論的、実験的特徴を示す。従来の研究とは対照的に、信号場の近似準均質性と等方性を利用して、実験的および理論的特徴付けにかかわる数値計算を劇的に削減する。実験データセットが256×256ピクセルの5000枚のシングルショット画像で構成されている場合,本手法は計算時間を2桁に短縮する。このスピードアップは、より大きな入力サイズに対してさらに劇的である。その結果、様々なポンプ振幅に対してシュミットモードとシュミットスペクトルを高速に特徴付けることができ、利得の増加とともにその変動を研究することができる。この結果から,シュミットモードの拡大とシュミットスペクトルの狭化が理論と実験の整合性の向上に寄与していることが明らかとなった。 The ability to efficiently characterize the spatial correlations of entangled states of light is critical for applications of many quantum technologies such as quantum imaging. Here, we demonstrate highly efficient theoretical and experimental characterization of the spatial Schmidt modes and the Schmidt spectrum of bright multiphoton entangled states of light produced from high-gain parametric down-conversion. In contrast to previous studies, we exploit the approximate quasihomogeneity and isotropy of the signal field and dramatically reduce the numerical computations involved in the experimental and theoretical characterization procedures. In our particular case where our experimental data sets consist of 5000 single-shot images of 256*256 pixels each, our method reduced the overall computation time by 2 orders of magnitude. This speed-up would be even more dramatic for larger input sizes. Consequently, we are able to rapidly characterize the Schmidt modes and Schmidt spectrum for a range of pump amplitudes and study their variation with increasing gain. Our results clearly reveal the broadening of the Schmidt modes and narrowing of the Schmidt spectrum for increasing gain with good agreement between theory and experiment.	翻訳日:2024-11-02 07:06:24 公開日:2024-10-06
# 言語に基づく意味理解の経路からの映像要約の実現 Realizing Video Summarization from the Path of Language-based Semantic Understanding ( http://arxiv.org/abs/2410.04511v1 ) ライセンス: Link先を確認	Kuan-Chen Mu, Zhi-Yi Chin, Wei-Chen Chiu,	(参考訳) 近年のビデオベースLarge Language Models (ビデオLLMs) の開発は,映像特徴と音声特徴をLarge Language Models (LLMs) と整合させることにより,映像要約の大幅な進歩を遂げている。これらのビデオLLMはそれぞれ独自の長所と短所を持っている。最近の多くの手法は、資源集約的なこれらのモデルの限界を克服するために、広範囲な微調整を必要としている。本研究では,あるビデオLLMの強みが他のビデオLLMの弱みを補うことを観察する。この知見を生かして、我々はMixture of Experts(MoE)パラダイムにインスパイアされた新しいビデオ要約フレームワークを提案する。提案手法は,複数のビデオLLMを統合し,包括的で一貫性のあるテキスト要約を生成する。視覚的およびオーディオ的コンテンツを効果的に組み合わせ、詳細な背景記述を提供し、キーフレームの識別に長けており、視覚情報にのみ依存する従来のコンピュータビジョンのアプローチよりも意味論的に意味のある検索を可能にする。さらに、結果の要約は、キーフレームの選択またはテキスト・ツー・イメージモデルの組み合わせによって、要約ビデオ生成のような下流タスクのパフォーマンスを向上させる。我々の言語駆動型アプローチは、従来の手法に代えて意味的に豊かな代替手段を提供し、より新しいビデオLLMを組み込む柔軟性を提供し、ビデオ要約タスクにおける適応性と性能を向上させる。 The recent development of Video-based Large Language Models (VideoLLMs), has significantly advanced video summarization by aligning video features and, in some cases, audio features with Large Language Models (LLMs). Each of these VideoLLMs possesses unique strengths and weaknesses. Many recent methods have required extensive fine-tuning to overcome the limitations of these models, which can be resource-intensive. In this work, we observe that the strengths of one VideoLLM can complement the weaknesses of another. Leveraging this insight, we propose a novel video summarization framework inspired by the Mixture of Experts (MoE) paradigm, which operates as an inference-time algorithm without requiring any form of fine-tuning. Our approach integrates multiple VideoLLMs to generate comprehensive and coherent textual summaries. It effectively combines visual and audio content, provides detailed background descriptions, and excels at identifying keyframes, which enables more semantically meaningful retrieval compared to traditional computer vision approaches that rely solely on visual information, all without the need for additional fine-tuning. Moreover, the resulting summaries enhance performance in downstream tasks such as summary video generation, either through keyframe selection or in combination with text-to-image models. Our language-driven approach offers a semantically rich alternative to conventional methods and provides flexibility to incorporate newer VideoLLMs, enhancing adaptability and performance in video summarization tasks.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# DAMRO:LVLMの注意機構の解明と幻覚の低減 DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination ( http://arxiv.org/abs/2410.04514v1 ) ライセンス: Link先を確認	Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei,	(参考訳) LVLM(Large Vision-Language Models)の成功にもかかわらず、彼らは必然的に幻覚に苦しんでいる。我々が知っているように、LVLMのビジュアルエンコーダとLarge Language Model (LLM)デコーダはトランスフォーマーベースであり、モデルが視覚情報を抽出し、注意機構を介してテキスト出力を生成することができる。画像トークン上のLLMデコーダの注意分布は視覚エンコーダと非常に一致しており、どちらの分布も画像中の参照対象よりも特定の背景トークンに注目する傾向にある。我々は、視覚エンコーダ自体に固有の欠陥があり、LCMが冗長な情報を過度に強調し、オブジェクト幻覚を生成することを誤解しているため、予期せぬ注意分布を考慮に入れている。この問題に対処するために、D$iveを$A$ttention $M$echanism of LVLM to $R$educe $O$bject Hallucination(英語版)に変換する新しいトレーニングフリー戦略であるDAMROを提案する。具体的には、ViTの分類トークン(CLS)を用いて、背景に散在する高アテンションな外れ値トークンをフィルタリングし、復号段階での影響を除去する。 LLaVA-1.5, LLaVA-NeXT, InstructBLIPなどのLVLMに対して, POPE, CHAIR, MME, GPT-4V Aided Evaluation などのベンチマークを用いて評価を行った。以上の結果から,本手法は,これらの異常トークンの影響を著しく低減し,LVLMの幻覚を効果的に緩和することを示した。私たちのメソッドのコードはまもなくリリースされます。 Despite the great success of Large Vision-Language Models (LVLMs), they inevitably suffer from hallucination. As we know, both the visual encoder and the Large Language Model (LLM) decoder in LVLMs are Transformer-based, allowing the model to extract visual information and generate text outputs via attention mechanisms. We find that the attention distribution of LLM decoder on image tokens is highly consistent with the visual encoder and both distributions tend to focus on particular background tokens rather than the referred objects in the image. We attribute to the unexpected attention distribution to an inherent flaw in the visual encoder itself, which misguides LLMs to over emphasize the redundant information and generate object hallucination. To address the issue, we propose DAMRO, a novel training-free strategy that $D$ive into $A$ttention $M$echanism of LVLM to $R$educe $O$bject Hallucination. Specifically, our approach employs classification token (CLS) of ViT to filter out high-attention outlier tokens scattered in the background and then eliminate their influence during decoding stage. We evaluate our method on LVLMs including LLaVA-1.5, LLaVA-NeXT and InstructBLIP, using various benchmarks such as POPE, CHAIR, MME and GPT-4V Aided Evaluation. The results demonstrate that our approach significantly reduces the impact of these outlier tokens, thus effectively alleviating the hallucination of LVLMs. The code of our method will be released soon.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# RevMUX: 効率的なLLMバッチ推論のための可逆アダプタによるデータ多重化 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference ( http://arxiv.org/abs/2410.04519v1 ) ライセンス: Link先を確認	Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao,	(参考訳) 大きな言語モデル(LLM)は、自然言語処理(NLP)コミュニティに大きなブレークスルーをもたらしました。データ多重化は、複数の入力を1つの複合入力にマージすることでこの問題に対処し、共有フォワードパスによるより効率的な推論を可能にする。しかしながら、複合入力と個人を区別することは難しいため、従来の手法ではバックボーン全体をトレーニングする必要があるが、性能劣化に悩まされている。本稿では,パラメータ効率のよいデータ多重化フレームワークであるRevMUXについて紹介する。 4種類のLLMバックボーンと3種類のLLMバックボーンの大規模な実験により,良好な分類性能を維持しつつ,LLM推論効率を向上させるRevMUXの有効性が示された。 Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput demands. Data multiplexing addresses this by merging multiple inputs into a single composite input, allowing more efficient inference through a shared forward pass. However, as distinguishing individuals from a composite input is challenging, conventional methods typically require training the entire backbone, yet still suffer from performance degradation. In this paper, we introduce RevMUX, a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer, which can be reused by the demultiplexer to perform reverse operations and restore individual samples for classification. Extensive experiments on four datasets and three types of LLM backbones demonstrate the effectiveness of RevMUX for enhancing LLM inference efficiency while retaining a satisfactory classification performance.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# 動的ホック後ニューラルエンサンブラ Dynamic Post-Hoc Neural Ensemblers ( http://arxiv.org/abs/2410.04520v1 ) ライセンス: Link先を確認	Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka,	(参考訳) アンサンブル法は、複数のベースラーナーを組み合わせることで、機械学習モデルの精度と堅牢性を高めることが知られている。しかし、グリーディやランダムアンサンブルのような標準的なアプローチは、アンサンブルメンバーのサンプル間で一定の重みを仮定するため、しばしば不足する。これにより、アンサンブル予測の集約時に表現性を制限することができ、性能を損なうことができる。本研究では,様々なモデル予測を適応的に活用するために,動的アンサンブルの重要性を強調し,ニューラルネットワークをアンサンブル手法として活用することを検討する。低多様性のアンサンブルを学習するリスクにより、トレーニング中にベースモデル予測をランダムにドロップすることでモデルを正規化することを提案する。このアプローチはアンサンブル内の多様性を低くし、オーバーフィッティングを減らし、一般化能力を向上させる。実験では, コンピュータビジョン, 自然言語処理, 表計算データにおいて, 強健なベースラインと比較して, 動的ニューラルアンサンブラが競争力を発揮することを示した。 Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembles often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we explore employing neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions during the training. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities. Our experiments showcase that the dynamic neural ensemblers yield competitive results compared to strong baselines in computer vision, natural language processing, and tabular data.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# MC-CoT: LLMとMLLMを統合したゼロショット医療VQAのためのモジュール協調CoTフレームワーク MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration ( http://arxiv.org/abs/2410.04521v1 ) ライセンス: Link先を確認	Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang, Zhongyu Wei, Wei Chen,	(参考訳) 近年,Med-VQA(Med-VQA)タスクに対処するために,特定の医用画像データセットに基づいてMLLM(Multimodal large language model)が微調整されている。しかし、タスク固有の微調整の一般的なアプローチはコストが高く、ダウンストリームタスクごとに別々のモデルが必要であるため、ゼロショット能力の探索が制限される。本稿では,大規模な言語モデル(LLM)を活用することで,Med-VQAにおけるMLLMのゼロショット性能を向上させることを目的とした,モジュール型クロスモーダルコラボレーションChain-of-Thought(CoT)フレームワークであるMC-CoTを紹介する。 MC-CoTは、医学知識とタスク固有のガイダンスを統合することで推論と情報抽出を改善し、LSMは様々な複雑な医学推論チェーンを提供し、MLLMはLSMの指示に基づいて様々な医学画像の観察を行う。 SLAKE, VQA-RAD, PATH-VQAなどのデータセットを用いた実験により, MC-CoT はスタンドアロンのMLLM や様々なマルチモーダル CoT フレームワークをリコール率と精度で上回っていることがわかった。これらの知見は、複雑なゼロショットのMed-VQAタスクに、背景情報と詳細なガイダンスを組み込むことの重要性を強調している。 In recent advancements, multimodal large language models (MLLMs) have been fine-tuned on specific medical image datasets to address medical visual question answering (Med-VQA) tasks. However, this common approach of task-specific fine-tuning is costly and necessitates separate models for each downstream task, limiting the exploration of zero-shot capabilities. In this paper, we introduce MC-CoT, a modular cross-modal collaboration Chain-of-Thought (CoT) framework designed to enhance the zero-shot performance of MLLMs in Med-VQA by leveraging large language models (LLMs). MC-CoT improves reasoning and information extraction by integrating medical knowledge and task-specific guidance, where LLM provides various complex medical reasoning chains and MLLM provides various observations of medical images based on instructions of the LLM. Our experiments on datasets such as SLAKE, VQA-RAD, and PATH-VQA show that MC-CoT surpasses standalone MLLMs and various multimodality CoT frameworks in recall rate and accuracy. These findings highlight the importance of incorporating background information and detailed guidance in addressing complex zero-shot Med-VQA tasks.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# 航空・海上医療避難プラットフォーム調整のための半マルコフ計画 Semi-Markovian Planning to Coordinate Aerial and Maritime Medical Evacuation Platforms ( http://arxiv.org/abs/2410.04523v1 ) ライセンス: Link先を確認	Mahdi Al-Husseini, Kyle H. Wray, Mykel J. Kochenderfer,	(参考訳) 水中水上機を用いた2機間の患者の移動は、海洋環境における医療的避難範囲と柔軟性を増大させる。患者交換のための複数の水中船のいずれかの選択は、航空機の利用履歴と参加する水上船の位置と速度によって複雑である。選択問題は、固定地と移動船の交換点を含む作用空間を有するセミマルコフ決定過程としてモデル化される。ルート並列化によるモンテカルロ木探索は、最適な交換点を選択し、航空機の発送時間を決定するために用いられる。モデルパラメータは、ウォータークラフト交換点がインシデント応答時間を減少させる代表シナリオを特定するためにシミュレーションで変化する。その結果, 船舶交換点を用いた最適政策は, 船舶交換点を含まない最適政策と, グリーディ政策を35%, 40%向上させることがわかった。米国陸軍と共同で、ハワイのオアフ島の南で2機のHH-60M医療避難ヘリコプターと、進行中の陸軍物流支援船との間で、マニキンによる模擬患者輸送を実行することで、初めて水上交換地点を配備した。どちらのヘリコプターも我々の最適化された決定戦略に従って派遣された。 The transfer of patients between two aircraft using an underway watercraft increases medical evacuation reach and flexibility in maritime environments. The selection of any one of multiple underway watercraft for patient exchange is complicated by participating aircraft utilization history and a participating watercraft position and velocity. The selection problem is modeled as a semi-Markov decision process with an action space including both fixed land and moving watercraft exchange points. Monte Carlo tree search with root parallelization is used to select optimal exchange points and determine aircraft dispatch times. Model parameters are varied in simulation to identify representative scenarios where watercraft exchange points reduce incident response times. We find that an optimal policy with watercraft exchange points outperforms an optimal policy without watercraft exchange points and a greedy policy by 35% and 40%, respectively. In partnership with the United States Army, we deploy for the first time the watercraft exchange point by executing a mock patient transfer with a manikin between two HH-60M medical evacuation helicopters and an underway Army Logistic Support Vessel south of the Hawaiian island of Oahu. Both helicopters were dispatched in accordance with our optimized decision strategy.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# セキュアチューニングに向けて - 良質なインストラクションの微調整から生じるセキュリティリスクの軽減 Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning ( http://arxiv.org/abs/2410.04524v1 ) ライセンス: Link先を確認	Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Fenglei Fan, Ting Liu, Bing Qin,	(参考訳) インストラクションファインタニング(IFT)は、基礎となるLarge Language Models(LLM)をプロフェッショナルおよびプライベートな用途に応用するための重要な手法となっている。しかし、研究者は、IFTプロセスが完全に良性な命令(Benign IFT)を含む場合でも、IFT後のLLMのセキュリティが大幅に低下することを懸念している。我々の研究は、ベニグンIFTによるセキュリティリスクを軽減するための先駆的な取り組みである。具体的には,LLMの内部モジュールがセキュリティにどのように貢献するかを検討することを目的としたモジュールロバストネス解析を行う。本稿では,ML-LR(Modular Layer-wise Learning Rate)戦略と呼ばれる新しいIFT戦略を提案する。分析では,モジュールの堅牢性(例えば$Q$/$K$/$V$など)を測定するためのプロキシとして機能する,シンプルなセキュリティ機能分類器を実装した。モジュールの強靭性は,モジュールタイプや層深度によって定期的に変化し,明確なパターンを示すことがわかった。これらの知見を活用して、モジュールのロバストなサブセットを識別するプロキシ誘導探索アルゴリズムを Mods$_{Robust}$ と呼ぶ。 IFT中、ML-LR戦略はMods$_{Robust}$とその他のモジュールの差分学習率を採用している。本研究は,セキュリティ評価において,ML-LR戦略の適用により,良性IFT後のLSMの有害性の増加が著しく軽減されることを示す。特に,我々のML-LR戦略は Benign IFT に続く LLM のユーザビリティや専門性にはほとんど影響しない。さらに,ML-LR戦略の健全性と柔軟性を検証するため,包括的分析を行った。 Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the security risks arising from Benign IFT. Specifically, we conduct a Module Robustness Analysis, aiming to investigate how LLMs' internal modules contribute to their security. Based on our analysis, we propose a novel IFT strategy, called the Modular Layer-wise Learning Rate (ML-LR) strategy. In our analysis, we implement a simple security feature classifier that serves as a proxy to measure the robustness of modules (e.g. $Q$/$K$/$V$, etc.). Our findings reveal that the module robustness shows clear patterns, varying regularly with the module type and the layer depth. Leveraging these insights, we develop a proxy-guided search algorithm to identify a robust subset of modules, termed Mods$_{Robust}$. During IFT, the ML-LR strategy employs differentiated learning rates for Mods$_{Robust}$ and the rest modules. Our experimental results show that in security assessments, the application of our ML-LR strategy significantly mitigates the rise in harmfulness of LLMs following Benign IFT. Notably, our ML-LR strategy has little impact on the usability or expertise of LLMs following Benign IFT. Furthermore, we have conducted comprehensive analyses to verify the soundness and flexibility of our ML-LR strategy.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# 周りを見回して見る:相対角によるOOD検出 Look Around and Find Out: OOD Detection with Relative Angles ( http://arxiv.org/abs/2410.04525v1 ) ライセンス: Link先を確認	Berker Demirel, Marco Fumero, Francesco Locatello,	(参考訳) 現実世界のアプリケーションにデプロイされるディープラーニングシステムは、その分散(ID)とは異なるデータに遭遇することが多い。信頼できるシステムは、理想的には、このアウト・オブ・ディストリビューション(OOD)設定での意思決定を控えるべきです。既存の最先端の手法は、主にk番目の隣人や決定境界までの距離といった特徴距離に焦点を当てている。本研究では, 分布内構造に対して計算されるOOD検出のための新しい角度に基づく計量法を提案する。特徴表現と決定境界の間の角度は,分布内特徴の平均から見て,IDとOODデータ間の効果的な識別要因となることを示す。提案手法は, CIFAR-10 と ImageNet ベンチマークの最先端性能を実現し, FPR95 を 0.88% と 7.74% 削減した。我々のスコア関数は既存の特徴空間正規化技術と互換性があり、性能が向上する。さらに、そのスケール不変性により、単純なスコア和によるOOD検出のためのモデルのアンサンブルを作成することができる。 Deep learning systems deployed in real-world applications often encounter data that is different from their in-distribution (ID). A reliable system should ideally abstain from making decisions in this out-of-distribution (OOD) setting. Existing state-of-the-art methods primarily focus on feature distances, such as k-th nearest neighbors and distances to decision boundaries, either overlooking or ineffectively using in-distribution statistics. In this work, we propose a novel angle-based metric for OOD detection that is computed relative to the in-distribution structure. We demonstrate that the angles between feature representations and decision boundaries, viewed from the mean of in-distribution features, serve as an effective discriminative factor between ID and OOD data. Our method achieves state-of-the-art performance on CIFAR-10 and ImageNet benchmarks, reducing FPR95 by 0.88% and 7.74% respectively. Our score function is compatible with existing feature space regularization techniques, enhancing performance. Additionally, its scale-invariance property enables creating an ensemble of models for OOD detection via simple score summation.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# Casablanca:多方言アラビア語音声認識のデータとモデル Casablanca: Data and Models for Multidialectal Arabic Speech Recognition ( http://arxiv.org/abs/2410.04527v1 ) ライセンス: Link先を確認	Bashar Talafha, Karima Kadaoui, Samar Mohamed Magdy, Mariem Habiboullah, Chafei Mohamed Chafei, Ahmed Oumar El-Shangiti, Hiba Zayed, Mohamedou cheikh tourad, Rahaf Alhamouri, Rwaa Assi, Aisha Alraeesi, Hour Mohamed, Fakhraddin Alwajih, Abdelrahman Mohamed, Abdellah El Mekki, El Moatez Billah Nagoudi, Benelhadj Djelloul Mama Saadia, Hamzah A. Alsayadi, Walid Al-Dhabyani, Sara Shatnawi, Yasir Ech-Chammakhy, Amal Makouar, Yousra Berrachedi, Mustafa Jarrar, Shady Shehata, Ismail Berrada, Muhammad Abdul-Mageed,	(参考訳) 近年の音声処理の進歩にもかかわらず、世界の言語や方言の大部分は明らかになっていない。この状況は、既に広範囲の技術的分断を妨げ、技術的・社会経済的包摂を妨げているだけである。この課題は主に、多様な音声システムを強化するデータセットがないためである。本稿では,多方言のアラビア語データセットを収集・転写する大規模コミュニティ主導の取り組みであるCasablancaを提示することにより,アラビア語方言のこの障害を軽減することを目的とする。このデータセットには、アルジェリア語、エジプト語、エミラティ語、ヨルダン語、モーリタニア語、モロッコ語、パレスチナ語、イエメン語の8つの方言が含まれ、転写、性別、方言、コードスイッチングのアノテーションが含まれている。私たちはまた、カサブランカを活用できる強力なベースラインを多数開発しています。 Casablanca のプロジェクトページは www.dlnlp.ai/speech/casablanca にある。 In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclusion. This challenge is largely due to the absence of datasets that can empower diverse speech systems. In this paper, we seek to mitigate this obstacle for a number of Arabic dialects by presenting Casablanca, a large-scale community-driven effort to collect and transcribe a multi-dialectal Arabic dataset. The dataset covers eight dialects: Algerian, Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni, and includes annotations for transcription, gender, dialect, and code-switching. We also develop a number of strong baselines exploiting Casablanca. The project page for Casablanca is accessible at: www.dlnlp.ai/speech/casablanca.	翻訳日:2024-11-02 06:56:10 公開日:2024-10-06
# 3次元シーン理解のための知覚的事前認識によるPlace Panoptic Radiance Field Segmentation In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding ( http://arxiv.org/abs/2410.04529v1 ) ライセンス: Link先を確認	Shenghao Li,	(参考訳) 正確な3Dシーン表現とパノプティクス理解は、仮想現実、ロボティクス、自律運転などのアプリケーションに不可欠である。しかし、正確な2D-to-3Dマッピング、境界あいまいさやスケールの変化といった複雑なシーン特性の扱い、パノピックな擬似ラベルのノイズ軽減など、既存の手法では課題が続いている。本稿では,2次元のセマンティクスとインスタンス認識を含む線形代入問題として,ニューラルラディアンス領域におけるパノプティクス理解を再構成する,知覚優先の3次元シーン表現とパノプティカル理解手法を提案する。事前学習された2次元パノプティックセグメンテーションモデルからの知覚情報を事前指導として組み込むことにより、ニューラル放射場における外観、幾何学、およびパノプティック理解の学習過程を同期させる。縮小符号化されたカスケードグリッドを再パラメータ化ドメイン蒸留フレームワーク内に拡張することにより,屋内および屋外のシーン間の一般化を促進するために,暗黙のシーン表現と理解モデルを開発した。このモデルは複雑なシーン特性を効果的に管理し、3D一貫性のあるシーン表現と様々なシーンに対するパノラマ理解結果を生成する。合成シーンや実世界のシーンを含む難易度条件下での実験およびアブレーション研究は、3次元シーン表現の強化とパノプティックセグメンテーションの精度向上における提案手法の有効性を実証する。 Accurate 3D scene representation and panoptic understanding are essential for applications such as virtual reality, robotics, and autonomous driving. However, challenges persist with existing methods, including precise 2D-to-3D mapping, handling complex scene characteristics like boundary ambiguity and varying scales, and mitigating noise in panoptic pseudo-labels. This paper introduces a novel perceptual-prior-guided 3D scene representation and panoptic understanding method, which reformulates panoptic understanding within neural radiance fields as a linear assignment problem involving 2D semantics and instance recognition. Perceptual information from pre-trained 2D panoptic segmentation models is incorporated as prior guidance, thereby synchronizing the learning processes of appearance, geometry, and panoptic understanding within neural radiance fields. An implicit scene representation and understanding model is developed to enhance generalization across indoor and outdoor scenes by extending the scale-encoded cascaded grids within a reparameterized domain distillation framework. This model effectively manages complex scene attributes and generates 3D-consistent scene representations and panoptic understanding outcomes for various scenes. Experiments and ablation studies under challenging conditions, including synthetic and real-world scenes, demonstrate the proposed method's effectiveness in enhancing 3D scene representation and panoptic segmentation accuracy.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# UniMuMo:統一テキスト、音楽、モーション生成 UniMuMo: Unified Text, Music and Motion Generation ( http://arxiv.org/abs/2410.04534v1 ) ライセンス: Link先を確認	Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan,	(参考訳) 任意のテキスト,音楽,動作データを入力条件として取り込んで,3つのモードすべてにまたがる出力を生成する,統一型マルチモーダルモデルUniMuMoを導入する。時間同期データの欠如に対処するため、リズミカルパターンに基づく未ペア音楽とモーションデータを調整し、既存の大規模音楽のみとモーションのみのデータセットを活用する。音楽、動き、テキストをトークンベースの表現に変換することで、我々のモデルはエンコーダ・デコーダ・トランスフォーマアーキテクチャを通じてこれらのモダリティをブリッジする。一つのフレームワーク内で複数の生成タスクをサポートするために、アーキテクチャの改善をいくつか導入する。そこで我々は,音楽コードブックを用いた符号化動作を提案し,その動作を音楽と同じ特徴空間にマッピングする。本稿では,楽音と楽音の同時生成タスクを1つの変圧器デコーダアーキテクチャに統一する音楽運動並列生成方式を提案する。さらに、既存の訓練済みの単一モダリティモデルを微調整することで、計算要求を大幅に低減する。広範にわたる実験により、UniMuMoは音楽、モーション、テキストのモダリティにわたる全一方向生成ベンチマークで競合する結果が得られることが示された。定量的な結果は \href{https://hanyangclarence.github.io/unimumo_demo/}{project page} で見ることができる。 We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired music and motion data based on rhythmic patterns to leverage existing large-scale music-only and motion-only datasets. By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture. To support multiple generation tasks within a single framework, we introduce several architectural improvements. We propose encoding motion with a music codebook, mapping motion into the same feature space as music. We introduce a music-motion parallel generation scheme that unifies all music and motion generation tasks into a single transformer decoder architecture with a single training task of music-motion joint generation. Moreover, the model is designed by fine-tuning existing pre-trained single-modality models, significantly reducing computational demands. Extensive experiments demonstrate that UniMuMo achieves competitive results on all unidirectional generation benchmarks across music, motion, and text modalities. Quantitative results are available in the \href{https://hanyangclarence.github.io/unimumo_demo/}{project page}.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 量子電磁力学のLandau-Peierls定式化のマルチ時間版 Multi-Time Version of the Landau-Peierls Formulation of Quantum Electrodynamics ( http://arxiv.org/abs/2410.04535v1 ) ライセンス: Link先を確認	Matthias Lienert, Roderich Tumulka,	(参考訳) ランダウとピエルスは、粒子配置表現における量子電磁力学の単純化版をハミルトン式に書き記した。時間発展方程式は単純で自然であり、ゲージの選択に関してより透明であり、そしておそらく最も重要なのはローレンツ共変である。マルチ時間方程式の特性について論じる。また、空間的曲面に対するローレンツ共変3dディラックデルタ分布と、任意のゲージで空間的曲面上の光子波動関数の内積についても論じる。 Landau and Peierls wrote down the Hamiltonian of a simplified version of quantum electrodynamics in the particle-position representation. We present a multi-time version of their Schr\"odinger equation, which bears several advantages over their original equation: the time evolution equations are simpler and more natural; they are more transparent with respect to choice of gauge; and, perhaps most importantly, they are manifestly Lorentz covariant. We discuss properties of the multi-time equations. Along the way, we also discuss the Lorentz covariant 3d Dirac delta distribution for spacelike surfaces and the inner product of photon wave functions on spacelike surfaces in an arbitrary gauge.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 機能近似器としてのLLMの機能評価について:ベイズの視点から On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective ( http://arxiv.org/abs/2410.04541v1 ) ライセンス: Link先を確認	Shoaib Ahmed Siddiqui, Yanzhi Chen, Juyeon Heo, Menglin Xia, Adrian Weller,	(参考訳) 最近の研究は、機能モデリングタスクにLLM(Large Language Models)をうまく応用している。しかし、この成功の理由は不明である。本研究では,LLMの関数モデリング能力を総合的に評価するための新しい評価フレームワークを提案する。関数モデリングのベイズ的視点を採用することで、LLMは生データのパターンの理解に比較的弱いが、基礎となる関数の理解を深めるために、ドメインに関する事前知識を活用することに長けていることが分かる。本研究は,機能モデリングの文脈におけるLLMの強度と限界に関する新たな知見を提供する。 Recent works have successfully applied Large Language Models (LLMs) to function modeling tasks. However, the reasons behind this success remain unclear. In this work, we propose a new evaluation framework to comprehensively assess LLMs' function modeling abilities. By adopting a Bayesian perspective of function modeling, we discover that LLMs are relatively weak in understanding patterns in raw data, but excel at utilizing prior knowledge about the domain to develop a strong understanding of the underlying function. Our findings offer new insights about the strengths and limitations of LLMs in the context of function modeling.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 医薬品設計のための合成経路の創成流れ Generative Flows on Synthetic Pathway for Drug Design ( http://arxiv.org/abs/2410.04542v1 ) ライセンス: Link先を確認	Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyoo Park, Sungsoo Ahn, Woo Youn Kim,	(参考訳) 薬物発見における生成モデルは、最近、ブルートフォース仮想スクリーニングの効率的な代替手段として注目されている。しかし、既存のほとんどのモデルは合成可能性を考慮しておらず、現実のシナリオでの使用を制限している。本稿では、RxnFlowを提案する。RxnFlowは、予め定義された分子構造ブロックと化学反応テンプレートを用いて分子を逐次組み立て、合成化学経路を制約する。次に、生成フローネットワーク(GFlowNets)を用いて、この逐次生成プロセスを訓練し、高い報酬と多様な分子を生成する。 GFlowNetsにおける合成経路の大規模な動作空間を緩和するため,新しい動作空間サブサンプリング法を実装した。これによりRxnFlowは、120万のビルディングブロックと71のリアクションテンプレートを組み合わせた広範なアクション空間上の生成フローを、計算上のオーバーヘッドなく学習することができる。さらに、RxnFlowは変更または拡張されたアクションスペースを、再トレーニングせずに生成するために使用することができるため、追加の目的や新たに発見されたビルディングブロックの導入が可能になる。実験により、RxnFlowは、既存の反応に基づくモデルやフラグメントベースのモデルよりも、ポケット固有の最適化において、様々なターゲットポケットにおいて優れていることを示す。さらに、RxnFlowはCrossDocked2020において、平均8.85kcal/molのVinaスコアと34.8%の合成性を持つ、ポケットコンディショナリ生成のための最先端のパフォーマンスを実現している。 Generative models in drug discovery have recently gained attention as efficient alternatives to brute-force virtual screening. However, most existing models do not account for synthesizability, limiting their practical use in real-world scenarios. In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to constrain the synthetic chemical pathway. We then train on this sequential generating process with the objective of generative flow networks (GFlowNets) to generate both highly rewarded and diverse molecules. To mitigate the large action space of synthetic pathways in GFlowNets, we implement a novel action space subsampling method. This enables RxnFlow to learn generative flows over extensive action spaces comprising combinations of 1.2 million building blocks and 71 reaction templates without significant computational overhead. Additionally, RxnFlow can employ modified or expanded action spaces for generation without retraining, allowing for the introduction of additional objectives or the incorporation of newly discovered building blocks. We experimentally demonstrate that RxnFlow outperforms existing reaction-based and fragment-based models in pocket-specific optimization across various target pockets. Furthermore, RxnFlow achieves state-of-the-art performance on CrossDocked2020 for pocket-conditional generation, with an average Vina score of -8.85kcal/mol and 34.8% synthesizability.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# データマニフォールド上のプルバックフローマッチング Pullback Flow Matching on Data Manifolds ( http://arxiv.org/abs/2410.04543v1 ) ライセンス: Link先を確認	Friso de Kruiff, Erik Bekkers, Ozan Öktem, Carola-Bibiane Schönlieb, Willem Diepeveen,	(参考訳) データ多様体生成のための新しいフレームワークであるPullback Flow Matching (PFM)を提案する。リーマンフローマッチング(RFM)モデルを訓練するための制限的閉形式多様体写像を仮定または学習する既存の方法とは異なり、PFMは引き戻し幾何学と等長学習を活用して、下層の多様体の幾何学を保存し、潜在空間における効率的な生成と正確な補間を可能にする。このアプローチは、データ多様体上の閉形式写像を促進するだけでなく、データ多様体と潜在多様体の両方で仮定された測度を用いて、設計可能な潜在空間を可能にする。ニューラルネットワークによる等尺学習を強化し、スケーラブルな学習目標を提案することにより、補間に適した潜時空間を実現し、多様体学習と生成性能を向上させる。 PFMの有効性は, 合成データ, タンパク質動態, タンパク質配列データに応用し, 特定の性質を持つ新規タンパク質を生成することによって実証する。本手法は, 特定の性質を持つ新規試料を生成できることが注目される, 医薬品の発見と材料科学に強い可能性を示す。 We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM's effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# AI支援の開示は筆記の知覚にどのように影響するか? How Does the Disclosure of AI Assistance Affect the Perceptions of Writing? ( http://arxiv.org/abs/2410.04545v1 ) ライセンス: Link先を確認	Zhuoyan Li, Chen Liang, Jing Peng, Ming Yin,	(参考訳) 大規模言語モデルのような生成型AI技術の最近の進歩は、ワークフローを書く際にAIアシストを組み込むことを加速し、書き込みにおける人間とAIの共同創造の新しいパラダイムが台頭した。本稿では,このパラダイムの下で作成される書物がどのように認識されるかを理解するために,書物の品質評価や異なる書物ランキングなど,書物プロセスにおけるAI支援のレベルとタイプが,書物における書物に対する人々の認識にどのように影響するかを実験的に検討する。以上の結果から,特にAIが新たなコンテンツ生成の支援を提供していれば,議論的なエッセイと創造的ストーリーの両面での平均品質評価が低下することが示唆された。この平均的な品質評価の低下は、しばしば異なる個人による同じ文章の質評価における変化のレベルが増大する。実際、個人の筆記自信やAIの筆記アシスタントへの親しみなどの要因は、筆記品質評価に対するAI支援の開示の影響を緩やかにしている。また、AIアシストの使用を開示することで、トップランクの著作のうち、AIのコンテンツ生成支援によって生成される文章の割合が大幅に減少する可能性があることもわかりました。 Recent advances in generative AI technologies like large language models have boosted the incorporation of AI assistance in writing workflows, leading to the rise of a new paradigm of human-AI co-creation in writing. To understand how people perceive writings that are produced under this paradigm, in this paper, we conduct an experimental study to understand whether and how the disclosure of the level and type of AI assistance in the writing process would affect people's perceptions of the writing on various aspects, including their evaluation on the quality of the writing and their ranking of different writings. Our results suggest that disclosing the AI assistance in the writing process, especially if AI has provided assistance in generating new content, decreases the average quality ratings for both argumentative essays and creative stories. This decrease in the average quality ratings often comes with an increased level of variations in different individuals' quality evaluations of the same writing. Indeed, factors such as an individual's writing confidence and familiarity with AI writing assistants are shown to moderate the impact of AI assistance disclosure on their writing quality evaluations. We also find that disclosing the use of AI assistance may significantly reduce the proportion of writings produced with AI's content generation assistance among the top-ranked writings.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# リモートセンシング画像における非バイアス表現の学習 Learning De-Biased Representations for Remote-Sensing Imagery ( http://arxiv.org/abs/2410.04546v1 ) ライセンス: Link先を確認	Zichen Tian, Zhaozheng Chen, Qianru Sun,	(参考訳) リモートセンシング(RS)画像は、特定の衛星を収集し、注釈を付けるのが困難であり、データ不足と特定のスペクトルのクラス不均衡に悩まされている。データ不足のため、スクラッチから大規模なRSモデルをトレーニングするのは非現実的であり、代わりに、微調整やよりデータ効率のよいLoRAによって事前訓練されたモデルを転送する。クラス不均衡のため、移行モデルは強いバイアスを示し、主要なクラスの特徴はマイナークラスのモデルよりも優位である。本稿では,任意の LoRA 変種と協調してデバイアス特徴を得る汎用的なトレーニング手法である debLoRA を提案する。これは教師なしの学習アプローチであり、共有属性に基づいたマイナークラス機能を主要なクラスに分散させ、その属性はクラスタリングの単純なステップによって得られる。これを評価するために、我々は、自然から光学的RS画像、光学的RS画像からマルチスペクトル的RS画像まで、RS領域における2つの伝達学習シナリオにおいて広範な実験を行った。我々は、光学RSデータセットDOTAとSARデータセットFUSRSのオブジェクト分類およびオブジェクト指向オブジェクト検出タスクを実行する。以上の結果から,3.3と4.7のパーセンテージが自然と光のRSと光のRSを多スペクトルのRSに適応させ,ヘッドクラスの性能を保ちつつ,その効果と適応性を検証した。 Remote sensing (RS) imagery, requiring specialized satellites to collect and being difficult to annotate, suffers from data scarcity and class imbalance in certain spectrums. Due to data scarcity, training any large-scale RS models from scratch is unrealistic, and the alternative is to transfer pre-trained models by fine-tuning or a more data-efficient method LoRA. Due to class imbalance, transferred models exhibit strong bias, where features of the major class dominate over those of the minor class. In this paper, we propose debLoRA, a generic training approach that works with any LoRA variants to yield debiased features. It is an unsupervised learning approach that can diversify minor class features based on the shared attributes with major classes, where the attributes are obtained by a simple step of clustering. To evaluate it, we conduct extensive experiments in two transfer learning scenarios in the RS domain: from natural to optical RS images, and from optical RS to multi-spectrum RS images. We perform object classification and oriented object detection tasks on the optical RS dataset DOTA and the SAR dataset FUSRS. Results show that our debLoRA consistently surpasses prior arts across these RS adaptation settings, yielding up to 3.3 and 4.7 percentage points gains on the tail classes for natural to optical RS and optical RS to multi-spectrum RS adaptations, respectively, while preserving the performance on head classes, substantiating its efficacy and adaptability.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 推薦における不均一公正のための社会的選択 Social Choice for Heterogeneous Fairness in Recommendation ( http://arxiv.org/abs/2410.04551v1 ) ライセンス: Link先を確認	Amanda Aird, Elena Štefancová, Cassidy All, Amy Voida, Martin Homola, Nicholas Mattei, Robin Burke,	(参考訳) 推薦システムにおけるアルゴリズムの公正性は、競合する利害関係を持つ様々な利害関係者のニーズによく注意する必要がある。この領域における以前の研究は、固定された単目的の公正の定義によって制限されたり、単一の公正次元に適用されるアルゴリズムや最適化基準に組み込まれたり、あるいは最も多くは、次元にわたって同一に適用されたりすることで、しばしば制限された。これらの狭い概念化は、実際に発生する幅広い利害関係者のニーズと公正定義に公平に認識されたソリューションを適用する能力を制限する。本研究は,マルチエージェントフレームワークを用いて,計算社会選択の観点からのフェアネスを推奨する。本稿では,様々な社会的選択機構の特性について考察し,複数のデータセットにまたがる多種不均一性定義の確立を実証する。 Algorithmic fairness in recommender systems requires close attention to the needs of a diverse set of stakeholders that may have competing interests. Previous work in this area has often been limited by fixed, single-objective definitions of fairness, built into algorithms or optimization criteria that are applied to a single fairness dimension or, at most, applied identically across dimensions. These narrow conceptualizations limit the ability to adapt fairness-aware solutions to the wide range of stakeholder needs and fairness definitions that arise in practice. Our work approaches recommendation fairness from the standpoint of computational social choice, using a multi-agent framework. In this paper, we explore the properties of different social choice mechanisms and demonstrate the successful integration of multiple, heterogeneous fairness definitions across multiple data sets.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 学術ネットワークを用いたソーシャルメディア推薦効果のモデル化 : グラフニューラルネットワークによるアプローチ Modeling Social Media Recommendation Impacts Using Academic Networks: A Graph Neural Network Approach ( http://arxiv.org/abs/2410.04552v1 ) ライセンス: Link先を確認	Sabrina Guidotti, Gregor Donabauer, Simone Somazzi, Udo Kruschwitz, Davide Taibi, Dimitri Ognibene,	(参考訳) ソーシャルメディアの普及により、社会や個人に対する潜在的なネガティブな影響が浮き彫りにされ、主にユーザーの行動や社会的ダイナミクスを形作るレコメンデーションアルゴリズムによって引き起こされた。これらのアルゴリズムを理解することは必須だが、ソーシャルメディアネットワークの複雑で分散した性質と、現実世界のデータへのアクセス制限のために難しい。本研究では,学術ソーシャルネットワークをソーシャルメディアにおけるレコメンデーションシステム調査のプロキシとして活用することを提案する。グラフニューラルネットワーク(GNN)を用いることで,学術的インフォスフィアの予測と行動予測を分離するモデルを構築し,推薦者生成インフォスフィアをシミュレートし,将来的な共著者予測におけるモデルの性能を評価する。提案手法は,レコメンデーションシステムの役割とソーシャルネットワークのモデリングに関する理解を深めることを目的としている。作業の再現性をサポートするため、 https://github.com/DimNeuroLab/academic_network_project という実装を公開しています。 The widespread use of social media has highlighted potential negative impacts on society and individuals, largely driven by recommendation algorithms that shape user behavior and social dynamics. Understanding these algorithms is essential but challenging due to the complex, distributed nature of social media networks as well as limited access to real-world data. This study proposes to use academic social networks as a proxy for investigating recommendation systems in social media. By employing Graph Neural Networks (GNNs), we develop a model that separates the prediction of academic infosphere from behavior prediction, allowing us to simulate recommender-generated infospheres and assess the model's performance in predicting future co-authorships. Our approach aims to improve our understanding of recommendation systems' roles and social networks modeling. To support the reproducibility of our work we publicly make available our implementations: https://github.com/DimNeuroLab/academic_network_project	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# モデル予測制御のためのビシミュレーションメトリック Bisimulation metric for Model Predictive Control ( http://arxiv.org/abs/2410.04553v1 ) ライセンス: Link先を確認	Yutaka Shimizu, Masayoshi Tomizuka,	(参考訳) モデルに基づく強化学習は、複雑な環境でサンプル効率と意思決定を改善することを約束している。しかし、既存の手法は、訓練の安定性、雑音に対する堅牢性、計算効率の面で課題に直面している。本稿では,モデル予測制御のためのBisimulation Metric for Model Predictive Control (BS-MPC)を提案する。このタイムステップワイド直接最適化により、学習エンコーダは、無関係な詳細を破棄し、勾配やエラーの発散を防止しつつ、元の状態空間から固有の情報を抽出することができる。 BS-MPCは、トレーニング時間を削減することにより、トレーニング安定性、入力ノイズに対する堅牢性、および計算効率を向上させる。我々は、DeepMind Control Suiteから連続制御と画像ベースタスクの両方でBS-MPCを評価し、最先端のベースライン手法と比較して優れた性能とロバスト性を示した。 Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite, demonstrating superior performance and robustness compared to state-of-the-art baseline methods.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# $\texttt{dattri}$: 効率的なデータ属性のためのライブラリ $\texttt{dattri}$: A Library for Efficient Data Attribution ( http://arxiv.org/abs/2410.04555v1 ) ライセンス: Link先を確認	Junwei Deng, Ting-Wei Li, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, Jiaqi W. Ma,	(参考訳) データ属性法は、個々のトレーニングサンプルが人工知能(AI)モデルの予測に与える影響を定量化することを目的としている。大規模AIモデルの現代的開発において、トレーニングデータがますます重要な役割を担っているため、データ属性は、AIのパフォーマンスと安全性を改善する幅広い応用を見出した。しかし、最近の新しいデータ属性メソッドの急増にもかかわらず、さまざまなデータ属性メソッドの開発、ベンチマーク、デプロイを容易にする包括的なライブラリが欠如している。本稿では、上記のニーズに対処するオープンソースのデータ属性ライブラリである$\texttt{dattri}$を紹介します。具体的には、$\texttt{dattri}$は3つの新しいデザイン機能を強調します。まず、$\texttt{dattri}$は統一的で使いやすいAPIを提案しており、ユーザはコード数行を変更したPyTorchベースの機械学習パイプラインに、さまざまなデータ属性メソッドを統合することができる。第二に、$\texttt{dattri}$は、Hessian-vector product、inverse-Hessian-vector product、ランダムプロジェクションといったデータ帰属法でよく使われる低レベルのユーティリティ関数をモジュール化し、研究者が新しいデータ帰属法を簡単に開発できるようにする。第3に、$\texttt{dattri}$は、事前トレーニングされたモデルと、生成AI設定を含むさまざまなベンチマーク設定のための基底真理アノテーションを備えた包括的なベンチマークフレームワークを提供する。我々は,大規模ニューラルネットワークモデルに適用可能な,最先端の効率的なデータ属性手法を多種に実装し,将来このライブラリを継続的に更新する。開発された $\texttt{dattri}$ ライブラリを使って、幅広いデータ属性メソッドに対して包括的で公平なベンチマーク分析を行うことができる。 $\texttt{dattri}$のソースコードはhttps://github.com/TRAIS-Lab/dattriで入手できる。 Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being developed recently, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods. In this work, we introduce $\texttt{dattri}$, an open-source data attribution library that addresses the above needs. Specifically, $\texttt{dattri}$ highlights three novel design features. Firstly, $\texttt{dattri}$ proposes a unified and easy-to-use API, allowing users to integrate different data attribution methods into their PyTorch-based machine learning pipeline with a few lines of code changed. Secondly, $\texttt{dattri}$ modularizes low-level utility functions that are commonly used in data attribution methods, such as Hessian-vector product, inverse-Hessian-vector product or random projection, making it easier for researchers to develop new data attribution methods. Thirdly, $\texttt{dattri}$ provides a comprehensive benchmark framework with pre-trained models and ground truth annotations for a variety of benchmark settings, including generative AI settings. We have implemented a variety of state-of-the-art efficient data attribution methods that can be applied to large-scale neural network models, and will continuously update the library in the future. Using the developed $\texttt{dattri}$ library, we are able to perform a comprehensive and fair benchmark analysis across a wide range of data attribution methods. The source code of $\texttt{dattri}$ is available at https://github.com/TRAIS-Lab/dattri.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# GAMformer: 一般化付加モデルのための文脈学習 GAMformer: In-Context Learning for Generalized Additive Models ( http://arxiv.org/abs/2410.04560v1 ) ライセンス: Link先を確認	Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter,	(参考訳) 一般化付加モデル(GAM)は、表データのための完全に解釈可能な機械学習モデルを作成する能力で広く認識されている。伝統的に、GAMのトレーニングにはスプライン、ブーストツリー、ニューラルネットワークといった反復的な学習アルゴリズムが含まれており、繰り返しエラーの低減を通じて付加的なコンポーネントを洗練させる。本稿では,1つの前方通過路におけるGAMの形状関数を推定するために文脈内学習を利用した最初の手法であるGAMformerを紹介する。グラフデータに文脈内学習を適用した以前の研究に基づいて、我々は複雑な合成データのみを用いてGAMformerをトレーニングしましたが、実際のデータによく当てはまることが分かりました。実験の結果,GAMformerは様々な分類ベンチマークで他の主要なGAMと同等に動作し,高い解釈可能な形状関数を生成することがわかった。 Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to leverage in-context learning to estimate shape functions of a GAM in a single forward pass, representing a significant departure from the conventional iterative approaches to GAM fitting. Building on previous research applying in-context learning to tabular data, we exclusively use complex, synthetic data to train GAMformer, yet find it extrapolates well to real-world data. Our experiments show that GAMformer performs on par with other leading GAMs across various classification benchmarks while generating highly interpretable shape functions.	翻訳日:2024-11-02 06:46:25 公開日:2024-10-06
# 観測データによる価値推定を期待するマーケットプレースによるランク付け政策学習 Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data ( http://arxiv.org/abs/2410.04568v1 ) ライセンス: Link先を確認	Ehsan Ebrahimzadeh, Nikhil Monga, Hang Gao, Alex Cozzi, Abraham Bagherjeiran,	(参考訳) 本研究では,2面のeコマース市場における検索・レコメンデーションエンジンのランキングポリシーを,観測データを用いた期待報酬最適化問題として学習するための意思決定フレームワークを開発する。ランキングポリシは、検索したアイテムを指定されたスロットに割り当て、そのスロットされたアイテムから、ショッピング旅行の任意の段階で、ユーザユーティリティを最大化する。このアロケーションの目的は、ユーザ意図に合致する提示項目におけるインタラクションイベントの期待数として、下位の確率的ユーザブラウジングモデルに対して、ランキングコンテキストから定義することができる。市場におけるスロットアイテムとのインタラクションを通知する介入行動としてランキングが与える影響を認識させることにより、提示されたすべてのランキング行動から、市場が期待する報酬を集合価値として定式化する。この定式化の鍵となる要素は、コンテキスト値の分布の概念であり、これはセッション内のランク付け介入に対する価値の属性だけでなく、ユーザセッション間でのマーケットプレース報酬の分布も意味している。我々は、セッションコンテキスト間の経済価値の不均一性を考慮した観察データと、観察ユーザ活動データからの学習における学習の分布変化から、市場が期待する報酬に対する実証的な見積もりを構築した。ランク付けポリシーは、標準的なベイズ推論技術を用いて経験的期待報酬推定を最適化することで訓練することができる。本稿では,経験的報酬推定に基づいて訓練された警察官による基本的なトレードオフを,文脈値分布の極端な選択に関して実証的報酬推定に基づいて示す,大規模なeコマースプラットフォームにおける製品検索ランキングタスクの実証結果について報告する。 We develop a decision making framework to cast the problem of learning a ranking policy for search or recommendation engines in a two-sided e-commerce marketplace as an expected reward optimization problem using observational data. As a value allocation mechanism, the ranking policy allocates retrieved items to the designated slots so as to maximize the user utility from the slotted items, at any given stage of the shopping journey. The objective of this allocation can in turn be defined with respect to the underlying probabilistic user browsing model as the expected number of interaction events on presented items matching the user intent, given the ranking context. Through recognizing the effect of ranking as an intervention action to inform users' interactions with slotted items and the corresponding economic value of the interaction events for the marketplace, we formulate the expected reward of the marketplace as the collective value from all presented ranking actions. The key element in this formulation is a notion of context value distribution, which signifies not only the attribution of value to ranking interventions within a session but also the distribution of marketplace reward across user sessions. We build empirical estimates for the expected reward of the marketplace from observational data that account for the heterogeneity of economic value across session contexts as well as the distribution shifts in learning from observational user activity data. The ranking policy can then be trained by optimizing the empirical expected reward estimates via standard Bayesian inference techniques. We report empirical results for a product search ranking task in a major e-commerce platform demonstrating the fundamental trade-offs governed by ranking polices trained on empirical reward estimates with respect to extreme choices of the context value distribution.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# 透かし決定木が集まる Watermarking Decision Tree Ensembles ( http://arxiv.org/abs/2410.04570v1 ) ライセンス: Link先を確認	Stefano Calzavara, Lorenzo Cazzaro, Donald Gera, Salvatore Orlando,	(参考訳) 機械学習モデルの知的特性を保護することはホットな話題であり、深層ニューラルネットワークのための多くの透かしスキームが文献で提案されている。残念なことに、以前の研究は、非知覚データ上の分類タスクのための最先端のモデルである決定木アンサンブルを含む、他のタイプのモデルの透かし技術の調査をほとんど無視していた。本稿では,決定木アンサンブル用に設計された最初の透かし方式について述べる。我々はウォーターマークの作成と検証について論じ、攻撃の可能性について徹底的なセキュリティ分析を提示する。提案手法を実験的に評価し、最も関連性の高い脅威に対する精度と安全性の点で優れた結果を示す。 Protecting the intellectual property of machine learning models is a hot topic and many watermarking schemes for deep neural networks have been proposed in the literature. Unfortunately, prior work largely neglected the investigation of watermarking techniques for other types of models, including decision tree ensembles, which are a state-of-the-art model for classification tasks on non-perceptual data. In this paper, we present the first watermarking scheme designed for decision tree ensembles, focusing in particular on random forest models. We discuss watermark creation and verification, presenting a thorough security analysis with respect to possible attacks. We finally perform an experimental evaluation of the proposed scheme, showing excellent results in terms of accuracy and security against the most relevant threats.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# EnsemW2S: LLMのアンサンブルは、より強力なLLMを達成できるのか? EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? ( http://arxiv.org/abs/2410.04571v1 ) ライセンス: Link先を確認	Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, John Langford, Furong Huang,	(参考訳) 複数の大規模言語モデル(LLM)の集合的機能を利用して、さらに強力なモデルを構築するにはどうすればよいのか? この疑問が我々の研究の基盤となり、AIアライメントにおける弱点である弱強(w2s)一般化に対する革新的なアプローチを提案する。我々の研究は、より単純なタスクで訓練された弱いモデルがより複雑なタスクでより強力なモデルを協調的に監督するw2s一般化の実現可能性を研究するための、容易でハードな(e2h)フレームワークを導入している。このセットアップは、人間の直接の監督が制限される現実世界の課題を反映している。そこで我々は,AdaBoostにインスパイアされた新しいアンサンブル法を開発し,弱いスーパーバイザーのアンサンブルが,難しいQAデータセット上での分類や生成タスクにまたがる強力なLDMの性能を向上させることを実証した。いくつかのケースにおいて、我々のアンサンブルアプローチは、地上データに基づいて訓練されたモデルの性能と一致し、w2sの一般化のための新しいベンチマークを確立した。既存のベースラインよりも最大14%向上し,バイナリ分類と生成タスクでは平均5%,平均4%改善した。この研究は、特にラベル付きデータが希少で不十分なシナリオにおいて、集団的な監督を通じてAIを強化するための有望な方向性を示している。 How can we harness the collective capabilities of multiple Large Language Models (LLMs) to create an even more powerful model? This question forms the foundation of our research, where we propose an innovative approach to weak-to-strong (w2s) generalization-a critical problem in AI alignment. Our work introduces an easy-to-hard (e2h) framework for studying the feasibility of w2s generalization, where weak models trained on simpler tasks collaboratively supervise stronger models on more complex tasks. This setup mirrors real-world challenges, where direct human supervision is limited. To achieve this, we develop a novel AdaBoost-inspired ensemble method, demonstrating that an ensemble of weak supervisors can enhance the performance of stronger LLMs across classification and generative tasks on difficult QA datasets. In several cases, our ensemble approach matches the performance of models trained on ground-truth data, establishing a new benchmark for w2s generalization. We observe an improvement of up to 14% over existing baselines and average improvements of 5% and 4% for binary classification and generative tasks, respectively. This research points to a promising direction for enhancing AI through collective supervision, especially in scenarios where labeled data is sparse or insufficient.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# Dual Transformer Fusion による重症閉塞症における3次元姿勢推定の強化 Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion ( http://arxiv.org/abs/2410.04574v1 ) ライセンス: Link先を確認	Mehwish Ghafoor, Arif Mahmood, Muhammad Bilal,	(参考訳) モノクロビデオからの3Dヒューマン・ポース推定の分野では、多様なオクルージョン・タイプの存在が深刻な課題となっている。従来の研究では、2次元関節観察から3次元のポーズを推測するために空間的および時間的手がかりを利用することで進歩している。本稿では,重篤なオクルージョンの存在下でも,全身的な3次元ポーズ推定を実現するための新しいアプローチであるDTFアルゴリズムを提案する。咬合誘発関節データ不足の問題に先立ち,時間的補間に基づく咬合誘導機構を提案する。正確な3Dヒューマンポース推定を実現するために,本手法では,まず2つの中間ビューを生成する革新的なDTFアーキテクチャを利用する。各中間ビューは、自己精製スキーマを介して空間的精細化を行う。その後、これらの中間ビューを融合して最終3次元人のポーズ推定を行う。システム全体がエンドツーエンドのトレーニングが可能である。また,Human3.6MとMPI-INF-3DHPデータセットを用いた広範囲な実験により,本手法の性能評価を行った。特に、我々のアプローチは、両方のデータセットで既存の最先端メソッドよりも優れており、大幅な改善をもたらします。コードは、https://github.com/MehwishG/DTF.comで入手できる。 In the field of 3D Human Pose Estimation from monocular videos, the presence of diverse occlusion types presents a formidable challenge. Prior research has made progress by harnessing spatial and temporal cues to infer 3D poses from 2D joint observations. This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation, even in the presence of severe occlusions. Confronting the issue of occlusion-induced missing joint data, we propose a temporal interpolation-based occlusion guidance mechanism. To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views. Each intermediate-view undergoes spatial refinement through a self-refinement schema. Subsequently, these intermediate-views are fused to yield the final 3D human pose estimation. The entire system is end-to-end trainable. Through extensive experiments conducted on the Human3.6M and MPI-INF-3DHP datasets, our method's performance is rigorously evaluated. Notably, our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements. The code is available here: https://github.com/MehwishG/DTF.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# 表現学習のためのロバストネスプログラミング Robustness Reprogramming for Representation Learning ( http://arxiv.org/abs/2410.04577v1 ) ライセンス: Link先を確認	Zhichao Hou, MohamadAli Torkamani, Hamid Krim, Xiaorui Liu,	(参考訳) この研究は、表現学習における興味深い、そして基本的なオープンな課題に取り組みます。十分に訓練されたディープラーニングモデルが与えられたら、そのパラメータを変更することなく、敵対的または騒々しい入力摂動に対する堅牢性を高めるために、再プログラムできますか? そこで我々は,表現学習における中核的特徴変換機構を再考し,ロバストな代替手段として,新しい非線形ロバストパターンマッチング手法を提案する。さらに、3つのモデル再プログラミングパラダイムを導入し、異なる効率要件下で頑健さを柔軟に制御する。基本線形モデルやMLPから浅層・近代的なConvNetまで,多様な学習モデルを対象とした総合的な実験とアブレーション研究は,我々のアプローチの有効性を実証している。この作業は、既存の手法を越えてディープラーニングにおける敵防衛を改善するための有望で直交的な方向を開くだけでなく、堅牢な統計を持つよりレジリエントなAIシステムを設計するための新たな洞察を提供する。 This work tackles an intriguing and fundamental open challenge in representation learning: Given a well-trained deep learning model, can it be reprogrammed to enhance its robustness against adversarial or noisy input perturbations without altering its parameters? To explore this, we revisit the core feature transformation mechanism in representation learning and propose a novel non-linear robust pattern matching technique as a robust alternative. Furthermore, we introduce three model reprogramming paradigms to offer flexible control of robustness under different efficiency requirements. Comprehensive experiments and ablation studies across diverse learning models ranging from basic linear model and MLPs to shallow and modern deep ConvNets demonstrate the effectiveness of our approaches. This work not only opens a promising and orthogonal direction for improving adversarial defenses in deep learning beyond existing methods but also provides new insights into designing more resilient AI systems with robust statistics.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# 相対論的自己相互作用ボソン系の位相図 Phase Diagrams of Relativistic Selfinteracting Boson System ( http://arxiv.org/abs/2410.04580v1 ) ライセンス: Link先を確認	V. Gnatovskyy, D. Anchishkin, D. Zhuravel, V. Karpenko,	(参考訳) カノニカル・アンサンブル内では、平均場アプローチにおいて、有限温度および有限アイソスピン密度で相対論的ボソンを相互作用するシステムについて検討する。平均体は、魅力的な項と反発的な項の両方を含む。熱力学量の温度およびイソスピン密度依存性を得た。ボゾン系における粒子間のアトラクションの場合, ボース-アインシュタイン凝縮体の背景に液-ガス相転移が生じることが示されている。対応する位相図が与えられる。ボース凝縮体の存在が、ボルツマン統計学の枠組みにおいて同じ系で得られたものと比較して、液-ガス相転移の臨界温度を著しく上昇させる理由を説明する。この結果は実験データの解釈,特に混合相の臨界点がボース・アインシュタイン凝縮の存在にどれほど敏感であるかに影響を及ぼす可能性がある。 Within the Canonical Ensemble, we investigate a system of interacting relativistic bosons at finite temperatures and finite isospin densities in a mean-field approach. The mean field contains both attractive and repulsive terms. Temperature and isospin-density dependencies of thermodynamic quantities were obtained. It is shown that in the case of attraction between particles in a bosonic system, a liquid-gas phase transition develops against the background of the Bose-Einstein condensate. The corresponding phase diagrams are given. We explain the reasons why the presence of a Bose condensate significantly increases the critical temperature of the liquid-gas phase transition compared to that obtained for the same system within the framework of the Boltzmann statistics. Our results may have implications for the interpretation of experimental data, in particular, how sensitive the critical point of the mixed phase is to the presence of the Bose-Einstein condensate.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# 超平面-対称静的アインシュタイン-ディラック時空 Hyperplane-Symmetric Static Einstein-Dirac Spacetime ( http://arxiv.org/abs/2410.04582v1 ) ライセンス: Link先を確認	John Schliemann, Tim Sonnleitner,	(参考訳) 任意の次元の静的および超平面対称時空におけるアインシュタイン方程式とディラック場方程式の一般解を導出する。結果として、アインシュタイン方程式から時空への質量の多いディラック場のみを結合し、質量のない場合、エネルギー-運動量テンソルの対角成分を排除するためには、ダイラック場は適切な制約を満たす必要がある。また、Ricci scalar や Kretschmann scalar などの曲率不変量に対して、物理的特異点を示す明示的な表現を与える。さらに、測地方程式の一般解を二次方程式に還元する。 We derive the general solution to the coupled Einstein and Dirac field equations in static and hyperplane-symmetric spacetime of arbitrary dimension including a cosmological constant of either sign. As a result, only a massful Dirac field couples via the Einstein equations to spacetime, and in the massless case the Dirac field is required to fulfill appropriate constraints in order to eliminate off-diagonal components of the energy-momentum tensor. We also give explicit expressions for curvature invariants including the Ricci scalar and the Kretschmann scalar, indicating physical singularities. Moreover, we reduce the general solution of the geodesic equation to quadratures.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# 知識グラフ検索を用いた推論型医療予測 Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval ( http://arxiv.org/abs/2410.04585v1 ) ライセンス: Link先を確認	Pengcheng Jiang, Cao Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han,	(参考訳) 大規模言語モデル (LLM) は臨床的決定支援に有意な可能性を示唆している。しかし、LSMはまだ幻覚に悩まされており、詳細なコンテキストの医療知識が欠如しているため、臨床診断などの高い医療応用が制限されている。従来の検索拡張生成法(RAG)はこれらの制限に対処しようとするが、スパースや無関係な情報を頻繁に取得し、予測精度を損なう。我々は、知識グラフ(KG)コミュニティレベルの検索をLLM推論と統合し、医療予測を強化する新しいフレームワークであるKAREを紹介する。 KAREは、バイオメディカルデータベース、臨床文献、LLM生成した知見を統合して総合的なマルチソースKGを構築し、階層的なグラフコミュニティ検出と要約を用いて、正確で文脈的に関係のある情報検索を行う。 1)関連情報の正確な検索を可能にする高密度な医療知識構造化アプローチ,(2)集中型多面的医療インテリジェンスで患者コンテキストを充実させるダイナミックな知識検索機構,(3)これらのリッチなコンテキストを活用する推論フレームワークを用いて,正確かつ解釈可能な臨床予測を生成する。 MIMIC-IIIでは最大10.8～15.0%、MIMIC-IVでは12.6～12.7%である。予測精度の向上に加えて,LLMの推論能力を活用し,臨床予測の信頼性を高める。 Large language models (LLMs) have demonstrated significant potential in clinical decision support. Yet LLMs still suffer from hallucinations and lack fine-grained contextual medical knowledge, limiting their high-stake healthcare applications such as clinical diagnosis. Traditional retrieval-augmented generation (RAG) methods attempt to address these limitations but frequently retrieve sparse or irrelevant information, undermining prediction accuracy. We introduce KARE, a novel framework that integrates knowledge graph (KG) community-level retrieval with LLM reasoning to enhance healthcare predictions. KARE constructs a comprehensive multi-source KG by integrating biomedical databases, clinical literature, and LLM-generated insights, and organizes it using hierarchical graph community detection and summarization for precise and contextually relevant information retrieval. Our key innovations include: (1) a dense medical knowledge structuring approach enabling accurate retrieval of relevant information; (2) a dynamic knowledge retrieval mechanism that enriches patient contexts with focused, multi-faceted medical insights; and (3) a reasoning-enhanced prediction framework that leverages these enriched contexts to produce both accurate and interpretable clinical predictions. Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions. In addition to its impressive prediction accuracy, our framework leverages the reasoning capabilities of LLMs, enhancing the trustworthiness of clinical predictions.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# イタリア初のUDツリーバンク:KIParla林 Towards the first UD Treebank of Spoken Italian: the KIParla forest ( http://arxiv.org/abs/2410.04589v1 ) ライセンス: Link先を確認	Ludovica Pannitto,	(参考訳) このプロジェクトは、KIParla corpus(Mauri et al , 2019, Ballar\`e et al , 2020)のためのUniversal Dependencies treebankを構築し、イタリア語で利用可能な言語資源を充実させようとしている。 The present project endeavors to enrich the linguistic resources available for Italian by constructing a Universal Dependencies treebank for the KIParla corpus (Mauri et al., 2019, Ballar\`e et al., 2020), an existing and well known resource for spoken Italian.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# ProtocoLLM:ドメイン特化科学プロトコル定式化タスクにおけるLCMの自動評価フレームワーク ProtocoLLM: Automatic Evaluation Framework of LLMs on Domain-Specific Scientific Protocol Formulation Tasks ( http://arxiv.org/abs/2410.04601v1 ) ライセンス: Link先を確認	Seungjun Yi, Jaeyoung Lim, Juyong Yoon,	(参考訳) ロボットによって実行可能な科学プロトコルの自動生成は、科学的研究プロセスを著しく加速することができる。大言語モデル(LLM)は、SPFT(Scientific Protocol Formulation Tasks)で優れているが、その能力の評価は人間による評価に依存している。本稿では,SPFT 上で LLM の機能を評価するためのフレキシブルな自動フレームワーク ProtocoLLM を提案する。このフレームワークは、予め定義されたラボアクションのみを用いて、生物学のプロトコルからターゲットモデルとGPT-4を抽出し、LLAM-EVALを用いてターゲットモデルの出力を評価する。我々の適応型プロンプトベース評価手法であるLLAM-EVALは, 評価モデル, 材料, 基準, コストの面において, 大幅な柔軟性を提供する。 GPT変異,Llama,Mixtral,Gemma,Cohere,Geminiを評価した。全体として、GPTとCohereは強力な科学的プロトコル定式化器である。また、生物学プロトコルとそれに対応する擬似コードを備えたデータセットであるBIOPROT 2.0を導入し、SPFTの定式化と評価においてLLMを支援する。本研究は,SPFT 上の LLM を,特定の目的のためにプロトコル生成を必要とする様々な領域および他の分野にわたって評価するために拡張可能である。 Automated generation of scientific protocols executable by robots can significantly accelerate scientific research processes. Large Language Models (LLMs) excel at Scientific Protocol Formulation Tasks (SPFT), but the evaluation of their capabilities rely on human evaluation. Here, we propose a flexible, automatic framework to evaluate LLM's capability on SPFT: ProtocoLLM. This framework prompts the target model and GPT-4 to extract pseudocode from biology protocols using only predefined lab actions and evaluates the output of target model using LLAM-EVAL, the pseudocode generated by GPT-4 serving as a baseline and Llama-3 acting as the evaluator. Our adaptable prompt-based evaluation method, LLAM-EVAL, offers significant flexibility in terms of evaluation model, material, criteria, and is free of cost. We evaluate GPT variations, Llama, Mixtral, Gemma, Cohere, and Gemini. Overall, we find that GPT and Cohere is a powerful scientific protocol formulators. We also introduce BIOPROT 2.0, a dataset with biology protocols and corresponding pseudocodes, which can aid LLMs in formulation and evaluation of SPFT. Our work is extensible to assess LLMs on SPFT across various domains and other fields that require protocol generation for specific goals.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# プライバシの危機:データブローカーの規制されていない地下市場と提案されたフレームワークを解き放つ Privacy's Peril: Unmasking the Unregulated Underground Market of Data Brokers and the Suggested Framework ( http://arxiv.org/abs/2410.04606v1 ) ライセンス: Link先を確認	Rabia Bajwa, Farah Tasnur Meem,	(参考訳) インターネットは、企業ができるだけ多くのクライアントデータを収集、保存するための一般的な場所であり、この傾向によりコンピュータストレージ容量は指数関数的に増加した。企業はこのデータを使って顧客満足度を高め、収益を上げ、売上を増やし、プロフィールを増やす。しかし、データブローカーの新興セクターは法的課題に悩まされている。パート1では、データブローカとは何か、情報収集方法、データ産業、それに遭遇する困難について検討する。パートIIでは、データブローカを規制する潜在的なオプションを検討します。すべてのオプションは、EU一般データ保護規則(GDPR)に基づいて提供されている。第III部では、分析及び発見について紹介する。 The internet is a common place for businesses to collect and store as much client data as possible and computer storage capacity has increased exponentially due to this trend. Businesses utilize this data to enhance customer satisfaction, generate revenue, boost sales, and increase profile. However, the emerging sector of data brokers is plagued with legal challenges. In part I, we will look at what a data broker is, how it collects information, the data industry, and some of the difficulties it encounters. In Part II, we will look at potential options for regulating data brokers. All options are provided in light of the EU General Data Protection Regulation (GDPR). In Part III, we shall present our analysis and findings.	翻訳日:2024-11-02 06:36:17 公開日:2024-10-06
# VISTA:マルチモーダルモデル解釈のための視覚的・テキスト的注意データセット VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models ( http://arxiv.org/abs/2410.04609v1 ) ライセンス: Link先を確認	Harshit, Tolga Tasdizen,	(参考訳) 近年のディープラーニングの発展により、自然言語処理(NLP)とコンピュータビジョンが統合され、強力な統合ビジョンと言語モデル(VLM)が生まれた。その優れた能力にもかかわらず、これらのモデルは機械学習研究コミュニティ内のブラックボックスと見なされることが多い。これは、画像のどの部分が特定のテキストセグメントに対応し、どのようにそれらの関連を解読できるかという批判的な疑問を提起する。これらの接続を理解することは、モデルの透明性、解釈可能性、信頼性を高めるために不可欠です。この疑問に答えるために,画像領域と対応するテキストセグメント間の特定の関連をマッピングする画像テキスト整列人間の視覚的注意データセットを提案する。次に、VLモデルによって生成された内部のヒートマップとこのデータセットを比較し、モデルの決定プロセスを分析し、よりよく理解できるようにします。このアプローチは、これらのモデルが視覚的および言語的情報をどのように整合させるかについての洞察を提供することで、モデルの透明性、解釈可能性、信頼性を高めることを目的としている。これらのVLモデルにおいて,テキスト誘導型視覚塩分濃度検出の総合的研究を行った。本研究の目的は、異なるモデルが、対応するテキストセグメントに対してどのように特定の視覚要素を優先順位付けし、フォーカスするかを理解し、内部メカニズムについて深い洞察を提供し、アウトプットを解釈する能力を改善することである。 The recent developments in deep learning led to the integration of natural language processing (NLP) with computer vision, resulting in powerful integrated Vision and Language Models (VLMs). Despite their remarkable capabilities, these models are frequently regarded as black boxes within the machine learning research community. This raises a critical question: which parts of an image correspond to specific segments of text, and how can we decipher these associations? Understanding these connections is essential for enhancing model transparency, interpretability, and trustworthiness. To answer this question, we present an image-text aligned human visual attention dataset that maps specific associations between image regions and corresponding text segments. We then compare the internal heatmaps generated by VL models with this dataset, allowing us to analyze and better understand the model's decision-making process. This approach aims to enhance model transparency, interpretability, and trustworthiness by providing insights into how these models align visual and linguistic information. We conducted a comprehensive study on text-guided visual saliency detection in these VL models. This study aims to understand how different models prioritize and focus on specific visual elements in response to corresponding text segments, providing deeper insights into their internal mechanisms and improving our ability to interpret their outputs.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 相対的未来を推し進める:マルチターンRLHFの効率的な政策最適化 Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF ( http://arxiv.org/abs/2410.04612v1 ) ライセンス: Link先を確認	Zhaolin Gao, Wenhao Zhan, Jonathan D. Chang, Gokul Swamy, Kianté Brantley, Jason D. Lee, Wen Sun,	(参考訳) 大きな言語モデル(LLM)は、1ターンのインタラクションを含む要約のようなタスクで顕著に成功しました。しかし、長期的な計画を必要とする対話のようなマルチターンタスクには、依然として苦労する可能性がある。マルチターン対話における従来の研究は、人間からのフィードバック(RLHF)法から、全ての先行対話のターンを長いコンテキストとして扱うことで、マルチターン設定へのシングルターン強化学習を拡張している。トレーニングセット内の会話は、何らかの参照ポリシーによって以前のターンが生成されるため、学習者が実際に会話ループにいる場合、低いトレーニングエラーが必ずしも良いパフォーマンスに対応しない可能性がある。これに対して,LLMにおけるマルチターンRLHFに対処する効率的なポリシー最適化手法であるRegressing the RELative Future (REFUEL)を提案する。 REFUELは1つのモデルを用いて$Q$-valueを推定し、自己生成データでトレーニングし、共変量シフトの問題に対処する。 REFUELは、反復的に収集されたデータセット上の回帰タスクのシーケンスとして、マルチターンRLHF問題をフレーム化して実装を容易にする。理論的には、REFUELはトレーニングセットによってカバーされる任意のポリシーのパフォーマンスに一致できることを証明します。実験では,Llama-3.1-70B-itを用いて,モデルと対話するユーザのシミュレーションを行った。 REFUELは、DPOやREBELといった最先端の手法を様々な設定で一貫して上回っている。さらに、わずか80億のパラメータを持つにもかかわらず、REFUELで微調整されたLlama-3-8B-itは、長いマルチターン対話においてLlama-3.1-70B-itより優れている。 REFUELの実装はhttps://github.com/ZhaolinGao/REFUEL/で、REFUELでトレーニングされたモデルはhttps://huggingface.co/Cornell-AGIで見ることができる。 Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works on multi-turn dialogue extend single-turn reinforcement learning from human feedback (RLHF) methods to the multi-turn setting by treating all prior dialogue turns as a long context. Such approaches suffer from covariate shift: the conversations in the training set have previous turns generated by some reference policy, which means that low training error may not necessarily correspond to good performance when the learner is actually in the conversation loop. In response, we introduce REgressing the RELative FUture (REFUEL), an efficient policy optimization approach designed to address multi-turn RLHF in LLMs. REFUEL employs a single model to estimate $Q$-values and trains on self-generated data, addressing the covariate shift issue. REFUEL frames the multi-turn RLHF problem as a sequence of regression tasks on iteratively collected datasets, enabling ease of implementation. Theoretically, we prove that REFUEL can match the performance of any policy covered by the training set. Empirically, we evaluate our algorithm by using Llama-3.1-70B-it to simulate a user in conversation with our model. REFUEL consistently outperforms state-of-the-art methods such as DPO and REBEL across various settings. Furthermore, despite having only 8 billion parameters, Llama-3-8B-it fine-tuned with REFUEL outperforms Llama-3.1-70B-it on long multi-turn dialogues. Implementation of REFUEL can be found at https://github.com/ZhaolinGao/REFUEL/, and models trained by REFUEL can be found at https://huggingface.co/Cornell-AGI.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# LRQ-Fact:マルチモーダルFact-CheckingにおけるLLM関連質問 LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking ( http://arxiv.org/abs/2410.04616v1 ) ライセンス: Link先を確認	Alimohammad Beigi, Bohan Jiang, Dawei Li, Tharindu Kumarage, Zhen Tan, Pouya Shaeri, Huan Liu,	(参考訳) 人間のファクトチェックには専門的なドメイン知識があり、正確な質問を定式化して情報の正確性を検証することができる。しかし、この専門家主導のアプローチは労働集約的であり、特に複雑なマルチモーダル誤報を扱う場合、スケーラブルではない。本稿では,マルチモーダルファクトチェックのための完全自動フレームワークLRQ-Factを提案する。まず、VLM(Vision-Language Models)とLLM(Large Language Models)を利用して、マルチモーダルコンテンツを探索するための包括的な質問と回答を生成する。次に、ルールベースの意思決定モジュールは、元のコンテンツと生成された質問と回答の両方を評価し、全体的な妥当性を評価する。 2つのベンチマーク実験により、LRQ-Factはマルチモーダル誤報の検出精度を向上させることが示された。さらに、異なるモデルバックボーン間の一般化性を評価し、さらなる改良のための貴重な洞察を提供する。 Human fact-checkers have specialized domain knowledge that allows them to formulate precise questions to verify information accuracy. However, this expert-driven approach is labor-intensive and is not scalable, especially when dealing with complex multimodal misinformation. In this paper, we propose a fully-automated framework, LRQ-Fact, for multimodal fact-checking. Firstly, the framework leverages Vision-Language Models (VLMs) and Large Language Models (LLMs) to generate comprehensive questions and answers for probing multimodal content. Next, a rule-based decision-maker module evaluates both the original content and the generated questions and answers to assess the overall veracity. Extensive experiments on two benchmarks show that LRQ-Fact improves detection accuracy for multimodal misinformation. Moreover, we evaluate its generalizability across different model backbones, offering valuable insights for further refinement.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 地理空間コード生成におけるLLMの評価 Evaluation of Code LLMs on Geospatial Code Generation ( http://arxiv.org/abs/2410.04617v1 ) ライセンス: Link先を確認	Piotr Gramacki, Bruno Martins, Piotr Szymański,	(参考訳) ソフトウェア開発支援ツールは、コード生成にLLM(Large Language Models)を使用した最近のアプローチで、長い間研究されてきた。これらのモデルは、データサイエンスと機械学習アプリケーションのためのPythonコードを生成することができる。 LLMは、日々の作業で生産性を上げるので、ソフトウェアエンジニアにとって役立ちます。 LLMは経験の浅いソフトウェア開発者のための"メンター"としても機能し、実行可能な学習支援としても機能する。 LLMを使った高品質なコード生成は地理空間データサイエンスにも有用である。しかし、このドメインは異なる課題を提起し、コード生成のLLMは通常、地理空間的タスクでは評価されない。本稿では,空間的タスクの選択に基づいて,コード生成モデルの評価ベンチマークを構築した。その複雑さと必要なツールに基づいて地理空間的タスクを分類した。次に,空間推論,空間データ処理,地理空間ツール利用におけるモデル機能をテストするタスクを備えたデータセットを構築した。データセットは、高品質のために手作業で作成された特定のコーディング問題で構成されている。すべての問題に対して、私たちは、生成したコードを自動的に正当性をチェックするための一連のテストシナリオを提案しました。さらに,地理空間領域におけるコード生成のための既存のコード生成LLMの選定試験を行った。私たちは、データセットと再現可能な評価コードを公開GitHubリポジトリで共有しています。我々のデータセットは、地理空間的コーディングタスクを高精度に解決できる新しいモデルの開発に貢献することを期待している。これらのモデルにより、地理空間アプリケーションに適したコーディングアシスタントの開発が可能になる。 Software development support tools have been studied for a long time, with recent approaches using Large Language Models (LLMs) for code generation. These models can generate Python code for data science and machine learning applications. LLMs are helpful for software engineers because they increase productivity in daily work. An LLM can also serve as a "mentor" for inexperienced software developers, and be a viable learning support. High-quality code generation with LLMs can also be beneficial in geospatial data science. However, this domain poses different challenges, and code generation LLMs are typically not evaluated on geospatial tasks. Here, we show how we constructed an evaluation benchmark for code generation models, based on a selection of geospatial tasks. We categorised geospatial tasks based on their complexity and required tools. Then, we created a dataset with tasks that test model capabilities in spatial reasoning, spatial data processing, and geospatial tools usage. The dataset consists of specific coding problems that were manually created for high quality. For every problem, we proposed a set of test scenarios that make it possible to automatically check the generated code for correctness. In addition, we tested a selection of existing code generation LLMs for code generation in the geospatial domain. We share our dataset and reproducible evaluation code on a public GitHub repository, arguing that this can serve as an evaluation benchmark for new LLMs in the future. Our dataset will hopefully contribute to the development new models capable of solving geospatial coding tasks with high accuracy. These models will enable the creation of coding assistants tailored for geospatial applications.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 拡散前処理による無監督ブラインド顔修復に向けて Towards Unsupervised Blind Face Restoration using Diffusion Prior ( http://arxiv.org/abs/2410.04618v1 ) ライセンス: Link先を確認	Tianshu Kuai, Sina Honari, Igor Gilitschenski, Alex Levinshtein,	(参考訳) ブラインド顔復元法は、特に教師付き学習を伴う大規模合成データセットで訓練された場合、顕著な性能を示した。これらのデータセットは、手作りの画像分解パイプラインで、低品質の顔イメージをシミュレートすることによって生成されることが多い。しかし、このような合成劣化を訓練したモデルは、目に見えない劣化の入力には対応できない。本稿では, 入力画像の集合のみを用いて, 未知の劣化, 真理の目標を含まないことで, クリーンでコンテキスト的に一貫した出力にマップする方法を学習する復元モデルを微調整する。我々は,入力画像の内容の一貫性を保ちながら,自然画像の分布から高品質な画像を生成するための事前学習拡散モデルを構築した。これらの生成された画像は、トレーニング済みの復元モデルを微調整するために擬似ターゲットとして使用される。テスト時に拡散モデルを採用する最近の多くのアプローチとは異なり、トレーニング中にのみ行うので、効率的な推論時間性能を維持することができる。広汎な実験により,提案手法は,入力内容との整合性を保ちつつ,事前学習したブラインドフェイス復元モデルの知覚品質を一貫して向上させることができることが示された。我々の最良のモデルは、合成と実世界の両方のデータセットにおける最先端の結果も達成します。 Blind face restoration methods have shown remarkable performance, particularly when trained on large-scale synthetic datasets with supervised learning. These datasets are often generated by simulating low-quality face images with a handcrafted image degradation pipeline. The models trained on such synthetic degradations, however, cannot deal with inputs of unseen degradations. In this paper, we address this issue by using only a set of input images, with unknown degradations and without ground truth targets, to fine-tune a restoration model that learns to map them to clean and contextually consistent outputs. We utilize a pre-trained diffusion model as a generative prior through which we generate high quality images from the natural image distribution while maintaining the input image content through consistency constraints. These generated images are then used as pseudo targets to fine-tune a pre-trained restoration model. Unlike many recent approaches that employ diffusion models at test time, we only do so during training and thus maintain an efficient inference-time performance. Extensive experiments show that the proposed approach can consistently improve the perceptual quality of pre-trained blind face restoration models while maintaining great consistency with the input contents. Our best model also achieves the state-of-the-art results on both synthetic and real-world datasets.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# OKAPI BM25とクロスエンコーダのアンサンブルを用いたポーランド語テキストのパス検索 Passage Retrieval of Polish Texts Using OKAPI BM25 and an Ensemble of Cross Encoders ( http://arxiv.org/abs/2410.04620v1 ) ライセンス: Link先を確認	Jakub Pokrywka,	(参考訳) Passage Retrievalは伝統的にTF-IDFやBM25のような語彙的手法に依存してきた。最近、一部のニューラルネットワークモデルはこれらの手法を性能で上回っている。しかし、これらのモデルは、大きな注釈付きデータセットの必要性や新しいドメインへの適応といった課題に直面している。本稿では,ポーランド語テキストを3つの領域(トリビア,合法,顧客支援)で検索することを含む,Poleval 2023 Task 3: Passage Retrieval Challengeに勝利したソリューションを提案する。しかし、トリビアドメインのみがトレーニングや開発データに使用された。この手法はOKAPI BM25アルゴリズムを用いて文書を検索し、公開の多言語Cross Encoders for Re rankをアンサンブルした。リランカモデルの微調整はパフォーマンスをわずかに向上させたが、トレーニングドメインのみに留まり、他のドメインでは悪化した。 Passage Retrieval has traditionally relied on lexical methods like TF-IDF and BM25. Recently, some neural network models have surpassed these methods in performance. However, these models face challenges, such as the need for large annotated datasets and adapting to new domains. This paper presents a winning solution to the Poleval 2023 Task 3: Passage Retrieval challenge, which involves retrieving passages of Polish texts in three domains: trivia, legal, and customer support. However, only the trivia domain was used for training and development data. The method used the OKAPI BM25 algorithm to retrieve documents and an ensemble of publicly available multilingual Cross Encoders for Reranking. Fine-tuning the reranker models slightly improved performance but only in the training domain, while it worsened in other domains.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 変圧器を用いたポーランド語文の句読解予測 Punctuation Prediction for Polish Texts using Transformers ( http://arxiv.org/abs/2410.04621v1 ) ライセンス: Link先を確認	Jakub Pokrywka,	(参考訳) 音声認識システムは典型的には句読点を欠いたテキストを出力する。しかし、文章の理解には句読点が不可欠である。この問題に対処するため,句読影予測モデルを開発した。本稿では, 71.44 重み付き F1 のポーランド語テキストに対する Punctuation Prediction for Polleval 2022 Task 1 の解について述べる。この方法は、競合データと外部データセットに微調整された1つのHerBERTモデルを利用する。 Speech recognition systems typically output text lacking punctuation. However, punctuation is crucial for written text comprehension. To tackle this problem, Punctuation Prediction models are developed. This paper describes a solution for Poleval 2022 Task 1: Punctuation Prediction for Polish Texts, which scores 71.44 Weighted F1. The method utilizes a single HerBERT model finetuned to the competition data and an external dataset.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# ディバイドとコンカーによる大規模言語モデル制御 Control Large Language Models via Divide and Conquer ( http://arxiv.org/abs/2410.04628v1 ) ライセンス: Link先を確認	Bingxuan Li, Yiwei Wang, Tao Meng, Kai-Wei Chang, Nanyun Peng,	(参考訳) 本稿では,Lexically Constrained Generation(LCG)に着目し,大規模言語モデル(LLM)のプロンプトベース制御による制御可能生成について検討する。我々は,レキシカル制約を満たすためのLLMの性能を,プロンプトベース制御により体系的に評価し,下流アプリケーションでの有効性を検証した。我々は、LLMは、プロンプトベース制御による語彙制約を一貫して満たす上で、重大な課題に直面していると結論付けた。 1) LLMが入力内の特定の位置に現れる制約を満たす傾向にある位置バイアス,(2) LLMの制御に最小限の影響を与えるデコードパラメータに対する応答性が低いこと,(3) 特定の制約(複合語など)の固有の複雑さに対処する上での苦労など,LCGにおけるLCMの3つの重要な制限を特定した。これらの課題に対処するため、白箱と黒箱のLCMに有効である除算・分数生成戦略を導入し、LCGタスクにおけるLCMの性能を向上させるとともに、最も困難なLCGタスクにおいて、90%以上の成功率の向上を示す。提案手法は,LCGにおけるLCMの性能に関する貴重な知見を即時制御で提供し,より高度でカスタマイズされたテキスト生成アプリケーションへの経路を提供する。 This paper investigates controllable generation for large language models (LLMs) with prompt-based control, focusing on Lexically Constrained Generation (LCG). We systematically evaluate the performance of LLMs on satisfying lexical constraints with prompt-based control, as well as their efficacy in downstream applications. We conclude that LLMs face significant challenges in consistently satisfying lexical constraints with prompt-based control. We identified three key limitations of LLMs for LCG, including (1) position bias, where LLMs tend to satisfy constraints that appear in specific positions within the input; (2) low responsiveness to decoding parameters, which render minimal impact on control of LLMs; and (3) struggle with handling the inherent complexity of certain constraints (e.g., compound words). To address these issues, we introduce a Divide and Conquer Generation strategy, effective for both white-box and black-box LLMs, to enhance LLMs performance in LCG tasks, which demonstrates over 90% improvement on success rate in the most challenging LCG task. Our analysis provides valuable insights into the performance of LLMs in LCG with prompt-based control, and our proposed strategy offers a pathway to more sophisticated and customized text generation applications.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# Unitary Closed Timelike Curves can Solve all of NP Unitary Closed Timelike Curves can Solve all of NP ( http://arxiv.org/abs/2410.04630v1 ) ライセンス: Link先を確認	Omri Shmueli,	(参考訳) 量子力学と一般相対性理論の交わりに生まれ、不定因果構造は時間の連続体において、いくつかの事象はそれらの間に固有の因果順序を持たないという考え方である。 Oreshkov, Costa and Brukner (Nature Communications, 2012)によって導入されたプロセス行列は、不定因果構造を持つ量子情報処理を定義する。 Araujo et al (Physical Review A, 2017) は、プロセス行列の計算複雑性を定義し、多項式時間プロセス行列計算が標準多項式時間量子計算と等価であることを示し、選択後の閉時間曲線 (CTCs) へのアクセスを弱めることで、$\textit{linear}$に制限されることを示した。 Araujoらは、効率的なプロセス行列計算のための複雑性クラスを$\mathbf{BQP}_{\ell CTC}$(これは自明に$\mathbf{BQP}$を含む)と定義し、$\mathbf{BQP}_{\ell CTC}$が$\mathbf{BQP}_{\ell CTC}$の外にある計算タスクを含むかどうかというオープンな疑問を提起した。この研究において、この開問題は広く信じられている硬さの仮定の下で解決し、$\mathbf{NP} \subseteq \mathbf{BQP}_{\ell CTC}$ を示す。私たちの解は、(Araujo et al , Quantum, 2017) によってより制限されたプロセス行列のサブセットに捕えられ、(1) は任意のプロセス行列よりも物理的であると推測され、(2) は$\textit{unitary}$ (特に線形) のポスト選択 CTC にアクセスする多項式時間量子計算と等価である。概念的には、これまでの主張では、非線形性はCTCが$\mathbf{NP}$を解くのを可能にするものである、という信念は偽であり、純粋プロセス行列が物理的かどうかを理解することの重要性を高める。 Born in the intersection between quantum mechanics and general relativity, indefinite causal structure is the idea that in the continuum of time, some sets of events do not have an inherent causal order between them. Process matrices, introduced by Oreshkov, Costa and Brukner (Nature Communications, 2012), define quantum information processing with indefinite causal structure -- a generalization of the operations allowed in standard quantum information processing, and to date, are the most studied such generalization. Araujo et al. (Physical Review A, 2017) defined the computational complexity of process matrices, and showed that polynomial-time process matrix computation is equivalent to standard polynomial-time quantum computation with access to a weakening of post-selection Closed Timelike Curves (CTCs), that are restricted to be $\textit{linear}$. Araujo et al. accordingly defined the complexity class for efficient process matrix computation as $\mathbf{BQP}_{\ell CTC}$ (which trivially contains $\mathbf{BQP}$), and posed the open question of whether $\mathbf{BQP}_{\ell CTC}$ contains computational tasks that are outside $\mathbf{BQP}$. In this work we solve this open question under a widely believed hardness assumption, by showing that $\mathbf{NP} \subseteq \mathbf{BQP}_{\ell CTC}$. Our solution is captured by an even more restricted subset of process matrices that are purifiable (Araujo et al., Quantum, 2017), which (1) is conjectured more likely to be physical than arbitrary process matrices, and (2) is equivalent to polynomial-time quantum computation with access to $\textit{unitary}$ (which are in particular linear) post-selection CTCs. Conceptually, our work shows that the previously held belief, that non-linearity is what enables CTCs to solve $\mathbf{NP}$, is false, and raises the importance of understanding whether purifiable process matrices are physical or not.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# DeepLTL: 複雑なTL仕様を効果的に満足する学習 DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications ( http://arxiv.org/abs/2410.04631v1 ) ライセンス: Link先を確認	Mathias Jackermeier, Alessandro Abate,	(参考訳) リニア時間論理(LTL)は、強化学習(RL)において、複雑で時間的に拡張されたタスクを特定するための強力なフォーマリズムとして最近採用されている。しかし、学習中に観察されない任意の仕様を効率的に満たす学習方針は、依然として難しい問題である。既存のアプローチはいくつかの欠点に悩まされており、それらはLTLの有限水平フラグメントにのみ適用でき、最適以下の解に制限され、安全制約を適切に扱えない。本研究では,これらの問題に対処するための新しい学習手法を提案する。提案手法は,LTL仕様のセマンティクスを明示的に表現したB\「内オートマティクス」の構造を利用して,所望の式を満たす真理代入の順序に規定されたポリシーを学習する。様々な離散領域および連続領域の実験により、我々のアプローチは、有限および無限水平仕様の範囲を満たすことができ、満足度確率と効率の両方の観点から既存の手法より優れていることを示した。 Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B\"uchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# ドメイン適応に基づく音声感情認識のための言語間メタラーニング手法 A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition ( http://arxiv.org/abs/2410.04633v1 ) ライセンス: Link先を確認	David-Gabriel Ion, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Florin Pop, Mihaela-Claudia Cercel,	(参考訳) 最高のパフォーマンスの音声モデルは、彼らが目的とする言語で大量のデータに基づいて訓練される。しかし、ほとんどの言語はスパースデータを持ち、トレーニングモデルを困難にしている。このデータ不足は、音声の感情認識においてさらに顕著である。本研究は、限られたデータ、特に音声感情認識のためのモデルの性能について考察する。メタラーニングは、数発の学習を改善することに特化したものだ。その結果,メタラーニング技術は,音声の感情認識,アクセント認識,人物識別に応用された。そこで本研究では,多段階メタ学習法に対する一連の改良を提案する。メタ学習アルゴリズムの計算コストが高いため、より小さなモデルに焦点を当てた他の研究とは異なり、我々はより実践的なアプローチを取る。トレーニング済みの大きなバックボーンとプロトタイプネットワークを組み込んで,本手法をより実現し,適用可能にする。私たちの最も注目すべき貢献は、メタテスト中の微調整技術の改善です。この結果は、他のいくつかの研究からの漸進的な改善とともに、4方向5ショット学習の文脈ではトレーニングや検証の分割に含まれないギリシャ語およびルーマニア語の感情認識データセットに対して、83.78%と56.30%の精度スコアを達成するのに役立った。 Best-performing speech models are trained on large amounts of data in the language they are meant to work for. However, most languages have sparse data, making training models challenging. This shortage of data is even more prevalent in speech emotion recognition. Our work explores the model's performance in limited data, specifically for speech emotion recognition. Meta-learning specializes in improving the few-shot learning. As a result, we employ meta-learning techniques on speech emotion recognition tasks, accent recognition, and person identification. To this end, we propose a series of improvements over the multistage meta-learning method. Unlike other works focusing on smaller models due to the high computational cost of meta-learning algorithms, we take a more practical approach. We incorporate a large pre-trained backbone and a prototypical network, making our methods more feasible and applicable. Our most notable contribution is an improved fine-tuning technique during meta-testing that significantly boosts the performance on out-of-distribution datasets. This result, together with incremental improvements from several other works, helped us achieve accuracy scores of 83.78% and 56.30% for Greek and Romanian speech emotion recognition datasets not included in the training or validation splits in the context of 4-way 5-shot learning.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 何が欲しいのか? テキストと画像のモデルにおけるコンセプトアソシエーションの調査 Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models ( http://arxiv.org/abs/2410.04634v1 ) ライセンス: Link先を確認	Salma Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, Hanspeter Pfister,	(参考訳) テキスト・ツー・イメージ(T2I)モデルは、実生活に影響を及ぼすアプリケーションにますます使われている。そのため、望ましいタスクに適した画像を生成するために、これらのモデルを監査する必要性が高まっている。しかし,プロンプトと生成コンテンツとの関係を人間に理解可能な方法で体系的に検査することは依然として困難である。そこで本稿では,解釈可能な概念とメトリクスを用いて,視覚言語モデルの条件分布を特徴付けるフレームワークである \emph{Concept2Concept} を提案する。このキャラクタリゼーションにより、モデルやプロンプトデータセットの監査にフレームワークを使用することができます。本研究では,ユーザ定義分布や経験的実世界分布など,プロンプトの条件分布に関するいくつかのケーススタディについて述べる。最後に、非技術的エンドユーザーによる使用を容易にするオープンソースのインタラクティブ可視化ツールであるConcept2Conceptを実装した。警告: 本論文では, CSAM や NSFW などの有害物質について論じる。 Text-to-image (T2I) models are increasingly used in impactful real-life applications. As such, there is a growing need to audit these models to ensure that they generate desirable, task-appropriate images. However, systematically inspecting the associations between prompts and generated content in a human-understandable way remains challenging. To address this, we propose \emph{Concept2Concept}, a framework where we characterize conditional distributions of vision language models using interpretable concepts and metrics that can be defined in terms of these concepts. This characterization allows us to use our framework to audit models and prompt-datasets. To demonstrate, we investigate several case studies of conditional distributions of prompts, such as user defined distributions or empirical, real world distributions. Lastly, we implement Concept2Concept as an open-source interactive visualization tool facilitating use by non-technical end-users. Warning: This paper contains discussions of harmful content, including CSAM and NSFW material, which may be disturbing to some readers.	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 医療マイクロ波ラジオメータ(MWR)乳がん検出のための多段階自己コントラスト学習 Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection ( http://arxiv.org/abs/2410.04636v1 ) ライセンス: Link先を確認	Christoforos Galazis, Huiyi Wu, Igor Goryanin,	(参考訳) 乳がんの検出・モニタリング技術の強化が医療の主目的であり、イノベーティブなイメージング技術や診断アプローチの必要性が増している。本研究は,マイクロ波ラジオメトリー(MWR)乳がん検出の応用に適した,新しい多層自己造影モデルを提案する。本手法は, 局所MWR (L-MWR), 局所MWR (R-MWR), およびGlobal-MWR (G-MWR) の3つの異なるモデルを含む。これらのモデルは、各分析レベルで生成された自己コントラストデータを活用して検出能力を向上するジョイント-MWR(J-MWR)ネットワークを介して結合的に統合される。女性患者4,932例のデータセットを用いて,提案モデルの有効性を実証した。特に、J-MWRモデルはマシューズ相関係数 0.74$\pm$ 0.018 を達成し、既存のMWRニューラルネットワークやコントラスト法を上回り、自分自身を区別する。これらの結果は,MWRに基づく乳がん検出プロセスの診断精度と一般化性の向上において,自己コントラスト学習技術の有意義な可能性を浮き彫りにした。このような進歩は、さらなる調査と臨床の努力を約束する。ソースコードは、https://github.com/cgalaz01/self_contrastive_mwrで入手できる。 The pursuit of enhanced breast cancer detection and monitoring techniques is a paramount healthcare objective, driving the need for innovative imaging technologies and diagnostic approaches. This study introduces a novel multi-tiered self-contrastive model tailored for the application of microwave radiometry (MWR) breast cancer detection. Our approach encompasses three distinct models: Local-MWR (L-MWR), Regional-MWR (R-MWR), and Global-MWR (G-MWR), each engineered to analyze varying sub-regional comparisons within the breasts. These models are cohesively integrated through the Joint-MWR (J-MWR) network, which leverages the self-contrastive data generated at each analytical level to enhance detection capabilities. Employing a dataset comprising 4,932 cases of female patients, our research showcases the effectiveness of our proposed models. Notably, the J-MWR model distinguishes itself by achieving a Matthews correlation coefficient of 0.74 $\pm$ 0.018, surpassing existing MWR neural networks and contrastive methods. These results highlight the significant potential of self-contrastive learning techniques in improving both the diagnostic accuracy and generalizability of MWR-based breast cancer detection processes. Such advancements hold considerable promise for further investigative and clinical endeavors. The source code is available at: https://github.com/cgalaz01/self_contrastive_mwr	翻訳日:2024-11-02 06:26:32 公開日:2024-10-06
# 良性オーバーフィッティングによる確率弱相関一般化 Provable Weak-to-Strong Generalization via Benign Overfitting ( http://arxiv.org/abs/2410.04638v1 ) ライセンス: Link先を確認	David X. Wu, Anant Sahai,	(参考訳) 機械学習における古典的な教師学生モデルは、強い教師が弱い生徒を監督し、生徒の能力を向上させることを実証している。その代わりに、弱い教師が不完全な擬似ラベルを持つ強い生徒を監督する逆の状況を考える。このパラダイムはバーンズらによって最近紹介され、'emph{weak-to-strong generalization} と呼ばれた。弱教師の擬似ラベルが確率的推測のように漸近的に漸近的に現れるガウス共変体を用いたスタイリングオーバーパラメータ化スパイク共分散モデルにおいて、二分法と多ラベル分類の弱い対強一般化を理論的に検討する。これらの仮定に基づき, 弱監督後の強い学生の一般化の漸近位相を, 1) 一般化の成功と(2) ランダムな推算の2つの相を確実に同定する。我々の手法は最終的には弱いクラスから強いクラスに拡張されるべきである。そのために、関係のあるガウスの最大値に対して、より低い尾の不等式を証明し、これは独立な関心を持つかもしれない。マルチラベル設定を理解することで,ロジットの使用価値が向上する。 The classic teacher-student model in machine learning posits that a strong teacher supervises a weak student to improve the student's capabilities. We instead consider the inverted situation, where a weak teacher supervises a strong student with imperfect pseudolabels. This paradigm was recently brought forth by Burns et al.'23 and termed \emph{weak-to-strong generalization}. We theoretically investigate weak-to-strong generalization for binary and multilabel classification in a stylized overparameterized spiked covariance model with Gaussian covariates where the weak teacher's pseudolabels are asymptotically like random guessing. Under these assumptions, we provably identify two asymptotic phases of the strong student's generalization after weak supervision: (1) successful generalization and (2) random guessing. Our techniques should eventually extend to weak-to-strong multiclass classification. Towards doing so, we prove a tight lower tail inequality for the maximum of correlated Gaussians, which may be of independent interest. Understanding the multilabel setting reinforces the value of using logits for weak supervision when they are available.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# Radial Basis Operator Networks Radial Basis Operator Networks ( http://arxiv.org/abs/2410.04639v1 ) ライセンス: Link先を確認	Jason Kurz, Sean Oughton, Shitao Liu,	(参考訳) 作用素ネットワークは、関数空間のような無限次元空間間の写像を提供する非線形作用素を近似するように設計されている。これらのネットワークは、機械学習においてますます重要な役割を担い、その最も注目すべき貢献は科学コンピューティングの分野である。その重要性は、科学的な応用でしばしば遭遇するタイプのデータを扱う能力に起因している。例えば、気候モデリングや流体力学では、入力データは典型的には離散化された連続体(温度分布や速度場など)から構成される。複雑な入力を受け付けるように調整した場合に、時間領域と周波数領域の両方で演算子を学習できる最初の演算子ネットワークとして、重要な進歩を示す放射基底演算子ネットワーク(RBON)を導入する。 RBONは、小さな単一の隠蔽層構造にもかかわらず、いくつかのベンチマークケースでは、1ドル未満のOOD(in-of-distriion data)とout-of-distriion data)に対して、L^2$の相対的テスト誤差が小さい。さらにRBONは、トレーニングデータとは全く異なる関数クラスからOODデータに小さなエラーを保持する。 Operator networks are designed to approximate nonlinear operators, which provide mappings between infinite-dimensional spaces such as function spaces. These networks are playing an increasingly important role in machine learning, with their most notable contributions in the field of scientific computing. Their significance stems from their ability to handle the type of data often encountered in scientific applications. For instance, in climate modeling or fluid dynamics, input data typically consists of discretized continuous fields (like temperature distributions or velocity fields). We introduce the radial basis operator network (RBON), which represents a significant advancement as the first operator network capable of learning an operator in both the time domain and frequency domain when adjusted to accept complex-valued inputs. Despite the small, single hidden-layer structure, the RBON boasts small $L^2$ relative test error for both in- and out-of-distribution data (OOD) of less than $1\times 10^{-7}$ in some benchmark cases. Moreover, the RBON maintains small error on OOD data from entirely different function classes from the training data.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# ホログラムRG流れにおける相互情報と相関測度 Mutual information and correlation measures in holographic RG flows ( http://arxiv.org/abs/2410.04645v1 ) ライセンス: Link先を確認	Iftekher S. Chowdhury, Binay Prakash Akhouri, Shah Haque, Eric Howard,	(参考訳) 本稿では, 相転移に伴うホログラムRG流の相互情報, 絡み合いの負性, 多粒子相関の挙動について検討する。相互情報は、サブシステム間の全相関のUV有限測度を提供する一方で、絡み合いの負性度と多部相関は、量子構造、特に臨界点に近い点についてより詳細な洞察を与える。数値シミュレーションにより、相互情報は比較的スムーズなままであるが、絡み合いの負性度と多部相関は相転移付近で急激な変化を示す。これらの結果は、強い結合量子系における臨界現象のシグナル伝達において、多部相関が支配的な役割を果たすという仮説を支持している。 This paper investigates the behavior of mutual information, entanglement negativity, and multipartite correlations in holographic RG flows, particularly during phase transitions. Mutual information provides a UV-finite measure of total correlations between subsystems, while entanglement negativity and multipartite correlations offer finer insights into quantum structures, especially near critical points. Through numerical simulations, we show that while mutual information remains relatively smooth, both entanglement negativity and multipartite correlations exhibit sharp changes near phase transitions. These results support the hypothesis that multipartite correlations play a dominant role in signaling critical phenomena in strongly coupled quantum systems.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# Mode-GS:ロバスト・グラウンド・ビュー・シーンレンダリングのための単眼の奥行きガイド付き3Dガウス・スプレイティング Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering ( http://arxiv.org/abs/2410.04646v1 ) ライセンス: Link先を確認	Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon,	(参考訳) 地上ロボット軌道データセットのための新しいビューレンダリングアルゴリズムであるMode-GSを提案する。提案手法は,既存の3次元ガウススプラッティングアルゴリズムの限界を克服する目的で,アンカー付きガウススプラッターを用いている。従来のニューラルレンダリング手法では、シーンの複雑さとマルチビューの観察が不十分なため、深刻なスプラットの漂流に悩まされ、地上ロボットデータセットの真の幾何学上のスプラットの修正に失敗する可能性がある。本手法は, 単眼深度から画素アラインアンカーを統合し, 残留型ガウスデコーダを用いてこれらのアンカーの周囲にガウススプレートを生成する。単分子深度固有のスケールのあいまいさに対処するために、ビュー毎の深さスケールでアンカーをパラメータ化し、オンラインスケールキャリブレーションにスケール一貫性のある深さ損失を用いる。提案手法は,PSNR,SSIM,LPIPSの計測値に基づいて,自由軌道パターンを持つ地上でのレンダリング性能を向上し,R3LIVE odometry データセットと Tanks and Temples データセット上で最先端のレンダリング性能を実現する。 We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# AdaptDiff:網膜血管分節に対する弱条件セマンティック拡散によるクロスモーダルドメイン適応 AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation ( http://arxiv.org/abs/2410.04648v1 ) ライセンス: Link先を確認	Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz,	(参考訳) 深層学習は医用画像のセグメンテーションにおいて顕著な性能を示した。しかし、その約束に反して、深層学習は、固有のデータ分散シフトやドメイン適応をガイドする手動アノテーションの欠如によって、目に見えない領域に効果的に移行できないため、実際に多くの課題を抱えている。この問題に対処するため,眼底撮影(FP)で訓練された網膜血管分割ネットワークを,手動のラベルを使わずに未確認のモダリティ(例えば OCT-A)に対して満足な結果が得られる,AdaptDiff という非教師なし領域適応(UDA)手法を提案する。対象とするすべてのドメインに対して、まずソースドメインでトレーニングされたセグメンテーションモデルを採用して、擬似ラベルを作成します。これらの擬似ラベルを用いて,対象領域分布を表す条件付き意味拡散確率モデルを訓練する。実験により,低品質な擬似ラベルであっても,拡散モデルは条件付きセマンティック情報をキャプチャできることがわかった。その後、ソースドメインからバイナリ血管マスクを用いてターゲットドメインをサンプリングし、ペア化されたデータ、すなわちバイナリ血管マップ上に条件付きされたターゲットドメイン合成画像を取得する。最後に、合成ペアデータを用いて、事前訓練されたセグメンテーションネットワークを微調整し、ドメインギャップを緩和する。本稿では,AdaptDiffが3つの異なるモードで利用可能な7つのデータセットに対して有効であることを示す。その結果,全データセットのセグメンテーション性能は大幅に向上した。私たちのコードはhttps://github.com/DeweiHu/AdaptDiff.comで公開されています。 Deep learning has shown remarkable performance in medical image segmentation. However, despite its promise, deep learning has many challenges in practice due to its inability to effectively transition to unseen domains, caused by the inherent data distribution shift and the lack of manual annotations to guide domain adaptation. To tackle this problem, we present an unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography (FP) to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels. For all our target domains, we first adopt a segmentation model trained on the source domain to create pseudo-labels. With these pseudo-labels, we train a conditional semantic diffusion probabilistic model to represent the target domain distribution. Experimentally, we show that even with low quality pseudo-labels, the diffusion model can still capture the conditional semantic information. Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i.e., target domain synthetic images conditioned on the binary vessel map. Finally, we fine-tune the pre-trained segmentation network using the synthetic paired data to mitigate the domain gap. We assess the effectiveness of AdaptDiff on seven publicly available datasets across three distinct modalities. Our results demonstrate a significant improvement in segmentation performance across all unseen datasets. Our code is publicly available at https://github.com/DeweiHu/AdaptDiff.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# 空間認識型AIのためのマルチモーダル3次元融合とその場学習 Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI ( http://arxiv.org/abs/2410.04652v1 ) ライセンス: Link先を確認	Chengyuan Xu, Radha Kumaran, Noah Stier, Kangyou Yu, Tobias Höllerer,	(参考訳) 拡張現実における仮想世界と物理世界のシームレスな統合は、物理的な環境を意味的に「理解する」システムから恩恵を受ける。 AR研究は、コンテキスト認識の可能性に注目し、様々なオブジェクトレベルのインタラクションに3D環境におけるセマンティクスを活用する新しい能力を実証してきた。一方、コンピュータビジョンコミュニティは、自律的なタスクに対する環境認識を強化するために、ニューラルネットワークによる理解を飛躍させた。本研究では,幾何学的表現と意味的知識と言語的知識を一体化して,物理オブジェクトを含むユーザガイド型機械学習を実現するマルチモーダル3Dオブジェクト表現を提案する。私たちはまず、CLIP視覚言語機能を環境とオブジェクトモデルに融合させることで、ARに言語的理解をもたらす高速なマルチモーダル3D再構築パイプラインを提示する。次に,「in-situ」機械学習を提案する。これはマルチモーダル表現と連動して,空間的・言語的に意味のある方法で,物理空間やオブジェクトと対話するための新しいツールやインターフェースを提供する。 Magic Leap 2上の2つの現実世界のARアプリケーションを通して提案システムの有用性を実証する。 a) 自然言語と物理環境における空間探索ロ経時的に物の変化を追跡するインテリジェントな在庫システムまた、空間的に認識されたAIのさらなる探索と研究を促進するために、完全な実装とデモデータを(https://github.com/cy-xu/spatially_aware_AI)で利用可能にしています。 Seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment. AR research has long focused on the potential of context awareness, demonstrating novel capabilities that leverage the semantics in the 3D environment for various object-level interactions. Meanwhile, the computer vision community has made leaps in neural vision-language understanding to enhance environment perception for autonomous tasks. In this work, we introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation, enabling user-guided machine learning involving physical objects. We first present a fast multimodal 3D reconstruction pipeline that brings linguistic understanding to AR by fusing CLIP vision-language features into the environment and object models. We then propose "in-situ" machine learning, which, in conjunction with the multimodal representation, enables new tools and interfaces for users to interact with physical spaces and objects in a spatially and linguistically meaningful manner. We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time. We also make our full implementation and demo data available at (https://github.com/cy-xu/spatially_aware_AI) to encourage further exploration and research in spatially aware AI.	翻訳日:2024-11-02 02:47:36 公開日:2024-10-06
# Taylor Unswift氏:Taylorの拡張による大規模言語モデルのためのセキュアなウェイトリリース Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion ( http://arxiv.org/abs/2410.05331v1 ) ライセンス: Link先を確認	Guanchu Wang, Yu-Neng Chuang, Ruixiang Tang, Shaochen Zhong, Jiayi Yuan, Hongye Jin, Zirui Liu, Vipin Chaudhary, Shuai Xu, James Caverlee, Xia Hu,	(参考訳) リリースされた大きな言語モデル(LLM)のセキュリティを確保することは、既存のメカニズムが所有権を侵害するか、データプライバシの懸念を生じさせるため、大きなジレンマとなる。このジレンマに対処するために、TaylorMLPを導入し、LLMの所有権を保護し、それらの乱用を防ぐ。具体的には、TaylorMLP は LLM の重みを Taylor-Series のパラメータに変換することで LLM の所有権を保っている。オリジナルのウェイトをリリースするのではなく、開発者はTaylor-Seriesパラメータをユーザにリリースすることで、LCMのセキュリティを確保することができる。さらに、TaylorMLPは、生成速度を調整することにより、LCMの悪用を防止することができる。テイラー系列の項を増やすことにより、保護されたLLMに対して低速トークン生成を誘導することができる。この意図的な遅延は、LLM開発者がモデルの大規模無許可使用を防止するのに役立つ。 5つのデータセットと3つのLLMアーキテクチャの実証実験により、TaylorMLPは遅延を4倍以上増加させ、トークンが元のLLMと正確に一致することを示した。その後の防御実験では、TaylorMLPが下流データセットに基づいた重み値の再構築を効果的に防いでいることが確認された。 Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. Specifically, TaylorMLP preserves the ownership of LLMs by transforming the weights of LLMs into parameters of Taylor-series. Instead of releasing the original weights, developers can release the Taylor-series parameters with users, thereby ensuring the security of LLMs. Moreover, TaylorMLP can prevent abuse of LLMs by adjusting the generation speed. It can induce low-speed token generation for the protected LLMs by increasing the terms in the Taylor-series. This intentional delay helps LLM developers prevent potential large-scale unauthorized uses of their models. Empirical experiments across five datasets and three LLM architectures demonstrate that TaylorMLP induces over 4x increase in latency, producing the tokens precisely matched with original LLMs. Subsequent defensive experiments further confirm that TaylorMLP effectively prevents users from reconstructing the weight values based on downstream datasets.	翻訳日:2024-11-01 19:27:19 公開日:2024-10-06
# VPI-Mlogs: ペトロシクス応用のためのWebベースの機械学習ソリューション VPI-Mlogs: A web-based machine learning solution for applications in petrophysics ( http://arxiv.org/abs/2410.05332v1 ) ライセンス: Link先を確認	Anh Tuan Nguyen,	(参考訳) 機械学習は、データサイエンス分野の重要な部分である。石油物理学では、機械学習アルゴリズムと応用が広くアプローチされている。この文脈において、ベトナム石油研究所(VPI)は、ログ予測の欠如、破壊ゾーン、破壊密度予測など、いくつかの効果的な予測モデルを調査し、展開してきた。当社のソリューションのひとつとして、VPI-MLogsは、データ前処理、探索データ分析、可視化、モデル実行を統合する、Webベースのデプロイメントプラットフォームです。最も人気のあるデータ分析言語であるPythonを使って、このアプローチは、ペトロフィジカルログセクションを扱う強力なツールを提供する。このソリューションは、一般的な知識と石油物理学の洞察のギャップを狭めるのに役立つ。この記事では、石油物理データを把握するための多くのソリューションを統合するWebベースのアプリケーションに焦点を当てる。 Machine learning is an important part of the data science field. In petrophysics, machine learning algorithms and applications have been widely approached. In this context, Vietnam Petroleum Institute (VPI) has researched and deployed several effective prediction models, namely missing log prediction, fracture zone and fracture density forecast, etc. As one of our solutions, VPI-MLogs is a web-based deployment platform which integrates data preprocessing, exploratory data analysis, visualisation and model execution. Using the most popular data analysis programming language, Python, this approach gives users a powerful tool to deal with the petrophysical logs section. The solution helps to narrow the gap between common knowledge and petrophysics insights. This article will focus on the web-based application which integrates many solutions to grasp petrophysical data.	翻訳日:2024-11-01 19:27:19 公開日:2024-10-06
# 医療インフォマティクスのためのグローバルサイバーセキュリティ標準化フレームワーク A Global Cybersecurity Standardization Framework for Healthcare Informatics ( http://arxiv.org/abs/2410.05333v1 ) ライセンス: Link先を確認	Kishu Gupta, Vinaytosh Mishra, Aaisha Makkar,	(参考訳) ヘルスケアは、ポスト新型コロナウイルス(COVID-19)の世界でデジタル化が進んでいるのを目撃している。物の医療用インターネットやウェアラブルデバイスといった技術は、どこからでもクラウド上で利用できる大量のデータを生成している。このデータは、診断、予後、さらには病気の治療のための高度な人工知能技術を用いて分析することができる。この進歩には、保護された健康情報(PHI)を保護し、保護する大きなリスクが伴う。 PHIを維持するための一般的な規則は包括的でも実装も容易でもない。調査はまず、プライバシとセキュリティに不可欠な20のアクティビティを特定し、次にこれらを5つの同種カテゴリに分類する。 $\complement_1$ (Policy and Compliance Management), $\complement_2$ (Employee Training and Awareness), $\complement_3$ (Data Protection and Privacy Control), $\complement_4$ (Monitoring and Response), $\complement_5$ (Technology and Infrastructure Security)。このフレームワークは、Delphi Methodを使用して、アクティビティ、分類基準、優先順位付けを識別した。分類は, 雑音を伴うアプリケーションの密度に基づく空間クラスタリング(DBSCAN)に基づいており, 理想解(TOPSIS)と類似性による選好の順序付け手法を用いて優先順位付けを行う。その結果、$\complement_3$アクティビティは実装において第一に優先され、$\complement_1$と$\complement_2$アクティビティが続くと結論付けている。最後に、$\complement_4$と$\complement_5$を実装する必要がある。セキュリティとプライバシに関連する特定されたクラスタ化された医療活動の優先順位付けは、医療政策立案者や医療情報学の専門家にとって有用である。 Healthcare has witnessed an increased digitalization in the post-COVID world. Technologies such as the medical internet of things and wearable devices are generating a plethora of data available on the cloud anytime from anywhere. This data can be analyzed using advanced artificial intelligence techniques for diagnosis, prognosis, or even treatment of disease. This advancement comes with a major risk to protecting and securing protected health information (PHI). The prevailing regulations for preserving PHI are neither comprehensive nor easy to implement. The study first identifies twenty activities crucial for privacy and security, then categorizes them into five homogeneous categories namely: $\complement_1$ (Policy and Compliance Management), $\complement_2$ (Employee Training and Awareness), $\complement_3$ (Data Protection and Privacy Control), $\complement_4$ (Monitoring and Response), and $\complement_5$ (Technology and Infrastructure Security) and prioritizes these categories to provide a framework for the implementation of privacy and security in a wise manner. The framework utilized the Delphi Method to identify activities, criteria for categorization, and prioritization. Categorization is based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and prioritization is performed using a Technique for Order of Preference by Similarity to the Ideal Solution (TOPSIS). The outcomes conclude that $\complement_3$ activities should be given first preference in implementation and followed by $\complement_1$ and $\complement_2$ activities. Finally, $\complement_4$ and $\complement_5$ should be implemented. The prioritized view of identified clustered healthcare activities related to security and privacy, are useful for healthcare policymakers and healthcare informatics professionals.	翻訳日:2024-11-01 19:27:19 公開日:2024-10-06
# TA3: マシンラーニングモデルにおける敵攻撃に対するテスト TA3: Testing Against Adversarial Attacks on Machine Learning Models ( http://arxiv.org/abs/2410.05334v1 ) ライセンス: Link先を確認	Yuanzhe Jin, Min Chen,	(参考訳) 敵攻撃は、機械学習(ML)モデルを多くのアプリケーションに展開する上で大きな脅威である。このような攻撃に対してMLモデルをテストすることは、MLモデルを評価し改善するための重要なステップになりつつある。本稿では,TA3(Testing Against Adversarial Attacks)のワークフローを支援する対話型システムの設計と開発について報告する。特に、TA3では、Human-in-the-loop (HITL) は人間の操縦による攻撃シミュレーションと可視化による攻撃影響評価を可能にする。 TA3の現行バージョンは、One Pixel Attack Methodに基づく敵攻撃に対する決定ツリーモデルのテストに重点を置いているが、MLテストにおけるHITLの重要性と、他の種類のMLモデルや他の種類の敵攻撃に対するMLテストワークフローへのHITLの潜在的な適用を実証している。 Adversarial attacks are major threats to the deployment of machine learning (ML) models in many applications. Testing ML models against such attacks is becoming an essential step for evaluating and improving ML models. In this paper, we report the design and development of an interactive system for aiding the workflow of Testing Against Adversarial Attacks (TA3). In particular, with TA3, human-in-the-loop (HITL) enables human-steered attack simulation and visualization-assisted attack impact evaluation. While the current version of TA3 focuses on testing decision tree models against adversarial attacks based on the One Pixel Attack Method, it demonstrates the importance of HITL in ML testing and the potential application of HITL to the ML testing workflows for other types of ML models and other types of adversarial attacks.	翻訳日:2024-11-01 19:27:19 公開日:2024-10-06
# GreenLight-Gym:温室作物生産管理のための強化学習ベンチマーク環境 GreenLight-Gym: A Reinforcement Learning Benchmark Environment for Greenhouse Crop Production Control ( http://arxiv.org/abs/2410.05336v1 ) ライセンス: Link先を確認	Bart van Laatum, Eldert J. van Henten, Sjoerd Boersma,	(参考訳) 温室作物生産システムの制御は、作物、屋内および屋外の気候、経済の間の不確実かつ非線形なダイナミクスのために複雑な作業である。熟練農家の減少は、自律的な温室制御システムの開発を必要としている。強化学習(Reinforcement Learning, RL)は、温室管理を自動化するための制御ポリシーを学ぶための有望なアプローチである。 RLは、経済的な報酬関数によって誘導されながら、温室のモデルとの相互作用を通じて制御ポリシーを最適化する。しかし、実世界のシステムへの適用は、モデルと実世界の力学の相違により制限されている。さらに、RLコントローラは、特にモデルが作物の成長に対する制約違反の悪影響を適切に捉えていない場合において、主目的を最適化しながら、状態制約を維持するのに苦労する可能性がある。また、新しい状態への一般化は、例えば、目に見えない気象軌道によるものであり、RLに基づく温室効果ガスの制御において過小評価されている。この研究は3つの重要な貢献を通じてこれらの課題に対処する。まず,現在最先端の温室モデルであるGreenLight上でRLアルゴリズムをトレーニングし,評価するために設計された,最初のオープンソース環境であるGreenLight-Gymを紹介する。 GreenLight-Gymは、コミュニティがRLベースの制御方法論をベンチマークできるようにする。第二に、州の境界を強制するために、乗法または加法ペナルティを用いる2つの報酬形成アプローチを比較する。加法的なペナルティはより安定したトレーニングを達成し、州の制約に順応し、乗法的なペナルティは極端に高い利益をもたらす。最後に,不随意トレーニングと気象データによるRL性能の評価を行い,予測不可能な条件への一般化を実証した。我々の環境と実験スクリプトはオープンソースであり、学習に基づく温室制御に関する革新的な研究を促進する。 Controlling greenhouse crop production systems is a complex task due to uncertain and non-linear dynamics between crops, indoor and outdoor climate, and economics. The declining number of skilled growers necessitates the development of autonomous greenhouse control systems. Reinforcement Learning (RL) is a promising approach that can learn a control policy to automate greenhouse management. RL optimises a control policy through interactions with a model of the greenhouse while guided by an economic-based reward function. However, its application to real-world systems is limited due to discrepancies between models and real-world dynamics. Moreover, RL controllers may struggle to maintain state constraints while optimising the primary objective, especially when models inadequately capture the adverse effects of constraint violations on crop growth. Also, the generalisation to novel states, for example, due to unseen weather trajectories, is underexplored in RL-based greenhouse control. This work addresses these challenges through three key contributions. First, we present GreenLight-Gym, the first open-source environment designed for training and evaluating RL algorithms on the state-of-the-art greenhouse model GreenLight. GreenLight-Gym enables the community to benchmark RL-based control methodologies. Second, we compare two reward-shaping approaches, using either a multiplicative or additive penalty, to enforce state boundaries. The additive penalty achieves more stable training while better adhering to state constraints, while the multiplicative penalty yields marginally higher profits. Finally, we evaluate RL performance on a disjoint training and testing weather dataset, demonstrating improved generalisation to unseen conditions. Our environment and experiment scripts are open-sourced, facilitating innovative research on learning-based greenhouse control.	翻訳日:2024-11-01 19:17:28 公開日:2024-10-06
# モバイルエッジとクラウド上の分散推論 - 早期排他的クラスタリングアプローチ Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach ( http://arxiv.org/abs/2410.05338v1 ) ライセンス: Link先を確認	Divya Jyoti Bajpai, Manjesh Kumar Hanawal,	(参考訳) 近年のディープニューラルネットワーク(DNN)の進歩は、様々な領域で顕著な性能を示している。しかし、その大きなサイズは、モバイル、エッジ、IoTプラットフォームといったリソース制約のあるデバイスにデプロイする上での課題である。これを解決するために、小さなDNN(最初は少数のレイヤ)をモバイルに、大きなバージョンをエッジに、完全なバージョンをクラウドにデプロイする分散推論設定を使用することができる。複雑さの低いサンプル(容易)は、モバイル上で推測され、エッジでは適度な複雑性(medium)、クラウドでは高い複雑性(hard)を持つ。各サンプルの複雑さは事前に分かっていないため、分散推論では、DNNの十分な層によって処理されるように、どのように複雑さを決定するかという疑問が生じる。我々は、DNNにおける推論遅延を最小限に抑えるために、Early Exit(EE)戦略を利用するDIMEEという新しいアプローチを開発した。 DIMEEは、モバイルからエッジ/クラウドへのオフロードコストを考慮して、精度の向上を目指している。各種NLPタスクを含むGLUEデータセットに対する実験的検証により,提案手法は,クラウド上での推論を行う場合と比較して,最小の精度低下(0.3%)を維持しつつ,推論コスト(>43%)を著しく低減することが示された。 Recent advances in Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains. However, their large size is a challenge for deployment on resource-constrained devices such as mobile, edge, and IoT platforms. To overcome this, a distributed inference setup can be used where a small-sized DNN (initial few layers) can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud. A sample that has low complexity (easy) could be then inferred on mobile, that has moderate complexity (medium) on edge, and higher complexity (hard) on the cloud. As the complexity of each sample is not known beforehand, the following question arises in distributed inference: how to decide complexity so that it is processed by enough layers of DNNs. We develop a novel approach named DIMEE that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs. DIMEE aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud. Experimental validation on GLUE datasets, encompassing various NLP tasks, shows that our method significantly reduces the inference cost (> 43%) while maintaining a minimal drop in accuracy (< 0.3%) compared to the case where all the inference is made in cloud.	翻訳日:2024-11-01 19:17:28 公開日:2024-10-06
# LLMの意思決定論理と人間の認知の整合性:法的LLMを事例として Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs ( http://arxiv.org/abs/2410.09083v1 ) ライセンス: Link先を確認	Lu Chen, Yuxuan Huang, Yixing Li, Yaohui Jin, Shuai Zhao, Zilong Zheng, Quanshi Zhang,	(参考訳) 本稿では,Lumge Language Models (LLM) の意思決定ロジックと人間の認知の整合性を評価する手法を提案する。言語生成結果に関する従来の評価とは違って,LLMの詳細な意思決定ロジックの正確さは,その正確さの裏側で評価することを提案する。この目的のために、LLMによって符号化された相互作用を原始的な決定論理として定量化する。 LLMの詳細な意思決定ロジックを評価するために,一連のメトリクスを設計する。実験により、言語生成結果が正しいように見える場合でも、内部推論ロジックのかなりの部分が顕著な問題を含んでいることが示された。 This paper presents a method to evaluate the alignment between the decision-making logic of Large Language Models (LLMs) and human cognition in a case study on legal LLMs. Unlike traditional evaluations on language generation results, we propose to evaluate the correctness of the detailed decision-making logic of an LLM behind its seemingly correct outputs, which represents the core challenge for an LLM to earn human trust. To this end, we quantify the interactions encoded by the LLM as primitive decision-making logic, because recent theoretical achievements have proven several mathematical guarantees of the faithfulness of the interaction-based explanation. We design a set of metrics to evaluate the detailed decision-making logic of LLMs. Experiments show that even when the language generation results appear correct, a significant portion of the internal inference logic contains notable issues.	翻訳日:2024-10-30 16:48:15 公開日:2024-10-06
# 大規模言語モデルを用いたロボットシステム問題診断 Diagnosing Robotics Systems Issues with Large Language Models ( http://arxiv.org/abs/2410.09084v1 ) ライセンス: Link先を確認	Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Müller,	(参考訳) 産業アプリケーションで報告された問題の迅速な解決は、経済的影響を最小限に抑えるために不可欠である。しかし、必要なデータ分析によって、基礎となる根の診断は、専門家にとっても困難で時間を要するタスクを引き起こす。対照的に、大きな言語モデル(LLM)は大量のデータを分析するのに優れている。実際、AI-Opsにおける以前の作業は、ITシステムを分析する上での有効性を示している。ここでは、この研究を、ロボットシステムの難解で、ほとんど探索されていない領域に拡張する。この目的のために、2500以上の報告された問題を含む、ロボット工学のプロプライエタリなシステム診断ベンチマークであるSYSDIAGBENCHを作成しました。我々はSYSDIAGBENCHを用いて,LLMの性能を根本原因分析に適用し,モデルサイズと適応手法の幅を考慮して検討する。以上の結果から,QLoRAの微調整により,GPT-4の診断精度が向上し,費用対効果が著しく向上することが示唆された。 LLM-as-a-judgeの結果を人間の専門家による研究で検証し,基準ラベルと同様の承認評価が得られることを発見した。 Quickly resolving issues reported in industrial applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness in analyzing IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, a proprietary system diagnostics benchmark for robotics, containing over 2500 reported issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that QLoRA finetuning can be sufficient to let a 7B-parameter model outperform GPT-4 in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves similar approval ratings as our reference labels.	翻訳日:2024-10-30 16:48:15 公開日:2024-10-06
# UAV通信のセキュア化 - 認証と統合 Securing UAV Communication: Authentication and Integrity ( http://arxiv.org/abs/2410.09085v1 ) ライセンス: Link先を確認	Meriem Ouadah, Fatiha Merazka,	(参考訳) 近年の技術進歩により、民間の任務から軍事活動に至るまで、無人航空網(UAV)が様々な分野に統合されている。この文脈では、データの盗難と操作を防ぐため、セキュリティ、正確に認証を保証することが不可欠である。 Man-in-the-Middle攻撃はネットワークの整合性を損なうだけでなく、元のデータを脅かし、盗難や変更につながる可能性がある。本研究では,セキュアでない通信路上でのUAVデータ交換をセキュアにするための認証手法を提案する。我々のソリューションは、Diffie-Hellman(DH)キー交換とHashベースのメッセージ認証コード(HMAC)をROS通信チャネル内で組み合わせて、交換されたUAVデータを認証する。送信時間を測定し,鍵改ざんをシミュレーションし,4096ビット未満のDHキーサイズに対して許容性能を示した。どちらのドローンもキーの改ざんを検知し、UAV通信を保護する方法の有効性を確認しました。しかし、資源制約のある環境におけるスケーラビリティの課題は、さらなる研究を保証している。 Recent technological advancements have seen the integration of unmanned aerial networks (UAVs) into various sectors, from civilian missions to military operations. In this context, ensuring security, precisely authentication, is essential to prevent data theft and manipulation. A Man-in-the-Middle attack not only compromises network integrity but also threatens the original data, potentially leading to theft or alteration. In this work, we proposed an authentication method to secure UAV data exchange over an insecure communication channel. Our solution combines Diffie-Hellman (DH) key exchange and Hash-based Message Authentication Code (HMAC) within ROS communication channels to authenticate exchanged UAV data. We evaluated our method by measuring transmission time and simulating key tampering, finding acceptable performance for DH key sizes below 4096 bits but longer times for larger sizes due to increased complexity. Both drones successfully detected tampered keys, affirming our method's efficacy in protecting UAV communication. However, scalability challenges in resource-constrained environments warrant further research.	翻訳日:2024-10-30 16:48:15 公開日:2024-10-06

Title

Authors

Abstract

論文公表日・翻訳日

# 低リソース言語におけるモデルマージの可能性の解き放つ

Unlocking the Potential of Model Merging for Low-Resource Languages ( http://arxiv.org/abs/2407.03994v3 )

ライセンス: Link先を確認

Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang, Dongyan Zhao, Yansong Feng,

(参考訳) 大規模言語モデル(LLM)を新しい言語に適応させるには、通常、継続事前訓練(CT)と、教師付き微調整(SFT)が含まれる。しかし、このCT-then-SFTアプローチは、低リソース言語のコンテキストにおいて限られたデータを扱うため、言語モデリングとタスク解決能力のバランスが取れない。そこで我々は,低リソース言語に代わるモデルマージを提案する。我々は、SFTデータを対象言語に含まない低リソース言語のためのタスク解決LLMを開発するために、モデルマージを使用する。 Llama-2-7Bをベースとした実験により, タスク解決能力の低い低リソース言語では, モデルマージがLLMを効果的に実現し, 極めて少ないシナリオではCT-then-SFTより優れていることが示された。モデルマージにおける性能飽和をより多くのトレーニングトークンで観測し、さらにマージプロセスを分析し、モデルのマージアルゴリズムにスラック変数を導入し、重要なパラメータの損失を軽減し、性能を向上させる。モデルマージは、データ不足とデータ効率の向上に苦しむ、より多くの人間の言語に恩恵をもたらすことを願っています。

Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency.

翻訳日:2024-11-08 23:57:53 公開日:2024-10-06

# VoxAct-B:Voxel-based Acting and Stabilizing Policy for bimanual Manipulation

VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation ( http://arxiv.org/abs/2407.04152v2 )

ライセンス: Link先を確認

I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme,

(参考訳) 双対操作は多くのロボティクス応用において重要である。シングルアーム操作とは対照的に、高次元のアクション空間のため、双方向操作タスクは困難である。先行研究は、この問題に対処するために大量のデータと原始的なアクションを利用するが、サンプルの非効率性と様々なタスクにわたる限定的な一般化に悩まされる可能性がある。この目的のために,視覚言語モデル(VLM)を利用した言語条件付きボクセルベース手法であるVoxAct-Bを提案する。我々はこのボクセルグリッドをバイマニュアル操作ポリシーに提供し、動作と安定化の動作を学ぶ。このアプローチは、ボクセルからのより効率的なポリシー学習を可能にし、異なるタスクに一般化することができる。シミュレーションにおいて、VoxAct-Bは、細粒度バイマニュアル操作タスクにおいて、強いベースラインを上回ります。さらに、現実世界の$\texttt{Open Drawer}$と$\texttt{Open Jar}$タスクで2つのUR5を使ってVoxAct-Bを実証する。コード、データ、ビデオはhttps://voxact-b.github.io.comで公開されている。

Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks. To this end, we propose VoxAct-B, a language-conditioned, voxel-based method that leverages Vision Language Models (VLMs) to prioritize key regions within the scene and reconstruct a voxel grid. We provide this voxel grid to our bimanual manipulation policy to learn acting and stabilizing actions. This approach enables more efficient policy learning from voxels and is generalizable to different tasks. In simulation, we show that VoxAct-B outperforms strong baselines on fine-grained bimanual manipulation tasks. Furthermore, we demonstrate VoxAct-B on real-world $\texttt{Open Drawer}$ and $\texttt{Open Jar}$ tasks using two UR5s. Code, data, and videos are available at https://voxact-b.github.io.

翻訳日:2024-11-08 23:57:53 公開日:2024-10-06

# AWT:Augmentation, Weighting, Transportationによるビジョンランゲージモデルの転送

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation ( http://arxiv.org/abs/2407.04603v2 )

ライセンス: Link先を確認

Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang,

(参考訳) 事前学習された視覚言語モデル(VLM)は、様々な視覚分類タスクにおいて印象的な結果を示している。しかし、新しいクラスに関する情報が限られているため、新しい概念を理解するためにそれらを適用する際に、その可能性を完全に解き放たないことが多い。この制限に対処するため、新しい適応フレームワークであるAWT(Augment, Weight, then Transport)を導入する。 AWTは3つの重要な構成要素から構成される: 多様な視覚的視点を持つ入力の増強、画像変換と言語モデルによるクラス記述の強化、予測エントロピーに基づく入力の動的重み付け、視覚言語空間における意味的相関のマイニングに最適な輸送を利用する。 AWTは、様々なVLMにシームレスに統合することができ、追加のトレーニングなしでゼロショット機能を強化し、統合マルチモーダルアダプタモジュールを通じて数ショットの学習を容易にする。我々は、ゼロショット画像分類、ゼロショットビデオアクション認識、アウト・オブ・ディストリビューションの一般化など、AWTを複数の困難なシナリオで検証する。 AWTは、各設定における最先端メソッドを一貫して上回る。さらに、我々の広範な研究は、異なるVLM、アーキテクチャ、スケールにわたるAWTの有効性と適応性をさらに実証している。

Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new classes. To address this limitation, we introduce a novel adaptation framework, AWT (Augment, Weight, then Transport). AWT comprises three key components: augmenting inputs with diverse visual perspectives and enriched class descriptions through image transformations and language models; dynamically weighting inputs based on the prediction entropy; and employing optimal transport to mine semantic correlations in the vision-language space. AWT can be seamlessly integrated into various VLMs, enhancing their zero-shot capabilities without additional training and facilitating few-shot learning through an integrated multimodal adapter module. We verify AWT in multiple challenging scenarios, including zero-shot and few-shot image classification, zero-shot video action recognition, and out-of-distribution generalization. AWT consistently outperforms the state-of-the-art methods in each setting. In addition, our extensive studies further demonstrate AWT's effectiveness and adaptability across different VLMs, architectures, and scales.

翻訳日:2024-11-08 23:46:45 公開日:2024-10-06

# リストグローバル安定性を用いたGMMの非依存的個人密度推定

Agnostic Private Density Estimation for GMMs via List Global Stability ( http://arxiv.org/abs/2407.04783v2 )

ライセンス: Link先を確認

Mohammad Afzali, Hassan Ashtiani, Christopher Liaw,

(参考訳) 制約のない高次元ガウス多様体の混合に対する個人密度推定の問題を考える。この問題のサンプル複雑性に関する最初の上限を証明した。従来,高次元GMMの個人学習性は,実現可能な設定 [Afzali et al , 2024] でのみ知られていた。この結果を証明するために,私的な教師あり学習の文脈で導入された$\textit{list global stability}$ [Ghazi et al , 2021b,a]という概念を利用する。この定義の無関係な変種を定義し、その存在が無関係な私的密度推定に十分であることを示す。そして、GMMのためのグローバルに安定な学習者リストを構築する。

We consider the problem of private density estimation for mixtures of unrestricted high dimensional Gaussians in the agnostic setting. We prove the first upper bound on the sample complexity of this problem. Previously, private learnability of high dimensional GMMs was only known in the realizable setting [Afzali et al., 2024]. To prove our result, we exploit the notion of $\textit{list global stability}$ [Ghazi et al., 2021b,a] that was originally introduced in the context of private supervised learning. We define an agnostic variant of this definition, showing that its existence is sufficient for agnostic private density estimation. We then construct an agnostic list globally stable learner for GMMs.

翻訳日:2024-11-08 23:35:45 公開日:2024-10-06

# RotRNN: 長いシーケンスをローテーションでモデル化する

RotRNN: Modelling Long Sequences with Rotations ( http://arxiv.org/abs/2407.07239v2 )

ライセンス: Link先を確認

Kai Biegun, Rares Dolga, Jake Cunningham, David Barber,

(参考訳) ステートスペースモデル(SSM)やリニアリカレントユニット(LRU)のような線形リカレントニューラルネットワークは、最近、ロングシーケンスモデリングベンチマークで最先端のパフォーマンスを示している。彼らの成功にもかかわらず、彼らの経験的業績はよく理解されておらず、特に複雑な初期化と正規化のスキームなど、多くの欠点が伴っている。本研究では、回転行列の便利な性質を利用する線形リカレントモデルであるRotRNNを提案することにより、これらの問題に対処する。本稿では,RotRNNが頑健な正規化手順を備えたシンプルで効率的なモデルを提供し,その理論的導出に忠実な実践的実装であることを示す。 RotRNNは、いくつかのロングシーケンスモデリングデータセット上で、最先端の線形リカレントモデルに対する競合性能も達成している。

Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

翻訳日:2024-11-08 22:51:19 公開日:2024-10-06

# NativQA: LLMのための多言語文化的適応型自然言語クエリ

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs ( http://arxiv.org/abs/2407.09823v2 )

ライセンス: Link先を確認

Md. Arid Hasan, Maram Hasanain, Fatema Ahmad, Sahinur Rahman Laskar, Sunaya Upadhyay, Vrunda N Sukhadia, Mucahid Kutlu, Shammur Absar Chowdhury, Firoj Alam,

(参考訳) 自然質問回答(QA)データセットは、大規模言語モデル(LLM)の能力を評価する上で重要な役割を果たす。開発されている多くのQAデータセットにも拘わらず、独自の言語でネイティブユーザによって生成された地域固有のデータセットは、注目すべきに欠如している。このギャップは、地域や文化的特異性に対するLLMの効果的なベンチマークを妨げている。さらに、細調整されたモデルの開発も制限される。本研究では,言語に依存しないスケーラブルなフレームワークであるNativQAを提案し,LLMの評価とチューニングのために,文化的かつ地域的に整合したQAデータセットをネイティブ言語でシームレスに構築する。提案手法の有効性を,18のトピックをカバーする9つの領域の母語話者からの質問に基づいて,多言語対応の自然QAデータセットである \mnqa を7言語で,64k の注釈付き QA ペアで設計し,提案手法の有効性を実証した。オープンソースのLCMとMultiNativQAデータセットをベンチマークする。また,低リソースおよび方言に富んだ言語を対象とした微調整データ構築におけるフレームワークの有効性を示す。私たちはNativQAフレームワークとMultiNativQAデータセットをコミュニティ向けに公開しました(https://nativqa.gitlab.io.)。

Natural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own languages. This gap hinders the effective benchmarking of LLMs for regional and cultural specificities. Furthermore, it also limits the development of fine-tuned models. In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, \mnqa, consisting of ~64k manually annotated QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers from 9 regions covering 18 topics. We benchmark open- and closed-source LLMs with the MultiNativQA dataset. We also showcase the framework efficacy in constructing fine-tuning data especially for low-resource and dialectally-rich languages. We made both the framework NativQA and MultiNativQA dataset publicly available for the community (https://nativqa.gitlab.io).

翻訳日:2024-11-08 21:54:45 公開日:2024-10-06

# 多インスタンス部分ラベル学習における不均衡のキャラクタリゼーションと緩和について

On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning ( http://arxiv.org/abs/2407.10000v2 )

ライセンス: Link先を確認

Kaifu Wang, Efthymia Tsamoura, Dan Roth,

(参考訳) *Multi-Instance partial Label Learning*(MI-PLL)は、*partial label learning*、*latent structure learning*、*neurosymbolic learning*を含む弱教師付き学習環境である。 MI-PLL では教師付き学習とは異なり、訓練時の分類器への入力は $\mathbf{x}$ のタプルである。同時に、監督信号は、$\mathbf{x}$の(隠された)ゴールドラベル上の関数$\sigma$によって生成される。本研究は,これまでのMI-PLLの文脈では研究されていない問題,すなわち,異なるクラス(クラス固有のリスク*)のインスタンスを分類する際に発生するエラーの大きな違いを特徴付け,緩和する問題に,複数のコントリビューションを行う。理論の観点からは、最小の仮定をしながら、MI-PLLのクラス固有のリスク境界を導出する。我々の理論は、$\sigma$が学習の不均衡に大きな影響を及ぼすというユニークな現象を明らかにしている。この結果は、データ不均衡のプリズムの下での不均衡を学ぶことのみを研究する教師付きおよび弱教師付き学習に関する以前の研究と対照的である。実用面では,MI-PLLデータのみを用いて隠れラベルの限界を推定する手法を提案する。次に,隠れラベルの限界を制約として扱うことにより,トレーニング時とテスト時の不均衡を軽減するアルゴリズムを導入する。ニューロシンボリック学習とロングテール学習の強いベースラインを用いた手法の有効性を実証し,最大14\%の性能向上を示唆した。

*Multi-Instance Partial Label Learning* (MI-PLL) is a weakly-supervised learning setting encompassing *partial label learning*, *latent structural learning*, and *neurosymbolic learning*. Unlike supervised learning, in MI-PLL, the inputs to the classifiers at training-time are tuples of instances $\mathbf{x}$. At the same time, the supervision signal is generated by a function $\sigma$ over the (hidden) gold labels of $\mathbf{x}$. In this work, we make multiple contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL: that of characterizing and mitigating *learning imbalances*, i.e., major differences in the errors occurring when classifying instances of different classes (aka *class-specific risks*). In terms of theory, we derive class-specific risk bounds for MI-PLL, while making minimal assumptions. Our theory reveals a unique phenomenon: that $\sigma$ can greatly impact learning imbalances. This result is in sharp contrast with previous research on supervised and weakly-supervised learning, which only studies learning imbalances under the prism of data imbalances. On the practical side, we introduce a technique for estimating the marginal of the hidden labels using only MI-PLL data. Then, we introduce algorithms that mitigate imbalances at training- and testing-time, by treating the marginal of the hidden labels as a constraint. We demonstrate the effectiveness of our techniques using strong baselines from neurosymbolic and long-tail learning, suggesting performance improvements of up to 14\%.

翻訳日:2024-11-08 21:43:45 公開日:2024-10-06

# 大規模言語モデルにおける知識メカニズム:調査と展望

Knowledge Mechanisms in Large Language Models: A Survey and Perspective ( http://arxiv.org/abs/2407.15017v3 )

ライセンス: Link先を確認

Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang,

(参考訳) 大規模言語モデル(LLM)における知識メカニズムの理解は、信頼できるAGIへ進む上で不可欠である。本稿では,知識利用と進化を含む新しい分類法から知識メカニズムの解析をレビューする。知識利用は記憶、理解、応用、創造のメカニズムに根ざす。知識進化は、個人およびグループLLM内の知識の動的進行に焦点を当てている。さらに, LLMが学んだ知識, パラメトリック知識の脆弱性の理由, 対処が難しい暗黒知識(仮説)についても論じる。この研究がLLMにおける知識の理解を助け、将来の研究に洞察を与えてくれることを願っています。

Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research.

翻訳日:2024-11-08 19:27:32 公開日:2024-10-06

# Inverted Activations: ニューラルネットワークトレーニングにおけるメモリフットプリントの削減

Inverted Activations: Reducing Memory Footprint in Neural Network Training ( http://arxiv.org/abs/2407.15545v2 )

ライセンス: Link先を確認

Georgii Novikov, Ivan Oseledets,

(参考訳) データとモデルサイズの増加によるニューラルネットワークのスケーリングは、より効率的なディープラーニングアルゴリズムの開発を必要とする。ニューラルネットワークトレーニングにおける重要な課題は、アクティベーションテンソルに関連するメモリフットプリント、特に、後方パスの入力テンソル全体を伝統的に保存するポイントワイド非線形層である。本稿では, ポイントワイド非線形層におけるアクティベーションテンソルの取扱いの修正を提案する。我々の方法は、フォワードパス中に入力テンソルの代わりに出力テンソルを節約することである。後続の層は典型的には入力テンソルを節約するので、このアプローチは2つの層ではなく1つの層間のテンソルだけを格納することで必要な総メモリを削減する。この最適化は、GPT、BERT、Mistral、Llamaといったトランスフォーマーベースのアーキテクチャにとって特に有益である。このアプローチを実現するために,後方通過時の非線形性の逆関数を利用する。逆はほとんどの非線形性に対して解析的に計算できないので、より単純な関数を用いて正確な近似を構築する。実験により,本手法はトレーニング精度や計算性能に影響を与えることなく,メモリ使用量を大幅に削減することを示した。我々の実装は、PyTorchフレームワークの標準非線形層をドロップインで置き換えることで、アーキテクチャの変更を必要とせずに、容易に採用できるようにする。

The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with activation tensors, particularly in pointwise nonlinearity layers that traditionally save the entire input tensor for the backward pass, leading to substantial memory consumption. In this paper, we propose a modification to the handling of activation tensors in pointwise nonlinearity layers. Our method involves saving the output tensor instead of the input tensor during the forward pass. Since the subsequent layer typically also saves its input tensor, this approach reduces the total memory required by storing only one tensor between layers instead of two. This optimization is especially beneficial for transformer-based architectures like GPT, BERT, Mistral, and Llama. To enable this approach, we utilize the inverse function of the nonlinearity during the backward pass. As the inverse cannot be computed analytically for most nonlinearities, we construct accurate approximations using simpler functions. Experimental results demonstrate that our method significantly reduces memory usage without affecting training accuracy or computational performance. Our implementation is provided as a drop-in replacement for standard nonlinearity layers in the PyTorch framework, facilitating easy adoption without requiring architectural modifications.

翻訳日:2024-11-08 15:56:37 公開日:2024-10-06

# Counter Turing Test (CT^2$): HindiのAI生成テキスト検出を調査する - Hindi AI Detectability Index (ADI_{hi}$)に基づくLLMのランク付け

Counter Turing Test ($CT^2$): Investigating AI-Generated Text Detection for Hindi -- Ranking LLMs based on Hindi AI Detectability Index ($ADI_{hi}$) ( http://arxiv.org/abs/2407.15694v2 )

ライセンス: Link先を確認

Ishan Kavathekar, Anku Rani, Ashmit Chamoli, Ponnurangam Kumaraguru, Amit Sheth, Amitava Das,

(参考訳) LLM(Large Language Models)の普及と多言語LLMに関する認識は、AI生成テキストの誤用に関連する潜在的なリスクと反感を懸念し、警戒を増す必要がある。これらのモデルは、主に英語のために訓練されているが、Web全体をカバーする広大なデータセットに対する広範なトレーニングは、他の多くの言語でうまく機能する能力を備えている。 AI生成テキスト検出(AGTD)は、すでに研究で注目を集めているトピックとして現れており、いくつかの初期手法が提案されている。本稿では,Hindi言語におけるAGTDの検討について報告する。私たちの主な貢献は4つあります。一ヒンディー語テキスト作成の習熟度を評価するために、26 LLMを検査すること。二ヒンディー語(AG_{hi}$)データセットにAI生成ニュース記事を導入すること。 iii)最近提案された5つのAGTD(ConDA, J-Guard, RADAR, RAIDAR, Intrinsic Dimension Estimation)の有効性を評価した。 iv) Hindi AI Detectability Index(ADI_{hi}$)を提案した。コードとデータセットはhttps://github.com/ishank31/Counter_Turing_Testで公開されている。

The widespread adoption of Large Language Models (LLMs) and awareness around multilingual LLMs have raised concerns regarding the potential risks and repercussions linked to the misapplication of AI-generated text, necessitating increased vigilance. While these models are primarily trained for English, their extensive training on vast datasets covering almost the entire web, equips them with capabilities to perform well in numerous other languages. AI-Generated Text Detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. In this paper, we report our investigation on AGTD for an indic language Hindi. Our major contributions are in four folds: i) examined 26 LLMs to evaluate their proficiency in generating Hindi text, ii) introducing the AI-generated news article in Hindi ($AG_{hi}$) dataset, iii) evaluated the effectiveness of five recently proposed AGTD techniques: ConDA, J-Guard, RADAR, RAIDAR and Intrinsic Dimension Estimation for detecting AI-generated Hindi text, iv) proposed Hindi AI Detectability Index ($ADI_{hi}$) which shows a spectrum to understand the evolving landscape of eloquence of AI-generated text in Hindi. The code and dataset is available at https://github.com/ishank31/Counter_Turing_Test

翻訳日:2024-11-08 15:45:25 公開日:2024-10-06

# 音声認識におけるテキスト予測可能性の役割の定量化

Quantifying the Role of Textual Predictability in Automatic Speech Recognition ( http://arxiv.org/abs/2407.16537v2 )

ライセンス: Link先を確認

Sean Robertson, Gerald Penn, Ewan Dunbar,

(参考訳) 音声認識研究における長年の疑問は、エラーを音響をモデル化するモデルの能力と、高次文脈(語彙、形態学、構文、意味論)を活用する能力にどのように当てはめるかである。我々は,テキスト予測可能性の関数として誤り率をモデル化し,認識者に対するテキスト予測可能性の影響を計測する1つの数,$k$を得る新しい手法を検証する。本稿では,Wav2Vec 2.0 ベースのモデルが,明示的な言語モデルを使用しないにもかかわらず,ハイブリッド ASR モデルよりもテキストコンテキストをより強く活用できることを実証するために用いるとともに,アフリカ系アメリカ人英語における標準 ASR システムの性能の低下を示す最近の結果に光を当てるために使用する。これらは主に音響-音響-音響モデリングの失敗を表す。本稿では,ASRの診断と改善において,このアプローチがいかに簡単に利用できるかを示す。

A long-standing question in automatic speech recognition research is how to attribute errors to the ability of a model to model the acoustics, versus its ability to leverage higher-order context (lexicon, morphology, syntax, semantics). We validate a novel approach which models error rates as a function of relative textual predictability, and yields a single number, $k$, which measures the effect of textual predictability on the recognizer. We use this method to demonstrate that a Wav2Vec 2.0-based model makes greater stronger use of textual context than a hybrid ASR model, in spite of not using an explicit language model, and also use it to shed light on recent results demonstrating poor performance of standard ASR systems on African-American English. We demonstrate that these mostly represent failures of acoustic--phonetic modelling. We show how this approach can be used straightforwardly in diagnosing and improving ASR.

翻訳日:2024-11-08 15:34:26 公開日:2024-10-06

# ニューラル言語モデルによる言語習得における臨界期間の影響の検討

Investigating Critical Period Effects in Language Acquisition through Neural Language Models ( http://arxiv.org/abs/2407.19325v2 )

ライセンス: Link先を確認

Ionut Constantinescu, Tiago Pimentel, Ryan Cotterell, Alex Warstadt,

(参考訳) 第二言語 (L2) の習得は幼少期以降に難しくなり、この時代以降(以前ではないが)第1言語 (L1) への露出を緩和することは、通常、L1 の習熟度を著しく損なうことはない。これらのCP効果が自然に決定された脳の成熟によるものなのか、または経験によって自然に誘発される神経接続の安定化であるのかは不明である。本研究では、言語モデル(LM)を用いて、これらの現象が人間特有のものであるか、あるいはより広範な言語学習者によって共有されているかをテストする。また,L2の曝露時期が遅れた場合に,自然成熟期と直接類似しないLMがCP効果を示すことが確認された。本結果は,CP効果は統計的学習の必然的な結果であり,CP効果の自然メカニズムと矛盾するものである。我々は, 可塑性の成熟度低下をシミュレートするために, トレーニングを通じてレギュレータ部分ウェイを導入することにより, CPをリバースエンジニアリングできることを示す。以上の結果から,L1学習自体がCPを誘導するには不十分である可能性が示唆され,言語モデルをより認知的確固たるものにするためには,さらなるエンジニアリングが必要である。

Humans appear to have a critical period (CP) for language acquisition: Second language (L2) acquisition becomes harder after early childhood, and ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to substantial loss of L1 proficiency. It is unknown whether these CP effects result from innately determined brain maturation or as a stabilization of neural connections naturally induced by experience. In this study, we use language models (LMs) to test the extent to which these phenomena are peculiar to humans, or shared by a broader class of language learners. We vary the age of exposure by training LMs on language pairs in various experimental conditions, and find that LMs, which lack any direct analog to innate maturational stages, do not show CP effects when the age of exposure of L2 is delayed. Our results contradict the claim that CP effects are an inevitable result of statistical learning, and they are consistent with an innate mechanism for CP effects. We show that we can reverse-engineer the CP by introducing a regularizer partway through training to simulate a maturational decrease in plasticity. All in all, our results suggest that L1 learning on its own may not be enough to induce a CP, and additional engineering is necessary to make language models more cognitively plausible.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-06

# 市販のCNNとViTを併用した音声認識のためのもうひとつの驚くべきベースライン

Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition ( http://arxiv.org/abs/2407.19472v2 )

ライセンス: Link先を確認

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Prayag Tiwari, Josef Bigun,

(参考訳) 本稿では,ImageNet Large Scale Visual Recognition Challengeのために開発された事前学習型アーキテクチャを,近視認識に適用する。これらのアーキテクチャは、設計されたもの以外の様々なコンピュータビジョンタスクにおいて大きな成功を収めた。この研究は、既成の畳み込みニューラルネットワーク(CNN)を用いた以前の研究に基づいており、最近提案されたビジョントランスフォーマー(ViT)を含むように拡張している。汎用オブジェクト分類の訓練を受けているにもかかわらず、CNNとViTの中間層の特徴は、近視画像に基づいて個人を認識するのに適した方法である。また,CNN と ViT が相補的であることも実証した。さらに,これらの事前学習モデルのごく一部で精度が向上し,より少ないパラメータで,移動体などの資源制限環境に適したモデルが得られることを示す。この効率性は、従来の手作りの機能も追加すれば向上する。

We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition. These architectures have demonstrated significant success in various computer vision tasks beyond the ones for which they were designed. This work builds on our previous study using off-the-shelf Convolutional Neural Network (CNN) and extends it to include the more recently proposed Vision Transformers (ViT). Despite being trained for generic object classification, middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images. We also demonstrate that CNNs and ViTs are highly complementary since their combination results in boosted accuracy. In addition, we show that a small portion of these pre-trained models can achieve good accuracy, resulting in thinner models with fewer parameters, suitable for resource-limited environments such as mobiles. This efficiency improves if traditional handcrafted features are added as well.

翻訳日:2024-11-08 14:27:29 公開日:2024-10-06

# EEGMamba:EEGマルチタスク分類の専門家の混在を考慮した双方向状態空間モデル

EEGMamba: Bidirectional State Space Model with Mixture of Experts for EEG Multi-task Classification ( http://arxiv.org/abs/2407.20254v2 )

ライセンス: Link先を確認

Yiyu Gui, MingZhi Chen, Yuqi Su, Guibo Luo, Yuchao Yang,

(参考訳) 近年、深層学習の発展に伴い、脳波分類網(EEG)は一定の進歩を遂げている。トランスフォーマーベースのモデルは、脳波信号の長期的な依存関係を捉えるのによく機能する。しかし、その二次計算の複雑さは、かなりの計算上の問題を引き起こす。さらに、ほとんどのEEG分類モデルは単一タスクにのみ適しており、特に信号長やチャネル数の変化に直面した場合、様々なタスクの一般化に苦慮している。本稿では,脳波アプリケーションのためのマルチタスク学習を真に実装した初のユニバーサル脳波分類ネットワークであるEEGMambaを紹介する。 EEGMambaは、Spatio-Temporal-Adaptive (ST-Adaptive)モジュール、双方向のMamba、Mixture of Experts (MoE)をシームレスに統合したフレームワークに統合する。提案するST-Adaptiveモジュールは,空間適応的畳み込みによって異なる長さとチャネル数を持つ脳波信号に対して統合された特徴抽出を行い,時間適応性を実現するためにクラストークンを組み込む。さらに,脳波信号に特に適する双方向のマンバを設計し,特徴抽出,高精度のバランス,高速推論速度,長期脳波信号処理における効率的なメモリ使用量について検討した。複数のタスクにまたがる脳波データの処理を強化するため、タスク認識型MOEをユニバーサルエキスパートに導入し、異なるタスクから脳波データの違いと共通点の両方を効果的に把握する。本研究では,8つの公用脳波データセットを用いてモデルの評価を行い,その評価結果から,発作検出,感情認識,睡眠ステージ分類,運動画像の4種類のタスクにおいて,その優れた性能を実証した。コードはまもなくリリースされる予定だ。

In recent years, with the development of deep learning, electroencephalogram (EEG) classification networks have achieved certain progress. Transformer-based models can perform well in capturing long-term dependencies in EEG signals. However, their quadratic computational complexity poses a substantial computational challenge. Moreover, most EEG classification models are only suitable for single tasks and struggle with generalization across different tasks, particularly when faced with variations in signal length and channel count. In this paper, we introduce EEGMamba, the first universal EEG classification network to truly implement multi-task learning for EEG applications. EEGMamba seamlessly integrates the Spatio-Temporal-Adaptive (ST-Adaptive) module, bidirectional Mamba, and Mixture of Experts (MoE) into a unified framework. The proposed ST-Adaptive module performs unified feature extraction on EEG signals of different lengths and channel counts through spatial-adaptive convolution and incorporates a class token to achieve temporal-adaptability. Moreover, we design a bidirectional Mamba particularly suitable for EEG signals for further feature extraction, balancing high accuracy, fast inference speed, and efficient memory-usage in processing long EEG signals. To enhance the processing of EEG data across multiple tasks, we introduce task-aware MoE with a universal expert, effectively capturing both differences and commonalities among EEG data from different tasks. We evaluate our model on eight publicly available EEG datasets, and the experimental results demonstrate its superior performance in four types of tasks: seizure detection, emotion recognition, sleep stage classification, and motor imagery. The code is set to be released soon.

翻訳日:2024-11-08 14:05:01 公開日:2024-10-06

# グラフニューラルネットワークを用いた最適かつ効率的なテキスト偽造物

Optimal and efficient text counterfactuals using Graph Neural Networks ( http://arxiv.org/abs/2408.01969v2 )

ライセンス: Link先を確認

Dimitris Lymperopoulos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,

(参考訳) NLPモデルは意思決定プロセスにますます不可欠なものとなり、説明可能性や解釈可能性の必要性が最重要になっている。そこで本研究では,モデル予測を変化させる反事実的介入と呼ばれる意味論的に編集された入力を生成し,モデルに対する反事実的説明の形式を提供するフレームワークを提案する。我々は2つのNLPタスク – バイナリ感情分類とトピック分類 – でフレームワークをテストし、生成した編集がコントラストがあり、流動性があり、最小限であることを示した。

As NLP models become increasingly integral to decision-making processes, the need for explainability and interpretability has become paramount. In this work, we propose a framework that achieves the aforementioned by generating semantically edited inputs, known as counterfactual interventions, which change the model prediction, thus providing a form of counterfactual explanations for the model. We test our framework on two NLP tasks - binary sentiment classification and topic classification - and show that the generated edits are contrastive, fluent and minimal, while the whole process remains significantly faster that other state-of-the-art counterfactual editors.

翻訳日:2024-11-08 13:07:08 公開日:2024-10-06

# Diff-PIC:拡散モデルを用いた粒子内核融合シミュレーション

Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models ( http://arxiv.org/abs/2408.02693v3 )

ライセンス: Link先を確認

Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng,

(参考訳) AIの急速な発展は、持続可能なエネルギーの必要性の押し付けを強調している。核融合は究極的な解決策と見なされるが、ほぼ1世紀近くにわたって集中的な研究の中心であり、投資は数十億ドルに達した。近年の慣性凝縮核融合の進展は核融合研究に大きな注目を集めており、レーザー-プラズマ相互作用(LPI)は核融合の安定性と効率を確保するために重要である。しかし、核融合点火時のLPIの複雑さは分析的アプローチを非現実的なものにしており、非常に計算に要求されるParticle-in-Cell (PIC) シミュレーションに頼ってデータを生成し、融合研究の進展に重大なボトルネックをもたらす。 Diff-PICは、条件付き拡散モデルを利用して、高忠実度科学的なLPIデータを生成するために、PICシミュレーションの計算効率を向上する新しいフレームワークである。本研究では,PICシミュレーションによって得られた物理パターンを,(1)物理パラメータとそれに対応する結果との複雑な関係を効果的に捉えるために,物理インフォームド方式で,拡散モデルに抽出する。 2) 高忠実度, 物理的妥当性を維持しつつ, 効率を一層向上させるため, 修正流法を用いて, モデルの1ステップ条件拡散モデルに変換する。実験の結果、Diff-PICは100ピコ秒のシミュレーションで従来のPICと比較して16,200$\times$スピードアップを達成し、他の2つのSOTAデータ生成手法と比較してMAE / RMSE / FIDの59.21% / 57.15% / 39.46%の減少率を示した。

The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century, with investments reaching hundreds of billions of dollars. Recent advancements in Inertial Confinement Fusion have drawn significant attention to fusion research, in which Laser-Plasma Interaction (LPI) is critical for ensuring fusion stability and efficiency. However, the complexity of LPI upon fusion ignition makes analytical approaches impractical, leaving researchers depending on extremely computation-demanding Particle-in-Cell (PIC) simulations to generate data, presenting a significant bottleneck to advancing fusion research. In response, this work introduces Diff-PIC, a novel framework that leverages conditional diffusion models as a computationally efficient alternative to PIC simulations for generating high-fidelity scientific LPI data. In this work, physical patterns captured by PIC simulations are distilled into diffusion models associated with two tailored enhancements: (1) To effectively capture the complex relationships between physical parameters and corresponding outcomes, the parameters are encoded in a physically-informed manner. (2) To further enhance efficiency while maintaining high fidelity and physical validity, the rectified flow technique is employed to transform our model into a one-step conditional diffusion model. Experimental results show that Diff-PIC achieves 16,200$\times$ speedup compared to traditional PIC on a 100 picosecond simulation, with an average reduction in MAE / RMSE / FID of 59.21% / 57.15% / 39.46% with respect to two other SOTA data generation approaches.

翻訳日:2024-11-08 12:55:50 公開日:2024-10-06

# グラフ残差法による分子特性予測法

Graph Residual based Method for Molecular Property Prediction ( http://arxiv.org/abs/2408.03342v2 )

ライセンス: Link先を確認

Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar,

(参考訳) 不動産予測のための機械学習駆動の手法は、大きな関心を集めてきた。しかし、重要なアプリケーションの一般化能力、正確性、推論時間を改善するために、多くの作業が続けられている。従来の機械学習モデルは、しばしば容易に利用できない分子から抽出された特徴に基づいて特性を予測する。本研究では,新しいDeep Learning法であるエッジ条件付き残留グラフニューラルネットワーク(ECRGNN)を適用し,分子のグラフ構造を直接予測する。 SMILES (Simplified Molecular Input Line Entry System) の分子の表現は入力データ形式として使用されており、さらにトレーニングデータを構成するグラフデータベースに変換されている。この写本は、GRUベースの新しい方法論であるECRGNNの詳細な記述を強調し、使用済みの入力をマッピングする。回帰特性と分類効力の両方を強調して強調する。変分オートエンコーダ(VAE)の詳細な記述とマルチクラスマルチラベル特性予測に使用されるエンドツーエンド学習法も提案した。結果は、標準ベンチマークデータセットや、新たに開発されたデータセットと比較されている。これまで使用されてきたパフォーマンス指標はすべて明確に定義されており、その理由が選択されている。

Machine learning-driven methods for property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time for critical applications. The traditional machine learning models predict properties based on the features extracted from the molecules, which are often not easily available. In this work, a novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This manuscript highlights a detailed description of the novel GRU-based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has been provided as well. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.

翻訳日:2024-11-08 12:44:50 公開日:2024-10-06

# 測定デバイス非依存量子鍵分布における強度相関

Intensity correlations in measurement-device-independent quantum key distribution ( http://arxiv.org/abs/2408.08011v3 )

ライセンス: Link先を確認

Junxuan Liu, Tianyi Xing, Ruiyin Liu, Zihao Chen, Hao Tan, Anqi Huang,

(参考訳) 測定デバイス非依存量子鍵分布(MDI QKD)システムにおける量子状態準備中の不完全な変調による強度相関は、そのセキュリティ性能を損なう。したがって、MDI QKDシステムの実用セキュリティに対する強度相関の影響を評価することが重要である。本研究では,MDI QKDシステムのキーレートを,強度相関の下で定量的に解析する理論モデルを提案する。さらに,この理論モデルを実測強度相関を用いたMDI QKDシステムに適用することにより,本モデルの下で鍵を効率よく生成することが困難であることを示す。また、秘密鍵を生成するために強度相関の境界条件についても検討する。本研究は,MDI QKDプロトコルに対する強度相関のセキュリティ解析を拡張し,MDI QKDシステムの実用的セキュリティを評価する方法論を提供する。

The intensity correlations due to imperfect modulation during the quantum-state preparation in a measurement-device-independent quantum key distribution (MDI QKD) system compromise its security performance. Therefore, it is crucial to assess the impact of intensity correlations on the practical security of MDI QKD systems. In this work, we propose a theoretical model that quantitatively analyzes the secure key rate of MDI QKD systems under intensity correlations. Furthermore, we apply the theoretical model to a practical MDI QKD system with measured intensity correlations, which shows that the system struggles to generate keys efficiently under this model. We also explore the boundary conditions of intensity correlations to generate secret keys. This study extends the security analysis of intensity correlations to MDI QKD protocols, providing a methodology to evaluate the practical security of MDI QKD systems.

翻訳日:2024-11-08 07:29:14 公開日:2024-10-06

# Cybench: セキュリティ能力と言語モデルのリスクを評価するフレームワーク

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models ( http://arxiv.org/abs/2408.08926v2 )

ライセンス: Link先を確認

Andy K. Zhang, Neil Perry, Riya Dulepet, Joey Ji, Justin W. Lin, Eliot Jones, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh, Daniel E. Ho, Percy Liang,

(参考訳) 脆弱性を自律的に識別し、エクスプロイトを実行するサイバーセキュリティのための言語モデル(LM)エージェントは、現実世界に影響を及ぼす可能性がある。政策立案者、モデル提供者、AIおよびサイバーセキュリティコミュニティの他の研究者は、サイバーリスクを軽減し、侵入テストの機会を調べるためにそのようなエージェントの能力を定量化することに興味を持っている。そこで,サイバーセキュリティタスクの特定と,それらのタスクに対するエージェント評価のためのフレームワークであるCybenchを紹介する。 4つの異なるCTFコンペティションから、40のプロフェッショナルレベルのCapture the Flag(CTF)タスクが含まれています。各タスクは独自の記述、スターターファイルを含み、エージェントがbashコマンドを実行して出力を観察できる環境で初期化される。多くのタスクは既存のLMエージェントの能力を超えており、タスクごとにサブタスクを導入し、タスクを中間ステップに分解してより詳細な評価を行う。 GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, Llama 3.1 405B Instruct。サブタスクのガイダンスなしでは、Claude 3.5 Sonnet、GPT-4o、OpenAI o1-preview、Claude 3 Opusを活用するエージェントは、人間のチームが解くのに最大11分かかった完全なタスクをうまく解決した。対照的に、最も難しいタスクは、解決に24時間54分を要した。すべてのコードとデータはhttps://cybench.github.ioで公開されている。

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute bash commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks for each task, which break down a task into intermediary steps for a more detailed evaluation. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 8 models: GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. Without subtask guidance, agents leveraging Claude 3.5 Sonnet, GPT-4o, OpenAI o1-preview, and Claude 3 Opus successfully solved complete tasks that took human teams up to 11 minutes to solve. In comparison, the most difficult task took human teams 24 hours and 54 minutes to solve. All code and data are publicly available at https://cybench.github.io

翻訳日:2024-11-08 07:07:05 公開日:2024-10-06

# 多粒子連続可変非ガウスエンタングルメント構造のメトロロジカルキャラクタリゼーション

Metrological Characterization of Multipartite Continuous-Variable non-Gaussian Entanglement Structure ( http://arxiv.org/abs/2408.12554v2 )

ライセンス: Link先を確認

Mingsheng Tian, Xiaoting Gao, Boxuan Jing, Feng-Xiao Sun, Matteo Fadel, Qiongyi He,

(参考訳) マルチパーティ・エンタングルメントは量子情報処理に不可欠な資源であるが、連続変数系におけるエンタングルメント構造の特徴付けは、特にマルチモード非ガウス的シナリオにおいて難しいままである。本研究では,連続変数状態における多部交絡構造を検出する手法を提案する。量子フィッシャー情報を活用することにより,多モード非ガウス状態における量子相関を捉えることが可能な演算子を同定する体系的手法を提案する。ランダムに生成した多モード量子状態に対して,本手法の有効性を実証し,絡み付き検出において高い成功率を達成する。さらに,本手法は,アクセス可能な演算子の集合を拡張することで,損失に対する堅牢性を向上する。この研究は、様々な連続変数系における絡み合い構造を特徴づけるための一般的なフレームワークを提供し、多くの実験的な応用を可能にする。

Multipartite entanglement is an essential resource for quantum information tasks, but characterizing entanglement structures in continuous variable systems remains challenging, especially in multimode non-Gaussian scenarios. In this work, we introduce a method for detecting multipartite entanglement structures in continuous variable states. By leveraging the quantum Fisher information, we propose a systematic approach to identify feasible operators that capture quantum correlations in multimode non-Gaussian states. We demonstrate the effectiveness of our method on over $10^5$ randomly generated multimode-entangled quantum states, achieving a high success rate in entanglement detection. Additionally, our method exhibits enhanced robustness against losses by expanding the set of accessible operators. This work provides a general framework for characterizing entanglement structures in diverse continuous variable systems, enabling a number of experimentally relevant applications.

翻訳日:2024-11-08 05:37:29 公開日:2024-10-06

# S4D:ガウスと3次元制御点を用いた4次元実世界再構成

S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points ( http://arxiv.org/abs/2408.13036v2 )

ライセンス: Link先を確認

Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang,

(参考訳) ガウシアンを用いた動的シーン再構築が近年注目されている。主流のアプローチは典型的には、大域的な変形場を用いて、標準空間の3Dシーンをワープする。しかし、暗黙の神経場の固有の低周波の性質は、しばしば複素運動の非効率な表現につながる。さらに、その構造的な剛性は、様々な解像度と持続時間を持つシーンへの適応を妨げる可能性がある。これらの課題に対処するために,離散的な3次元制御点を用いた4次元実世界の再構成をストリーミングする手法を提案する。この方法は局所光を物理的にモデル化し、運動デカップリング座標系を確立する。従来のグラフィックスと学習可能なパイプラインを効果的にマージすることにより、堅牢で効率的なローカルな6自由度(6-DoF)モーション表現を提供する。さらに,ガウスの制御点とガウスの制御点を統合する一般化されたフレームワークを開発した。最初の3D再構成から始まり、我々のワークフローはストリーミング4D再構成を4つの独立したサブモジュールに分解する。実験により,提案手法は,Neu3DVおよびCMU-Panopticデータセットの既存の4Dガウススプラッティング技術より優れていることが示された。特に、私たちの3Dコントロールポイントの最適化は、100回、NVIDIA 4070 GPUで1フレームあたりわずか2秒で達成できます。

Dynamic scene reconstruction using Gaussians has recently attracted increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in canonical space. However, the inherent low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To address these challenges, we introduce a novel approach for streaming 4D real-world reconstruction utilizing discrete 3D control points. This method physically models local rays and establishes a motion-decoupling coordinate system. By effectively merging traditional graphics with learnable pipelines, it provides a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that integrates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D reconstruction into four independent submodules: 3D segmentation, 3D control point generation, object-wise motion manipulation, and residual compensation. Experimental results demonstrate that our method outperforms existing state-of-the-art 4D Gaussian splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Notably, the optimization of our 3D control points is achievable in 100 iterations and within just 2 seconds per frame on a single NVIDIA 4070 GPU.

翻訳日:2024-11-08 05:26:28 公開日:2024-10-06

# SONICS: Synthetic or Not -- Identifying Counterfeit Songs

SONICS: Synthetic Or Not -- Identifying Counterfeit Songs ( http://arxiv.org/abs/2408.14080v3 )

ライセンス: Link先を確認

Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah,

(参考訳) 最近のAI生成楽曲の急増は、エキサイティングな可能性と挑戦を示している。これらの発明は、音楽の創造を民主化する一方で、芸術的整合性を守り、人間の音楽芸術を保護するために、人間の構成した歌と合成歌を区別する能力も必要である。フェイクソング検出における既存の研究とデータセットは、ボーカルがAIによって生成されるが、楽器音楽は実際の歌から供給される、歌声のディープフェイク検出(SVDD)のみに焦点を当てている。しかし、これらのアプローチは、すべてのコンポーネント(声、音楽、歌詞、スタイル)がAIによって生成されるような、現代のエンドツーエンドの人工歌を検出するには不十分である。さらに、既存のデータセットには、音楽歌詞の多様性、長いデュレーション曲、オープンアクセスのフェイクソングが欠けている。これらのギャップに対処するため,Sano や Udio などの人気プラットフォームから,97k曲 (4,751時間) 以上と49k曲以上の合成歌からなる,エンドツーエンドの合成歌検出(SSD)のための新しいデータセット SONICS を紹介した。さらに,既存の手法で完全に見落とされ,歌唱における時間的長期依存性を効果的に検出するためにモデル化することの重要性を強調した。長距離パターンを利用するために、従来のCNNやTransformerベースのモデルよりも時間とメモリ効率を大幅に向上させる新しいアーキテクチャであるSpecTTTraを導入する。特に、長いオーディオサンプルでは、私たちの最高のパフォーマンスの亜種は、ViTのスコアを8%上回り、スピードは38%、メモリ使用量は26%減った。さらに,ConvNeXtと比較してF1スコアが1%向上し,速度が20%向上し,メモリ使用量が67%減少した。モデルファミリーの他のバリエーションは、競争力のあるパフォーマンスで、より優れたスピードとメモリ効率を提供する。

The recent surge in AI-generated songs presents exciting possibilities and challenges. While these inventions democratize music creation, they also necessitate the ability to distinguish between human-composed and synthetic songs to safeguard artistic integrity and protect human musical artistry. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, these approaches are inadequate for detecting contemporary end-to-end artificial songs where all components (vocals, music, lyrics, and style) could be AI-generated. Additionally, existing datasets lack music-lyrics diversity, long-duration songs, and open-access fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs (4,751 hours) with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect entirely overlooked in existing methods. To utilize long-range patterns, we introduce SpecTTTra, a novel architecture that significantly improves time and memory efficiency over conventional CNN and Transformer-based models. In particular, for long audio samples, our top-performing variant outperforms ViT by 8% F1 score while being 38% faster and using 26% less memory. Additionally, in comparison with ConvNeXt, our model achieves 1% gain in F1 score with 20% boost in speed and 67% reduction in memory usage. Other variants of our model family provide even better speed and memory efficiency with competitive performance.

翻訳日:2024-11-08 05:04:12 公開日:2024-10-06

# 簡易型安全連続学習機

Simplex-enabled Safe Continual Learning Machine ( http://arxiv.org/abs/2409.05898v2 )

ライセンス: Link先を確認

Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo,

(参考訳) 本稿では, 安全クリティカルな自律システムを対象とした, シンプルで安全な連続学習システムSeC-Learning Machineを提案する。 SeC学習マシンはSimplexロジック(「複雑さを制御するためのシンプルさ」)と物理制御された深層強化学習(Phy-DRL)に基づいて構築されている。これにより、HP(ハイパフォーマンス)、HA(ハイアシュアランス)、コーディネータを構成する。具体的には、HP-Studentは事前訓練された高性能だが完全に検証されていないPhy-DRLで、実際の工場で学び続け、アクションポリシーを安全に調整している。これとは対照的に、HA-Teacherはミッション再現型、物理モデルベース、そして検証された設計である。 HA-Teacherには2つのミッションがある。 Coordinatorは、HP-StudentとHA-Teacherのインタラクションとスイッチをトリガーする。対話的な3つのコンポーネントで動く機械学習マシンSeC 一生涯の安全を確保すること(すなわち、HP-Studentの成功又は収束にかかわらず、継続学習段階における安全を保証すること。) ii)Sim2Realのギャップに対処し、三実の植物の未知を許容することを学ぶこと。カートポールシステムと実四足歩行ロボットの実験は、Sim2Realギャップに対処するアプローチを備えた最先端の安全なDRLフレームワーク上に構築された連続学習と比較して、SeC学習マシンの際立った特徴を実証している。

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

翻訳日:2024-11-07 22:27:40 公開日:2024-10-06

# ネイティブ対非ネイティブ言語プロンプト:比較分析

Native vs Non-Native Language Prompting: A Comparative Analysis ( http://arxiv.org/abs/2409.07054v2 )

ライセンス: Link先を確認

Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, Firoj Alam,

(参考訳) 大規模言語モデル(LLM)は、標準自然言語処理(NLP)タスクなど、さまざまな分野において顕著な能力を示している。 LLMから知識を引き出すために、プロンプトは自然言語命令からなる重要な役割を果たす。ほとんどのオープンソースでクローズドなLCMは、テキスト、画像、オーディオ、ビデオなどのデジタルコンテンツというラベル付きおよびラベルなしのリソースで訓練されている。したがって、これらのモデルは高リソースの言語に対してより良い知識を持っているが、低リソースの言語では苦労している。プロンプトは能力を理解する上で重要な役割を果たすため、プロンプトに使われる言語は依然として重要な研究課題である。この領域では重要な研究がなされているが、まだ限られており、中級言語から低級言語への探索は少ない。本研究では、12のアラビアデータセット(9.7Kデータポイント)に関連する11の異なるNLPタスクにおける異なるプロンプト戦略(ネイティブ対非ネイティブ)について検討する。合計で3つのLSM、12のデータセット、および3つのプロンプト戦略を含む197の実験を行った。以上の結果から,非ネイティブプロンプトは平均して最善であり,その後に混合プロンプトとネイティブプロンプトが続くことが示唆された。

Large language models (LLMs) have shown remarkable abilities in different fields, including standard Natural Language Processing (NLP) tasks. To elicit knowledge from LLMs, prompts play a key role, consisting of natural language instructions. Most open and closed source LLMs are trained on available labeled and unlabeled resources--digital content such as text, images, audio, and videos. Hence, these models have better knowledge for high-resourced languages but struggle with low-resourced languages. Since prompts play a crucial role in understanding their capabilities, the language used for prompts remains an important research question. Although there has been significant research in this area, it is still limited, and less has been explored for medium to low-resourced languages. In this study, we investigate different prompting strategies (native vs. non-native) on 11 different NLP tasks associated with 12 different Arabic datasets (9.7K data points). In total, we conducted 197 experiments involving 3 LLMs, 12 datasets, and 3 prompting strategies. Our findings suggest that, on average, the non-native prompt performs the best, followed by mixed and native prompts.

翻訳日:2024-11-07 21:53:46 公開日:2024-10-06

# Propaganda to Hate:マルチエージェントLDMを用いたアラビアミームのマルチモーダル分析

Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs ( http://arxiv.org/abs/2409.07246v2 )

ライセンス: Link先を確認

Firoj Alam, Md. Rafiul Biswas, Uzair Shah, Wajdi Zaghouani, Georgios Mikros,

(参考訳) 過去10年間、ソーシャルメディアプラットフォームは情報発信と消費に使われてきた。コンテンツの大部分は市民ジャーナリズムと大衆の認知を促進するために投稿されるが、一部のコンテンツは誤解を招くユーザーへ投稿される。テキスト、画像、ビデオなどの様々なコンテンツタイプの中で、ミーム(画像上のテキストオーバーレイド)は特に一般的であり、プロパガンダ、憎悪、ユーモアの強力な乗り物として機能する。現在の文献では、ミーム内の個々の内容を検出する努力がなされている。しかし、それらの交叉の研究は非常に限られている。本研究では,マルチエージェントLPMを用いた手法を用いて,ミームにおけるプロパガンダと憎悪の交点を探索する。我々は、粗い、きめ細かい憎悪ラベルでプロパガンダ的なミームデータセットを拡張した。我々の発見は、ミームにプロパガンダと憎悪の関連があることを示唆している。今後の研究のベースラインとなるための詳細な実験結果を提供する。実験的なリソースをコミュニティに公開します(https://github.com/firojalam/propaganda-and-hateful-memes)。

In the past decade, social media platforms have been used for information dissemination and consumption. While a major portion of the content is posted to promote citizen journalism and public awareness, some content is posted to mislead users. Among different content types such as text, images, and videos, memes (text overlaid on images) are particularly prevalent and can serve as powerful vehicles for propaganda, hate, and humor. In the current literature, there have been efforts to individually detect such content in memes. However, the study of their intersection is very limited. In this study, we explore the intersection between propaganda and hate in memes using a multi-agent LLM-based approach. We extend the propagandistic meme dataset with coarse and fine-grained hate labels. Our finding suggests that there is an association between propaganda and hate in memes. We provide detailed experimental results that can serve as a baseline for future studies. We will make the experimental resources publicly available to the community (https://github.com/firojalam/propaganda-and-hateful-memes).

翻訳日:2024-11-07 21:53:46 公開日:2024-10-06

# 自然言語推論における説明を用いた敵対的ロバスト性の向上

Enhancing adversarial robustness in Natural Language Inference using explanations ( http://arxiv.org/abs/2409.07423v2 )

ライセンス: Link先を確認

Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,

(参考訳) 最先端のTransformerベースのモデルの急増は、間違いなくNLPモデルのパフォーマンスの限界を押し上げ、様々なタスクに優れています。我々は,自然言語推論 (NLI) の課題に注目を当てた。なぜなら,よく適合したデータセットで訓練されたモデルは,敵対的攻撃の影響を受けやすいため,微妙な入力介入によってモデルを誤解させることができるからだ。本研究は, 前提仮説入力ではなく, 説明文の分類器を微調整することによって, 説明自由ベースラインと比較して, 種々の敵攻撃下での堅牢性を実現することによる, 広範囲な実験を通じて, モデルに依存しない防衛戦略としての自然言語説明の利用を検証するものである。また、生成した説明のセマンティックな妥当性をテストするための標準的な戦略が存在しないため、広範に使われている言語生成指標と人間の知覚との相関について検討し、それらが堅牢なNLIモデルへのプロキシとして機能するようにした。我々の手法は資源効率が良く再現可能であり、計算量に大きな制限はない。

The surge of state-of-the-art Transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks. We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since models trained on popular well-suited datasets are susceptible to adversarial attacks, allowing subtle input interventions to mislead the model. In this work, we validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation: only by fine-tuning a classifier on the explanation rather than premise-hypothesis inputs, robustness under various adversarial attacks is achieved in comparison to explanation-free baselines. Moreover, since there is no standard strategy of testing the semantic validity of the generated explanations, we research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models. Our approach is resource-efficient and reproducible without significant computational limitations.

翻訳日:2024-11-07 21:53:46 公開日:2024-10-06

# Faetarベンチマーク: 非常にアンダーソースな言語における音声認識

The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language ( http://arxiv.org/abs/2409.08103v2 )

ライセンス: Link先を確認

Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar,

(参考訳) 低リソース音声認識への現在のアプローチの限界を押し上げるために設計されたベンチマークコーパスであるFaetar Automatic Speech Recognition Benchmarkを導入する。フェタールは、主にイタリアで話されるフランコ・プロヴェン・c{c} の変種であり、標準的な正書法を持たず、ベンチマークに含まれるもの以外のテキストや音声のリソースはほとんどなく、他のフランコ・プロヴェン・c{c} の形式とは全く異なる。コーパスはフィールド録音に由来するが、ほとんどはノイズがあり、5時間しか一致した書き起こしがなく、強制的なアライメントは可変品質である。コーパスには、さらに20時間分の未収録のスピーチが含まれている。本稿では,現在最先端の多言語音声基礎モデルの音声誤り率30.4%のベースライン結果について報告する。

We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms of Franco-Proven\c{c}al. The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions, and for which forced alignment is of variable quality. The corpus contains an additional 20 hrs of unlabelled speech. We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%, using a pipeline that continues pre-training on the foundation model using the unlabelled set.

翻訳日:2024-11-07 21:31:36 公開日:2024-10-06

# Famba-V:クロス層トーケン融合による高速ビジョンマンバ

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion ( http://arxiv.org/abs/2409.09808v2 )

ライセンス: Link先を確認

Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang,

(参考訳) MambaとVision Mamba(Vim)モデルは、Transformerアーキテクチャに基づくメソッドの代替としての可能性を示している。この研究は、Vimモデルのトレーニング効率を高めるための層間トークン融合技術であるFast Mamba for Vision (Famba-V)を導入している。 Famba-Vの鍵となる考え方は、既存の作業が提案するすべてのレイヤに対してトークン融合を均一に適用するのではなく、異なるVim層にまたがって類似したトークンを識別し、融合することである。 CIFAR-100におけるFamba-Vの性能評価を行った。この結果から,Famba-Vはトレーニング中のトレーニング時間とピークメモリ使用量の両方を削減することで,Vimモデルのトレーニング効率を向上させることができることがわかった。さらに、提案したクロスレイヤー戦略により、Famba-Vはより優れた精度と効率のトレードオフを提供できる。これらの結果はいずれも、Famba-V を Vim モデルの有望な効率向上技術として実証している。

Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.

翻訳日:2024-11-07 20:46:36 公開日:2024-10-06

# Famba-V:クロス層トーケン融合による高速ビジョンマンバ

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion ( http://arxiv.org/abs/2409.09808v3 )

ライセンス: Link先を確認

Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang,

翻訳日:2024-11-07 20:46:36 公開日:2024-10-06

# 量子物理学のどの特徴は基本的に量子ではなく、不決定性によるものなのか?

Which features of quantum physics are not fundamentally quantum but are due to indeterminism? ( http://arxiv.org/abs/2409.10601v2 )

ライセンス: Link先を確認

Flavio Del Santo, Nicolas Gisin,

(参考訳) 量子とは何か? 我々は、測度問題、ウィグナーの友人パラドックスとその提案された解、単一粒子非局所性、および非閉化など、ほとんどの特徴、問題、パラドックスは、古典物理学を根本的非決定論的と解釈するならば、量子物理学に帰属するとされる古典的な類似性を持っていると論じる。量子物理学を真に特徴付けるものは、$\hbar$、すなわち非互換な観測可能量を含む現象のみに起因する。

What is fundamentally quantum? We argue that most of the features, problems, and paradoxes -- such as the measurement problem, the Wigner's friend paradox and its proposed solutions, single particle nonlocality, and no-cloning -- allegedly attributed to quantum physics have a clear classical analogue if one is to interpret classical physics as fundamentally indeterministic. What really characterizes quantum physics boils down only to phenomena that involve $\hbar$, i.e., incompatible observables.

翻訳日:2024-11-07 20:24:11 公開日:2024-10-06

# 先進的脅威属性の包括的調査--分類学,方法,課題,オープンリサーチ問題

A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems ( http://arxiv.org/abs/2409.11415v2 )

ライセンス: Link先を確認

Nanda Rani, Bikash Saha, Sandeep Kumar Shukla,

(参考訳) Advanced Persistent Threat (APT) アトリビューションはサイバーセキュリティにおいて重要な課題であり、高度なサイバー攻撃の背後にある犯人を正確に識別するプロセスを示している。防衛機構を大幅に強化し、戦略的な対応を通知することができる。人工知能(AI)と機械学習(ML)技術の普及に伴い、研究者たちは、サイバー脅威を責任あるアクターにリンクする自動化ソリューションの開発に注力し、従来の手作業の手法から遠ざかっている。自動帰属に関する以前の文献では、自動帰属プロセスに役立つ自動化された方法と関連するアーティファクトの体系的なレビューが欠けている。これらのギャップに対処し、脅威属性の現在の状況についてコンテキストを提供するため、自動化APT属性の総合的な調査を行う。この調査は、分散したアーティファクトの理解から始まり、貢献に役立つアーティファクトの包括的分類を提供する。我々は、利用可能な属性データセットと現在の自動化APT属性の分類を包括的にレビューし、提示する。さらに,現状の文献手法について批判的なコメントを出し,自動帰属の課題を議論し,オープンな研究課題へ向けた。この調査は、現在のギャップと課題に対処するため、今後のAPT貢献研究の機会を明らかにします。この調査は、現在の実践における強みと限界を特定することによって、自動化され、信頼性があり、実行可能なAPT帰属法における将来の研究と開発の基礎を提供する。

Advanced Persistent Threat (APT) attribution is a critical challenge in cybersecurity and implies the process of accurately identifying the perpetrators behind sophisticated cyber attacks. It can significantly enhance defense mechanisms and inform strategic responses. With the growing prominence of artificial intelligence (AI) and machine learning (ML) techniques, researchers are increasingly focused on developing automated solutions to link cyber threats to responsible actors, moving away from traditional manual methods. Previous literature on automated threat attribution lacks a systematic review of automated methods and relevant artifacts that can aid in the attribution process. To address these gaps and provide context on the current state of threat attribution, we present a comprehensive survey of automated APT attribution. The presented survey starts with understanding the dispersed artifacts and provides a comprehensive taxonomy of the artifacts that aid in attribution. We comprehensively review and present the classification of the available attribution datasets and current automated APT attribution methods. Further, we raise critical comments on current literature methods, discuss challenges in automated attribution, and direct toward open research problems. This survey reveals significant opportunities for future research in APT attribution to address current gaps and challenges. By identifying strengths and limitations in current practices, this survey provides a foundation for future research and development in automated, reliable, and actionable APT attribution methods.

翻訳日:2024-11-07 20:01:55 公開日:2024-10-06

Nanda Rani, Bikash Saha, Sandeep Kumar Shukla,

翻訳日:2024-11-07 20:01:55 公開日:2024-10-06

# EL素子の発注が加工性能に及ぼす影響

The Impact of Element Ordering on LM Agent Performance ( http://arxiv.org/abs/2409.12089v3 )

ライセンス: Link先を確認

Wayne Chi, Ameet Talwalkar, Chris Donahue,

(参考訳) Webやデスクトップなどの仮想環境をナビゲートできる言語モデルエージェントへの関心が高まっている。このような環境をナビゲートするために、エージェントは、様々な要素(例えば、ボタン、テキスト、画像)に関する情報から恩恵を受ける。特にグラフィカルな表現(ピクセル)のみを提供する環境では、どの要素属性がエージェントのパフォーマンスに最も大きな影響を与えるのかは不明だ。ここでは,言語モデルに要素が提示される順序付けが驚くほど影響を受けており,Webページ内のランダム化要素の順序付けはエージェントの状態表現からすべての可視テキストを削除することで,エージェントのパフォーマンスを両立させる。ウェブページは要素の階層的な順序付けを提供するが、ピクセルから直接要素を解析する際にそのような順序付けは存在しない。さらに、タスクがより困難になり、モデルがより洗練されるにつれて、我々の実験は注文の影響が増加することを示唆している。効果的な注文を見つけることは簡単ではない。ウェブおよびデスクトップ環境における各種要素順序付け手法の影響について検討する。我々は, 画素のみの環境において, 次元の減少が実効的な順序付けをもたらすことを見出した。 UI要素の検出モデルをトレーニングして、ピクセルから要素を抽出し、その結果をエージェントベンチマーク(OmniACT)に適用します。本手法は,従来の最先端技術と比較して平均2倍以上のタスクを完了させる。

There has been a surge of interest in language model agents that can navigate virtual environments such as the web or desktop. To navigate such environments, agents benefit from information on the various elements (e.g., buttons, text, or images) present. It remains unclear which element attributes have the greatest impact on agent performance, especially in environments that only provide a graphical representation (i.e., pixels). Here we find that the ordering in which elements are presented to the language model is surprisingly impactful--randomizing element ordering in a webpage degrades agent performance comparably to removing all visible text from an agent's state representation. While a webpage provides a hierarchical ordering of elements, there is no such ordering when parsing elements directly from pixels. Moreover, as tasks become more challenging and models more sophisticated, our experiments suggest that the impact of ordering increases. Finding an effective ordering is non-trivial. We investigate the impact of various element ordering methods in web and desktop environments. We find that dimensionality reduction provides a viable ordering for pixel-only environments. We train a UI element detection model to derive elements from pixels and apply our findings to an agent benchmark--OmniACT--where we only have access to pixels. Our method completes more than two times as many tasks on average relative to the previous state-of-the-art.

翻訳日:2024-11-07 19:26:16 公開日:2024-10-06

# 平均流路距離における量子チャネル試験

Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v1 )

ライセンス: Link先を確認

Hugo Aaronson, Gregory Rosenthal, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,

(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離では$\Omega(\sqrt{d_{\mathrm{in}} / \varepsilon})$クエリを必要とする。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。

We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}} / \varepsilon})$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.

翻訳日:2024-11-07 14:19:13 公開日:2024-10-06

# 平均流路距離における量子チャネル試験

Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v2 )

ライセンス: Link先を確認

Hugo Aaronson, Gregory Rosenthal, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,

(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離において$\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$クエリを必要とすることを示す。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。

We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.

翻訳日:2024-11-07 14:19:13 公開日:2024-10-06

# 平均流路距離における量子チャネル試験

Quantum Channel Testing in Average-Case Distance ( http://arxiv.org/abs/2409.12566v3 )

ライセンス: Link先を確認

Gregory Rosenthal, Hugo Aaronson, Sathyawageeswar Subramanian, Animesh Datta, Tom Gur,

(参考訳) 量子チャネルの試験特性の複雑さについて検討する。まず、任意のチャネルに対して、$\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}}} \times d_{\mathrm{out}}}$ダイヤモンドノルム距離において$\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$クエリを必要とすることを示す。これは、ダイヤモンドノルムによって誘導される距離の最悪のケースの性質に起因する。この制限やその他の理論的および実践的な応用によって、ダイヤモンド標準の平均ケースアナログを導入し、これを平均ケース模倣ダイヤモンド標準(ACID)と呼ぶ。アンシラ、コヒーレンス、適応性のない最も弱いアルゴリズムモデルでは、シリコーン距離の特定の種類のチャネルに対する同一性をテストすることは、チャネルの次元に依存しない複雑さで行うことができるが、他の種類のチャネルでは、複雑さは入力次元と出力次元の両方に依存する。以前の研究に基づいて、固定チャネルに対する同一性は、$\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ACID距離のクエリと$\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$このモデルにおけるダイヤモンド距離のクエリで検証できることを示す。最後に, チャネルトモグラフィーの複雑度と酸距離との密接な関係を証明した。

We study the complexity of testing properties of quantum channels. First, we show that testing identity to any channel $\mathcal N: \mathbb C^{d_{\mathrm{in}} \times d_{\mathrm{in}}} \to \mathbb C^{d_{\mathrm{out}} \times d_{\mathrm{out}}}$ in diamond norm distance requires $\Omega(\sqrt{d_{\mathrm{in}}} / \varepsilon)$ queries, even in the strongest algorithmic model that admits ancillae, coherence, and adaptivity. This is due to the worst-case nature of the distance induced by the diamond norm. Motivated by this limitation and other theoretical and practical applications, we introduce an average-case analogue of the diamond norm, which we call the average-case imitation diamond (ACID) norm. In the weakest algorithmic model without ancillae, coherence, or adaptivity, we prove that testing identity to certain types of channels in ACID distance can be done with complexity independent of the dimensions of the channel, while for other types of channels the complexity depends on both the input and output dimensions. Building on previous work, we also show that identity to any fixed channel can be tested with $\tilde O(d_{\mathrm{in}} d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in ACID distance and $\tilde O(d_{\mathrm{in}}^2 d_{\mathrm{out}}^{3/2} / \varepsilon^2)$ queries in diamond distance in this model. Finally, we prove tight bounds on the complexity of channel tomography in ACID distance.

翻訳日:2024-11-07 14:19:13 公開日:2024-10-06

# MaPPER: 表現理解の参照に有効なマルチモーダル事前誘導パラメータチューニング

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension ( http://arxiv.org/abs/2409.13609v1 )

ライセンス: Link先を確認

Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin,

(参考訳) Referring Expression Comprehension (REC) は、自然言語を介して局所的な視覚領域を接地することを目的としており、マルチモーダルアライメントに大きく依存するタスクである。既存のほとんどの方法は、強力な事前訓練されたモデルを使用して、完全な微調整によって視覚的/言語的な知識を伝達する。しかし、バックボーン全体の完全な微調整は、事前学習に埋め込まれた豊富な事前知識を損なうだけでなく、計算コストも著しく低下させる。近年,パラメータ効率のよい移動学習法(PETL)が出現し,その課題を効果的かつ効率的に解決することを目指している。これらのPETL法をRECタスクに直接適用するのは不適切である。そこで本研究では,マルチモーダル事前誘導パラメーター効率チューニング(MaPPER)の新たなフレームワークを提案する。具体的には、MaPPERは、アライメントされた事前でガイドされる動的プリエントアダプタと、より正確なローカルセマンティクスを抽出して視覚的知覚を改善するローカルコンボリューションアダプタから構成される。さらに、事前ガイド付きテキストモジュールは、相互モーダルアライメントを容易にするために、事前の利用をさらに促進するために提案されている。 3つの広く使用されているベンチマーク実験の結果、MaPPERは11.41%の調整可能なバックボーンパラメータを持つ完全微調整や他のPETL法と比較して、最も精度が高いことが示された。

Referring Expression Comprehension (REC), which aims to ground a local visual region via natural language, is a task that heavily relies on multimodal alignment. Most existing methods utilize powerful pre-trained models to transfer visual/linguistic knowledge by full fine-tuning. However, full fine-tuning the entire backbone not only breaks the rich prior knowledge embedded in the pre-training, but also incurs significant computational costs. Motivated by the recent emergence of Parameter-Efficient Transfer Learning (PETL) methods, we aim to solve the REC task in an effective and efficient manner. Directly applying these PETL methods to the REC task is inappropriate, as they lack the specific-domain abilities for precise local visual perception and visual-language alignment. Therefore, we propose a novel framework of Multimodal Prior-guided Parameter Efficient Tuning, namely MaPPER. Specifically, MaPPER comprises Dynamic Prior Adapters guided by a aligned prior, and Local Convolution Adapters to extract precise local semantics for better visual perception. Moreover, the Prior-Guided Text module is proposed to further utilize the prior for facilitating the cross-modal alignment. Experimental results on three widely-used benchmarks demonstrate that MaPPER achieves the best accuracy compared to the full fine-tuning and other PETL methods with only 1.41% tunable backbone parameters.

翻訳日:2024-11-07 06:19:44 公開日:2024-10-06

Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin,

(参考訳) Referring Expression Comprehension (REC) は、自然言語を介して局所的な視覚領域を接地することを目的としており、マルチモーダルアライメントに大きく依存するタスクである。既存のほとんどの方法は、強力な事前訓練されたモデルを使用して、完全な微調整によって視覚的/言語的な知識を伝達する。しかし、バックボーン全体の完全な微調整は、事前学習に埋め込まれた豊富な事前知識を損なうだけでなく、計算コストも著しく低下させる。近年,パラメータ効率のよい移動学習法(PETL)が出現し,その課題を効果的かつ効率的に解決することを目指している。これらのPETL法をRECタスクに直接適用するのは不適切である。そこで本研究では,マルチモーダル事前誘導パラメーター効率チューニング(MaPPER)の新たなフレームワークを提案する。具体的には、MaPPERは、アライメントされた事前でガイドされる動的プリエントアダプタと、より正確なローカルセマンティクスを抽出して視覚的知覚を改善するローカルコンボリューションアダプタから構成される。さらに、事前ガイド付きテキストモジュールは、相互モーダルアライメントを容易にするために、事前の利用をさらに促進するために提案されている。 3つの広く使用されているベンチマーク実験の結果、MaPPERは11.41%の調整可能なバックボーンパラメータを持つ完全微調整や他のPETL法と比較して、最も精度が高いことが示された。私たちのコードはhttps://github.com/liuting20/MaPPERで利用可能です。

Referring Expression Comprehension (REC), which aims to ground a local visual region via natural language, is a task that heavily relies on multimodal alignment. Most existing methods utilize powerful pre-trained models to transfer visual/linguistic knowledge by full fine-tuning. However, full fine-tuning the entire backbone not only breaks the rich prior knowledge embedded in the pre-training, but also incurs significant computational costs. Motivated by the recent emergence of Parameter-Efficient Transfer Learning (PETL) methods, we aim to solve the REC task in an effective and efficient manner. Directly applying these PETL methods to the REC task is inappropriate, as they lack the specific-domain abilities for precise local visual perception and visual-language alignment. Therefore, we propose a novel framework of Multimodal Prior-guided Parameter Efficient Tuning, namely MaPPER. Specifically, MaPPER comprises Dynamic Prior Adapters guided by an aligned prior, and Local Convolution Adapters to extract precise local semantics for better visual perception. Moreover, the Prior-Guided Text module is proposed to further utilize the prior for facilitating the cross-modal alignment. Experimental results on three widely-used benchmarks demonstrate that MaPPER achieves the best accuracy compared to the full fine-tuning and other PETL methods with only 1.41% tunable backbone parameters. Our code is available at https://github.com/liuting20/MaPPER.

翻訳日:2024-11-07 06:19:44 公開日:2024-10-06

# Obliviate:パラメータ効率のよい微調整パラダイムにおけるタスク非依存のバックドアの中立化

Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm ( http://arxiv.org/abs/2409.14119v1 )

ライセンス: Link先を確認

Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,

(参考訳) パラメータ効率のよい微調整(PEFT)は,大規模言語モデルにおいて重要な訓練戦略となっている。しかし、トレーニング可能なパラメータが少ないため、タスクに依存しないバックドアのようなセキュリティリスクが生じる。幅広いタスクに深刻な影響を与えるにもかかわらず、PEFTのコンテキスト内でタスク非依存のバックドアを効果的に対処する実用的な防御ソリューションは存在しない。本研究では,PEFT統合型バックドアディフェンスであるObliviateを紹介する。我々は,PEFT層内の良性ニューロンを増幅し,トリガートークンの影響を罰する2つの手法を開発した。本手法は,3つのPEFTアーキテクチャを対象とした評価により,最先端のタスク非依存バックドア(83.6%$\downarrow$)の攻撃成功率を大幅に低減できることを示す。さらに,タスク固有のバックドアとアダプティブアタックに対する堅牢な防御能力を示す。ソースコードはhttps://github.com/obliviateARR/Obliviateで取得できる。

Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.

翻訳日:2024-11-07 03:33:25 公開日:2024-10-06

Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,

翻訳日:2024-11-07 03:33:25 公開日:2024-10-06

Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin,

翻訳日:2024-11-07 03:33:25 公開日:2024-10-06

# KISS-Matcher: 高速でロバストなクラウド登録が再検討

KISS-Matcher: Fast and Robust Point Cloud Registration Revisited ( http://arxiv.org/abs/2409.15615v2 )

ライセンス: Link先を確認

Hyungtae Lim, Daebeom Kim, Gunhee Shin, Jingnan Shi, Ignacio Vizzo, Hyun Myung, Jaesik Park, Luca Carlone,

(参考訳) グローバルポイントクラウド登録システムはあらゆる面で大きく進歩しているが、多くの研究は特徴抽出、グラフ理論プルーニング、ポーズソルバといった特定のコンポーネントに焦点を当てている。本稿では,この登録問題を総合的に考察し,ポイントクラウド登録のためのオープンソースで汎用的なC++ライブラリである「textit{KISS-Matcher}」を開発する。 KISS-Matcherは、古典的なファストポイント特徴ヒストグラム(FPFH)を改善する新しい特徴検出器 \textit{Faster-PFH} を組み合わせる。さらに、$k$-core-based graph-theoretic pruningを採用して、外れ値対応を拒否する時間の複雑さを低減する。最後に、これらのモジュールを完全で、ユーザフレンドリで、使用可能なパイプラインに統合する。広範な実験によって検証されたように、KISS-Matcherはスケーラビリティと広範囲な適用性に優れており、精度を維持しながら最先端のアウトリア・ロバスト登録パイプラインに比べて大幅に高速化されている。私たちのコードは、 \href{https://github.com/MIT-SPARK/KISS-Matcher}{\texttt{https://github.com/MIT-SPARK/KISS-Matcher}}で利用可能です。

While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers. In this paper, we take a holistic view on the registration problem and develop an open-source and versatile C++ library for point cloud registration, called \textit{KISS-Matcher}. KISS-Matcher combines a novel feature detector, \textit{Faster-PFH}, that improves over the classical fast point feature histogram (FPFH). Moreover, it adopts a $k$-core-based graph-theoretic pruning to reduce the time complexity of rejecting outlier correspondences. Finally, it combines these modules in a complete, user-friendly, and ready-to-use pipeline. As verified by extensive experiments, KISS-Matcher has superior scalability and broad applicability, achieving a substantial speed-up compared to state-of-the-art outlier-robust registration pipelines while preserving accuracy. Our code will be available at \href{https://github.com/MIT-SPARK/KISS-Matcher}{\texttt{https://github.com/MIT-SPARK/KISS-Matcher}}.

翻訳日:2024-11-06 19:32:29 公開日:2024-10-06

# AI安全のためのマシン・アンラーニングの敵対的展望

An Adversarial Perspective on Machine Unlearning for AI Safety ( http://arxiv.org/abs/2409.18025v2 )

ライセンス: Link先を確認

Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando,

(参考訳) 大きな言語モデルは、有害な知識に関する質問を拒否するために微調整されているが、これらの保護はしばしばバイパスされる。アンラーニング手法は、モデルから有害な能力を完全に取り除き、敵に近づかないようにすることを目的としている。この研究は、非学習と従来の訓練後の安全性の基本的な相違に敵対的な観点から挑戦する。既存のjailbreakメソッドは、これまで未学習に対して効果がないと報告されていたが、慎重に適用した場合に成功できることを実証する。さらに、最も未学習と思われる能力を回復する様々な適応手法を開発した。例えば、アクティベーション空間における10の非関連例の微調整や特定の方向の除去は、最先端の未学習手法であるRMUで編集されたモデルに対して最も有害な能力を回復できることを示す。我々の研究は、現在の未学習アプローチの堅牢性に挑戦し、安全性トレーニングよりも彼らの優位性に疑問を投げかけている。

Large language models are finetuned to refuse questions about hazardous knowledge, but these protections can often be bypassed. Unlearning methods aim at completely removing hazardous capabilities from models and make them inaccessible to adversaries. This work challenges the fundamental differences between unlearning and traditional safety post-training from an adversarial perspective. We demonstrate that existing jailbreak methods, previously reported as ineffective against unlearning, can be successful when applied carefully. Furthermore, we develop a variety of adaptive methods that recover most supposedly unlearned capabilities. For instance, we show that finetuning on 10 unrelated examples or removing specific directions in the activation space can recover most hazardous capabilities for models edited with RMU, a state-of-the-art unlearning method. Our findings challenge the robustness of current unlearning approaches and question their advantages over safety training.

翻訳日:2024-11-06 15:51:02 公開日:2024-10-06

# 因果推論エンジンとしての深部自己回帰モデル

Using Deep Autoregressive Models as Causal Inference Engines ( http://arxiv.org/abs/2409.18581v2 )

ライセンス: Link先を確認

Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho,

(参考訳) 既存の因果推論(CI)モデルは、主に低次元の共同設立者とシングルトンアクションを扱うことに限られている。本稿では,現代アプリケーションに共通する複雑な共同創設者とシーケンシャルアクションを処理可能な自己回帰型(AR)CIフレームワークを提案する。このことは、基礎となる因果線図からトークンの列に変換することによって達成される。このアプローチは、任意のDAGから生成されたデータによるトレーニングを可能にするだけでなく、既存のCI機能を拡張して、.em single}モデルを使用していくつかの統計量の推定を可能にする。介入確率を直接予測し、推論を簡素化し、結果予測精度を向上することができる。我々は,CIに適応したARモデルは,迷路をナビゲートしたり,チェスのエンドゲームを行ったり,あるキーワードが紙の受容率に与える影響を評価するなど,様々な複雑な応用において効率的かつ効果的であることが実証された。

Existing causal inference (CI) models are limited to primarily handling low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions common in modern applications. We accomplish this by {\em sequencification}, transforming data from an underlying causal diagram into a sequence of tokens. This approach not only enables training with data generated from any DAG but also extends existing CI capabilities to accommodate estimating several statistical quantities using a {\em single} model. We can directly predict interventional probabilities, simplifying inference and enhancing outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.

翻訳日:2024-11-06 05:42:34 公開日:2024-10-06

# ワープ合成によるニューラル製品重要度サンプリング

Neural Product Importance Sampling via Warp Composition ( http://arxiv.org/abs/2409.18974v1 )

ライセンス: Link先を確認

Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev,

(参考訳) 現代のフォトリアリスティックレンダリングのヒンジにおいて高効率を達成するには、各ピクセルで推定される照明積分を近似したモンテカルロサンプリング分布を用いる。サンプルは通常、単純な分布の集合から生成され、それぞれがインテグレードの異なる因子をターゲットにしており、複数の重要なサンプリングによって結合される。結果として生じる混合分布は、すべての因子の実際の生成物から遠く離れており、直接照明推定においても準最適分散をもたらす。本稿では, 環境照明や材料用語の積である試料照明製品積分を効率よく重要にするために, 正規化フローを用いた学習に基づく手法を提案する。サンプルはエミッタテールワープでフローヘッドワープを構成する。小型のコンディショナルヘッドワープはニューラルスプラインフローで表現され、大型のアンコンディショナルテールは環境マップ毎に離散化され、その評価は瞬時に行われる。コンディショニングが低次元であれば、ヘッドワープを識別してより優れた性能が得られる。複雑な幾何学, 材料, 照明などを含む様々な応用において, 先行手法による分散の低減を実証する。

Achieving high efficiency in modern photorealistic rendering hinges on using Monte Carlo sampling distributions that closely approximate the illumination integral estimated for every pixel. Samples are typically generated from a set of simple distributions, each targeting a different factor in the integrand, which are combined via multiple importance sampling. The resulting mixture distribution can be far from the actual product of all factors, leading to sub-optimal variance even for direct-illumination estimation. We present a learning-based method that uses normalizing flows to efficiently importance sample illumination product integrals, e.g., the product of environment lighting and material terms. Our sampler composes a flow head warp with an emitter tail warp. The small conditional head warp is represented by a neural spline flow, while the large unconditional tail is discretized per environment map and its evaluation is instant. If the conditioning is low-dimensional, the head warp can be also discretized to achieve even better performance. We demonstrate variance reduction over prior methods on a range of applications comprising complex geometry, materials and illumination.

翻訳日:2024-11-06 05:22:52 公開日:2024-10-06

# ワープ合成によるニューラル製品重要度サンプリング

Neural Product Importance Sampling via Warp Composition ( http://arxiv.org/abs/2409.18974v2 )

ライセンス: Link先を確認

Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev,

翻訳日:2024-11-06 05:22:52 公開日:2024-10-06

# オンライン直接選好最適化におけるサンプリングの役割

The Crucial Role of Samplers in Online Direct Preference Optimization ( http://arxiv.org/abs/2409.19605v1 )

ライセンス: Link先を確認

Ruizhe Shi, Runlong Zhou, Simon S. Du,

(参考訳) DPO(Direct Preference Optimization)は、言語モデルアライメントのための安定的でスケーラブルで効率的なソリューションとして登場した。経験的な成功にもかかわらず、$\textit{optimization}$プロパティ、特に、その収束率に対するサンプルの影響は未定のままである。本稿では,DPO の $\textit{convergence rate}$ の厳密な分析を行い,厳密な勾配設定の下で異なるサンプリング戦略を用いて,一様サンプリングが $\textit{linear}$ 収束を達成し,提案するオンラインサンプリングは $\textit{quadratic}$ 収束を達成した。さらに、後続分布と$\textit{logit mix}$を組み込むことにより、サンプルを実用的な設定に適応させ、従来のアプローチよりも大幅に改善したことを示す。 Safe-RLHFデータセットでは,バニラDPOよりも4.5ドル%,オンポラDPOより3.0ドル%,Iterative-PromptではバニラDPO,オンポラDPO,Hybrid GSHFよりも4.2ドル%向上した。我々の結果は、DPOの理論的立場に関する洞察を提供するだけでなく、将来的なアルゴリズム設計の道を開いた。

Direct Preference Optimization (DPO) has emerged as a stable, scalable, and efficient solution for language model alignment. Despite its empirical success, the $\textit{optimization}$ properties, particularly the impact of samplers on its convergence rates, remain underexplored. In this paper, we provide a rigorous analysis of DPO's $\textit{convergence rates}$ with different sampling strategies under the exact gradient setting, revealing a surprising separation: uniform sampling achieves $\textit{linear}$ convergence, while our proposed online sampler achieves $\textit{quadratic}$ convergence. We further adapt the sampler to practical settings by incorporating posterior distributions and $\textit{logit mixing}$, demonstrating significant improvements over previous approaches. On Safe-RLHF dataset, our method exhibits a $4.5$% improvement over vanilla DPO and a $3.0$% improvement over on-policy DPO; on Iterative-Prompt, our approach outperforms vanilla DPO, on-policy DPO, and Hybrid GSHF by over $4.2$%. Our results not only offer insights into the theoretical standing of DPO but also pave the way for potential algorithm designs in the future.

翻訳日:2024-11-05 22:18:46 公開日:2024-10-06

# オンライン直接選好最適化におけるサンプリングの役割

The Crucial Role of Samplers in Online Direct Preference Optimization ( http://arxiv.org/abs/2409.19605v2 )

ライセンス: Link先を確認

Ruizhe Shi, Runlong Zhou, Simon S. Du,

翻訳日:2024-11-05 22:18:46 公開日:2024-10-06

# ラディアタパインブランチ検出と距離測定のためのドローンステレオビジョン:ディープラーニングとYOLOの統合を活用して

Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration ( http://arxiv.org/abs/2410.00503v1 )

ライセンス: Link先を確認

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green,

(参考訳) 本研究は,木の枝の空間的位置を正確に検出・測定する,刈り取り工具とステレオビジョンカメラを備えたドローンの開発に焦点をあてる。分岐セグメンテーションにはYOLOを用い, モノクラーとステレオの2つの深度推定手法について検討した。 SGBMと比較して、ディープラーニング技術はより洗練され正確な深度マップを生成する。深部ニューラルネットワークを用いた微調整処理を最適深度値の近似に応用した。この手法は正確な分岐検出と距離測定を容易にし、刈り取り作業の自動化における重要な課題に対処する。その結果、農業分野におけるイノベーションの推進と自動化の促進を深層学習がもたらす可能性について、精度と効率の両面で顕著な進歩が示された。

This research focuses on the development of a drone equipped with pruning tools and a stereo vision camera to accurately detect and measure the spatial positions of tree branches. YOLO is employed for branch segmentation, while two depth estimation approaches, monocular and stereo, are investigated. In comparison to SGBM, deep learning techniques produce more refined and accurate depth maps. In the absence of ground-truth data, a fine-tuning process using deep neural networks is applied to approximate optimal depth values. This methodology facilitates precise branch detection and distance measurement, addressing critical challenges in the automation of pruning operations. The results demonstrate notable advancements in both accuracy and efficiency, underscoring the potential of deep learning to drive innovation and enhance automation in the agricultural sector.

翻訳日:2024-11-05 05:16:55 公開日:2024-10-06

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green,

翻訳日:2024-11-05 05:16:55 公開日:2024-10-06

# 機械学習によるIoTセキュリティ向上のための侵入検知

Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security ( http://arxiv.org/abs/2410.01016v1 )

ライセンス: Link先を確認

Mona Esmaeili, Morteza Rahimi, Matin Khajavi, Dorsa Farahmand, Hadi Jabbari Saray,

(参考訳) IoT(Internet of Things)に対する攻撃は、デバイス、アプリケーション、インタラクションのネットワーク化と統合化が進むにつれて増加している。 IoTネットワークをターゲットにしたサイバー攻撃の増加は、プライバシ、セキュリティ、機能、重要なシステムの可用性に重大な脆弱性と脅威をもたらし、運用上の障害、財務的損失、ID盗難、データ漏洩につながる。 IoTデバイスを効率的にセキュアにするためには、侵入システムのリアルタイム検出が不可欠だ。本稿では、IoTセキュリティのための機械学習による侵入検出戦略に関する最新の研究について、リアルタイム応答性、検出精度、アルゴリズム効率に集中して検討する。主要な研究は、よく知られたすべての学術データベースからレビューされ、既存のアプローチのための分類学が提供された。このレビューでは、既存の研究ギャップを強調し、現在のIoTセキュリティフレームワークの限界を概説し、将来の研究の方向性と開発に実用的な洞察を提供する。

Attacks against the Internet of Things (IoT) are rising as devices, applications, and interactions become more networked and integrated. The increase in cyber-attacks that target IoT networks poses a huge vulnerability and threat to the privacy, security, functionality, and availability of critical systems, which leads to operational disruptions, financial losses, identity thefts, and data breaches. To efficiently secure IoT devices, real-time detection of intrusion systems is critical, especially those using machine learning to identify threats and mitigate risks and vulnerabilities. This paper investigates the latest research on machine learning-based intrusion detection strategies for IoT security, concentrating on real-time responsiveness, detection accuracy, and algorithm efficiency. Key studies were reviewed from all well-known academic databases, and a taxonomy was provided for the existing approaches. This review also highlights existing research gaps and outlines the limitations of current IoT security frameworks to offer practical insights for future research directions and developments.

翻訳日:2024-11-04 23:40:11 公開日:2024-10-06

# 機械学習によるIoTセキュリティ向上のための侵入検知

Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security ( http://arxiv.org/abs/2410.01016v2 )

ライセンス: Link先を確認

Mona Esmaeili, Morteza Rahimi, Hadise Pishdast, Dorsa Farahmandazad, Matin Khajavi, Hadi Jabbari Saray,

(参考訳) IoT(Internet of Things)に対する攻撃は、デバイス、アプリケーション、インタラクションのネットワーク化と統合化が進むにつれて増加している。 IoTネットワークをターゲットにしたサイバー攻撃の増加は、プライバシ、セキュリティ、機能、重要なシステムの可用性に重大な脆弱性と脅威をもたらし、運用上の障害、財務的損失、ID盗難、データ漏洩につながる。 IoTデバイスを効率的にセキュアにするためには、侵入システムのリアルタイム検出が不可欠だ。本稿では、IoTセキュリティのための機械学習による侵入検出戦略に関する最新の研究について、リアルタイム応答性、検出精度、アルゴリズム効率に集中して検討する。主要な研究は、よく知られたすべての学術データベースからレビューされ、既存のアプローチのための分類学が提供された。このレビューでは、既存の研究ギャップを強調し、現在のIoTセキュリティフレームワークの限界を概説し、将来の研究の方向性と開発に実用的な洞察を提供する。

Attacks against the Internet of Things (IoT) are rising as devices, applications, and interactions become more networked and integrated. The increase in cyber-attacks that target IoT networks poses a considerable vulnerability and threat to the privacy, security, functionality, and availability of critical systems, which leads to operational disruptions, financial losses, identity thefts, and data breaches. To efficiently secure IoT devices, real-time detection of intrusion systems is critical, especially those using machine learning to identify threats and mitigate risks and vulnerabilities. This paper investigates the latest research on machine learning-based intrusion detection strategies for IoT security, concentrating on real-time responsiveness, detection accuracy, and algorithm efficiency. Key studies were reviewed from all well-known academic databases, and a taxonomy was provided for the existing approaches. This review also highlights existing research gaps and outlines the limitations of current IoT security frameworks to offer practical insights for future research directions and developments.

翻訳日:2024-11-04 23:40:11 公開日:2024-10-06

# GaussianBlock: プリミティブとガウシアンによるパートアウェアな構成と編集可能な3Dシーンの構築

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians ( http://arxiv.org/abs/2410.01535v1 )

ライセンス: Link先を確認

Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao,

(参考訳) 近年, ニューラルレージアン場とガウススプラッティングの発展に伴い, 3次元再構成技術は極めて高い忠実性を実現している。しかし、これらの手法によって学習される潜在表現は非常に絡み合っており、解釈可能性に欠ける。本稿では,GussianBlockと呼ばれる新しい部分認識型合成再構成手法を提案する。これは意味的一貫性と非絡み合いの表現を可能にし,高い忠実さを同時に維持しつつ,ビルディングブロックに類似した正確な物理的編集を可能にする。我々のGaussianBlockは、フレキシブルな動作性と編集性で知られるプリミティブと、再現性に優れた3D Gaussianの両方の利点を生かしたハイブリッド表現を導入しています。具体的には,2次元のセマンティックプリミティブから誘導される新たな注意誘導中心的損失を,動的分裂と融合戦略によって補うことによって,意味的コヒーレントなプリミティブを実現する。さらに, プリミティブとハイブリダイゼーションした3次元ガウスアンを用いて, 構造的詳細を洗練し, 忠実度を高める。さらに、この2つの接続を強化し維持するために、バインディング継承戦略が採用されている。再構成されたシーンは、様々なベンチマークで絡み合っていて、構成的でコンパクトで、シームレスで、直接的で、正確な編集が可能で、高品質を維持しています。

Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.

翻訳日:2024-11-04 17:14:45 公開日:2024-10-06

Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao,

翻訳日:2024-11-04 17:14:45 公開日:2024-10-06

# 心臓MRIの総合的評価のためのビジョン基礎モデルに向けて

Towards a vision foundation model for comprehensive assessment of Cardiac MRI ( http://arxiv.org/abs/2410.01665v1 )

ライセンス: Link先を確認

Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert,

(参考訳) 心臓磁気共鳴イメージング(CMR)は、非侵襲的心臓アセスメントのゴールドスタンダードと考えられており、多種多様な画像処理タスクを必要とする多種多様な複雑なモダリティである。ディープラーニングの進歩により、これらのタスクのための最先端(SoTA)モデルの開発が可能になった。しかし、モデルトレーニングは、特にあまり一般的でない画像シーケンスにおいて、データとラベルの不足のために困難である。さらに、各モデルは特定のタスクに対してトレーニングされることが多く、関連するタスクの間には関連性がない。本研究では,3600万枚のCMR画像に対して,自己教師付きで訓練したCMR評価のための視覚基礎モデルを提案する。次に、分類、セグメント化、ランドマークの局在化、病理診断など、CMRワークフローに典型的な9つの臨床的タスクの教師付き方法でモデルを微調整する。すべてのタスクにおいて、ラベル付きデータセットサイズの範囲で、精度と堅牢性が改善されたことを実証する。また,画像解析の課題として,ラベル付きサンプルの少なさによる数ショット学習の改善も示した。我々は,ほとんどの臨床作業において,SoTAに匹敵するアウト・オブ・ボックス性能を実現する。提案手法は,注記データが少ない場合でも,画像解析タスクのための深層学習ベースのソリューションの開発を加速する可能性があり,CMR評価のための資源効率,統一的なフレームワークを提供する。

Cardiac magnetic resonance imaging (CMR), considered the gold standard for noninvasive cardiac assessment, is a diverse and complex modality requiring a wide variety of image processing tasks for comprehensive assessment of cardiac morphology and function. Advances in deep learning have enabled the development of state-of-the-art (SoTA) models for these tasks. However, model training is challenging due to data and label scarcity, especially in the less common imaging sequences. Moreover, each model is often trained for a specific task, with no connection between related tasks. In this work, we introduce a vision foundation model trained for CMR assessment, that is trained in a self-supervised fashion on 36 million CMR images. We then finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow, across classification, segmentation, landmark localization, and pathology detection. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes. We also demonstrate improved few-shot learning with fewer labeled samples, a common challenge in medical image analyses. We achieve an out-of-box performance comparable to SoTA for most clinical tasks. The proposed method thus presents a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.

翻訳日:2024-11-04 16:13:24 公開日:2024-10-06

# 心臓MRIの総合的評価のためのビジョン基礎モデルに向けて

Towards a vision foundation model for comprehensive assessment of Cardiac MRI ( http://arxiv.org/abs/2410.01665v2 )

ライセンス: Link先を確認

Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert,

翻訳日:2024-11-04 16:13:24 公開日:2024-10-06

# FARM: 小分子の関数型グループ認識表現

FARM: Functional Group-Aware Representations for Small Molecules ( http://arxiv.org/abs/2410.02082v1 )

ライセンス: Link先を確認

Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji,

(参考訳) SMILES,自然言語,分子グラフのギャップを埋める新しい基礎モデルであるFARM(Functional Group-Aware Representations for Small Molecules)を紹介する。 FARMの鍵となる革新は、関数型グループ認識トークン化であり、関数型グループ情報を表現に直接組み込む。この戦略的なトークン化粒度の減少は、機能的特性の主要な要因(すなわち、官能基)と意図的に相互作用し、化学言語に対するモデルの理解を高め、化学レキシコンを拡張し、SMILESと自然言語をより効果的にブリッジし、最終的に分子特性を予測する能力を向上させる。 FARMはまた、原子レベルの特徴を捉えるためにマスク付き言語モデリングを使用することと、分子トポロジ全体を符号化するためにグラフニューラルネットワークを使用することである。対照的な学習を活用することで、FARMはこれらの2つの表現のビューを統一された分子埋め込みに整列させる。 MoleculeNetデータセット上でFARMを厳格に評価し、12タスク中10タスクで最先端のパフォーマンスを実現しています。これらの結果は、FARMが分子表現学習を改善する可能性を浮き彫りにし、医薬品発見や薬学研究に有望な応用が期待できる。

We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which incorporates functional group information directly into the representations. This strategic reduction in tokenization granularity in a way that is intentionally interfaced with key drivers of functional properties (i.e., functional groups) enhances the model's understanding of chemical language, expands the chemical lexicon, more effectively bridging SMILES and natural language, and ultimately advances the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. By leveraging contrastive learning, FARM aligns these two views of representations into a unified molecular embedding. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research.

翻訳日:2024-11-04 09:05:40 公開日:2024-10-06

# FARM: 小分子の関数型グループ認識表現

FARM: Functional Group-Aware Representations for Small Molecules ( http://arxiv.org/abs/2410.02082v2 )

ライセンス: Link先を確認

Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji,

(参考訳) SMILES,自然言語,分子グラフのギャップを埋める新しい基礎モデルであるFARM(Functional Group-Aware Representations for Small Molecules)を紹介する。 FARMの鍵となる革新は、関数型グループ認識トークン化であり、関数型グループ情報を表現に直接組み込む。このトークン化の粒度の戦略的削減は、故意に機能的特性(すなわち、機能的群)のキードライバと一致し、モデルの化学言語に対する理解を深める。化学レキシコンを拡大することにより、FARMはSMILESと自然言語をより効果的に橋渡しし、最終的にモデルの能力を高めて分子特性を予測する。 FARMはまた、原子レベルの特徴を捉えるためにマスク付き言語モデリングを使用することと、分子トポロジ全体を符号化するためにグラフニューラルネットワークを使用することである。対照的な学習を活用することで、FARMはこれらの2つの表現のビューを統一された分子埋め込みに整列させる。 MoleculeNetデータセット上でFARMを厳格に評価し、12タスク中10タスクで最先端のパフォーマンスを実現しています。これらの結果は、FARMが分子表現学習を改善する可能性を浮き彫りにし、医薬品発見や薬学研究に有望な応用が期待できる。

We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i.e., functional groups), enhancing the model's understanding of chemical language. By expanding the chemical lexicon, FARM more effectively bridges SMILES and natural language, ultimately advancing the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. By leveraging contrastive learning, FARM aligns these two views of representations into a unified molecular embedding. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research.

翻訳日:2024-11-04 09:05:40 公開日:2024-10-06

# 分散システムのモデル誘導ファジィリング

Model-guided Fuzzing of Distributed Systems ( http://arxiv.org/abs/2410.02307v1 )

ライセンス: Link先を確認

Ege Berkay Gulcan, Burcu Kulahcioglu Ozkan, Rupak Majumdar, Srinidhi Nagendra,

(参考訳) 本稿では,分散システム実装のためのカバレッジ誘導テストアルゴリズムを提案する。私たちの主な革新は、カバレッジを定義するために使用されるシステムの抽象的な形式モデルを使用することです。このような抽象モデルはプロトコル設計と検証の初期段階でしばしば開発されるが、テスト時にはあまり使われない。モデルカバレッジを用いたランダムなテスト生成の導出は,実装状態空間における興味深い点をカバーするのに有効であることを示す。我々は,TLA+で記述された分散システム実装と抽象モデルのためのファジィザを実装した。提案アルゴリズムは,スケジューラのカバレッジと突然変異の異なる概念によって導かれるランダム探索と同様に,純粋にランダムな探索よりも優れたカバレッジを示す。特に、Etcd-raftやRedisRaftのような分散コンセンサスプロトコルの実装において、常に高いカバレッジを示し、バグを高速に検出する。さらに, モデル誘導ファズリングでのみ検出できるバグが13件発見されている。

We present a coverage-guided testing algorithm for distributed systems implementations. Our main innovation is the use of an abstract formal model of the system that is used to define coverage. Such abstract models are frequently developed in early phases of protocol design and verification but are infrequently used at testing time. We show that guiding random test generation using model coverage can be effective in covering interesting points in the implementation state space. We have implemented a fuzzer for distributed system implementations and abstract models written in TLA+. Our algorithm shows better coverage over purely random exploration as well as random exploration guided by different notions of scheduler coverage and mutation. In particular, we show consistently higher coverage and detect bugs faster on implementations of distributed consensus protocols such as those in Etcd-raft and RedisRaft. Moreover, we discovered 13 previously unknown bugs in their implementations, four of which could only be detected by model-guided fuzzing.

翻訳日:2024-11-04 04:00:02 公開日:2024-10-06

# 分散システムのモデル誘導ファジィリング

Model-guided Fuzzing of Distributed Systems ( http://arxiv.org/abs/2410.02307v2 )

ライセンス: Link先を確認

Ege Berkay Gulcan, Burcu Kulahcioglu Ozkan, Rupak Majumdar, Srinidhi Nagendra,

翻訳日:2024-11-04 04:00:02 公開日:2024-10-06

# LoGra-Med: 医用ビジョンランゲージモデルのためのLong Context Multi-Graphアライメント

LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model ( http://arxiv.org/abs/2410.02615v1 )

ライセンス: Link先を確認

Duy M. H. Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zhou, Daniel Sonntag, Mathias Niepert,

(参考訳) LLaVA-MedやBioMedGPTのような最先端の医療マルチモーダルな大規模言語モデル(med-MLLM)は、事前トレーニングで命令追跡データを活用する。しかしながら、これらのモデルは、主に自己回帰学習の目的に依存しながら、パフォーマンスを向上させるために、モデルサイズとデータボリュームのスケーリングに重点を置いています。驚くべきことに、このような学習スキームが視覚と言語モダリティの整合性の弱さを招き、これらのモデルを広範囲な事前学習データセットに非常に依存させることは、医療領域において、高品質な命令追跡インスタンスをキュレートする費用と時間のかかる性質のため、大きな課題である。画像のモダリティ、会話に基づく記述、拡張キャプションの3重相関を強制する新しいマルチグラフアライメントアルゴリズムであるLoGra-Medでこの問題に対処する。これにより、モデルが文脈的意味を捉え、言語的多様性を扱い、視覚とテキスト間の相互関連を構築するのに役立つ。提案手法をスケールするために,ブラックボックス勾配推定を用いた効率的なエンドツーエンド学習方式を設計し,LLaMa 7Bの学習を高速化した。以上の結果から,LoGra-Medは医療用VQAの600K画像テキスト対に対してLAVA-Medと一致し,その10%でトレーニングした場合に有意に優れていた。例えば、VQA-RADでは、LLAVA-Medを20.13%上回り、100%事前トレーニングスコア(72.64%に対して72.52%)とほぼ一致している。また,視覚チャットボットにおけるBiomedGPTや,VQAを用いたゼロショット画像分類におけるRadFMといったSOTA手法を超越し,マルチグラフアライメントの有効性を強調した。

State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.

翻訳日:2024-11-04 02:12:23 公開日:2024-10-06

Duy M. H. Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zhou, Daniel Sonntag, Mathias Niepert,

翻訳日:2024-11-04 02:02:21 公開日:2024-10-06

# 生成モデルの説得性の測定と改善

Measuring and Improving Persuasiveness of Generative Models ( http://arxiv.org/abs/2410.02653v1 )

ライセンス: Link先を確認

Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy,

(参考訳) LLMは、人間(例えばマーケティング)が消費するコンテンツを生成するワークフローや、人間(例えばチャットボット)と直接対話するワークフローで、ますます使われている。確実な説得力のあるメッセージを生成することができるシステムの開発は、社会にとっての機会と課題の両方を提示する。一方、こうした制度は、薬物依存に対処するなど、広告や社会的善悪などの領域に積極的に影響を及ぼす可能性があり、また、誤った情報を広め、政治的意見を形成するために誤用される可能性もある。 LLMが社会に与える影響を明らかにするためには,その説得力を計測し,評価するシステムを開発する必要がある。このモチベーションを生かしたPersuasionBenchとPersuasionArenaは,生成モデルの説得能力を自動的に測定するタスクのバッテリを含む,最初の大規模ベンチマークとアリーナである。我々は,LLMが言語パターンをどのように理解し,より説得力のある言語を生成するのに役立つかを検討する。以上の結果から, LLMの説得性はモデルサイズと正の相関がみられたが, より小型のモデルでは, より大きなモデルよりも高い説得性が得られることが示唆された。特に、合成および自然なデータセットを使用したターゲットトレーニングは、より小さなモデルの説得能力を著しく向上させ、スケール依存の仮定に挑戦する。我々の発見は、モデル開発者と政策立案者の両方にとって重要な意味を持つ。例えば、EU AI ActとカリフォルニアのSB-1047は、浮動小数点演算の数に基づいてAIモデルを規制することを目的としていますが、このような単純なメトリクスだけでは、AIの社会的影響の全範囲を捉えられません。私たちはコミュニティに、AI駆動の説得とその社会的意味についての理解を深めるために、https://bit.ly/measure-peruasionで入手可能なPersuasionArenaとPersuasionBenchを探求し、貢献するよう呼びかけます。

LLMs are increasingly being used in workflows involving generating content to be consumed by humans (e.g., marketing) and also in directly interacting with humans (e.g., through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce PersuasionBench and PersuasionArena, the first large-scale benchmark and arena containing a battery of tasks to measure the persuasion ability of generative models automatically. We investigate to what extent LLMs know and leverage linguistic patterns that can help them generate more persuasive language. Our findings indicate that the persuasiveness of LLMs correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at https://bit.ly/measure-persuasion, to advance our understanding of AI-driven persuasion and its societal implications.

翻訳日:2024-11-04 01:52:35 公開日:2024-10-06

# 大規模言語モデルの説得力の測定と改善

Measuring and Improving Persuasiveness of Large Language Models ( http://arxiv.org/abs/2410.02653v2 )

ライセンス: Link先を確認

Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy,

翻訳日:2024-11-04 01:52:35 公開日:2024-10-06

# 確実性の校正表現

Calibrating Expressions of Certainty ( http://arxiv.org/abs/2410.04315v1 )

ライセンス: Link先を確認

Peiqi Wang, Barbara D. Lam, Yingcheng Liu, Ameneh Asgari-Targhi, Rameswar Panda, William M. Wells, Tina Kapur, Polina Golland,

(参考訳) 本稿では,「マヨベ」や「マヨベ」といった言語表現のキャリブレーションに新たなアプローチを提案する。各特定のフレーズに1つのスコアを割り当てる以前の作業とは異なり、我々は不確実性を単純度上の分布としてモデル化し、それらのセマンティクスをより正確にキャプチャする。この新たな確実性の表現に対応するため、既存の誤校正対策を一般化し、新しいポストホック校正法を導入する。これらのツールを活用することで、人間(例えば放射線学者)と計算モデル(例えば言語モデル)の両方の校正を分析し、校正を改善するための解釈可能な提案を提供する。

We present a novel approach to calibrating linguistic expressions of certainty, e.g., "Maybe" and "Likely". Unlike prior work that assigns a single score to each certainty phrase, we model uncertainty as distributions over the simplex to capture their semantics more accurately. To accommodate this new representation of certainty, we generalize existing measures of miscalibration and introduce a novel post-hoc calibration method. Leveraging these tools, we analyze the calibration of both humans (e.g., radiologists) and computational models (e.g., language models) and provide interpretable suggestions to improve their calibration.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# 気候と環境の正義のための都市コンピューティング:2つの研究イニシアティブから

Urban Computing for Climate and Environmental Justice: Early Perspectives From Two Research Initiatives ( http://arxiv.org/abs/2410.04318v1 )

ライセンス: Link先を確認

Carolina Veiga, Ashish Sharma, Daniel de Oliveira, Marcos Lage, Fabio Miranda,

(参考訳) 気候変動の影響は、洪水や熱波などの極端な気象現象が、低所得層や低所得層に大きく影響しているため、世界中の都市社会における既存の脆弱性や格差を増している。これらの課題に対処するには、コンピュータ科学、工学、気候科学、公衆衛生など、複数の分野にまたがる専門知識を統合する新しいアプローチが必要である。都市コンピューティングは、複数のソースからのデータを統合して意思決定をサポートし、気象パターン、インフラの弱点、人口の脆弱性に関する実用的な洞察を提供することによって、これらの取り組みにおいて重要な役割を果たす。しかし、技術進歩を活用する能力は、グローバル・サウスとグローバル・ノースの間で大きく異なる。本稿では,米国シカゴとブラジルのニテロイに複数年にわたる多学際プロジェクトを実施し,これらの多様な状況下での都市コンピューティングの可能性と限界を明らかにする。筆者らの経験を反映して、都市環境における気候関連リスクの理解と緩和を容易にする視覚分析ツールの基本的要件と既存のギャップについて考察する。

The impacts of climate change are intensifying existing vulnerabilities and disparities within urban communities around the globe, as extreme weather events, including floods and heatwaves, are becoming more frequent and severe, disproportionately affecting low-income and underrepresented groups. Tackling these increasing challenges requires novel approaches that integrate expertise across multiple domains, including computer science, engineering, climate science, and public health. Urban computing can play a pivotal role in these efforts by integrating data from multiple sources to support decision-making and provide actionable insights into weather patterns, infrastructure weaknesses, and population vulnerabilities. However, the capacity to leverage technological advancements varies significantly between the Global South and Global North. In this paper, we present two multiyear, multidisciplinary projects situated in Chicago, USA and Niter\'oi, Brazil, highlighting the opportunities and limitations of urban computing in these diverse contexts. Reflecting on our experiences, we then discuss the essential requirements, as well as existing gaps, for visual analytics tools that facilitate the understanding and mitigation of climate-related risks in urban environments.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# CAVにおける協調データ融合のためのチャネル・アウェア・スループットの最大化

Channel-Aware Throughput Maximization for Cooperative Data Fusion in CAV ( http://arxiv.org/abs/2410.04320v1 )

ライセンス: Link先を確認

Haonan An, Zhengru Fang, Yuang Zhang, Senkang Hu, Xianhao Chen, Guowen Xu, Yuguang Fang,

(参考訳) 接続型および自律型車両(CAV)は、認識範囲の拡大と知覚範囲の増大により、大きな注目を集めている。盲点や障害物などの課題に対処するため、CAVは周囲の車両からのセンサデータを収集するために車両間通信(V2V)を採用している。しかし、協調的な知覚は、達成可能なネットワークスループットとチャネル品質の制限によって制約されることが多い。本稿では,適応データ圧縮に自己教師付きオートエンコーダを活用することで,CAVデータ融合を容易にするチャネル対応スループット最大化手法を提案する。この問題を混合整数プログラミング(MIP)モデルとして定式化し、与えられたリンク条件下で最適なデータレートと圧縮比の解を導出するために2つのサブプロブレムに分解する。オートエンコーダは、決定された圧縮比でビットレートを最小にするために訓練され、さらにスペクトルリソース消費を減らすために微調整戦略が用いられる。 OpenCOOD プラットフォーム上での実験的な評価により,提案アルゴリズムの有効性が示され,ネットワークスループットが 20.19 % 向上し,平均精度 (AP@IoU) が 9.38 % 向上した。

Connected and autonomous vehicles (CAVs) have garnered significant attention due to their extended perception range and enhanced sensing coverage. To address challenges such as blind spots and obstructions, CAVs employ vehicle-to-vehicle (V2V) communications to aggregate sensory data from surrounding vehicles. However, cooperative perception is often constrained by the limitations of achievable network throughput and channel quality. In this paper, we propose a channel-aware throughput maximization approach to facilitate CAV data fusion, leveraging a self-supervised autoencoder for adaptive data compression. We formulate the problem as a mixed integer programming (MIP) model, which we decompose into two sub-problems to derive optimal data rate and compression ratio solutions under given link conditions. An autoencoder is then trained to minimize bitrate with the determined compression ratio, and a fine-tuning strategy is employed to further reduce spectrum resource consumption. Experimental evaluation on the OpenCOOD platform demonstrates the effectiveness of our proposed algorithm, showing more than 20.19\% improvement in network throughput and a 9.38\% increase in average precision (AP@IoU) compared to state-of-the-art methods, with an optimal latency of 19.99 ms.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# RLExplorerによる深層強化学習プログラムのデバッグに向けて

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer ( http://arxiv.org/abs/2410.04322v1 )

ライセンス: Link先を確認

Rached Bouchoucha, Ahmed Haj Yahmed, Darshan Patil, Janarthanan Rajendran, Amin Nikanjam, Sarath Chandar, Foutse Khomh,

(参考訳) 深層強化学習(DRL)は、ロボット工学、コンピュータゲーム、レコメンデーションシステムなど様々な分野で成功している。しかし、他のソフトウェアシステムと同様に、DRLベースのソフトウェアシステムは、デバッグと診断に固有の課題を生じさせるフォールトに影響を受けやすい。これらの障害はしばしば、明示的な失敗やエラーメッセージなしで予期しない振る舞いを生じさせ、デバッグが難しく、時間がかかります。したがって、DRLシステムの監視と診断の自動化は、開発者の負担を軽減するために重要である。本稿では,DRLベースのソフトウェアシステムにおける最初の故障診断手法であるRLExplorerを提案する。 RLExplorerは自動的にトレーニングトレースを監視し、DRL学習ダイナミクスの特性に基づいて診断ルーチンを実行し、DRL固有の障害の発生を検出する。そして、これらの診断の結果を、理論的概念、推奨プラクティス、そして特定された障害に対する潜在的な解決策をカバーする警告として記録する。我々はRLExplorerを評価するために2つの評価を行った。 Stack Overflowの障害DRLサンプルを初めて評価したところ,83%の症例において,本手法が実際の障害を効果的に診断できることが判明した。 RLExplorerを15名のDRL専門家/開発者で評価したところ,(1)RLExplorerは手動デバッグの3.6倍の欠陥を識別でき,(2)RLExplorerは容易にDRLアプリケーションに統合できることがわかった。

Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However, like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# プロンプト型連続学習における階層型分類の活用

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning ( http://arxiv.org/abs/2410.04327v1 )

ライセンス: Link先を確認

Quyen Tran, Minh Le, Tuan Truong, Dinh Phung, Linh Ngo, Thien Nguyen, Nhat Ho, Trung Le,

(参考訳) 人間の学習行動からインスピレーションを得たこの研究は、連続的に出現するクラスデータ間の関係を利用して、Promptベースの連続学習モデルにおける破滅的な忘れを緩和する新しいアプローチを提案する。深層学習モデルの学習において,情報の整理・接続という人間の習慣を適用することが効果的な戦略として有効であることがわかった。具体的には、拡大するラベルセットに基づいて階層木構造を構築することで、データに対する新たな洞察を得ることができ、類似したクラスのグループを特定することは、容易に混乱を引き起こす可能性がある。さらに、私たちは、最適なトランスポートベースのアプローチを通じて、オリジナルの事前訓練されたモデルの振る舞いを探索することで、クラス間の隠れた接続を深く掘り下げる。これらの知見から,モデルがより挑戦的な知識領域に集中し,全体的な性能を向上させるための新たな正規化損失関数を提案する。実験により,本手法は様々なベンチマークにおいて,最も頑健な最先端モデルに対して有意な優位性を示した。

Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class data. We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models. Specifically, by building a hierarchical tree structure based on the expanding set of labels, we gain fresh insights into the data, identifying groups of similar classes could easily cause confusion. Additionally, we delve deeper into the hidden connections between classes by exploring the original pretrained model's behavior through an optimal transport-based approach. From these insights, we propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas, thereby enhancing overall performance. Experimentally, our method demonstrated significant superiority over the most robust state-of-the-art models on various benchmarks.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# OD-Stega: 最適化分布を用いたLDMによるニア・インパーセプティブル・ステガノグラフィー

OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions ( http://arxiv.org/abs/2410.04328v1 )

ライセンス: Link先を確認

Yu-Shin Huang, Peter Just, Krishna Narayanan, Chao Tian,

(参考訳) 本研究では,Large Language Model (LLM) が算術符号デコーダを駆動してステゴテキストを生成する場合の非被覆ステガノグラフィーについて考察する。効率的な方法は、秘密のメッセージビットをできるだけ少数の言語トークンに埋め込む必要がある。個々のトークンレベルでは、選択された確率分布とLLMが与える元の分布とのKL分散の制約を条件として、次のトークン生成の置換確率分布のエントロピーを最大化することが数学的に等価であることを示す。最適化問題に対して、効率的に計算できる閉形式解が提供される。重要な実務上の問題もいくつか取り組まれている。 1) しばしば見過ごされるトークン化ミスマッチ問題は、単純なプロンプト選択アプローチで解決される。 2)最適化分布と語彙トランケーション手法の組み合わせを考察し,その有効性について考察する。 3)最適化された分布と他のシーケンスレベルの選択ヒューリスティックを組み合わせることで,効率と信頼性をさらに向上させる。

We consider coverless steganography where a Large Language Model (LLM) drives an arithmetic coding decoder to generate stego-texts. An efficient method should embed secret message bits in as few language tokens as possible, while still keeping the stego-text natural and fluent. We show that on the individual token level, this problem is mathematically equivalent to maximizing the entropy of a replacement probability distribution of the next token generation, subject to a constraint on the KL divergence between the chosen probability distribution and the original distribution given by the LLM. A closed-form solution is provided for the optimization problem, which can be computed efficiently. Several important practical issues are also tackled: 1) An often-overlooked tokenization mismatch issue is resolved with a simple prompt selection approach, 2) The combination of the optimized distribution and the vocabulary truncation technique is considered, and 3) The combination of the optimized distribution with other sequence-level selection heuristics to further enhance the efficiency and reliability is studied.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# N$-Partite系における最強量子非局所性

Strongest quantum nonlocality in $N$-partite systems ( http://arxiv.org/abs/2410.04331v1 )

ライセンス: Link先を確認

Mengying Hu, Ting Gao, Fengli Yan,

(参考訳) 直交状態の集合は、自明な直交保存正の作用素値測度(POVM)のみをサブシステムの分割ごとに行うことができれば、最強の量子非局所性を持つ。この概念は、Halder $et~alによって提唱された強い量子非局所性に由来する。 $[Phy]。レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・レヴ・ $\textbf{122}$, 040403 (2019)] は、局所的不明瞭性に基づく非局所性の強い表現であり、量子情報隠蔽におけるより効率的な応用を見出す。しかし、直交保存局所測定(OPLM)の自明さを示すことは容易ではない。本稿では,ある条件下での$N$-partiteシステムにおいて,自明なOPLMに対して十分かつ必要な条件を示す。提案した条件を用いて、システム $(\mathbb{C}^{3})^{\otimes N}$ において最強非局所性を持つ集合の最小サイズを導出する。とPhys。 A $\textbf{109}$, 022220 (2024)] は、この値を達成する。最強の非局所性を持つ州を対象とする建設研究は、アプリケーションにおける資源消費の低減に寄与することが知られている。さらに、システム $(\mathbb{C}^{d})^{\otimes N}~(d\geq4)$ において最強非局所真絡集合を構築する。その結果, 最強非局所性についての理解を深めることができた。

A set of orthogonal states possesses the strongest quantum nonlocality if only a trivial orthogonality-preserving positive operator-valued measure (POVM) can be performed for each bipartition of the subsystems. This concept originated from the strong quantum nonlocality proposed by Halder $et~al.$ [Phy. Rev. Lett. $\textbf{122}$, 040403 (2019)], which is a stronger manifestation of nonlocality based on locally indistinguishability and finds more efficient applications in quantum information hiding. However, demonstrating the triviality of orthogonality-preserving local measurements (OPLMs) is not straightforward. In this paper, we present a sufficient and necessary condition for trivial OPLMs in $N$-partite systems under certain conditions. By using our proposed condition, we deduce the minimum size of set with the strongest nonlocality in system $(\mathbb{C}^{3})^{\otimes N}$, where the genuinely entangled sets constructed in Ref. [Phys. Rev. A $\textbf{109}$, 022220 (2024)] achieve this value. As it is known that studying construction involving fewer states with strongest nonlocality contribute to reducing resource consumption in applications. Furthermore, we construct strongest nonlocal genuinely entangled sets in system $(\mathbb{C}^{d})^{\otimes N}~(d\geq4)$, which have a smaller size than the existing strongest nonlocal genuinely entangled sets as $N$ increases. Consequently, our results contribute to a better understanding of strongest nonlocality.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# グラディエントルーティング:ニューラルネットワークにおける計算のローカライズのためのマスキンググラディエント

Gradient Routing: Masking Gradients to Localize Computation in Neural Networks ( http://arxiv.org/abs/2410.04332v1 )

ライセンス: Link先を確認

Alex Cloud, Jacob Goldman-Wetzler, Evžen Wybitul, Joseph Miller, Alexander Matt Turner,

(参考訳) ニューラルネットワークは、内部メカニズムに関係なく、主に入力と出力に基づいて訓練される。これらの無視されたメカニズムは、安全に重要な特性を決定づける。透明性; 透明性; 透明性二機密情報又は有害な能力の欠如三訓練分布を超えた目標の信頼性の高い一般化。この欠点に対処するために、ニューラルネットワークの特定の部分領域に機能を分離する訓練手法である勾配ルーティングを導入する。勾配ルーティングは、バックプロパゲーション中の勾配にデータ依存の重み付きマスクを適用する。これらのマスクは、どのパラメータがどのデータポイントによって更新されるかを設定するために、ユーザによって提供される。本研究では,(1)解釈可能な方法で分割された表現の学習,(2)事前指定したネットワークサブリージョンのアブレーションによる堅牢なアンラーニングの実現,(3)異なる動作に責任を持つモジュールをローカライズすることで,強化学習者のスケーラブルな監視を実現すること,を示す。全体として、勾配ルーティングは、制限されたアドホックなデータサブセットに適用しても、機能をローカライズする。私たちは、高品質なデータが不足している、挑戦的な現実世界のアプリケーションに対して、このアプローチが約束されていると結論付けます。

Neural networks are trained primarily based on their inputs and outputs, without regard for their internal mechanisms. These neglected mechanisms determine properties that are critical for safety, like (i) transparency; (ii) the absence of sensitive information or harmful capabilities; and (iii) reliable generalization of goals beyond the training distribution. To address this shortcoming, we introduce gradient routing, a training method that isolates capabilities to specific subregions of a neural network. Gradient routing applies data-dependent, weighted masks to gradients during backpropagation. These masks are supplied by the user in order to configure which parameters are updated by which data points. We show that gradient routing can be used to (1) learn representations which are partitioned in an interpretable way; (2) enable robust unlearning via ablation of a pre-specified network subregion; and (3) achieve scalable oversight of a reinforcement learner by localizing modules responsible for different behaviors. Throughout, we find that gradient routing localizes capabilities even when applied to a limited, ad-hoc subset of the data. We conclude that the approach holds promise for challenging, real-world applications where quality data are scarce.

翻訳日:2024-11-02 08:30:03 公開日:2024-10-06

# 対称性破壊ダイナミクスのためのランダム非エルミートハミルトンフレームワーク

Random non-Hermitian Hamiltonian framework for symmetry breaking dynamics ( http://arxiv.org/abs/2410.04333v1 )

ライセンス: Link先を確認

Pei Wang,

(参考訳) ヒルベルト空間における量子状態の一般確率非線形ダイナミクスをモデル化するために、非エルミート的ハミルトニアンをランダムに提案する。本手法は, 線形方程式の線形性に基礎を置き, 線形系解法の適用性を確保する。さらに、統計対称性を容易に組み込むという利点があり、これは確率過程への明示対称性の一般化である。提案手法の有用性を実証するために,初期対称性保存状態からランダムに分布し,対称性を破る最終状態へと進化する実時間力学を記述する。我々のモデルは、不規則状態から秩序状態への遷移過程の量子的枠組みとして機能し、そこでは対称性が自発的に壊れる。

We propose random non-Hermitian Hamiltonians to model the generic stochastic nonlinear dynamics of a quantum state in Hilbert space. Our approach features an underlying linearity in the dynamical equations, ensuring the applicability of techniques used for solving linear systems. Additionally, it offers the advantage of easily incorporating statistical symmetry, a generalization of explicit symmetry to stochastic processes. To demonstrate the utility of our approach, we apply it to describe real-time dynamics, starting from an initial symmetry-preserving state and evolving into a randomly distributed, symmetry-breaking final state. Our model serves as a quantum framework for the transition process, from disordered states to ordered ones, where symmetry is spontaneously broken.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# マイクロサービス環境におけるインシデントライフサイクルのためのAIアシスタント:システマティック文献レビュー

AI Assistants for Incident Lifecycle in a Microservice Environment: A Systematic Literature Review ( http://arxiv.org/abs/2410.04334v1 )

ライセンス: Link先を確認

Dahlia Ziqi Zhou, Marios Fokaefs,

(参考訳) マイクロサービス環境のインシデントは、複雑さと分散した性質のために、コストがかかり、回復が難しい場合がある。人工知能(AI)の最近の進歩は、インシデント管理を改善するための有望なソリューションを提供する。本稿では、インシデントライフサイクルの異なるフェーズをサポートするように設計されたAIアシスタントに関する基礎研究を体系的にレビューする。これはAIの成功した応用を強調し、現在の研究のギャップを特定し、AIによるインシデント管理を強化する将来の機会を提案する。これらの研究を検討することで、AIツールの有効性と、インシデント回復における継続的な課題に対処する可能性についての洞察を提供することが目的である。

Incidents in microservice environments can be costly and challenging to recover from due to their complexity and distributed nature. Recent advancements in artificial intelligence (AI) offer promising solutions for improving incident management. This paper systematically reviews primary studies on AI assistants designed to support different phases of the incident lifecycle. It highlights successful applications of AI, identifies gaps in current research, and suggests future opportunities for enhancing incident management through AI. By examining these studies, the paper aims to provide insights into the effectiveness of AI tools and their potential to address ongoing challenges in incident recovery.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# ReTok: 大規模言語モデルにおける表現効率を高めるために、トークンライザをリプレースする

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model ( http://arxiv.org/abs/2410.04335v1 )

ライセンス: Link先を確認

Shuhao Gu, Mengdi Zhao, Bowen Zhang, Liangdong Wang, Jijie Li, Guang Liu,

(参考訳) Tokenizerは大規模言語モデル(LLM)に不可欠なコンポーネントであり、高い圧縮率のトークン化器はモデルの表現と処理効率を向上させることができる。しかし、トークン化器は全てのシナリオにおいて高い圧縮速度を保証することができず、平均入力および出力長の増加はモデルのトレーニングと推論コストを増大させる。したがって、モデルの性能を維持しながら、最小限のコストでモデルの効率を改善する方法を見つけることが重要である。本研究では, LLMのトークン化機能を置き換えることで, モデル表現と処理効率を向上させる手法を提案する。モデルの入力層と出力層のパラメータを元のモデルのパラメータに置き換えて再起動し、他のパラメータを固定しながらこれらのパラメータをトレーニングする。我々は,異なるLLM実験を行い,その結果から,トークン化器を置き換えたモデルの性能を維持できるとともに,長文の復号速度を大幅に向上できることを示した。

Tokenizer is an essential component for large language models (LLMs), and a tokenizer with a high compression rate can improve the model's representation and processing efficiency. However, the tokenizer cannot ensure high compression rate in all scenarios, and an increase in the average input and output lengths will increases the training and inference costs of the model. Therefore, it is crucial to find ways to improve the model's efficiency with minimal cost while maintaining the model's performance. In this work, we propose a method to improve model representation and processing efficiency by replacing the tokenizers of LLMs. We propose replacing and reinitializing the parameters of the model's input and output layers with the parameters of the original model, and training these parameters while keeping other parameters fixed. We conducted experiments on different LLMs, and the results show that our method can maintain the performance of the model after replacing the tokenizer, while significantly improving the decoding speed for long texts.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 工学用Si-Qubit MOSFET:極低温での静電量子積分による相場モデリング手法

Engineering Si-Qubit MOSFETs: A Phase-Field Modeling Approach Integrating Quantum-Electrostatics at Cryogenic Temperatures ( http://arxiv.org/abs/2410.04339v1 )

ライセンス: Link先を確認

Nilesh Pandey, Dipanjan Basu, Yogesh Singh Chauhan, Leonard F. Register, Sanjay K. Banerjee,

(参考訳) 本研究は、Si系量子ビットMOSFETを解析し、静電気と量子力学的効果を統合するために、高度な位相場モデリングを用いる。我々は、シュロディンガー方程式解のフルウェーブ処理と、極低温におけるポアソン方程式を併用した包括的モデリング手法を採用する。本分析では, 界面トラップが量子ドット(QD)障壁高さに与える影響を考察し, トンネルによる結合に影響を及ぼす。より広いトラップ分布は量子ドットの分離につながる。さらに、プランジャ/バリアゲート長が増加するにつれて、伝送および反射係数の振動が増加し、QD間の結合が減少する。プランジャ,バリアゲート次元,スペーサ構成,ギャップ酸化物長を最適化することにより,量子井戸深さの制御を強化し,不要な波動関数のリークを最小限に抑える。モデリングアルゴリズムは実験データに対しても検証され,クーロン遮断によるId Vgsの発振を低温下で正確に捉えることができる。

This study employs advanced phase-field modeling to investigate Si-based qubit MOSFETs, integrating electrostatics and quantum mechanical effects. We adopt a comprehensive modeling approach, utilizing full-wave treatment of the Schrodinger equation solutions, coupled with the Poisson equation at cryogenic temperatures. Our analysis explores the influence of interface traps on quantum dot (QD) barrier heights, affecting coupling due to tunneling. A wider trap distribution leads to the decoupling of quantum dots. Furthermore, the oscillations in the transmission and reflection coefficients increase as the plunger/barrier gate length increases, reducing the coupling between the QDs. By optimizing plunger and barrier gate dimensions, spacer configurations, and gap oxide lengths, we enhance control over quantum well depths and minimize unwanted wave function leakage. The modeling algorithm is also validated against the experimental data and can accurately capture the oscillations in the Id Vgs caused by the Coulomb blockade at cryogenic temperature

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 周波数領域におけるネットワークの高速化

Accelerating Inference of Networks in the Frequency Domain ( http://arxiv.org/abs/2410.04342v1 )

ライセンス: Link先を確認

Chenqiu Zhao, Guanfang Dong, Anup Basu,

(参考訳) 周波数領域において、ネットワークのパラメータが極めて少ない精度で大幅に低減できることが示されている。しかし、周波数変換のコストを考えると、計算複雑性は著しく低下しない。本研究では,周波数パラメータが疎いネットワークを高速化するために,周波数領域におけるネットワーク推論を提案する。特に、空間領域におけるネットワーク推論に双対な周波数推論連鎖を提案する。非線形層を扱うために、周波数データに直接非線形演算を適用し、効果的に動作するように妥協する。周波数推論チェーンと非線形層に対する戦略によって実現され、提案手法は周波数領域の全推論を完了させる。全ての層に対して余分な周波数変換や逆変換を必要とする従来の手法とは異なり、提案手法はネットワークの始点と終点に1度だけ周波数変換と逆変換を必要とする。最先端手法との比較により,提案手法は高速比(100倍以上)の場合,精度を著しく向上することが示された。ソースコードは \url{https://github.com/guanfangdong/FreqNet-Infer} で公開されている。

It has been demonstrated that networks' parameters can be significantly reduced in the frequency domain with a very small decrease in accuracy. However, given the cost of frequency transforms, the computational complexity is not significantly decreased. In this work, we propose performing network inference in the frequency domain to speed up networks whose frequency parameters are sparse. In particular, we propose a frequency inference chain that is dual to the network inference in the spatial domain. In order to handle the non-linear layers, we make a compromise to apply non-linear operations on frequency data directly, which works effectively. Enabled by the frequency inference chain and the strategy for non-linear layers, the proposed approach completes the entire inference in the frequency domain. Unlike previous approaches which require extra frequency or inverse transforms for all layers, the proposed approach only needs the frequency transform and its inverse once at the beginning and once at the end of a network. Comparisons with state-of-the-art methods demonstrate that the proposed approach significantly improves accuracy in the case of a high speedup ratio (over 100x). The source code is available at \url{https://github.com/guanfangdong/FreqNet-Infer}.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 長期検索拡張生成のための推論スケーリング

Inference Scaling for Long-Context Retrieval Augmented Generation ( http://arxiv.org/abs/2410.04343v1 )

ライセンス: Link先を確認

Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky,

(参考訳) 推論計算のスケーリングにより、様々な設定にまたがるLong-context Large Language Model (LLM)の可能性が解き放たれた。知識集約的なタスクでは、より多くの外部知識を組み込むために計算量が増加することがしばしばある。しかし、そのような知識を効果的に活用しなければ、文脈を拡大するだけでは必ずしも性能が向上するとは限らない。本研究では,検索拡張生成(RAG)における推論スケーリングについて検討し,単に知識量を増やす以上の戦略を探求する。インコンテキスト学習と反復的プロンプトという,2つの推論スケーリング戦略に注目します。これらの戦略は、テスト時間計算(例えば、検索した文書や生成ステップを増やすことで)をスケールするためのさらなる柔軟性を提供する。 1) RAG のパフォーマンスは、最適に設定された場合の推論計算のスケーリングからどのような恩恵を受けますか? 2) RAG 性能と推論パラメータの関係をモデル化することにより,与えられた予算に対する最適テスト時間計算割当を予測できるのか? 観測の結果,推定計算の増大は最適に割り当てた場合,RAGの性能がほぼ線形に向上することを示し,RAGの推論スケーリング法則として記述した。これに基づいて、異なる推論構成におけるRAG性能を推定する計算割当モデルをさらに発展させる。このモデルは、様々な計算制約の下で最適な推論パラメータを予測し、実験結果と密接に一致させる。これらの最適構成を適用することで、長文LLMのスケーリング推論計算が標準RAGと比較してベンチマークデータセットで最大58.9%向上することを示す。

The scaling of inference computation has unlocked the potential of long-context large language models (LLMs) across diverse settings. For knowledge-intensive tasks, the increased compute is often allocated to incorporate more external knowledge. However, without effectively utilizing such knowledge, solely expanding context does not always enhance performance. In this work, we investigate inference scaling for retrieval augmented generation (RAG), exploring strategies beyond simply increasing the quantity of knowledge. We focus on two inference scaling strategies: in-context learning and iterative prompting. These strategies provide additional flexibility to scale test-time computation (e.g., by increasing retrieved documents or generation steps), thereby enhancing LLMs' ability to effectively acquire and utilize contextual information. We address two key questions: (1) How does RAG performance benefit from the scaling of inference computation when optimally configured? (2) Can we predict the optimal test-time compute allocation for a given budget by modeling the relationship between RAG performance and inference parameters? Our observations reveal that increasing inference computation leads to nearly linear gains in RAG performance when optimally allocated, a relationship we describe as the inference scaling laws for RAG. Building on this, we further develop the computation allocation model to estimate RAG performance across different inference configurations. The model predicts optimal inference parameters under various computation constraints, which align closely with the experimental results. By applying these optimal configurations, we demonstrate that scaling inference compute on long-context LLMs achieves up to 58.9% gains on benchmark datasets compared to standard RAG.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# DeepONet for Solving PDE: Generalization Analysis in Sobolev Training

DeepONet for Solving PDEs: Generalization Analysis in Sobolev Training ( http://arxiv.org/abs/2410.04344v1 )

ライセンス: Link先を確認

Yahong Yang,

(参考訳) 本稿では,演算子学習,特にDeepONetの偏微分方程式(PDE)への応用について検討する。各PDEに対して別々のニューラルネットワークのトレーニングを必要とする関数学習方法とは異なり、オペレータ学習は再トレーニングすることなく、異なるPDEをまたいだ一般化を行う。本稿では,ソボレフトレーニングにおけるDeepONetの性能に着目し,ディープブランチとトランクネットワークの近似能力とソボレフノルムの一般化誤差の2つの重要な問題に対処する。我々の発見は、ディープブランチネットワークが大きなパフォーマンス上のメリットを提供するのに対して、トランクネットワークは最もシンプルであることを示している。また、符号化部に微分情報を加えない標準サンプリング法は、一般化解析に基づくソボレフ訓練における一般化誤差を最小限に抑えるのに十分である。本稿では,幅広い物理インフォームド機械学習モデルと応用のための誤差推定を提供することにより,理論的ギャップを埋める。

In this paper, we investigate the application of operator learning, specifically DeepONet, to solve partial differential equations (PDEs). Unlike function learning methods that require training separate neural networks for each PDE, operator learning generalizes across different PDEs without retraining. We focus on the performance of DeepONet in Sobolev training, addressing two key questions: the approximation ability of deep branch and trunk networks, and the generalization error in Sobolev norms. Our findings highlight that deep branch networks offer significant performance benefits, while trunk networks are best kept simple. Moreover, standard sampling methods without adding derivative information in the encoding part are sufficient for minimizing generalization error in Sobolev training, based on generalization analysis. This paper fills a theoretical gap by providing error estimations for a wide range of physics-informed machine learning models and applications.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# MVP-Bench: 大規模視覚言語モデルは、人間のように多段階の視覚知覚を実行できるか?

MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans? ( http://arxiv.org/abs/2410.04345v1 )

ライセンス: Link先を確認

Guanzhen Li, Yuxi Xie, Min-Yen Kan,

(参考訳) 人間は、低レベルの物体認識や行動理解のような高レベルの意味解釈を含む、複数のレベルで視覚的知覚を行う。低レベルの細部における微妙な違いは、高レベルの知覚に大きな変化をもたらす可能性がある。例えば、銃を持った人が持っていた買い物袋を代用することは、暴力行為を示唆し、犯罪行為や暴力行為を暗示する。様々なマルチモーダルタスクの大幅な進歩にもかかわらず、LVLM(Large Visual-Language Models)はそのようなマルチレベル視覚知覚を行う能力について未解明のままである。 LVLMの低レベルと高レベルの両方の視覚知覚を体系的に評価する最初の視覚言語ベンチマークであるMVP-Benchを導入する。本研究では,自然画像と合成画像にMVP-Benchを構築し,操作したコンテンツがモデル知覚に与える影響について検討する。 MVP-Benchを用いて、10個のオープンソースと2個のクローズドソースのLVLMの視覚的認識を診断し、高いレベルの認識タスクが既存のLVLMに大きく挑戦していることを示す。最先端の GPT-4o は,低レベルのシナリオでは 754 % に対して,Yes/No の質問では 56 % の精度しか達成していない。さらに、自然画像と操作画像のパフォーマンスギャップは、現在のLVLMが人間のように合成画像の視覚的意味を理解できないことを示している。私たちのデータとコードはhttps://github.com/GuanzhenLi/MVP-Bench.comで公開されています。

Humans perform visual perception at multiple levels, including low-level object recognition and high-level semantic interpretation such as behavior understanding. Subtle differences in low-level details can lead to substantial changes in high-level perception. For example, substituting the shopping bag held by a person with a gun suggests violent behavior, implying criminal or violent activity. Despite significant advancements in various multimodal tasks, Large Visual-Language Models (LVLMs) remain unexplored in their capabilities to conduct such multi-level visual perceptions. To investigate the perception gap between LVLMs and humans, we introduce MVP-Bench, the first visual-language benchmark systematically evaluating both low- and high-level visual perception of LVLMs. We construct MVP-Bench across natural and synthetic images to investigate how manipulated content influences model perception. Using MVP-Bench, we diagnose the visual perception of 10 open-source and 2 closed-source LVLMs, showing that high-level perception tasks significantly challenge existing LVLMs. The state-of-the-art GPT-4o only achieves an accuracy of $56\%$ on Yes/No questions, compared with $74\%$ in low-level scenarios. Furthermore, the performance gap between natural and manipulated images indicates that current LVLMs do not generalize in understanding the visual semantics of synthetic images as humans do. Our data and code are publicly available at https://github.com/GuanzhenLi/MVP-Bench.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 正規選好最適化:NDCGによる人選好の調整

Ordinal Preference Optimization: Aligning Human Preferences via NDCG ( http://arxiv.org/abs/2410.04346v1 )

ライセンス: Link先を確認

Yang Zhao, Yixin Wang, Mingzhang Yin,

(参考訳) 多様な人間の好みを持つ大規模言語モデル(LLM)の調整は、モデルの振る舞いを制御し、生成品質を向上させるための重要な技術である。 Reinforcement Learning from Human Feedback (RLHF)、Direct Preference Optimization (DPO)、およびそれらの変種はペア比較により言語モデルを最適化する。しかし、複数のレスポンスが利用できる場合、これらのアプローチは報酬モデルや人間からのフィードバックによって与えられるランキングの広範な情報を活用するには至らない。そこで本研究では,正規化比較累積ゲイン (NDCG) を用いた正規化選好最適化 (OPO) という新しいリストワイズ手法を提案する。我々は、NDCGを異なる代理損失で近似することで、エンドツーエンドの選好最適化アルゴリズムを開発する。このアプローチは,情報検索におけるランキングモデルとアライメント問題の関連性を構築する。順序付き報酬に割り当てられたマルチレスポンスデータセットの調整において、OPOは、評価セットとAlpacaEvalのような一般的なベンチマークにおいて、既存のペアワイズおよびリストワイズアプローチよりも優れています。さらに, 陰性サンプルのプールの増加は, 自明な負の悪影響を低減し, モデル性能を向上させることを実証した。

Aligning Large Language Models (LLMs) with diverse human preferences is a pivotal technique for controlling model behaviors and enhancing generation quality. Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and their variants optimize language models by pairwise comparisons. However, when multiple responses are available, these approaches fall short of leveraging the extensive information in the ranking given by the reward models or human feedback. In this work, we propose a novel listwise approach named Ordinal Preference Optimization (OPO), which employs the Normalized Discounted Cumulative Gain (NDCG), a widely-used ranking metric, to better utilize relative proximity within ordinal multiple responses. We develop an end-to-end preference optimization algorithm by approximating NDCG with a differentiable surrogate loss. This approach builds a connection between ranking models in information retrieval and the alignment problem. In aligning multi-response datasets assigned with ordinal rewards, OPO outperforms existing pairwise and listwise approaches on evaluation sets and general benchmarks like AlpacaEval. Moreover, we demonstrate that increasing the pool of negative samples can enhance model performance by reducing the adverse effects of trivial negatives.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 大規模言語モデルを用いた予測モデル拡張のための潜在的特徴マイニング

Latent Feature Mining for Predictive Model Enhancement with Large Language Models ( http://arxiv.org/abs/2410.04347v1 )

ライセンス: Link先を確認

Bingxuan Li, Pengyi Shi, Amy Ward,

(参考訳) 予測モデリングは、データ可用性と品質の制限による課題に直面することが多い。特に、収集された特徴が結果と弱い相関関係にあり、追加の特徴収集が倫理的または実践的な困難によって制約される領域において。従来の機械学習(ML)モデルは、観測されていないが重要な要素を組み込むのに苦労している。本研究では,テキストからテキストへの命題論理的推論として潜在特徴抽出を定式化するための効果的な手法を提案する。 FLAME(Faithful Latent Feature Mining for Predictive Model Enhancement)は,大規模言語モデル(LLM)を利用して,潜在機能を備えた観測機能を強化し,下流タスクにおけるMLモデルの予測能力を向上するフレームワークである。このフレームワークは、各領域に固有のコンテキスト情報を組み込んで、類似したデータ可用性課題に直面した領域への効果的な転送を保証するように設計されており、ドメイン固有の適応を必要とする様々なドメインにまたがって一般化可能である。我々は,(1)刑事司法制度,(2)患者プライバシの懸念と医療データの複雑さが包括的特徴収集を制限する医療分野を特徴とする領域,という2つのケーススタディを用いて,枠組みを検証した。以上の結果から,推定潜時特徴は地上の真理ラベルとよく一致し,下流の分類器を著しく強化することがわかった。

Predictive modeling often faces challenges due to limited data availability and quality, especially in domains where collected features are weakly correlated with outcomes and where additional feature collection is constrained by ethical or practical difficulties. Traditional machine learning (ML) models struggle to incorporate unobserved yet critical factors. In this work, we introduce an effective approach to formulate latent feature mining as text-to-text propositional logical reasoning. We propose FLAME (Faithful Latent Feature Mining for Predictive Model Enhancement), a framework that leverages large language models (LLMs) to augment observed features with latent features and enhance the predictive power of ML models in downstream tasks. Our framework is generalizable across various domains with necessary domain-specific adaptation, as it is designed to incorporate contextual information unique to each area, ensuring effective transfer to different areas facing similar data availability challenges. We validate our framework with two case studies: (1) the criminal justice system, a domain characterized by limited and ethically challenging data collection; (2) the healthcare domain, where patient privacy concerns and the complexity of medical data limit comprehensive feature collection. Our results show that inferred latent features align well with ground truth labels and significantly enhance the downstream classifier.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# TIS-DPO:推定重み付き直接選好最適化のためのトークンレベルの重要度サンプリング

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights ( http://arxiv.org/abs/2410.04350v1 )

ライセンス: Link先を確認

Aiwei Liu, Haoping Bai, Zhiyun Lu, Yanchao Sun, Xiang Kong, Simon Wang, Jiulong Shan, Albin Madappally Jose, Xiaojiang Liu, Lijie Wen, Philip S. Yu, Meng Cao,

(参考訳) 直接選好最適化(DPO)は、その単純さと有効性から、Large Language Models(LLM)の選好アライメントに広く採用されている。しかし、DPOは、全応答が単一アームとして扱われるバンディット問題として導出され、トークン間の重要性の違いを無視し、最適化効率に影響を及ぼし、最適な結果を得るのが難しくなる。本研究では, DPO の最適データは, トークンの重要度に差がないため, 勝ち負けにおける各トークンに対して等しく期待される報酬を持つことを示す。しかし、この最適データセットは実際には利用できないため、重要サンプリングのために元のデータセットを用いて、偏りのない最適化を実現することを提案する。そこで本稿では,TIS-DPO と呼ばれるトークン単位の重要度サンプリング DPO の目的について提案する。従来の研究から着想を得て,一対の対照的なLLMからの予測確率の差を用いて,トークンの重要度を推定した。提案手法は,(1) 元のLDMをコントラスト的プロンプトで導くこと,(2) 勝敗応答を用いて2つの別々のLDMを訓練すること,(3) 勝敗応答を用いて前後DPOトレーニングを行うこと,の3つである。実験により、TIS-DPOは、無害性、無益性アライメントおよび要約タスクにおいて、様々なベースライン手法を著しく上回っていることが示された。また、推定重量を可視化し、キートークンの位置を識別する能力を示す。

Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the optimal dataset is unavailable in practice, we propose using the original dataset for importance sampling to achieve unbiased optimization. Accordingly, we propose a token-level importance sampling DPO objective named TIS-DPO that assigns importance weights to each token based on its reward. Inspired by previous works, we estimate the token importance weights using the difference in prediction probabilities from a pair of contrastive LLMs. We explore three methods to construct these contrastive LLMs: (1) guiding the original LLM with contrastive prompts, (2) training two separate LLMs using winning and losing responses, and (3) performing forward and reverse DPO training with winning and losing responses. Experiments show that TIS-DPO significantly outperforms various baseline methods on harmlessness and helpfulness alignment and summarization tasks. We also visualize the estimated weights, demonstrating their ability to identify key token positions.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# Androidマルウェア検出の強化:ChatGPTが意思決定中心タスクに及ぼす影響

Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task ( http://arxiv.org/abs/2410.04352v1 )

ライセンス: Link先を確認

Yao Li, Sen Fang, Tao Zhang, Haipeng Cai,

(参考訳) ChatGPTのような大規模言語モデルの台頭により、非決定モデルが様々なタスクに適用されている。さらにChatGPTは、Androidのマルウェア検出における従来の意思決定中心のタスクに注意を向けている。研究者によって提案された効果的な検出方法にもかかわらず、それらは低い解釈可能性の問題に直面している。具体的には、これらのメソッドは、良心的または悪意的なアプリケーション分類に優れ、悪意のある振る舞いを検出できるが、意思決定に関する詳細な説明は提供できないことが多い。この課題は、既存の検出スキームの信頼性に関する懸念を高め、複雑なデータを理解する真の能力に疑問を投げかける。本研究では,非決定モデルChatGPTがAndroidマルウェア検出における従来の意思決定中心タスクに与える影響について検討する。 Drebin、XMAL、MaMaDroidの3つの最先端ソリューションを選択し、公開データセットに関する一連の実験を行い、包括的な比較と分析を行う。この結果から,これらの決定駆動型ソリューションは,基盤となるデータを真に理解するのではなく,データセット内の統計的パターンに依存していることが示唆された。対照的に、ChatGPTは非決定モデルであり、包括的な分析レポートの提供に優れ、解釈可能性を大幅に向上させる。さらに、経験豊富な開発者を対象に調査を実施します。この結果は、ChatGPTの詳細な洞察を提供し、課題の効率性と理解を高めることによって、開発者のChatGPTに対する好みを強調している。一方、これらの研究と分析は深い洞察を与え、開発者はAndroidのマルウェア検出に新たな視点を与え、非決定的な視点から検出結果の信頼性を高める。

With the rise of large language models, such as ChatGPT, non-decisional models have been applied to various tasks. Moreover, ChatGPT has drawn attention to the traditional decision-centric task of Android malware detection. Despite effective detection methods proposed by scholars, they face low interpretability issues. Specifically, while these methods excel in classifying applications as benign or malicious and can detect malicious behavior, they often fail to provide detailed explanations for the decisions they make. This challenge raises concerns about the reliability of existing detection schemes and questions their true ability to understand complex data. In this study, we investigate the influence of the non-decisional model, ChatGPT, on the traditional decision-centric task of Android malware detection. We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid, conduct a series of experiments on publicly available datasets, and carry out a comprehensive comparison and analysis. Our findings indicate that these decision-driven solutions primarily rely on statistical patterns within datasets to make decisions, rather than genuinely understanding the underlying data. In contrast, ChatGPT, as a non-decisional model, excels in providing comprehensive analysis reports, substantially enhancing interpretability. Furthermore, we conduct surveys among experienced developers. The result highlights developers' preference for ChatGPT, as it offers in-depth insights and enhances efficiency and understanding of challenges. Meanwhile, these studies and analyses offer profound insights, presenting developers with a novel perspective on Android malware detection--enhancing the reliability of detection results from a non-decisional perspective.

翻訳日:2024-11-02 08:20:17 公開日:2024-10-06

# 超量子状態の幾何学と絡み合い

Geometry and Entanglement of Super-Qubit Quantum States ( http://arxiv.org/abs/2410.04361v1 )

ライセンス: Link先を確認

Oktay K. Pashaev, Aygul Kocak,

(参考訳) 我々は、零点と1つの超粒子状態の重畳によって決定される超量子状態を導入し、超ブロック球面上の点で表すことができる。 1つの量子ビットの場合とは対照的に、1つの超粒子状態は、別の超ブロック球に等しい拡張された複素平原の点によって特徴づけられる。幾何学的には、超量子状態は2つの単位球面、または2つのブロッホ球面の直積で表される。超量子状態に作用する変位演算子を用いて、超コヒーレント状態を構築し、超消滅作用素の固有状態となり、2つの超ブロック球の変位パラメータと立体射影の3つの複素数で特徴づける。状態はフェルミオンボソン絡み合っており、状態の共起は2つのブロッホ球に対応する2つの共起子の積である。球面上の点状態から垂直軸(点状態を通る水平面における円の半径)までの距離として、共起の幾何学的意味を示す。そして、北極状態と南極状態との崩壊確率は、状態の垂直座標から極の対応する点への半距離に等しい。補体フェルミオン数演算子に対しては、転置された超消滅作用素の固有状態として、フリップされた超量子状態と対応する超コヒーレント状態を得る。複素平原におけるフィボナッチ振動円の無限集合と、2つのフィボナッチ数の比として不確実性を持つ量子状態の対応する集合と、無限の極限がゴールデンラティオの不確実性となるような極限とが導かれる。

We introduce the super-qubit quantum state, determined by superposition of the zero and the one super-particle states, which can be represented by points on the super-Bloch sphere. In contrast to the one qubit case, the one super-particle state is characterized by points in extended complex plain, equivalent to another super-Bloch sphere. Then, geometrically, the super-qubit quantum state is represented by two unit spheres, or the direct product of two Bloch spheres. By using the displacement operator, acting on the super-qubit state as the reference state, we construct the super-coherent states, becoming eigenstates of the super-annihilation operator, and characterized by three complex numbers, the displacement parameter and stereographic projections of two super-Bloch spheres. The states are fermion-boson entangled, and the concurrence of states is the product of two concurrences, corresponding to two Bloch spheres. We show geometrical meaning of concurrence as distance from point-state on the sphere to vertical axes - the radius of circle at horizontal plane through the point-state. Then, probabilities of collapse to the north pole state and to the south pole state are equal to half-distances from vertical coordinate of the state to corresponding points at the poles. For complimentary fermion number operator, we get the flipped super-qubit state and corresponding super-coherent state, as eigenstate of transposed super-annihilation operator. The infinite set of Fibonacci oscillating circles in complex plain, and corresponding set of quantum states with uncertainty relations as the ratio of two Fibonacci numbers, and in the limit at infinity becoming the Golden Ratio uncertainty, is derived.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# RespDiff: PPG信号からの呼吸波形推定のためのマルチスケールRNN拡散モデル

RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals ( http://arxiv.org/abs/2410.04366v1 )

ライセンス: Link先を確認

Yuyang Miao, Zehua Chen, Chang Li, Danilo Mandic,

(参考訳) 呼吸率(RR)は、しばしば不都合なシナリオ下で監視される重要な健康指標であり、継続的なモニタリングの実用性を制限する。 Photoplethysmography(PPG)センサーは、ますますウェアラブルデバイスに統合され、ポータブルな方法でRRを継続的に推定する機会を提供する。本稿では,PSG信号からの呼吸波形推定のためのエンドツーエンドマルチスケールRNN拡散モデルであるRespDiffを提案する。 RespDiffは手作りの機能や低品質信号セグメントの排除を必要としないため、現実のシナリオに適している。モデルはマルチスケールエンコーダを使用し、異なる解像度で特徴を抽出し、双方向RNNを使用してPSG信号を処理し、呼吸波形を抽出する。さらに、モデルをさらに最適化するためにスペクトル損失項が導入された。 BIDMCデータセットで実施された実験では、RespDiffはRR推定の1.18bpmの平均絶対誤差(MAE)を達成し、他のものは1.66bpmから2.15bpmの範囲で達成し、実際の応用における堅牢で正確な呼吸モニタリングの可能性を示している。

Respiratory rate (RR) is a critical health indicator often monitored under inconvenient scenarios, limiting its practicality for continuous monitoring. Photoplethysmography (PPG) sensors, increasingly integrated into wearable devices, offer a chance to continuously estimate RR in a portable manner. In this paper, we propose RespDiff, an end-to-end multi-scale RNN diffusion model for respiratory waveform estimation from PPG signals. RespDiff does not require hand-crafted features or the exclusion of low-quality signal segments, making it suitable for real-world scenarios. The model employs multi-scale encoders, to extract features at different resolutions, and a bidirectional RNN to process PPG signals and extract respiratory waveform. Additionally, a spectral loss term is introduced to optimize the model further. Experiments conducted on the BIDMC dataset demonstrate that RespDiff outperforms notable previous works, achieving a mean absolute error (MAE) of 1.18 bpm for RR estimation while others range from 1.66 to 2.15 bpm, showing its potential for robust and accurate respiratory monitoring in real-world applications.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# ランダムトランスのアルゴリズム機能

Algorithmic Capabilities of Random Transformers ( http://arxiv.org/abs/2410.04368v1 )

ライセンス: Link先を確認

Ziqian Zhong, Jacob Andreas,

(参考訳) トレーニングされたトランスモデルは、算術や連想的リコールのようなタスクの解釈可能なプロシージャを実装することが知られているが、これらのプロシージャを実装する回路がトレーニング中にどのように発生するかはほとんど分かっていない。モデルに提供される監視信号にどの程度依存するか、トレーニング開始時のモデルにすでに存在する振る舞いにどの程度寄与するか? そこで本研究では,組込み層のみを最適化したランダム初期化変換器を用いて,データから学習可能な入出力マッピングが,ランダム初期化モデルによって既に実装されている(符号化方式の選択まで)関数であることを示す。これらのランダムトランスフォーマーは、モジュラー演算、インウェイト、コンテキスト内連想リコール、十進加算、括弧バランス、さらには自然言語テキスト生成のいくつかの側面を含む、幅広い意味あるアルゴリズムタスクを実行できる。以上の結果から,これらのモデルが訓練される前であっても,トランスフォーマ(かつ適切な構造化された入力を通じてアクセス可能な)にアルゴリズム能力が存在することが示唆された。コードはhttps://github.com/fjzzq2002/random_transformersで入手できる。

Trained transformer models have been found to implement interpretable procedures for tasks like arithmetic and associative recall, but little is understood about how the circuits that implement these procedures originate during training. To what extent do they depend on the supervisory signal provided to models, and to what extent are they attributable to behavior already present in models at the beginning of training? To investigate these questions, we investigate what functions can be learned by randomly initialized transformers in which only the embedding layers are optimized, so that the only input--output mappings learnable from data are those already implemented (up to a choice of encoding scheme) by the randomly initialized model. We find that these random transformers can perform a wide range of meaningful algorithmic tasks, including modular arithmetic, in-weights and in-context associative recall, decimal addition, parenthesis balancing, and even some aspects of natural language text generation. Our results indicate that some algorithmic capabilities are present in transformers (and accessible via appropriately structured inputs) even before these models are trained. Code is available at https://github.com/fjzzq2002/random_transformers.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# DiffusionFake: ガイド付き安定拡散によるディープフェイク検出における一般化の促進

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion ( http://arxiv.org/abs/2410.04372v1 )

ライセンス: Link先を確認

Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji,

(参考訳) Deepfakeテクノロジーの急速な進歩により、顔交換は非常に現実的になり、偽造された顔コンテンツの悪意ある使用に対する懸念が高まっている。既存の方法は、顔操作の多様な性質のため、目に見えない領域に一般化するのに苦労することが多い。本稿では、生成過程を再検討し、普遍原理を同定する: ディープフェイク画像は、本質的に、ソースとターゲットの同一性の両方の情報を含んでいるが、真の顔は、一貫した同一性を維持している。この知見に基づいて,顔偽造の生成過程を逆転させて検出モデルの一般化を促進する新しいプラグ・アンド・プレイフレームワークであるDiffusionFakeを紹介した。 DiffusionFakeは、検出モデルによって抽出された特徴を凍結したトレーニング済みの安定拡散モデルに注入し、対応するターゲットとソースイメージを再構築する。このガイド付き再構成プロセスは、検出ネットワークを制約して、ソースとターゲットに関する特徴を捕捉し、再構成を容易にし、その結果、目に見えない偽造に対してより回復力のある、リッチで非絡み合った表現を学習する。大規模な実験により、DiffusionFakeは推論中に追加パラメータを導入することなく、様々な検出器アーキテクチャのドメイン間一般化を大幅に改善することが示された。私たちのコードはhttps://github.com/skJack/DiffusionFake.gitで利用可能です。

The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. In this paper, we revisit the generation process and identify a universal principle: Deepfake images inherently contain information from both source and target identities, while genuine faces maintain a consistent identity. Building upon this insight, we introduce DiffusionFake, a novel plug-and-play framework that reverses the generative process of face forgeries to enhance the generalization of detection models. DiffusionFake achieves this by injecting the features extracted by the detection model into a frozen pre-trained Stable Diffusion model, compelling it to reconstruct the corresponding target and source images. This guided reconstruction process constrains the detection network to capture the source and target related features to facilitate the reconstruction, thereby learning rich and disentangled representations that are more resilient to unseen forgeries. Extensive experiments demonstrate that DiffusionFake significantly improves cross-domain generalization of various detector architectures without introducing additional parameters during inference. Our Codes are available in https://github.com/skJack/DiffusionFake.git.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# 学習効率の良い生成可能な純状態の計算複雑性

Computational Complexity of Learning Efficiently Generatable Pure States ( http://arxiv.org/abs/2410.04373v1 )

ライセンス: Link先を確認

Taiga Hiroka, Min-Hsiu Hsieh,

(参考訳) 様々な学習モデルにおける学習効率のよい古典的プログラムの計算複雑性を理解することは、古典的学習理論において基礎的で重要な問題である。本研究では,Kearnsらによって導入された分散学習の量子一般化として見ることができる量子状態学習の計算複雑性について検討する。 Chung と Lin [TQC21] と B\u{a}descu と O$'$Donnell [STOC21] による以前の研究は、量子状態学習のサンプルの複雑さを研究し、未知の量子状態が効率的に生成可能であれば多項式コピーが十分であることを示した。しかし、アルゴリズムは非効率であり、この学習問題の計算複雑性は未解決のままである。本研究では、状態が効率的に生成可能であることを約束する量子状態学習の計算複雑性について検討する。未知の量子状態が純粋状態であることを約束し、効率的に生成可能であるなら、量子多項式時間アルゴリズム$A$と言語$L \in PP$が存在して、$A^L$はその古典的な記述を学べることを示す。また、学習量子状態の硬さと量子暗号の関連性も観察する。純粋な状態出力を持つ一方通行状態生成器の存在は、学習純状態の平均ケース硬度と等価であることを示す。さらに、EFIの存在は、混合状態の学習における平均的なケース硬さを意味することを示す。

Understanding the computational complexity of learning efficient classical programs in various learning models has been a fundamental and important question in classical computational learning theory. In this work, we study the computational complexity of quantum state learning, which can be seen as a quantum generalization of distributional learning introduced by Kearns et.al [STOC94]. Previous works by Chung and Lin [TQC21], and B\u{a}descu and O$'$Donnell [STOC21] study the sample complexity of the quantum state learning and show that polynomial copies are sufficient if unknown quantum states are promised efficiently generatable. However, their algorithms are inefficient, and the computational complexity of this learning problem remains unresolved. In this work, we study the computational complexity of quantum state learning when the states are promised to be efficiently generatable. We show that if unknown quantum states are promised to be pure states and efficiently generateable, then there exists a quantum polynomial time algorithm $A$ and a language $L \in PP$ such that $A^L$ can learn its classical description. We also observe the connection between the hardness of learning quantum states and quantum cryptography. We show that the existence of one-way state generators with pure state outputs is equivalent to the average-case hardness of learning pure states. Additionally, we show that the existence of EFI implies the average-case hardness of learning mixed states.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# 人間と敵対するテキストの関連性について

Suspiciousness of Adversarial Texts to Human ( http://arxiv.org/abs/2410.04377v1 )

ライセンス: Link先を確認

Shakila Mahjabin Tonni, Pedro Faustini, Mark Dras,

(参考訳) 敵対的な例は、画像ドメインとテキストドメインの両方にわたるディープニューラルネットワーク(DNN)に対して、微妙に変化した入力によってモデルパフォーマンスを低下させることを意図して、大きな課題となっている。しかし、敵対的テキストは、意味的類似性やテキスト内容の離散的な性質が要求されるため、敵対的画像とは異なっている。この研究は、人間の不審感という概念を掘り下げるものであり、画像に基づく敵の例に見られる非受容性に対する伝統的な焦点とは異なる品質である。敵対的変化が人間の目と区別できないように意図されている画像とは異なり、テキストの敵対的内容は、NLPシステムやバイパスフィルターを欺くことを目的としている場合でも、人間の読者にとって見つからない、あるいは目立たないままでいなければならない。本研究では、個人が敵対的文章をどのように知覚するかを分析することによって、人間の不審性の研究を拡大する。筆者らは,4つの広く使用されている対人攻撃法によって構築された,敵文の不審性に関する人間の評価に関する新たなデータセットを収集,公開し,機械による変化を検出する人間の能力との相関性を評価する。さらに,疑わしいテキスト生成における疑わしさを軽減するために,疑わしさを定量化し,今後の研究のベースラインを確立するための回帰モデルを構築した。また、回帰器が生成した疑わしいスコアが、コンピュータ生成と見なされる可能性が低いテキストを生成するために、逆生成方法にどのように組み込まれるかを示す。人間の不審な注釈付きデータとコードを利用できるようにします。

Adversarial examples pose a significant challenge to deep neural networks (DNNs) across both image and text domains, with the intent to degrade model performance through meticulously altered inputs. Adversarial texts, however, are distinct from adversarial images due to their requirement for semantic similarity and the discrete nature of the textual contents. This study delves into the concept of human suspiciousness, a quality distinct from the traditional focus on imperceptibility found in image-based adversarial examples. Unlike images, where adversarial changes are meant to be indistinguishable to the human eye, textual adversarial content must often remain undetected or non-suspicious to human readers, even when the text's purpose is to deceive NLP systems or bypass filters. In this research, we expand the study of human suspiciousness by analyzing how individuals perceive adversarial texts. We gather and publish a novel dataset of Likert-scale human evaluations on the suspiciousness of adversarial sentences, crafted by four widely used adversarial attack methods and assess their correlation with the human ability to detect machine-generated alterations. Additionally, we develop a regression-based model to quantify suspiciousness and establish a baseline for future research in reducing the suspiciousness in adversarial text generation. We also demonstrate how the regressor-generated suspicious scores can be incorporated into adversarial generation methods to produce texts that are less likely to be perceived as computer-generated. We make our human suspiciousness annotated data and our code available.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# 半古典的アプローチにおける対のゆらぎの包含:ジョセフソン効果の研究の場合

Inclusion of pairing fluctuations in a semiclassical approach: The case of study of the Josephson effect ( http://arxiv.org/abs/2410.04382v1 )

ライセンス: Link先を確認

Verdiana Piselli, Leonardo Pisani, Giancarlo Calvanese Strinati,

(参考訳) 半古典的手法の最近の改良を概観し、非自明な空間幾何学の存在下での不均一な局所ギャップパラメータを記述し、同時に平均場を超えたペアリング変動を考慮した。この手法を用いて、超低温フェルミガスを用いた最近の実験に関する幅広い物理条件に対するジョセフソン効果を記述する。

Recent refinements on a semiclassical approach are reviewed, aiming at describing the inhomogeneous local gap parameter in the presence of non-trivial spatial geometries and at taking into account at the same time pairing fluctuations beyond mean field. The method is applied to describe the Josephson effect over the wide range of physical conditions related to recent experiments on this topic performed with ultra-cold Fermi gases.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# BrainCodec:認知脳状態の復号のためのニューラルfMRIコーデック

BrainCodec: Neural fMRI codec for the decoding of cognitive brain states ( http://arxiv.org/abs/2410.04383v1 )

ライセンス: Link先を確認

Yuto Nishimura, Masataka Sawayama, Ayumu Yamashita, Hideki Nakayama, Kaoru Amano,

(参考訳) 近年、ディープラーニングにおけるビッグデータの活用は、fMRIデータを用いたメンタルステートデコーディングなどのアプリケーションで確認されたように、大幅なパフォーマンス向上につながっている。しかし、fMRIデータセットの規模は比較的小さく、fMRIデータにおける低信号対雑音比(SNR)の固有の問題は、これらの課題をさらに悪化させる。これを解決するために、fMRIデータの前処理ステップとして圧縮技術を適用する。ニューラルオーディオコーデックに触発された新しいfMRIコーデックであるBrainCodecを提案する。我々は、ブレインコーデックの精神状態復号における圧縮能力を評価し、従来の方法よりもさらに改善したことを示す。さらに、BrainCodecを用いて得られた潜伏表現を分析し、タスクと静止状態fMRIの類似点と相違点を解明し、BrainCodecの解釈可能性を強調した。また,BrainCodecを用いたfMRI再構成により,高いSNRを達成し,脳活動の可視性を高めることが実証された。我々の研究は、BrainCodecが従来の方法よりも性能を高めるだけでなく、ニューロサイエンスに新たな分析可能性をもたらすことを示している。私たちのコード、データセット、モデルウェイトはhttps://github.com/amano-k-lab/BrainCodec.comで公開されています。

Recently, leveraging big data in deep learning has led to significant performance improvements, as confirmed in applications like mental state decoding using fMRI data. However, fMRI datasets remain relatively small in scale, and the inherent issue of low signal-to-noise ratios (SNR) in fMRI data further exacerbates these challenges. To address this, we apply compression techniques as a preprocessing step for fMRI data. We propose BrainCodec, a novel fMRI codec inspired by the neural audio codec. We evaluated BrainCodec's compression capability in mental state decoding, demonstrating further improvements over previous methods. Furthermore, we analyzed the latent representations obtained through BrainCodec, elucidating the similarities and differences between task and resting state fMRI, highlighting the interpretability of BrainCodec. Additionally, we demonstrated that fMRI reconstructions using BrainCodec can enhance the visibility of brain activity by achieving higher SNR, suggesting its potential as a novel denoising method. Our study shows that BrainCodec not only enhances performance over previous methods but also offers new analytical possibilities for neuroscience. Our codes, dataset, and model weights are available at https://github.com/amano-k-lab/BrainCodec.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# データ分散評価

Data Distribution Valuation ( http://arxiv.org/abs/2410.04386v1 )

ライセンス: Link先を確認

Xinyi Xu, Shuaiqi Wang, Chuan-Sheng Foo, Bryan Kian Hsiang Low, Giulia Fanti,

(参考訳) データアベイラビリティ(Data valuation)は、データマーケットプレースにおける価格などのアプリケーションのデータ価値を定量的に評価するテクニックのクラスである。既存のデータバリュエーションメソッドは、離散データセットの値を定義します。しかし、多くのユースケースでは、ユーザはデータセットの値だけでなく、データセットがサンプリングされた分布の値にも興味を持っています。例えば、異なるベンダーからデータを購入するかどうかを評価しようとする買い手について考えてみましょう。購入者は、各ベンダーの小さなプレビューサンプルのみを観察して、購入者および購入者に最も有用なベンダーのデータ配布を決定することができる。中心的な疑問は、サンプルからのデータ分散値を比較するにはどうすればよいか、ということです。本研究では, ベンダー間のデータ不均一性を特徴付けるHuber の評価手法として, サンプルからのデータ分布を比較するための理論的に原理化された, 行動可能なポリシーを実現するための, MMD に基づく評価手法を提案する。実世界の複数のデータセット(例えば、ネットワーク侵入検出、クレジットカード不正検出)や下流アプリケーション(分類、回帰)において、本手法はサンプル効率が高く、複数の既存ベースラインに対して有意義なデータ分布を特定するのに有効であることを示す。

Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces. Existing data valuation methods define a value for a discrete dataset. However, in many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. For example, consider a buyer trying to evaluate whether to purchase data from different vendors. The buyer may observe (and compare) only a small preview sample from each vendor, to decide which vendor's data distribution is most useful to the buyer and purchase. The core question is how should we compare the values of data distributions from their samples? Under a Huber characterization of the data heterogeneity across vendors, we propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies for comparing data distributions from samples. We empirically demonstrate that our method is sample-efficient and effective in identifying valuable data distributions against several existing baselines, on multiple real-world datasets (e.g., network intrusion detection, credit card fraud detection) and downstream applications (classification, regression).

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# WISE: ビジネス・プロセスのメトリクスをドメイン・知識で解き放つ

WISE: Unraveling Business Process Metrics with Domain Knowledge ( http://arxiv.org/abs/2410.04387v1 )

ライセンス: Link先を確認

Urszula Jessen, Dirk Fahland,

(参考訳) 複雑な産業プロセスの異常は、しばしば、イベントデータの高い変動性と複雑さによって隠蔽され、プロセスマイニングによるその識別と解釈を妨げる。この問題に対処するために、ドメイン知識、プロセスマイニング、機械学習の統合を通じてビジネスプロセスメトリクスを分析する新しい方法であるWISE(Weighted Insights for Evaluating Effective)を紹介する。この方法論は、ビジネス目標を定義し、アクティビティレベルで重み付けされた制約のあるプロセスノルムを確立することを含み、ドメインの専門家やプロセスアナリストからのインプットを取り入れます。個々のプロセスインスタンスはこれらの制約に基づいてスコアされ、スコアはプロセスのゴールに影響を与える特徴を特定するために正規化されます。 BPIC 2019データセットと実際の産業状況を用いた評価は、WISEがビジネスプロセス分析の自動化を強化し、望ましいプロセスフローからの逸脱を効果的に検出することを示している。 LLMは分析をサポートするが、ドメインの専門家が加わったことにより、発見の正確さと妥当性が保証される。

Anomalies in complex industrial processes are often obscured by high variability and complexity of event data, which hinders their identification and interpretation using process mining. To address this problem, we introduce WISE (Weighted Insights for Evaluating Efficiency), a novel method for analyzing business process metrics through the integration of domain knowledge, process mining, and machine learning. The methodology involves defining business goals and establishing Process Norms with weighted constraints at the activity level, incorporating input from domain experts and process analysts. Individual process instances are scored based on these constraints, and the scores are normalized to identify features impacting process goals. Evaluation using the BPIC 2019 dataset and real industrial contexts demonstrates that WISE enhances automation in business process analysis and effectively detects deviations from desired process flows. While LLMs support the analysis, the inclusion of domain experts ensures the accuracy and relevance of the findings.

翻訳日:2024-11-02 08:10:32 公開日:2024-10-06

# モンテカルロ予測最大化を用いた音響的空間キャプチャーの近似最大推定

Approximate Maximum Likelihood Inference for Acoustic Spatial Capture-Recapture with Unknown Identities, Using Monte Carlo Expectation Maximization ( http://arxiv.org/abs/2410.04390v1 )

ライセンス: Link先を確認

Yuheng Wang, Juan Ye, Weiye Li, David L. Borchers,

(参考訳) 音響空間キャプチャー(ASCR)サーベイは、動物の密度を推定したり、呼び出し密度を推定するのに有効な方法である。しかし、ASCR分析に必要なキャプチャ履歴を構築することは困難であり、異なる検出器でのどの検出が、どの呼び出しが自明なタスクであるかを認識することは難しい。異なる距離からの呼び出しは検知器に到達するのに異なる時間を要するため、呼び出しが検出される順序は、その呼び出しが実行される順序と必ずしも同じではなく、どの検出が同じ呼び出しであるかがわからなければ、どのくらいの異なる呼び出しが検出されるかはわからない。本稿では,モンテカルロ予測最大化(MCEM)推定法を提案する。この文脈でMCEM法を実装するために、予測ステップで完全データ確率モデルから潜伏変数をサンプリングし、最大化ステップで半完全データ確率または条件付き確率を使用する。パラメトリックブートストラップを用いて信頼区間を求める。本手法をカスカエル調査に適用すると, 専門家が作成した呼取履歴データを用いて得られた推定値の15%以内を推定し, 後者と異なり, この信頼区間は呼出同一性に関する不確実性を含む。シミュレーションでは、バイアス(6%)が低く、カバー確率が95%に近いことが示されている。

Acoustic spatial capture-recapture (ASCR) surveys with an array of synchronized acoustic detectors can be an effective way of estimating animal density or call density. However, constructing the capture histories required for ASCR analysis is challenging, as recognizing which detections at different detectors are of which calls is not a trivial task. Because calls from different distances take different times to arrive at detectors, the order in which calls are detected is not necessarily the same as the order in which they are made, and without knowing which detections are of the same call, we do not know how many different calls are detected. We propose a Monte Carlo expectation-maximization (MCEM) estimation method to resolve this unknown call identity problem. To implement the MCEM method in this context, we sample the latent variables from a complete-data likelihood model in the expectation step and use a semi-complete-data likelihood or conditional likelihood in the maximization step. We use a parametric bootstrap to obtain confidence intervals. When we apply our method to a survey of moss frogs, it gives an estimate within 15% of the estimate obtained using data with call capture histories constructed by experts, and unlike this latter estimate, our confidence interval incorporates the uncertainty about call identities. Simulations show it to have a low bias (6%) and coverage probabilities close to the nominal 95% value.

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# Recursively Subdivided Tetrahedra を用いた変形性NeRF

Deformable NeRF using Recursively Subdivided Tetrahedra ( http://arxiv.org/abs/2410.04402v1 )

ライセンス: Link先を確認

Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang,

(参考訳) ニューラル放射場(NeRF)は、新しいビュー合成において有望であるが、その暗黙的な表現は、オブジェクト操作に対する明示的な制御を制限する。既存の研究では、変形を可能にするための明示的な幾何学的プロキシの統合が提案されている。しかし、これらの手法は2つの大きな課題に直面している: 第一に、時間がかかり、計算的に要求される四面体化プロセス; 第二に、複雑な構造や細い構造を扱うことは、過度に、貯蔵集約的な四面体メッシュか、変形能力を損なう品質の悪いもののいずれかにつながる。これらの課題に対処するために,四面体メッシュのマニピュラビリティと特徴格子表現の高品質なレンダリング機能とをシームレスに統合するDeformRFを提案する。各物体に対する不整形四面体と四面体化を避けるため, 2段階の訓練戦略を提案する。ほぼ規則な四面体格子から始めると、このモデルは最初、物体を囲むキーテトラヘドラを保持し、その後、2段目においてより微細な粒度メッシュを用いてオブジェクトの詳細を洗練する。また,高分解能メッシュを暗黙的に生成するために,再帰的に分割するテトラヘドラの概念も提示する。これにより、第1のトレーニング段階で発生する粗い四面体メッシュの保存のみを必要としながら、マルチレゾリューション符号化が可能となる。合成データと実撮データの両方でDeformRFを総合的に評価する。定量的および定性的な結果は,新しいビュー合成および変形タスクにおける本手法の有効性を示すものである。プロジェクトページ:https://ustc3dv.github.io/DeformRF/

While neural radiance fields (NeRF) have shown promise in novel view synthesis, their implicit representation limits explicit control over object manipulation. Existing research has proposed the integration of explicit geometric proxies to enable deformation. However, these methods face two primary challenges: firstly, the time-consuming and computationally demanding tetrahedralization process; and secondly, handling complex or thin structures often leads to either excessive, storage-intensive tetrahedral meshes or poor-quality ones that impair deformation capabilities. To address these challenges, we propose DeformRF, a method that seamlessly integrates the manipulability of tetrahedral meshes with the high-quality rendering capabilities of feature grid representations. To avoid ill-shaped tetrahedra and tetrahedralization for each object, we propose a two-stage training strategy. Starting with an almost-regular tetrahedral grid, our model initially retains key tetrahedra surrounding the object and subsequently refines object details using finer-granularity mesh in the second stage. We also present the concept of recursively subdivided tetrahedra to create higher-resolution meshes implicitly. This enables multi-resolution encoding while only necessitating the storage of the coarse tetrahedral mesh generated in the first training stage. We conduct a comprehensive evaluation of our DeformRF on both synthetic and real-captured datasets. Both quantitative and qualitative results demonstrate the effectiveness of our method for novel view synthesis and deformation tasks. Project page: https://ustc3dv.github.io/DeformRF/

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# CiMaTe: メインテキストを効果的に活用するCitation Count予測

CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text ( http://arxiv.org/abs/2410.04404v1 )

ライセンス: Link先を確認

Jun Hirako, Ryohei Sasano, Koichi Takeda,

(参考訳) 今後,論文の引用数を予測することは,論文数が増え続ける中で,興味深い論文を見つける上でますます重要である。論文の本文は引用数予測において重要な要素であるが,本文は典型的に非常に長いため,機械学習モデルでは処理が困難である。本稿では,論文の断面構造を明示的に把握し,主文を利用したBERTに基づく引用数予測モデルCiMaTeを提案する。計算言語学および生物学領域の論文による実験を通じて、スピアマンのランク相関係数(計算言語学領域の5.1点、生物学領域の1.8点)において、CiMaTeの有効性を実証した。

Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to handle in machine learning models because the main text is typically very long; thus previous studies have not fully explored how to leverage it. In this paper, we propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure. Through experiments with papers from computational linguistics and biology domains, we demonstrate the CiMaTe's effectiveness, outperforming the previous methods in Spearman's rank correlation coefficient; 5.1 points in the computational linguistics domain and 1.8 points in the biology domain.

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# Lens: 大規模言語モデルの多言語拡張を再考する

Lens: Rethinking Multilingual Enhancement for Large Language Models ( http://arxiv.org/abs/2410.04407v1 )

ライセンス: Link先を確認

Weixiang Zhao, Yulin Hu, Jiahe Guo, Xingyu Sui, Tongtong Wu, Yang Deng, Yanyan Zhao, Bing Qin, Wanxiang Che, Ting Liu,

(参考訳) 多様な言語背景を持つユーザ向けの大規模言語モデル(LLM)の世界的な需要が高まっているにもかかわらず、最先端のLLMのほとんどは英語中心のままである。これにより、言語間でのパフォーマンスギャップが生じ、非英語話者の高度なAIサービスへのアクセスが制限される。現在の多言語機能向上手法は、多言語命令チューニングや連続的事前学習といったデータ駆動型後学習技術に大きく依存している。しかし、これらのアプローチは、高品質な多言語データセットの不足や、多言語機能の制限された拡張など、重大な課題に直面している。彼らはしばしば標的外問題や中央言語能力の破滅的な忘れ込みに悩まされる。この目的のために、Lensは、内部言語表現空間を活用することで、LLMの多言語機能を強化するための新しいアプローチである。特にLensは、LLMの上位層から言語に依存しない、言語固有のサブ空間内の隠された表現を操作することで動作する。中央言語をピボットとして使用すると、ターゲット言語は言語に依存しない部分空間内でそれに近い位置に描画されるため、十分に確立されたセマンティック表現を継承することができる。一方、言語固有の部分空間では、ターゲット言語と中央言語の表現が切り離され、ターゲット言語自体が明確に表現される。 1つの英語中心のLLMと2つの多言語LLMの広範な実験により、Lensはバックボーンモデルの本来の中央言語能力を犠牲にすることなく、多言語のパフォーマンスを効果的に向上し、既存の訓練後のアプローチと比べて計算資源をはるかに少なくして優れた結果が得られることを示した。

Despite the growing global demand for large language models (LLMs) that serve users from diverse linguistic backgrounds, most cutting-edge LLMs remain predominantly English-centric. This creates a performance gap across languages, restricting access to advanced AI services for non-English speakers. Current methods to enhance multilingual capabilities largely rely on data-driven post-training techniques, such as multilingual instruction tuning or continual pre-training. However, these approaches encounter significant challenges, including the scarcity of high-quality multilingual datasets and the limited enhancement of multilingual capabilities. They often suffer from off-target issues and catastrophic forgetting of central language abilities. To this end, we propose Lens, a novel approach to enhance multilingual capabilities of LLMs by leveraging their internal language representation spaces. Specially, Lens operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. Using the central language as a pivot, the target language is drawn closer to it within the language-agnostic subspace, allowing it to inherit well-established semantic representations. Meanwhile, in the language-specific subspace, the representations of the target and central languages are pushed apart, enabling the target language to express itself distinctly. Extensive experiments on one English-centric and two multilingual LLMs demonstrate that Lens effectively improves multilingual performance without sacrificing the original central language capabilities of the backbone model, achieving superior results with much fewer computational resources compared to existing post-training approaches.

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# 低次グラフ上での最大カットのための量子近似最適化アルゴリズム

Quantum Approximate Optimization Algorithms for Maxmimum Cut on Low-Girth Graphs ( http://arxiv.org/abs/2410.04409v1 )

ライセンス: Link先を確認

Tongyang Li, Yuexin Su, Ziyi Yang, Shengyu Zhang,

(参考訳) グラフ上の最大カット(MaxCut)は古典的なNPハード問題である。量子コンピューティングにおいて、Farhi、Gutmann、GoldstoneはMaxCutの問題を解決するためにQuantum Approximate Optimization Algorithm (QAOA)を提案した。カット分数(全エッジの出力カットのエッジの分数)に対する保証は、主に長い周期しか持たないグラフに対して研究された。一方、低木グラフは理論計算機科学においてユビキタスであり、拡張グラフは理論上およびそれ以上に広く応用された優れた例である。本稿では、加法積グラフとして知られるMohantyとO'Donnellによって提案された拡張グラフの集合上で、MaxCutにQAOAを適用する。さらに,多角QAOA (ma-QAOA) を用いて,加算積グラフのグラフ構造をよりよく活用する。理論的には、そのようなグラフの期待切断率を計算するための反復公式を導出する。一方,古典的局所アルゴリズムとQAOAを一定深度で比較するため,数値実験を行った。以上の結果から,QAOAはいくつかの付加積グラフで0.3%から5.2%,ma-QAOAは0.6%から2.5%でこの優位性を高めた。特に,ma-QAOAはよく知られた古典的アルゴリズムよりも優れているが,QAOAはそうではない。さらに、我々は実験をタイリンググリッドグラフのような平面グラフに拡張し、QAOAが有利であることを示す。

Maximum cut (MaxCut) on graphs is a classic NP-hard problem. In quantum computing, Farhi, Gutmann, and Goldstone proposed the Quantum Approximate Optimization Algorithm (QAOA) for solving the MaxCut problem. Its guarantee on cut fraction (the fraction of edges in the output cut over all edges) was mainly studied for high-girth graphs, i.e., graphs with only long cycles. On the other hand, low-girth graphs are ubiquitous in theoretical computer science, including expander graphs being outstanding examples with wide applications in theory and beyond. In this paper, we apply QAOA to MaxCut on a set of expander graphs proposed by Mohanty and O'Donnell known as additive product graphs. Additionally, we apply multi-angle QAOA (ma-QAOA) to better utilize the graph structure of additive product graphs in ansatz design. In theory, we derive an iterative formula to calculate the expected cut fraction of such graphs. On the other hand, we conduct numerical experiments to compare between best-known classical local algorithms and QAOA with constant depth. Our results demonstrate that QAOA outperforms the best-known classical algorithms by 0.3% to 5.2% on several additive product graphs, while ma-QAOA further enhances this advantage by an additional 0.6% to 2.5%. In particular, we observe cases that ma-QAOA exhibits superiority over best-known classical algorithms but QAOA does not. Furthermore, we extend our experiments to planar graphs such as tiling grid graphs, where QAOA also demonstrates an advantage.

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# Blocks Architecture (BloArk): Wikipediaの改訂履歴のための効率的で費用効果があり、インクリメンタルなデータセットアーキテクチャ

Blocks Architecture (BloArk): Efficient, Cost-Effective, and Incremental Dataset Architecture for Wikipedia Revision History ( http://arxiv.org/abs/2410.04410v1 )

ライセンス: Link先を確認

Lingxi Li, Zonghai Yao, Sunjae Kwon, Hong Yu,

(参考訳) ウィキペディア(ウィキペディア)は、自然言語処理(NLP)アプリケーションにおいて最も広く使われ、一般に公開されているリソースの1つである。 Wikipedia Revision History (WikiRevHist) は、ウィキページが最初に修正されてから編集された順序を示している。最も最新のWikiはトレーニングソースとして広く使われているが、WikiRevHistはNLPアプリケーションにとって貴重なリソースでもある。しかし、WikiRevHistの処理には十分なコンピュータリソースを必要とせず、さらなるカスタマイズや、他人の作業への適応に余分な時間を費やすことなく、不十分なツールがある。そこで我々はBlocks Architecture (BloArk) を報告した。BloArkは、実行時間、計算リソースの要求、WikiRevHistデータセットの処理における繰り返し処理を減らし、効率を重視したデータ処理アーキテクチャである。 BloArkは、ブロック、セグメント、倉庫の3つの部分で構成されている。それに加えて,コアデータ処理パイプライン – builder と modifier も構築しています。 BloArkビルダーは、オリジナルのWikiRevHistデータセットをXML構文からJSON行(JSONL)フォーマットに変換し、並列性とストレージ効率を改善する。 BloArk修飾器は、既存のデータベースの利用を改善し、他人の作業を再利用するコストを削減するために、以前製造された倉庫をインクリメンタルに改造する。最終的にBloArkは、Wikipediaのリビジョン履歴の処理と、下流のNLPユースケースのための既存のデータセットの漸進的な修正の両方で簡単にスケールアップできる。ソースコード、ドキュメンテーション、サンプルの使用例はオンラインで公開されており、GPL-2.0ライセンス下でオープンソース化されている。

Wikipedia (Wiki) is one of the most widely used and publicly available resources for natural language processing (NLP) applications. Wikipedia Revision History (WikiRevHist) shows the order in which edits were made to any Wiki page since its first modification. While the most up-to-date Wiki has been widely used as a training source, WikiRevHist can also be valuable resources for NLP applications. However, there are insufficient tools available to process WikiRevHist without having substantial computing resources, making additional customization, and spending extra time adapting others' works. Therefore, we report Blocks Architecture (BloArk), an efficiency-focused data processing architecture that reduces running time, computing resource requirements, and repeated works in processing WikiRevHist dataset. BloArk consists of three parts in its infrastructure: blocks, segments, and warehouses. On top of that, we build the core data processing pipeline: builder and modifier. The BloArk builder transforms the original WikiRevHist dataset from XML syntax into JSON Lines (JSONL) format for improving the concurrent and storage efficiency. The BloArk modifier takes previously-built warehouses to operate incremental modifications for improving the utilization of existing databases and reducing the cost of reusing others' works. In the end, BloArk can scale up easily in both processing Wikipedia Revision History and incrementally modifying existing dataset for downstream NLP use cases. The source code, documentations, and example usages are publicly available online and open-sourced under GPL-2.0 license.

翻訳日:2024-11-02 08:00:46 公開日:2024-10-06

# DAdEE: 早期PLMにおける教師なしドメイン適応

DAdEE: Unsupervised Domain Adaptation in Early Exit PLMs ( http://arxiv.org/abs/2410.04424v1 )

ライセンス: Link先を確認

Divya Jyoti Bajpai, Manjesh Kumar Hanawal,

(参考訳) 事前訓練された言語モデル(PLM)は、自己スーパービジョンを用いて様々なタスクにわたって高い精度と一般化能力を示すが、その大きなサイズは高い推論遅延をもたらす。 Early Exit(EE)戦略は、中間層に取り付けられた分類器からサンプルを退避させることで問題に対処するが、出口分類器はドメインの変更に敏感であるため、それらをうまく一般化しない。これを解決するために,知識蒸留を用いた多段階適応を用いた非教師付き領域適応型EEフレームワーク(DADEE)を提案する。 DADEEは、各レイヤでのGANベースの逆順応を利用してドメイン不変表現を実現し、ソースとターゲットドメイン間のすべてのレイヤ間のドメインギャップを減らします。取り付けられた出口は推論をスピードアップするだけでなく、破滅的な忘れ込みとモード崩壊を減らすことでドメイン適応を向上させるため、現実世界のシナリオにより適している。感情分析やエンテーメント分類、自然言語推論といったタスクの実験では、DADEEは早期終了法だけでなく、ドメインシフトシナリオ下での様々なドメイン適応法よりも一貫して優れていることが示されている。匿名のソースコードはhttps://github.com/Div290/DAdEEで入手できる。

Pre-trained Language Models (PLMs) exhibit good accuracy and generalization ability across various tasks using self-supervision, but their large size results in high inference latency. Early Exit (EE) strategies handle the issue by allowing the samples to exit from classifiers attached to the intermediary layers, but they do not generalize well, as exit classifiers can be sensitive to domain changes. To address this, we propose Unsupervised Domain Adaptation in EE framework (DADEE) that employs multi-level adaptation using knowledge distillation. DADEE utilizes GAN-based adversarial adaptation at each layer to achieve domain-invariant representations, reducing the domain gap between the source and target domain across all layers. The attached exits not only speed up inference but also enhance domain adaptation by reducing catastrophic forgetting and mode collapse, making it more suitable for real-world scenarios. Experiments on tasks such as sentiment analysis, entailment classification, and natural language inference demonstrate that DADEE consistently outperforms not only early exit methods but also various domain adaptation methods under domain shift scenarios. The anonymized source code is available at https://github.com/Div290/DAdEE.

翻訳日:2024-11-02 07:51:01 公開日:2024-10-06

# CoVLM:半教師付きマルチモーダルフェイクニュース検出のためのビジョンランゲージモデルからの合意の活用

CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection ( http://arxiv.org/abs/2410.04426v1 )

ライセンス: Link先を確認

Devank, Jayateja Kalla, Soma Biswas,

(参考訳) 本研究では,実画像と誤った字幕を組み合わせて偽ニュースを生成する,文脈外誤情報検出の現実的課題に対処する。このタスクの既存のアプローチは、大量のラベル付きデータの可用性を前提としています。対照的に、ラベル付き画像テキストペアの大規模なコーパスの取得はより容易であるため、本研究では、ラベル付き画像テキストペアの限られた数とラベル付き画像テキストペアの大規模なコーパスにアクセス可能な半教師付きプロトコルを提案する。さらに、偽ニュースの発生は実際のニュースよりもはるかに少ないため、データセットは極めて不均衡であり、タスクをさらに難しくする傾向にある。そこで本研究では,ラベル付きデータから得られた閾値を用いて,ラベル付きペアに対してロバストな擬似ラベルを生成する新しいフレームワークであるConsensus from Vision-Language Models (CoVLM)を提案する。このアプローチは、自信のある擬似ラベルを選択するためのモデルの正しいしきい値を自動的に決定できる。課題のある条件に対するベンチマークデータセットの実験結果と、最先端のアプローチとの比較により、我々のフレームワークの有効性が示されている。

In this work, we address the real-world, challenging task of out-of-context misinformation detection, where a real image is paired with an incorrect caption for creating fake news. Existing approaches for this task assume the availability of large amounts of labeled data, which is often impractical in real-world, since it requires extensive manual intervention and domain expertise. In contrast, since obtaining a large corpus of unlabeled image-text pairs is much easier, here, we propose a semi-supervised protocol, where the model has access to a limited number of labeled image-text pairs and a large corpus of unlabeled pairs. Additionally, the occurrence of fake news being much lesser compared to the real ones, the datasets tend to be highly imbalanced, thus making the task even more challenging. Towards this goal, we propose a novel framework, Consensus from Vision-Language Models (CoVLM), which generates robust pseudo-labels for unlabeled pairs using thresholds derived from the labeled data. This approach can automatically determine the right threshold parameters of the model for selecting the confident pseudo-labels. Experimental results on benchmark datasets across challenging conditions and comparisons with state-of-the-art approaches demonstrate the effectiveness of our framework.

翻訳日:2024-11-02 07:51:01 公開日:2024-10-06

# CAPEEN:早期退院と知識蒸留による画像キャプション

CAPEEN: Image Captioning with Early Exits and Knowledge Distillation ( http://arxiv.org/abs/2410.04433v1 )

ライセンス: Link先を確認

Divya Jyoti Bajpai, Manjesh Kumar Hanawal,

(参考訳) ディープニューラルネットワーク(DNN)は、視覚要素を認識し、画像キャプションタスクで記述的なテキストを生成することで大きな進歩を遂げている。しかし、その性能改善は計算負荷の増加と推論遅延によるものである。 Early Exit(EE)戦略は効率を高めるために使用できるが、その適応は正確な予測のために様々なレベルの意味情報を必要とするため、画像キャプションにおける課題を示す。そこで我々は,知識蒸留を用いたEE戦略の性能向上のためにCAPEENを導入する。予測信頼度がトレーニングデータから得られた予め定義された値を超えると、CAPEENの推論は中間層で完了する。トレーニングサンプルから目標分布をドリフトできる実世界の展開を考慮し,Multiarmed banditsフレームワークを用いて,フライ時のしきい値に適応する改良型A-CAPEENを提案する。 MS COCOとFlickr30kデータセットの実験では、CAPEENは最終層と比較して競争性能を維持しながら1.77倍のスピードアップを示し、A-CAPEENは歪みに対して堅牢性を提供する。ソースコードはhttps://github.com/Div290/CapEENで入手できる。

Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency. Early Exit (EE) strategies can be used to enhance their efficiency, but their adaptation presents challenges in image captioning as it requires varying levels of semantic information for accurate predictions. To overcome this, we introduce CAPEEN to improve the performance of EE strategies using knowledge distillation. Inference in CAPEEN is completed at intermediary layers if prediction confidence exceeds a predefined value learned from the training data. To account for real-world deployments, where target distributions could drift from that of training samples, we introduce a variant A-CAPEEN to adapt the thresholds on the fly using Multiarmed bandits framework. Experiments on the MS COCO and Flickr30k datasets show that CAPEEN gains speedup of 1.77x while maintaining competitive performance compared to the final layer, and A-CAPEEN additionally offers robustness against distortions. The source code is available at https://github.com/Div290/CapEEN

翻訳日:2024-11-02 07:51:01 公開日:2024-10-06

# UNetの数学的説明

A Mathematical Explanation of UNet ( http://arxiv.org/abs/2410.04434v1 )

ライセンス: Link先を確認

Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li,

(参考訳) UNetアーキテクチャはイメージセグメンテーションを変換した。 UNetの汎用性と精度は広く採用され、画像の機械学習問題に大きく依存している。本稿では,UNetの簡潔な数学的説明を行う。 UNetの各コンポーネントの意味と機能について説明する。 UNetが制御問題を解決していることを示します。マルチグリッド法を用いて制御変数を分解する。次に、演算子分割技術を用いて、そのアーキテクチャがUNetアーキテクチャを正確に回復する問題の解決を行う。この結果から,UNetは制御問題に対する一段階演算子分割アルゴリズムであることがわかった。

The UNet architecture has transformed image segmentation. UNet's versatility and accuracy have driven its widespread adoption, significantly advancing fields reliant on machine learning problems with images. In this work, we give a clear and concise mathematical explanation of UNet. We explain what is the meaning and function of each of the components of UNet. We will show that UNet is solving a control problem. We decompose the control variables using multigrid methods. Then, operator-splitting techniques is used to solve the problem, whose architecture exactly recovers the UNet architecture. Our result shows that UNet is a one-step operator-splitting algorithm for the control problem.

翻訳日:2024-11-02 07:51:01 公開日:2024-10-06

# QKAN:Quantum Kolmogorov-Arnold Networks

QKAN: Quantum Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2410.04435v1 )

ライセンス: Link先を確認

Petr Ivashkov, Po-Wei Huang, Kelvin Koor, Lirandë Pira, Patrick Rebentrost,

(参考訳) 量子ハードウェアにおける学習モデルの可能性は、依然としてオープンな疑問である。しかし、量子機械学習の分野は、これらのモデルが量子実装をどのように活用できるかを永続的に探求している。近年、コルモゴロフ・アルノルド表現定理の構成構造に触発されて、コルモゴロフ・アルノルドネットワーク(KAN)と呼ばれる新しいニューラルネットワークアーキテクチャが出現している。本研究ではQKANと呼ばれる量子バージョンを設計する。我々のQKANは、量子特異値変換を含む強力な量子線型代数ツールを利用して、ネットワークの端にパラメータ化活性化関数を適用する。 QKANはブロックエンコーディングに基づいており、本質的に直接量子入力に適している。さらに,その漸近的複雑性を分析し,単一層からエンドツーエンドのニューラルネットワークアーキテクチャへ再帰的に構築する。 QKANのゲート複雑性は、入力と重みのためのブロックエンコーディングを構築するコストと線形にスケールし、高次元入力を持つタスクに広く適用可能であることを示唆している。 QKANは、パラメータ化された量子回路と確立された量子サブルーチンを組み合わせることで、トレーニング可能な量子機械学習モデルとして機能する。最後に,QKANアーキテクチャ構築に基づく多変量状態準備戦略を提案する。

The potential of learning models in quantum hardware remains an open question. Yet, the field of quantum machine learning persistently explores how these models can take advantage of quantum implementations. Recently, a new neural network architecture, called Kolmogorov-Arnold Networks (KAN), has emerged, inspired by the compositional structure of the Kolmogorov-Arnold representation theorem. In this work, we design a quantum version of KAN called QKAN. Our QKAN exploits powerful quantum linear algebra tools, including quantum singular value transformation, to apply parameterized activation functions on the edges of the network. QKAN is based on block-encodings, making it inherently suitable for direct quantum input. Furthermore, we analyze its asymptotic complexity, building recursively from a single layer to an end-to-end neural architecture. The gate complexity of QKAN scales linearly with the cost of constructing block-encodings for input and weights, suggesting broad applicability in tasks with high-dimensional input. QKAN serves as a trainable quantum machine learning model by combining parameterized quantum circuits with established quantum subroutines. Lastly, we propose a multivariate state preparation strategy based on the construction of the QKAN architecture.

翻訳日:2024-11-02 07:51:01 公開日:2024-10-06

# 入力粒度制御とグリフ認識学習による視覚テキスト生成のためのバックボーンモデル

Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training ( http://arxiv.org/abs/2410.04439v1 )

ライセンス: Link先を確認

Wenbo Li, Guohao Li, Zhibin Lan, Xue Xu, Wanru Zhuang, Jiachen Liu, Xinyan Xiao, Jinsong Su,

(参考訳) 拡散に基づくテキスト・ツー・イメージモデルでは、多様性と美学において顕著な成果が示されているが、可視的な視覚的テキストで画像を生成するのに苦労している。既存のバックボーンモデルには、ミススペル、テキスト生成の失敗、中国語テキストのサポートの欠如といった制限があるが、その開発は有望な可能性を示している。本稿では,英語と中国語の視覚テキスト生成にバックボーンモデルを活用するための一連の手法を提案する。まず、Byte Pair Encoding(BPE)トークン化とクロスアテンションモジュールの学習不足により、バックボーンモデルの性能が制限されることを明らかにする予備的研究を行った。そこで我々は,(1)より適切なテキスト表現を提供するために,混合粒度入力戦略を設計し,(2)3つのグリフ認識学習損失を伴って従来の訓練目標を強化することを提案する。実験により,本手法は,基本的な画像生成品質を維持しつつ,意味的,美的,正確な視覚的テキスト画像を生成するために,バックボーンモデルを効果的に活用できることが実証された。

Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with legible visual texts. Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text, but their development shows promising potential. In this paper, we propose a series of methods, aiming to empower backbone models to generate visual texts in English and Chinese. We first conduct a preliminary study revealing that Byte Pair Encoding (BPE) tokenization and the insufficient learning of cross-attention modules restrict the performance of the backbone models. Based on these observations, we make the following improvements: (1) We design a mixed granularity input strategy to provide more suitable text representations; (2) We propose to augment the conventional training objective with three glyph-aware training losses, which enhance the learning of cross-attention modules and encourage the model to focus on visual texts. Through experiments, we demonstrate that our methods can effectively empower backbone models to generate semantic relevant, aesthetically appealing, and accurate visual text images, while maintaining their fundamental image generation quality.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# 視覚変換器を用いた金属表面欠陥の自動検出

Automated Detection of Defects on Metal Surfaces using Vision Transformers ( http://arxiv.org/abs/2410.04440v1 )

ライセンス: Link先を確認

Toqa Alaa, Mostafa Kotb, Arwa Zakaria, Mariam Diab, Walid Gomaa,

(参考訳) 金属製造は、しばしば欠陥製品を生産し、運用上の問題を引き起こす。従来の手動検査は時間を要するため、自動的な解決策が必要である。本研究では、深層学習技術を用いて、視覚変換器(ViT)を用いた金属表面欠陥検出モデルを開発した。提案モデルは,特徴抽出のためのViTを用いた欠陥の分類と局所化に焦点を当てている。アーキテクチャは、分類とローカライゼーションの2つのパスに分かれる。このモデルは,局所化過程において,平均角誤差(MSE)と平均絶対誤差(MAE)を極力低く保ちながら,高い分類精度にアプローチする必要がある。実験結果から, 自動欠陥検出, 運転効率の向上, 金属製造における誤差の低減に有効であることが示唆された。

Metal manufacturing often results in the production of defective products, leading to operational challenges. Since traditional manual inspection is time-consuming and resource-intensive, automatic solutions are needed. The study utilizes deep learning techniques to develop a model for detecting metal surface defects using Vision Transformers (ViTs). The proposed model focuses on the classification and localization of defects using a ViT for feature extraction. The architecture branches into two paths: classification and localization. The model must approach high classification accuracy while keeping the Mean Square Error (MSE) and Mean Absolute Error (MAE) as low as possible in the localization process. Experimental results show that it can be utilized in the process of automated defects detection, improve operational efficiency, and reduce errors in metal manufacturing.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# 未知領域の最適化:セファロメトリックランドマーク検出のためのドメインアライメント

Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection ( http://arxiv.org/abs/2410.04445v1 )

ライセンス: Link先を確認

Julian Wyatt, Irina Voiculescu,

(参考訳) ケパロメトリランドマーク検出(Cephalometric Landmark Detection)は、脳波計測のための重要な領域を特定するプロセスである。それぞれのランドマークは、臨床医によってラベル付けされた単一のGTポイントである。機械学習モデルは、ヒートマップで表されるランドマークの確率軌跡を予測する。この研究は、2024年のCL-Detection MICCAI Challengeのために、局所的な顔抽出モジュールとX線アーチファクト拡張手順によるドメインアライメント戦略を提案する。この課題は、我々の手法の結果をMREの1.186mm、オンライン検証のリーダーボードの2mm SDRの82.04%でベストと位置づけている。コードはhttps://github.com/Julian-Wyatt/OptimisingfortheUnknownで公開されている。

Cephalometric Landmark Detection is the process of identifying key areas for cephalometry. Each landmark is a single GT point labelled by a clinician. A machine learning model predicts the probability locus of a landmark represented by a heatmap. This work, for the 2024 CL-Detection MICCAI Challenge, proposes a domain alignment strategy with a regional facial extraction module and an X-ray artefact augmentation procedure. The challenge ranks our method's results as the best in MRE of 1.186mm and third in the 2mm SDR of 82.04% on the online validation leaderboard. The code is available at https://github.com/Julian-Wyatt/OptimisingfortheUnknown.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# 注意のシフト: 安全でないコンテンツからAIを操る

Attention Shift: Steering AI Away from Unsafe Content ( http://arxiv.org/abs/2410.04447v1 )

ライセンス: Link先を確認

Shivank Garg, Manyana Tiwari,

(参考訳) 本研究は, 最先端の生成モデルにおける安全でない, 有害なコンテンツの生成について検討し, それらの世代を制限する方法に着目した。提案手法は,非安全概念を推論中に追加のトレーニングを伴わずに取り除くことを目的とした,新たなトレーニングフリーアプローチである。我々は,従来のアブレーション法と比較し,質的,定量的な測定値を用いて,直接的および敵対的ジェイルブレイクプロンプトの性能評価を行った。本研究は,観察結果の潜在的な理由を仮説化し,コンテンツ制限の限界と広範な影響について議論する。

This study investigates the generation of unsafe or harmful content in state-of-the-art generative models, focusing on methods for restricting such generations. We introduce a novel training-free approach using attention reweighing to remove unsafe concepts without additional training during inference. We compare our method against existing ablation methods, evaluating the performance on both, direct and adversarial jailbreak prompts, using qualitative and quantitative metrics. We hypothesize potential reasons for the observed results and discuss the limitations and broader implications of content restriction.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# Video Summarization Techniques: A Comprehensive Reviews

Video Summarization Techniques: A Comprehensive Review ( http://arxiv.org/abs/2410.04449v1 )

ライセンス: Link先を確認

Toqa Alaa, Ahmad Mongy, Assem Bakr, Mariam Diab, Walid Gomaa,

(参考訳) ソーシャルメディア、教育、エンターテイメント、監視など、さまざまな産業におけるビデオコンテンツの急速な拡大は、ビデオ要約を重要な研究分野にしている。現在の研究は、抽象的戦略と抽出的戦略の両方を強調する、ビデオ要約のための様々なアプローチと手法を探求する調査である。抽出要約のプロセスは、ソースビデオからキーフレームやセグメントを識別し、ショット境界認識やクラスタリングなどの手法を利用する。一方、抽象的な要約は、深層ニューラルネットワークや自然言語処理、強化学習、注意機構、生成的敵ネットワーク、マルチモーダル学習といった機械学習モデルを用いて、ビデオから不可欠なコンテンツを取得することによって、新たなコンテンツを生成する。また、この2つの方法論を取り入れたアプローチや、実世界の実装で遭遇した利用と難しさについても論じる。論文では、これらのテクニックのベンチマークに使われるデータセットについても取り上げている。本稿では,映像要約研究の現状と今後の方向性について,最先端の知識を提供する。

The rapid expansion of video content across a variety of industries, including social media, education, entertainment, and surveillance, has made video summarization an essential field of study. The current work is a survey that explores the various approaches and methods created for video summarizing, emphasizing both abstractive and extractive strategies. The process of extractive summarization involves the identification of key frames or segments from the source video, utilizing methods such as shot boundary recognition, and clustering. On the other hand, abstractive summarization creates new content by getting the essential content from the video, using machine learning models like deep neural networks and natural language processing, reinforcement learning, attention mechanisms, generative adversarial networks, and multi-modal learning. We also include approaches that incorporate the two methodologies, along with discussing the uses and difficulties encountered in real-world implementations. The paper also covers the datasets used to benchmark these techniques. This review attempts to provide a state-of-the-art thorough knowledge of the current state and future directions of video summarization research.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# MindScope: マルチエージェントシステムによる大規模言語モデルにおける認知バイアスの探索

MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems ( http://arxiv.org/abs/2410.04452v1 )

ライセンス: Link先を確認

Zhentao Xie, Jiabao Zhao, Yilei Wang, Jinxin Shi, Yanhong Bai, Xingjiao Wu, Liang He,

(参考訳) 大きな言語モデル(LLM)における認知バイアスを検出することは、これらのモデル内の既存の認知バイアスを調査することを目的とした魅力的なタスクである。言語モデルにおける認知バイアスを検出する現在の方法は、一般的に不完全な検出能力と、検出可能なバイアスの種類が制限された範囲に悩まされている。この問題に対処するため、静的および動的要素を区別して統合する'MindScope'データセットを導入しました。静的成分は、72の認知バイアスカテゴリにまたがる5,170のオープンエンド質問からなる。動的コンポーネントはルールベースのマルチエージェント通信フレームワークを利用して、マルチラウンド対話を生成する。このフレームワークは柔軟性があり、LSMを含む様々な心理的実験に容易に適応できる。さらに,検索・拡張生成(RAG),競争的議論,強化学習に基づく意思決定モジュールを組み込んだ多エージェント検出手法を提案する。有効性を実証し、GPT-4と比較して検出精度を最大35.10%向上させることが示されている。コードと付録はhttps://github.com/2279072142/MindScope.comで入手できる。

Detecting cognitive biases in large language models (LLMs) is a fascinating task that aims to probe the existing cognitive biases within these models. Current methods for detecting cognitive biases in language models generally suffer from incomplete detection capabilities and a restricted range of detectable bias types. To address this issue, we introduced the 'MindScope' dataset, which distinctively integrates static and dynamic elements. The static component comprises 5,170 open-ended questions spanning 72 cognitive bias categories. The dynamic component leverages a rule-based, multi-agent communication framework to facilitate the generation of multi-round dialogues. This framework is flexible and readily adaptable for various psychological experiments involving LLMs. In addition, we introduce a multi-agent detection method applicable to a wide range of detection tasks, which integrates Retrieval-Augmented Generation (RAG), competitive debate, and a reinforcement learning-based decision module. Demonstrating substantial effectiveness, this method has shown to improve detection accuracy by as much as 35.10% compared to GPT-4. Codes and appendix are available at https://github.com/2279072142/MindScope.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# CopyLens: LLM出力に対する著作権付きサブデータセットのコントリビューションを動的にフラグする

CopyLens: Dynamically Flagging Copyrighted Sub-Dataset Contributions to LLM Outputs ( http://arxiv.org/abs/2410.04454v1 )

ライセンス: Link先を確認

Qichao Ma, Rui-Jie Zhu, Peiye Liu, Renye Yan, Fahong Zhang, Ling Liang, Meng Li, Zhaofei Yu, Zongwei Wang, Yimao Cai, Tiejun Huang,

(参考訳) 大きな言語モデル(LLM)は、その知識の吸収とテキスト生成能力によって普及している。同時に、データセットの事前トレーニングに関する著作権問題も、特に生成に特定のスタイルが含まれている場合、深刻な問題となっている。それまでの方法は、同一の著作権のある出力の防衛に焦点を当てたり、計算負荷のある個々のトークンによる解釈可能性を見出したりしていた。しかし、それらのギャップは存在し、データセットのコントリビューションがLLM出力にどのように影響するかの直接的な評価が欠けている。モデルプロバイダがデータ保持者の著作権保護を保証すると、より成熟したLCMコミュニティが確立される。これらの制限に対処するために、著作権付きデータセットがLLM応答にどのように影響するかを分析するための新しいフレームワークであるCopyLensを紹介します。まず、埋め込み空間における事前学習データのユニーク性に基づいて、トークン表現は著作権のあるテキストに対して最初に融合され、続いて軽量のLSTMベースのネットワークでデータセットのコントリビューションを分析する。このような先行して、対照的な学習に基づく非コピーライトOOD検出器が設計されている。我々のフレームワークは動的に異なる状況に直面することができ、現在の著作権検出方法のギャップを埋めることができます。実験の結果、CopyLensは提案したベースラインよりも効率と精度を15.2%向上し、エンジニアリング手法より58.7%、OOD検出ベースラインより0.21AUC向上した。

Large Language Models (LLMs) have become pervasive due to their knowledge absorption and text-generation capabilities. Concurrently, the copyright issue for pretraining datasets has been a pressing concern, particularly when generation includes specific styles. Previous methods either focus on the defense of identical copyrighted outputs or find interpretability by individual tokens with computational burdens. However, the gap between them exists, where direct assessments of how dataset contributions impact LLM outputs are missing. Once the model providers ensure copyright protection for data holders, a more mature LLM community can be established. To address these limitations, we introduce CopyLens, a new framework to analyze how copyrighted datasets may influence LLM responses. Specifically, a two-stage approach is employed: First, based on the uniqueness of pretraining data in the embedding space, token representations are initially fused for potential copyrighted texts, followed by a lightweight LSTM-based network to analyze dataset contributions. With such a prior, a contrastive-learning-based non-copyright OOD detector is designed. Our framework can dynamically face different situations and bridge the gap between current copyright detection methods. Experiments show that CopyLens improves efficiency and accuracy by 15.2% over our proposed baseline, 58.7% over prompt engineering methods, and 0.21 AUC over OOD detection baselines.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# SWEb:スカンジナビア語のための大規模なWebデータセット

SWEb: A Large Web Dataset for the Scandinavian Languages ( http://arxiv.org/abs/2410.04456v1 )

ライセンス: Link先を確認

Tobias Norlund, Tim Isbister, Amaru Cuba Gyllensten, Paul Dos Santos, Danila Petrelli, Ariel Ekgren, Magnus Sahlgren,

(参考訳) 本稿では,スカンジナビア語における最大規模の事前学習データセットであるスカンジナビア語WEb(SWEb)について述べる。本論文では,収集と処理のパイプラインを詳述し,ルールベースのアプローチと比較して,複雑性を著しく低減する新しいモデルベースのテキスト抽出器を提案する。また、スウェーデンの言語モデルを評価するための新しいクローゼスタイルのベンチマークを導入し、このテストを用いて、SWEbデータでトレーニングされたモデルとFinalWebでトレーニングされたモデルを比較し、競合する結果と比較した。すべてのデータ、モデル、コードはオープンに共有されます。

This paper presents the hitherto largest pretraining dataset for the Scandinavian languages: the Scandinavian WEb (SWEb), comprising over one trillion tokens. The paper details the collection and processing pipeline, and introduces a novel model-based text extractor that significantly reduces complexity in comparison with rule-based approaches. We also introduce a new cloze-style benchmark for evaluating language models in Swedish, and use this test to compare models trained on the SWEb data to models trained on FineWeb, with competitive results. All data, models and code are shared openly.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# 重力適応ゾーンキャリブレーションのためのアテンションベースアルゴリズム

An Attention-Based Algorithm for Gravity Adaptation Zone Calibration ( http://arxiv.org/abs/2410.04457v1 )

ライセンス: Link先を確認

Chen Yu,

(参考訳) 重力適応ゾーンの正確な校正は、水中航法、地球物理探査、海洋工学などの分野において非常に重要である。これらの領域における重力場データの適用が拡大するにつれ、重力場の複雑な特性を捉え、多次元データ間の複雑な相互関係に対処するために、単一特徴に基づく従来の校正手法が不十分になりつつある。本稿では,重力適応領域キャリブレーションのためのアテンション強化アルゴリズムを提案する。注意機構を導入することにより,多次元重力場の特徴を適応的に融合させ,特徴量を動的に割り当てることにより,従来の特徴選択法に固有の多重線型性や冗長性の問題を効果的に解決し,キャリブレーション精度とロバスト性を大幅に向上させ,さらに1万以上のサンプリングポイントを持つ大規模重力場データセットを構築し,データ空間の解像度を高めるためにクリリング補間を行い,モデルトレーニングと評価のための信頼性の高いデータ基盤を提供する。従来の機械学習モデル(SVM, GBDT, RFなど)の定性的および定量的な実験を行った結果,提案アルゴリズムはこれらのモデル間で性能を著しく改善し,従来の特徴選択法よりも優れていることが示された。本稿では,重力適応領域キャリブレーションのための新しい解法を提案する。コードは \href{this link} {https://github.com/hulnifox/RF-ATTN} で公開されている。

Accurate calibration of gravity adaptation zones is of great significance in fields such as underwater navigation, geophysical exploration, and marine engineering. With the increasing application of gravity field data in these areas, traditional calibration methods based on single features are becoming inadequate for capturing the complex characteristics of gravity fields and addressing the intricate interrelationships among multidimensional data. This paper proposes an attention-enhanced algorithm for gravity adaptation zone calibration. By introducing an attention mechanism, the algorithm adaptively fuses multidimensional gravity field features and dynamically assigns feature weights, effectively solving the problems of multicollinearity and redundancy inherent in traditional feature selection methods, significantly improving calibration accuracy and robustness.In addition, a large-scale gravity field dataset with over 10,000 sampling points was constructed, and Kriging interpolation was used to enhance the spatial resolution of the data, providing a reliable data foundation for model training and evaluation. We conducted both qualitative and quantitative experiments on several classical machine learning models (such as SVM, GBDT, and RF), and the results demonstrate that the proposed algorithm significantly improves performance across these models, outperforming other traditional feature selection methods. The method proposed in this paper provides a new solution for gravity adaptation zone calibration, showing strong generalization ability and potential for application in complex environments. The code is available at \href{this link} {https://github.com/hulnifox/RF-ATTN}.

翻訳日:2024-11-02 07:25:54 公開日:2024-10-06

# U-netによる脳脊髄液分布の予測と心室逆流グレーディング

U-net based prediction of cerebrospinal fluid distribution and ventricular reflux grading ( http://arxiv.org/abs/2410.04460v1 )

ライセンス: Link先を確認

Melanie Rieff, Fabian Holzberger, Oksana Lapina, Geir Ringstad, Lars Magnus Valnes, Bogna Warsza, Kent-Andre Mardal, Per Kristian Eide, Barbara Wohlmuth,

(参考訳) これまでの研究では、脳脊髄液(CSF)が脳の廃棄物浄化過程において重要な役割を担い、変化した流れパターンが中枢神経系の様々な疾患と関連していることが示されている。本研究では,ガドリニウム系CSF造影剤(tracer)の脳内分布を予測するための深層学習の可能性について検討した。このため,T1強調MRI(MRI)スキャンを経皮的投与前後に複数回施行した。本稿では,24時間後にピーク時の画素単位の信号増加を予測するために,U-netを用いた教師付き学習モデルを提案する。その性能は、トレーニング中に提供される異なるトレーサ分布ステージに基づいて評価される。以上の結果から, 初回2時間後の画像データから, トレーサーフローの予測値が, 追加の後期スキャンに匹敵するものであることが示唆された。さらに, 神経放射線医が提供した心室逆流グレーディングと, 医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用医用車用リフレックスグレーディングを比較検討した結果, 良好な一致が得られた。 CSFフロー予測のための深層学習法の可能性を示し,臨床解析にMRIスキャンが有用であり,臨床効率,患者の幸福感,医療費の低減に寄与する可能性が示唆された。

Previous work shows evidence that cerebrospinal fluid (CSF) plays a crucial role in brain waste clearance processes, and that altered flow patterns are associated with various diseases of the central nervous system. In this study, we investigate the potential of deep learning to predict the distribution in human brain of a gadolinium-based CSF contrast agent (tracer) administered intrathecal. For this, T1-weighted magnetic resonance imaging (MRI) scans taken at multiple time points before and after intrathecal injection were utilized. We propose a U-net-based supervised learning model to predict pixel-wise signal increases at their peak after 24 hours. Its performance is evaluated based on different tracer distribution stages provided during training, including predictions from baseline scans taken before injection. Our findings indicate that using imaging data from just the first two hours post-injection for training yields tracer flow predictions comparable to those trained with additional later-stage scans. The model was further validated by comparing ventricular reflux gradings provided by neuroradiologists, and inter-rater grading among medical experts and the model showed excellent agreement. Our results demonstrate the potential of deep learning-based methods for CSF flow prediction, suggesting that fewer MRI scans could be sufficient for clinical analysis, which might significantly improve clinical efficiency, patient well-being, and lower healthcare costs.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 生物配列設計における外部強化学習の改善

Improved Off-policy Reinforcement Learning in Biological Sequence Design ( http://arxiv.org/abs/2410.04461v1 )

ライセンス: Link先を確認

Hyeonah Kim, Minsu Kim, Taeyoung Yun, Sanghyeok Choi, Emmanuel Bengio, Alex Hernández-García, Jinkyoo Park,

(参考訳) 生物配列を望ましい性質で設計することは、組合せ的に広大な探索空間と、それぞれの候補配列を評価するコストが高いため、大きな課題である。これらの課題に対処するため、GFlowNetsのような強化学習(RL)手法では、プロキシモデルを用いて迅速な報酬評価を行い、アノテートされたデータをポリシートレーニングに利用する。これらの手法は、多種多様な新しいシーケンスを生成する上で有望であるが、膨大な検索空間に対する限られたトレーニングデータはしばしば、配布外入力のプロキシの誤特定につながる。我々は,GFlowNetsを訓練し,プロキシの誤特定に対するロバスト性を改善するための,新しいオフライン検索手法である$\delta$-Conservative Searchを紹介した。キーとなる考え方は、パラメータ$\delta$によって制御される保守性を組み込んで、検索を信頼できるリージョンに制限することである。具体的には、パラメータ$\delta$のベルヌーイ分布でランダムにトークンをマスキングし、GFlowNetポリシを使用してマスキングトークンをデノイズすることで、高スコアのオフラインシーケンスにノイズを注入する。さらに$\delta$は、各データポイントに対するプロキシモデルの不確実性に基づいて適応的に調整される。これにより、プロキシの不確実性の反映が保守性のレベルを決定することができる。実験結果から,DNA,RNA,タンパク質,ペプチドなど多種多様なタスクにまたがるハイスコア配列の発見において,既存の機械学習手法よりも一貫して優れており,特に大規模シナリオにおいてその性能が向上することが示唆された。

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce $\delta$-Conservative Search, a novel off-policy search method for training GFlowNets designed to improve robustness against proxy misspecification. The key idea is to incorporate conservativeness, controlled by parameter $\delta$, to constrain the search to reliable regions. Specifically, we inject noise into high-score offline sequences by randomly masking tokens with a Bernoulli distribution of parameter $\delta$ and then denoise masked tokens using the GFlowNet policy. Additionally, $\delta$ is adaptively adjusted based on the uncertainty of the proxy model for each data point. This enables the reflection of proxy uncertainty to determine the level of conservativeness. Experimental results demonstrate that our method consistently outperforms existing machine learning methods in discovering high-score sequences across diverse tasks-including DNA, RNA, protein, and peptide design-especially in large-scale scenarios.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# テンソルトレイン点雲圧縮と効率の良い近似近傍探索

Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search ( http://arxiv.org/abs/2410.04462v1 )

ライセンス: Link先を確認

Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets,

(参考訳) 大規模ベクトルデータベースにおける最寄りの探索は、さまざまな機械学習アプリケーションに不可欠である。本稿では, 点群を効率的に表現し, 近接探索を高速に行うために, テンソルトレイン(TT)低ランクテンソル分解を用いた新しい手法を提案する。 Sliced Wassersteinのような密度推定損失を利用してTT分解を訓練し、ロバストポイントクラウド圧縮を実現する確率論的解釈を提案する。 TT点雲内の固有階層構造を明らかにすることにより, 近接探索を効率的に行うことができる。本稿では,方法論に関する詳細な知見を提供し,既存の手法と包括的な比較を行う。本稿では, オフ・オブ・ディストリビューション (OOD) 検出問題や, ANN (Nest-Nighbor) 探索タスクなど, 様々なシナリオで有効性を示す。

Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train TT decompositions, resulting in robust point cloud compression. We reveal an inherent hierarchical structure within TT point clouds, facilitating efficient approximate nearest-neighbor searches. In our paper, we provide detailed insights into the methodology and conduct comprehensive comparisons with existing methods. We demonstrate its effectiveness in various scenarios, including out-of-distribution (OOD) detection problems and approximate nearest-neighbor (ANN) search tasks.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# Wrong-of-Thought:マルチパースペクティブ検証と誤り情報の統合型推論フレームワーク

Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information ( http://arxiv.org/abs/2410.04463v1 )

ライセンス: Link先を確認

Yongheng Zhang, Qiguang Chen, Jingxuan Zhou, Peng Wang, Jiasheng Si, Jin Wang, Wenpeng Lu, Libo Qin,

(参考訳) CoT(Chain-of-Thought)はLarge Language Models(LLM)の性能向上に欠かせない技術となり、研究者の注目を集めている。 1つのアプローチストリームは、所望の品質の推論出力を継続的に検証し、精査することにより、LCMの反復的な拡張に焦点を当てている。その印象的な結果にもかかわらず、このパラダイムは2つの重要な問題に直面している。(1) 単純な検証方法: 現在のパラダイムは単一の検証方法にのみ依存している。 2) 誤った情報無視: 従来のパラダイムは,論理パスを毎回スクラッチから洗練させ,推論中に誤った情報を直接無視する。これらの課題に対処するため,(1)マルチパースペクティブ検証(Multi-Perspective Verification):推論プロセスと結果の精度向上のためのマルチパースペクティブ検証(Multi-Perspective Verification):(2)誤情報利用(Wrong Information utilization):誤った情報を利用してLCMを警告し,同じミスを犯す可能性を低減する。 8つの一般的なデータセットと5つのLLMの実験は、WoTが以前のベースラインをすべて越えていることを示している。さらに、WoTは難しい計算タスクにおいて強力な能力を示す。

Chain-of-Thought (CoT) has become a vital technique for enhancing the performance of Large Language Models (LLMs), attracting increasing attention from researchers. One stream of approaches focuses on the iterative enhancement of LLMs by continuously verifying and refining their reasoning outputs for desired quality. Despite its impressive results, this paradigm faces two critical issues: (1) Simple verification methods: The current paradigm relies solely on a single verification method. (2) Wrong Information Ignorance: Traditional paradigms directly ignore wrong information during reasoning and refine the logic paths from scratch each time. To address these challenges, we propose Wrong-of-Thought (WoT), which includes two core modules: (1) Multi-Perspective Verification: A multi-perspective verification method for accurately refining the reasoning process and result, and (2) Wrong Information Utilization: Utilizing wrong information to alert LLMs and reduce the probability of LLMs making same mistakes. Experiments on 8 popular datasets and 5 LLMs demonstrate that WoT surpasses all previous baselines. In addition, WoT exhibits powerful capabilities in difficult computation tasks.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 大規模言語モデルにおける文脈内学習推論回路の再検討

Revisiting In-context Learning Inference Circuit in Large Language Models ( http://arxiv.org/abs/2410.04468v1 )

ライセンス: Link先を確認

Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue,

(参考訳) In-context Learning (ICL) は、言語モデル (LM) 上で、内部メカニズムを探索せずに学習する、新たな数発学習パラダイムである。 ICLの内部処理を記述する研究はすでに存在するが、大きな言語モデルにおけるすべての推論現象を捉えるのに苦労している。そこで本研究では、推論力学をモデル化し、ICLの観測現象を説明するための包括的な回路を提案する。 1) 要約: LMはすべての入力テキスト(デモとクエリ)を、ICLタスクを解くのに十分な情報を持つ隠された状態の線形表現にエンコードする。 2)Semantics Merge: LMは、デモのエンコードされた表現と対応するラベルトークンをマージして、ラベルとデモの合同表現を生成する。 (3)Feature Retrieval and Copy: LMはタスクサブスペース上のクエリ表現に似た共同表現を検索し、検索した表現をクエリにコピーする。次に、言語モデルヘッドは、これらのコピーされたラベル表現をある程度キャプチャし、予測されたラベルにデコードする。提案した推論回路は、ICLプロセス中に観測された多くの現象を捕捉し、ICL推論プロセスの包括的で実用的な説明となる。さらに,提案ステップの無効化によるアブレーション解析はICLの性能を著しく損なうものであり,提案回路が支配機構であることを示唆している。さらに,提案回路と並行してICLタスクを解くバイパス機構を確認し,リストアップする。

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Summarize: LMs encode every input text (demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. The proposed inference circuit successfully captured many phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 非エルミート準周期格子における創発的マトリオシュカ人形様点ギャップ

Emergent Matryoshka doll-like point gap in a non-Hermitian quasiperiodic lattice ( http://arxiv.org/abs/2410.04469v1 )

ライセンス: Link先を確認

Yi-Qi Zheng, Shan-Zhong Li, Zhi Li,

(参考訳) 幾何級数変調された非エルミート準周期格子モデルを提案し、その局所化と位相的性質について検討する。その結果, 幾何級数の累積項の増加に伴い, 高い巻数を持つ複数のモビリティエッジと非エルミート点ギャップがシステム内で引き起こされることが示唆された。系の点ギャップスペクトルは、複雑な平面にマトリオシカ人形のような構造を持ち、高い巻数となる。さらに、無限項の和の極限ケースを分析する。その結果, 総和項が極限にプッシュされると, モビリティエッジは1つのモビリティエッジとして結合することがわかった。一方、対応する点ギャップは、巻数1に等しいリングにマージされる。アビラの大域的理論を通じて、無限和の極限におけるモビリティエッジの解析的表現を与え、モビリティエッジとポイントギャップがマージされ、実際に1と等しい巻数となることを再確認する。

We propose a geometric series modulated non-Hermitian quasiperiodic lattice model, and explore its localization and topological properties. The results show that with the ever-increasing summation terms of the geometric series, multiple mobility edges and non-Hermitian point gaps with high winding number can be induced in the system. The point gap spectrum of the system has a Matryoshka doll-like structure in the complex plane, resulting in a high winding number. In addition, we analyze the limit case of summation of infinite terms. The results show that the mobility edges merge together as only one mobility edge when summation terms are pushed to the limit. Meanwhile, the corresponding point gaps are merged into a ring with winding number equal to one. Through Avila's global theory, we give an analytical expression for mobility edges in the limit of infinite summation, reconfirming that mobility edges and point gaps do merge and will result in a winding number that is indeed equal to one.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 音声要約表現を用いた構成可能な多言語ASR

Configurable Multilingual ASR with Speech Summary Representations ( http://arxiv.org/abs/2410.04478v1 )

ライセンス: Link先を確認

Harrison Zhu, Ivan Fung, Yingke Zhu, Lahiru Samarakoon,

(参考訳) 世界の人口の約半数は多言語であり、多言語 ASR (MASR) が不可欠である。複数のモノリンガルモデルをデプロイすることは、前もって基幹言語が不明な場合に困難である。これは、特定の言語を認識するために手動または自動でプロンプトできる、構成可能な多言語MASRモデルの研究努力を動機付けている。本稿では,構成性の向上を目的とした新しいアーキテクチャであるSession Vector (csvMASR) を用いた Configurable MASR モデルを提案する。提案手法では,音声ダイアリゼーションにおける対話的要約表現にインスパイアされた音声要約ベクトル表現を導入し,発話レベルにおける言語固有のコンポーネントからの出力を組み合わせる。また、コンフィグアビリティを高めるために補助的な言語分類損失も組み込んだ。 MLS(Multilingual Librispeech)データセットの7言語のデータを用いて、csvMASRは既存のMASRモデルより優れており、ベースラインと比較すると単語エラー率(WER)が10.33\%から9.95\%に低下する。さらに、csvMASRは言語分類とタスクのプロンプトにおいて優れたパフォーマンスを示している。

Approximately half of the world's population is multilingual, making multilingual ASR (MASR) essential. Deploying multiple monolingual models is challenging when the ground-truth language is unknown in advance. This motivates research efforts on configurable multilingual MASR models that can be prompted manually or adapted automatically to recognise specific languages. In this paper, we present the Configurable MASR model with Summary Vector (csvMASR), a novel architecture designed to enhance configurability. Our approach leverages adapters and introduces speech summary vector representations, inspired by conversational summary representations in speech diarization, to combine outputs from language-specific components at the utterance level. We also incorporate an auxiliary language classification loss to enhance configurability. Using data from 7 languages in the Multilingual Librispeech (MLS) dataset, csvMASR outperforms existing MASR models and reduces the word error rate (WER) from 10.33\% to 9.95\% when compared with the baseline. Additionally, csvMASR demonstrates superior performance in language classification and prompting tasks.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# SITCOM: 逆問題に対するステップワイドトリプル一貫性拡散サンプリング

SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems ( http://arxiv.org/abs/2410.04479v1 )

ライセンス: Link先を確認

Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang,

(参考訳) 拡散モデル(英: Diffusion Model、DM)は、トレーニングセット上で学習した分布からサンプリングできる生成モデルのクラスである。逆画像問題 (IPs) の解法に適用する場合、DMの逆サンプリングステップは、通常、画像空間における測定条件分布からおよそサンプルに修正される。しかしながら、これらの修正は特定の設定(測定ノイズの存在など)や非線形タスクには適さないかもしれない。これらの課題に対処するために、測定一貫性の拡散軌道を達成するための3つの条件を述べる。これらの条件に基づいて,従来の研究のように標準データ多様体測定の一貫性と前方拡散の一貫性を強制するだけでなく,各サンプリングステップにおける事前学習モデルの入力を最適化することにより拡散軌道を維持する後方拡散の整合性も備えた,新しい最適化に基づくサンプリング手法を提案する。これらの条件を暗黙的または明示的に強制することで、サンプルははるかに少ない逆ステップを必要とします。そこで我々はSITCOM(Step-wise Triple-Consistent Sampling)と呼ぶ。従来の最先端のベースライン法と比較して,5つの線形および3つの非線形画像復元タスクにわたる広範囲な実験により,SITCOMが標準画像類似度測定の点で競争力や優れた結果を得ると同時に,検討対象のすべてのタスクに対して実行時間を大幅に短縮することを示した。

Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such as in the presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier sampling steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces the standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions, either implicitly or explicitly, our sampler requires significantly fewer reverse steps. Therefore, we refer to our accelerated method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to existing state-of-the-art baseline methods, under different levels of measurement noise, our extensive experiments across five linear and three non-linear image restoration tasks demonstrate that SITCOM achieves competitive or superior results in terms of standard image similarity metrics while requiring a significantly reduced run-time across all considered tasks.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# ニューロシンボリックプログラム合成とタスク生成による抽象推論問題の解法

Learning to Solve Abstract Reasoning Problems with Neurosymbolic Program Synthesis and Task Generation ( http://arxiv.org/abs/2410.04480v1 )

ライセンス: Link先を確認

Jakub Bednarek, Krzysztof Krawiec,

(参考訳) 抽象的思考とアナロジーによる推論は、新しい条件に迅速に適応し、それらを分解して新たに遭遇した問題に対処し、包括的に問題を解決するために知識を合成するために必要なものである。本稿では,ニューラルプログラム合成に基づく抽象問題の解法であるTransCoderについて述べる。 TransCoderの中核は、機能エンジニアリングと抽象推論を容易にするために設計された、型付きドメイン固有言語である。トレーニングでは、タスクの解決に失敗したプログラムを使用して、新しいタスクを生成し、それらを合成データセットにまとめます。この方法で生成された各合成タスクは、既知の関連するプログラム(解法)を持ち、モデルが教師付きモードでトレーニングされる。ソリューションは透過的なプログラム形式で表現され、検査と検証が可能である。本稿では, TransCoder のパフォーマンスを Abstract Reasoning Corpus データセットを用いて実証する。

The ability to think abstractly and reason by analogy is a prerequisite to rapidly adapt to new conditions, tackle newly encountered problems by decomposing them, and synthesize knowledge to solve problems comprehensively. We present TransCoder, a method for solving abstract problems based on neural program synthesis, and conduct a comprehensive analysis of decisions made by the generative module of the proposed architecture. At the core of TransCoder is a typed domain-specific language, designed to facilitate feature engineering and abstract reasoning. In training, we use the programs that failed to solve tasks to generate new tasks and gather them in a synthetic dataset. As each synthetic task created in this way has a known associated program (solution), the model is trained on them in supervised mode. Solutions are represented in a transparent programmatic form, which can be inspected and verified. We demonstrate TransCoder's performance using the Abstract Reasoning Corpus dataset, for which our framework generates tens of thousands of synthetic problems with corresponding solutions and facilitates systematic progress in learning.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 眼球運動による読解理解の微粒化予測

Fine-Grained Prediction of Reading Comprehension from Eye Movements ( http://arxiv.org/abs/2410.04484v1 )

ライセンス: Link先を確認

Omer Shubi, Yoav Meiri, Cfir Avraham Hadar, Yevgeni Berzak,

(参考訳) 目の動きから人間の読みの理解を評価することは可能か? 本研究は,読解の行動分析を目的としたテキスト素材上での大規模視線追跡データを用いて,この長年にわたる課題に対処する。本研究は, 視線運動からの読み理解を, 通路上の1つの質問のレベルで予測する, きめ細かな, ほとんど適応していないタスクに焦点を当てる。 3つの新しいマルチモーダル言語モデルと,文献から得られた先行モデルのバッテリを用いて,この課題に取り組む。本研究では,新しいテキスト項目,新しい参加者,および両者の組み合わせを,通常の読解と情報検索という2つの異なる読解方式で一般化する能力を評価する。評価の結果,目の動きは,視力の把握に有用な信号を含んでいることが示唆された。コードとデータは公開されます。

Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data over textual materials that are geared towards behavioral analyses of reading comprehension. We focus on a fine-grained and largely unaddressed task of predicting reading comprehension from eye movements at the level of a single question over a passage. We tackle this task using three new multimodal language models, as well as a battery of prior models from the literature. We evaluate the models' ability to generalize to new textual items, new participants, and the combination of both, in two different reading regimes, ordinary reading and information seeking. The evaluations suggest that although the task is highly challenging, eye movements contain useful signals for fine-grained prediction of reading comprehension. Code and data will be made publicly available.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# SWEベンチにおける会話型テストスイートによるプログラム修復の可能性を探る

Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench ( http://arxiv.org/abs/2410.04485v1 )

ライセンス: Link先を確認

Anton Cheshkov, Pavel Zadorozhny, Rodion Levichev, Evgeny Maslov, Ronaldo Franco Jaldin,

(参考訳) プロジェクトレベルでのプログラムの自動修復は、人間の活動の様々な分野では見られていない。 SWE-Benchチャレンジが提示されて以来、多くのソリューションが見られた。パッチ生成はプログラム修復の一部であり,テストスイートに基づく会話型パッチ生成の有効性が証明されている。しかし、SWE-Benchでは、会話パッチ生成の可能性はまだ具体的には評価されていない。本研究では,SWE-Bench問題に対する会話パッチ生成の有効性を評価するための実験結果について報告する。実験によると、LLaMA 3.1 70Bに基づく単純な会話パイプラインは47\%のケースで有効なパッチを生成することができ、これはSWE-Benchのプログラム修復の最先端に匹敵する。

Automatic program repair at project level may open yet to be seen opportunities in various fields of human activity. Since the SWE-Bench challenge was presented, we have seen numerous of solutions. Patch generation is a part of program repair, and test suite-based conversational patch generation has proven its effectiveness. However, the potential of conversational patch generation has not yet specifically estimated on SWE-Bench. This study reports experimental results aimed at evaluating the individual effectiveness of conversational patch generation on problems from SWE-Bench. The experiments show that a simple conversational pipeline based on LLaMA 3.1 70B can generate valid patches in 47\% of cases, which is comparable to the state-of-the-art in program repair on SWE-Bench.

翻訳日:2024-11-02 07:16:09 公開日:2024-10-06

# 知識グラフ補完のためのプラガブル・コモンセンス強化フレームワーク

A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion ( http://arxiv.org/abs/2410.04488v1 )

ライセンス: Link先を確認

Guanglin Niu, Bo Li, Siling Feng,

(参考訳) 知識グラフ補完(KGC)タスクは、知識集約的な多くのアプリケーションのための知識グラフ(KG)において、行方不明な事実を推測することを目的としている。しかし、既存の埋め込みベースのKGCアプローチは、主に事実のトリプルに依存しており、一般的な感覚と矛盾する結果をもたらす可能性がある。さらに、明示的な共通感覚を生成することは、しばしばKGにとって実用的または費用がかかる。これらの課題に対処するため、我々は、KGCの事実と常識の両方を組み込んだプラグイン可能な共通感覚強化KGCフレームワークを提案する。このフレームワークは、実体概念の豊かさに基づいて異なるKGに適応し、実三重項から明示的または暗黙的な常識を自動的に生成する能力を有する。さらに、一般的な感覚誘導型負サンプリングと、リッチな実体概念を持つKGに対する粗大な推論手法を導入する。概念を持たないKGに対して、関係認識型概念埋め込み機構を含む二重スコアリング方式を提案する。重要なことは、我々のアプローチは、多くの知識グラフ埋め込み(KGE)モデルのためのプラグイン可能なモジュールとして統合することができ、共同で常識とファクトドリブンなトレーニングと推論を容易にすることである。実験により、我々のフレームワークは優れたスケーラビリティを示し、様々なKGCタスクで既存のモデルより優れています。

Knowledge graph completion (KGC) tasks aim to infer missing facts in a knowledge graph (KG) for many knowledge-intensive applications. However, existing embedding-based KGC approaches primarily rely on factual triples, potentially leading to outcomes inconsistent with common sense. Besides, generating explicit common sense is often impractical or costly for a KG. To address these challenges, we propose a pluggable common sense-enhanced KGC framework that incorporates both fact and common sense for KGC. This framework is adaptable to different KGs based on their entity concept richness and has the capability to automatically generate explicit or implicit common sense from factual triples. Furthermore, we introduce common sense-guided negative sampling and a coarse-to-fine inference approach for KGs with rich entity concepts. For KGs without concepts, we propose a dual scoring scheme involving a relation-aware concept embedding mechanism. Importantly, our approach can be integrated as a pluggable module for many knowledge graph embedding (KGE) models, facilitating joint common sense and fact-driven training and inference. The experiments illustrate that our framework exhibits good scalability and outperforms existing models across various KGC tasks.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# リニアセパビリティの端におけるグラッキング

Grokking at the Edge of Linear Separability ( http://arxiv.org/abs/2410.04489v1 )

ライセンス: Link先を確認

Alon Beck, Noam Levi, Yohai Bar-Sinai,

(参考訳) 単純化された環境での二項ロジスティック分類の一般化特性について検討し、「記憶」と「一般化」の解は常に厳密に定義でき、その力学においてグロキングの基盤となるメカニズムを経験的かつ解析的に解明する。定常ラベルを持つランダムな特徴モデル上でのロジスティック分類の漸近的長期ダイナミクスを解析し、遅延一般化と非単調なテスト損失の意味でグロキングを示すことを示す。線形分離性の頂点にあるトレーニングセットに分類を適用すると、Grokkingが増幅されることが分かる。完全一般化解は常に存在するが、ロジシック損失の暗黙の偏りが、トレーニングデータが原点から線形に分離可能であれば、モデルが過度に適合することを証明する。原点から分離できない訓練セットに対しては、モデルはいつでも完全に漸近的に一般化するが、過度な適合は訓練の初期段階で起こりうる。重要なことは、遷移の近傍、すなわち、原点からほぼ分離可能な訓練セットに対して、モデルは一般化する前に任意の時間に過度に適合する。モデル全体の重要な特徴を定量的に捉えた,牽引可能な1次元玩具モデルを調べることで,さらに洞察を得ることができる。最後に,本研究の共通点を最近の文献で強調し,グラッキングは一般に補間しきい値に近づき,物理系でしばしば見られる臨界現象を連想させることが示唆された。

We study the generalization properties of binary logistic classification in a simplified setting, for which a "memorizing" and "generalizing" solution can always be strictly defined, and elucidate empirically and analytically the mechanism underlying Grokking in its dynamics. We analyze the asymptotic long-time dynamics of logistic classification on a random feature model with a constant label and show that it exhibits Grokking, in the sense of delayed generalization and non-monotonic test loss. We find that Grokking is amplified when classification is applied to training sets which are on the verge of linear separability. Even though a perfect generalizing solution always exists, we prove the implicit bias of the logisitc loss will cause the model to overfit if the training data is linearly separable from the origin. For training sets that are not separable from the origin, the model will always generalize perfectly asymptotically, but overfitting may occur at early stages of training. Importantly, in the vicinity of the transition, that is, for training sets that are almost separable from the origin, the model may overfit for arbitrarily long times before generalizing. We gain more insights by examining a tractable one-dimensional toy model that quantitatively captures the key features of the full model. Finally, we highlight intriguing common properties of our findings with recent literature, suggesting that grokking generally occurs in proximity to the interpolation threshold, reminiscent of critical phenomena often observed in physical systems.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# ジャグリング顔モデルにおけるAI/MLサプライチェーンアタックの大規模エクスプロイト計測

A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models ( http://arxiv.org/abs/2410.04490v1 )

ライセンス: Link先を確認

Beatrice Casey, Joanna C. S. Santos, Mehdi Mirakhorli,

(参考訳) 機械学習(ML)技術の開発は、開発者が独自のモデルを開発し、デプロイする十分な機会をもたらしました。 Hugging Faceはオープンソースプラットフォームとして機能し、開発者はML開発をより協力的にするために、他のモデルを共有し、ダウンロードすることができる。モデルを共有するためには、まずシリアライズする必要がある。 Pythonのシリアライゼーションメソッドは、オブジェクトインジェクションに弱いため、安全ではないと考えられている。本稿では、Hugging Faceにおけるこれらの安全でないシリアライズ手法の広範性について検討し、その利用方法を通じて、安全でないシリアライズ手法を用いたモデルを活用、共有し、ML開発者のための安全でない環境を作成することを実証する。安全でないシリアライズ手法を用いて,Hugging Faceがリポジトリやファイルにフラグを付けることができるかを調査し,悪意のあるモデルを検出する手法を開発した。以上の結果から,Hugging Faceにはさまざまな脆弱性のあるモデルが存在することが示唆された。

The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# マルチモーダル感性分析のための知識誘導動的モダリティ注意融合フレームワーク

Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis ( http://arxiv.org/abs/2410.04491v1 )

ライセンス: Link先を確認

Xinyu Feng, Yuming Lin, Lihua He, You Li, Liang Chang, Ya Zhou,

(参考訳) マルチモーダルセンシング分析(MSA)は,マルチモーダルデータを用いてユーザの感情を推定する。従来の手法では、各モダリティの寄与を平等に扱うことや、各モダリティが支配的になる可能性のある状況を無視した相互作用を行うための支配的なモダリティとしてテキストを使用することに重点を置いていた。本稿では,マルチモーダル感情分析のための知識誘導動的モダリティ注意融合フレームワーク(KuDA)を提案する。 Kudaは感情知識を使用して、支配的なモダリティを動的に選択し、各モダリティの貢献を調整するモデルを導く。さらに、得られたマルチモーダル表現により、相関評価損失による支配的モダリティの寄与をさらに強調することができる。 4つのMSAベンチマークデータセットの大規模な実験は、KuDAが最先端のパフォーマンスを達成し、支配的なモダリティの異なるシナリオに適応できることを示している。

Multimodal Sentiment Analysis (MSA) utilizes multimodal data to infer the users' sentiment. Previous methods focus on equally treating the contribution of each modality or statically using text as the dominant modality to conduct interaction, which neglects the situation where each modality may become dominant. In this paper, we propose a Knowledge-Guided Dynamic Modality Attention Fusion Framework (KuDA) for multimodal sentiment analysis. KuDA uses sentiment knowledge to guide the model dynamically selecting the dominant modality and adjusting the contributions of each modality. In addition, with the obtained multimodal representation, the model can further highlight the contribution of dominant modality through the correlation evaluation loss. Extensive experiments on four MSA benchmark datasets indicate that KuDA achieves state-of-the-art performance and is able to adapt to different scenarios of dominant modality.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# 拡張的・意味的新しい視覚刺激に対するヒト脳反応の深層学習予測の一般化可能性解析

Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli ( http://arxiv.org/abs/2410.04497v1 )

ライセンス: Link先を確認

Valentyn Piskovskyi, Riccardo Chimisso, Sabrina Patania, Tom Foulsham, Giuseppe Vizzari, Dimitri Ognibene,

(参考訳) 本研究の目的は,視覚野活性化に対する画像強調技術の影響を探索するための枠組みとして,ニューラルネットワークを用いたアプローチの音質と有用性を検討することである。予備研究として、The Algonauts Project 2023 Challenge [16]に参加したトップ10の手法の中から選ばれた最先端の脳エンコーディングモデルを用意した。我々は、様々な画像強調技術が神経反応に与える影響について、有効な予測を行う能力について分析する。脳画像撮影にかかわる高コストによる実際のデータ取得が不可能であることを踏まえて,本研究は一連の実験を基礎にしている。具体的には,脳のエンコーダが,対象物(顔と言葉)に対する反応を,特定の領域に対する既知の影響で評価することにより,様々な拡張に対する脳反応を推定する能力について分析する。さらに,トレーニング中に見えない物体に対する反応の予測活性化について検討し,意味的アウト・オブ・ディストリビューション刺激の影響について検討した。提案するフレームワークを構成するモデルの一般化能力について,与えられたタスク,モデル駆動設計戦略,ARおよびVRアプリケーションに対して最適な視覚拡張フィルタの同定を期待できると思われる,関連性のある証拠を提供する。

The purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# AdaMemento:強化学習のための適応記憶支援政策最適化

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning ( http://arxiv.org/abs/2410.04498v1 )

ライセンス: Link先を確認

Renye Yan, Yaozhong Gan, You Wu, Junliang Xing, Ling Liangn, Yeshang Zhu, Yimao Cai,

(参考訳) 強化学習(RL)のスパース報酬シナリオでは、メモリメカニズムは、人間のような過去の経験を反映して、ポリシー最適化に有望なショートカットを提供する。しかし、現在のメモリベースのRLメソッドは、単に高価値ポリシーを保存して再利用し、様々な過去の経験のより深い精錬とフィルタリングを欠いているため、メモリの能力を制限している。本稿では,適応型メモリ拡張RLフレームワークであるAdaMementoを提案する。過去のポジティブな経験を記憶する代わりに、実時間状態に基づいて既知のローカルな最適ポリシーを予測することを学ぶことによって、ポジティブな経験とネガティブな経験の両方を活用するメモリリフレクションモジュールを設計する。さらに,記憶に対する情報トラジェクトリを効果的に収集するために,類似状態のニュアンスを正確に識別して探索する,詳細な本質的なモチベーションパラダイムを導入する。過去の経験の活用と新しい政策の探索は、グローバルな最適化に近づくために、アンサンブル学習によって適応的に調整される。さらに,新たな本質的なモチベーションとアンサンブル機構の優位性を理論的に証明した。 59の定量的および可視化実験から,AdaMementoは,記憶における過去の経験を効果的に活用し,従来の手法よりも大幅に改善した,微妙な状態を識別できることを確認した。

In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and reuse high-value policies, lacking a deeper refining and filtering of diverse past experiences and hence limiting the capability of memory. In this paper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Instead of just memorizing positive past experiences, we design a memory-reflection module that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states. To effectively gather informative trajectories for the memory, we further introduce a fine-grained intrinsic motivation paradigm, where nuances in similar states can be precisely distinguished to guide exploration. The exploitation of past experiences and exploration of new policies are then adaptively coordinated by ensemble learning to approach the global optimum. Furthermore, we theoretically prove the superiority of our new intrinsic motivation and ensemble mechanism. From 59 quantitative and visualization experiments, we confirm that AdaMemento can distinguish subtle states for better exploration and effectively exploiting past experiences in memory, achieving significant improvement over previous methods.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# 順応性を考慮したプレトレーニングバックボーンの調整

Adjusting Pretrained Backbones for Performativity ( http://arxiv.org/abs/2410.04499v1 )

ライセンス: Link先を確認

Berker Demirel, Lingjing Kong, Kun Zhang, Theofanis Karaletsos, Celestine Mendler-Dünner, Francesco Locatello,

(参考訳) ディープラーニングモデルの広範な展開により、彼らは様々な方法で環境に影響を与える。誘導された分散シフトは、デプロイされたモデルで予期せぬパフォーマンス劣化を引き起こす可能性がある。パフォーマンスを予想する既存の方法は、将来の成果を予測する際に、デプロイされたモデルに関する情報を特徴ベクトルに組み込むのが一般的である。魅力的な理論的性質を楽しみながら、予測タスクの入力次元を変更することは、しばしば実用的ではない。そこで本研究では,事前学習したバックボーンをモジュール方式で調整し,サンプル効率を向上し,既存のディープラーニング資産の再利用を可能にする手法を提案する。性能上のラベルシフトに注目して、重要なアイデアは、デプロイされるモデルの十分な統計量を得たバックボーンのロジットにベイズ最適ラベルシフト修正を実行するために、浅いアダプタモジュールをトレーニングすることである。そのため,本フレームワークは,動作性を管理するメカニズムから,入力固有の特徴埋め込みの構築を分離する。動的ベンチマークを応用として,視覚・言語タスクの逆サンプリングによるアプローチの評価を行った。再学習軌道に沿った損失を減らし、候補モデルの中から効果的に選択し、性能劣化を予測できることを示す。より広範に、私たちの研究は、ディープラーニングにおけるパフォーマンスに対処するための最初のベースラインを提供します。

With the widespread deployment of deep learning models, they influence their environment in various ways. The induced distribution shifts can lead to unexpected performance degradation in deployed models. Existing methods to anticipate performativity typically incorporate information about the deployed model into the feature vector when predicting future outcomes. While enjoying appealing theoretical properties, modifying the input dimension of the prediction task is often not practical. To address this, we propose a novel technique to adjust pretrained backbones for performativity in a modular way, achieving better sample efficiency and enabling the reuse of existing deep learning assets. Focusing on performative label shift, the key idea is to train a shallow adapter module to perform a Bayes-optimal label shift correction to the backbone's logits given a sufficient statistic of the model to be deployed. As such, our framework decouples the construction of input-specific feature embeddings from the mechanism governing performativity. Motivated by dynamic benchmarking as a use-case, we evaluate our approach under adversarial sampling, for vision and language tasks. We show how it leads to smaller loss along the retraining trajectory and enables us to effectively select among candidate models to anticipate performance degradations. More broadly, our work provides a first baseline for addressing performativity in deep learning.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# LRHP: 選好ペアによる人間の選好表現の学習

LRHP: Learning Representations for Human Preferences via Preference Pairs ( http://arxiv.org/abs/2410.04503v1 )

ライセンス: Link先を確認

Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu,

(参考訳) 人間の嗜好アライメントトレーニングを改善するため、現在の研究では、"preferred" または "dispreferred" とラベル付けされた選好ペアからなる多くの選好データセットを開発した。これらの選好ペアは典型的には、人間からのフィードバック(RLHF)からの強化学習において報酬信号として機能する報酬モデリングによって、人間の嗜好を単一の数値に符号化するために使用される。しかしながら、これらの人間の嗜好を数値として表現することは、これらの嗜好の分析を複雑にし、RLHF以外の幅広い応用を制限する。対照的に、本稿では、より豊かで構造化された人間の嗜好表現を構築することを目的とした嗜好表現学習タスクを導入する。我々はさらに、従来の報酬モデルを超えてこの課題に取り組むために、好みペア(LRHP)を介して、より一般化可能な、人間の嗜好の学習表現フレームワークを開発する。選好データ選択と選好マージン予測という2つの下流タスクにおける選好表現の有用性を検証する。表現における人間の好みに基づいて、両方のタスクにおいて強いパフォーマンスを達成し、ベースラインを著しく上回る。

To improve human-preference alignment training, current research has developed numerous preference datasets consisting of preference pairs labeled as "preferred" or "dispreferred". These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF). However, representing these human preferences as a numerical value complicates the analysis of these preferences and restricts their broader applications other than RLHF. In contrast, in this work, we introduce a preference representation learning task that aims to construct a richer and more structured representation of human preferences. We further develop a more generalizable framework, Learning Representations for Human Preferences via preference pairs (namely LRHP), which extends beyond traditional reward modeling to tackle this task. We verify the utility of preference representations in two downstream tasks: preference data selection and preference margin prediction. Building upon the human preferences in representations, we achieve strong performance in both tasks, significantly outperforming baselines.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# 量子資源の曖昧な識別課題の修正のための境界

Bounds for Revised Unambiguous Discrimination Tasks of Quantum Resources ( http://arxiv.org/abs/2410.04504v1 )

ライセンス: Link先を確認

Xian Shi,

(参考訳) 量子状態の識別は、量子情報理論において意味のある基本的なタスクである。本書では,量子資源の明確な識別について検討する。まず, 漸近的・漸近的シナリオにおいて, 修正された曖昧な識別課題に対する成功確率の上限を示す。次に、タスクを量子状態から量子チャネルに一般化する。適応戦略の下でタスクの成功確率の上限を示す。さらに,この境界を効率的に計算できることを示す。最後に、古典的不明確な判別と比較すると、半定値な正作用素の集合上の量子化器を考えることにより、量子の利点を示す。

Quantum state discrimination is a fundamental task that is meaningful in quantum information theory. In this manuscript, we consider a revised unambiguous discrimination of quantum resources. First, we present an upper bound of the success probability for a revised unambiguous discrimination task in the unasymptotic and asymptotic scenarios. Next, we generalize the task from quantum states to quantum channels. We present an upper bound of the success probability for the task under the adaptive strategy. Furthermore, we show the bound can be computed efficiently. Finally, compared with the classical unambiguous discrimination, we show the advantage of the quantum by considering a quantifier on a set of semidefinite positive operators.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# 高利得パラメトリックダウンコンバージョンによる多光子絡み合った状態の空間シュミットモードの効率的な評価

Efficient characterization of spatial Schmidt modes of multiphoton entangled states produced from high-gain parametric down-conversion ( http://arxiv.org/abs/2410.04505v1 )

ライセンス: Link先を確認

Mahtab Amooei, Girish Kulkarni, Jeremy Upham, Robert W. Boyd,

(参考訳) 光の絡み合った状態の空間的相関を効率的に特徴づける能力は、量子イメージングのような多くの量子技術の応用にとって重要である。ここでは、高利得パラメトリックダウンコンバージョンから生じる光の空間シュミットモードと明るい多光子絡み合った状態のシュミットスペクトルの高効率な理論的、実験的特徴を示す。従来の研究とは対照的に、信号場の近似準均質性と等方性を利用して、実験的および理論的特徴付けにかかわる数値計算を劇的に削減する。実験データセットが256×256ピクセルの5000枚のシングルショット画像で構成されている場合,本手法は計算時間を2桁に短縮する。このスピードアップは、より大きな入力サイズに対してさらに劇的である。その結果、様々なポンプ振幅に対してシュミットモードとシュミットスペクトルを高速に特徴付けることができ、利得の増加とともにその変動を研究することができる。この結果から,シュミットモードの拡大とシュミットスペクトルの狭化が理論と実験の整合性の向上に寄与していることが明らかとなった。

The ability to efficiently characterize the spatial correlations of entangled states of light is critical for applications of many quantum technologies such as quantum imaging. Here, we demonstrate highly efficient theoretical and experimental characterization of the spatial Schmidt modes and the Schmidt spectrum of bright multiphoton entangled states of light produced from high-gain parametric down-conversion. In contrast to previous studies, we exploit the approximate quasihomogeneity and isotropy of the signal field and dramatically reduce the numerical computations involved in the experimental and theoretical characterization procedures. In our particular case where our experimental data sets consist of 5000 single-shot images of 256*256 pixels each, our method reduced the overall computation time by 2 orders of magnitude. This speed-up would be even more dramatic for larger input sizes. Consequently, we are able to rapidly characterize the Schmidt modes and Schmidt spectrum for a range of pump amplitudes and study their variation with increasing gain. Our results clearly reveal the broadening of the Schmidt modes and narrowing of the Schmidt spectrum for increasing gain with good agreement between theory and experiment.

翻訳日:2024-11-02 07:06:24 公開日:2024-10-06

# 言語に基づく意味理解の経路からの映像要約の実現

Realizing Video Summarization from the Path of Language-based Semantic Understanding ( http://arxiv.org/abs/2410.04511v1 )

ライセンス: Link先を確認

Kuan-Chen Mu, Zhi-Yi Chin, Wei-Chen Chiu,

(参考訳) 近年のビデオベースLarge Language Models (ビデオLLMs) の開発は,映像特徴と音声特徴をLarge Language Models (LLMs) と整合させることにより,映像要約の大幅な進歩を遂げている。これらのビデオLLMはそれぞれ独自の長所と短所を持っている。最近の多くの手法は、資源集約的なこれらのモデルの限界を克服するために、広範囲な微調整を必要としている。本研究では,あるビデオLLMの強みが他のビデオLLMの弱みを補うことを観察する。この知見を生かして、我々はMixture of Experts(MoE)パラダイムにインスパイアされた新しいビデオ要約フレームワークを提案する。提案手法は,複数のビデオLLMを統合し,包括的で一貫性のあるテキスト要約を生成する。視覚的およびオーディオ的コンテンツを効果的に組み合わせ、詳細な背景記述を提供し、キーフレームの識別に長けており、視覚情報にのみ依存する従来のコンピュータビジョンのアプローチよりも意味論的に意味のある検索を可能にする。さらに、結果の要約は、キーフレームの選択またはテキスト・ツー・イメージモデルの組み合わせによって、要約ビデオ生成のような下流タスクのパフォーマンスを向上させる。我々の言語駆動型アプローチは、従来の手法に代えて意味的に豊かな代替手段を提供し、より新しいビデオLLMを組み込む柔軟性を提供し、ビデオ要約タスクにおける適応性と性能を向上させる。

The recent development of Video-based Large Language Models (VideoLLMs), has significantly advanced video summarization by aligning video features and, in some cases, audio features with Large Language Models (LLMs). Each of these VideoLLMs possesses unique strengths and weaknesses. Many recent methods have required extensive fine-tuning to overcome the limitations of these models, which can be resource-intensive. In this work, we observe that the strengths of one VideoLLM can complement the weaknesses of another. Leveraging this insight, we propose a novel video summarization framework inspired by the Mixture of Experts (MoE) paradigm, which operates as an inference-time algorithm without requiring any form of fine-tuning. Our approach integrates multiple VideoLLMs to generate comprehensive and coherent textual summaries. It effectively combines visual and audio content, provides detailed background descriptions, and excels at identifying keyframes, which enables more semantically meaningful retrieval compared to traditional computer vision approaches that rely solely on visual information, all without the need for additional fine-tuning. Moreover, the resulting summaries enhance performance in downstream tasks such as summary video generation, either through keyframe selection or in combination with text-to-image models. Our language-driven approach offers a semantically rich alternative to conventional methods and provides flexibility to incorporate newer VideoLLMs, enhancing adaptability and performance in video summarization tasks.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# DAMRO:LVLMの注意機構の解明と幻覚の低減

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination ( http://arxiv.org/abs/2410.04514v1 )

ライセンス: Link先を確認

Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei,

(参考訳) LVLM(Large Vision-Language Models)の成功にもかかわらず、彼らは必然的に幻覚に苦しんでいる。我々が知っているように、LVLMのビジュアルエンコーダとLarge Language Model (LLM)デコーダはトランスフォーマーベースであり、モデルが視覚情報を抽出し、注意機構を介してテキスト出力を生成することができる。画像トークン上のLLMデコーダの注意分布は視覚エンコーダと非常に一致しており、どちらの分布も画像中の参照対象よりも特定の背景トークンに注目する傾向にある。我々は、視覚エンコーダ自体に固有の欠陥があり、LCMが冗長な情報を過度に強調し、オブジェクト幻覚を生成することを誤解しているため、予期せぬ注意分布を考慮に入れている。この問題に対処するために、D$iveを$A$ttention $M$echanism of LVLM to $R$educe $O$bject Hallucination(英語版)に変換する新しいトレーニングフリー戦略であるDAMROを提案する。具体的には、ViTの分類トークン(CLS)を用いて、背景に散在する高アテンションな外れ値トークンをフィルタリングし、復号段階での影響を除去する。 LLaVA-1.5, LLaVA-NeXT, InstructBLIPなどのLVLMに対して, POPE, CHAIR, MME, GPT-4V Aided Evaluation などのベンチマークを用いて評価を行った。以上の結果から,本手法は,これらの異常トークンの影響を著しく低減し,LVLMの幻覚を効果的に緩和することを示した。私たちのメソッドのコードはまもなくリリースされます。

Despite the great success of Large Vision-Language Models (LVLMs), they inevitably suffer from hallucination. As we know, both the visual encoder and the Large Language Model (LLM) decoder in LVLMs are Transformer-based, allowing the model to extract visual information and generate text outputs via attention mechanisms. We find that the attention distribution of LLM decoder on image tokens is highly consistent with the visual encoder and both distributions tend to focus on particular background tokens rather than the referred objects in the image. We attribute to the unexpected attention distribution to an inherent flaw in the visual encoder itself, which misguides LLMs to over emphasize the redundant information and generate object hallucination. To address the issue, we propose DAMRO, a novel training-free strategy that $D$ive into $A$ttention $M$echanism of LVLM to $R$educe $O$bject Hallucination. Specifically, our approach employs classification token (CLS) of ViT to filter out high-attention outlier tokens scattered in the background and then eliminate their influence during decoding stage. We evaluate our method on LVLMs including LLaVA-1.5, LLaVA-NeXT and InstructBLIP, using various benchmarks such as POPE, CHAIR, MME and GPT-4V Aided Evaluation. The results demonstrate that our approach significantly reduces the impact of these outlier tokens, thus effectively alleviating the hallucination of LVLMs. The code of our method will be released soon.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# RevMUX: 効率的なLLMバッチ推論のための可逆アダプタによるデータ多重化

RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference ( http://arxiv.org/abs/2410.04519v1 )

ライセンス: Link先を確認

Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao,

(参考訳) 大きな言語モデル(LLM)は、自然言語処理(NLP)コミュニティに大きなブレークスルーをもたらしました。データ多重化は、複数の入力を1つの複合入力にマージすることでこの問題に対処し、共有フォワードパスによるより効率的な推論を可能にする。しかしながら、複合入力と個人を区別することは難しいため、従来の手法ではバックボーン全体をトレーニングする必要があるが、性能劣化に悩まされている。本稿では,パラメータ効率のよいデータ多重化フレームワークであるRevMUXについて紹介する。 4種類のLLMバックボーンと3種類のLLMバックボーンの大規模な実験により,良好な分類性能を維持しつつ,LLM推論効率を向上させるRevMUXの有効性が示された。

Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput demands. Data multiplexing addresses this by merging multiple inputs into a single composite input, allowing more efficient inference through a shared forward pass. However, as distinguishing individuals from a composite input is challenging, conventional methods typically require training the entire backbone, yet still suffer from performance degradation. In this paper, we introduce RevMUX, a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer, which can be reused by the demultiplexer to perform reverse operations and restore individual samples for classification. Extensive experiments on four datasets and three types of LLM backbones demonstrate the effectiveness of RevMUX for enhancing LLM inference efficiency while retaining a satisfactory classification performance.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# 動的ホック後ニューラルエンサンブラ

Dynamic Post-Hoc Neural Ensemblers ( http://arxiv.org/abs/2410.04520v1 )

ライセンス: Link先を確認

Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka,

(参考訳) アンサンブル法は、複数のベースラーナーを組み合わせることで、機械学習モデルの精度と堅牢性を高めることが知られている。しかし、グリーディやランダムアンサンブルのような標準的なアプローチは、アンサンブルメンバーのサンプル間で一定の重みを仮定するため、しばしば不足する。これにより、アンサンブル予測の集約時に表現性を制限することができ、性能を損なうことができる。本研究では,様々なモデル予測を適応的に活用するために,動的アンサンブルの重要性を強調し,ニューラルネットワークをアンサンブル手法として活用することを検討する。低多様性のアンサンブルを学習するリスクにより、トレーニング中にベースモデル予測をランダムにドロップすることでモデルを正規化することを提案する。このアプローチはアンサンブル内の多様性を低くし、オーバーフィッティングを減らし、一般化能力を向上させる。実験では, コンピュータビジョン, 自然言語処理, 表計算データにおいて, 強健なベースラインと比較して, 動的ニューラルアンサンブラが競争力を発揮することを示した。

Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembles often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we explore employing neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions during the training. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities. Our experiments showcase that the dynamic neural ensemblers yield competitive results compared to strong baselines in computer vision, natural language processing, and tabular data.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# MC-CoT: LLMとMLLMを統合したゼロショット医療VQAのためのモジュール協調CoTフレームワーク

MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration ( http://arxiv.org/abs/2410.04521v1 )

ライセンス: Link先を確認

Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang, Zhongyu Wei, Wei Chen,

(参考訳) 近年,Med-VQA(Med-VQA)タスクに対処するために,特定の医用画像データセットに基づいてMLLM(Multimodal large language model)が微調整されている。しかし、タスク固有の微調整の一般的なアプローチはコストが高く、ダウンストリームタスクごとに別々のモデルが必要であるため、ゼロショット能力の探索が制限される。本稿では,大規模な言語モデル(LLM)を活用することで,Med-VQAにおけるMLLMのゼロショット性能を向上させることを目的とした,モジュール型クロスモーダルコラボレーションChain-of-Thought(CoT)フレームワークであるMC-CoTを紹介する。 MC-CoTは、医学知識とタスク固有のガイダンスを統合することで推論と情報抽出を改善し、LSMは様々な複雑な医学推論チェーンを提供し、MLLMはLSMの指示に基づいて様々な医学画像の観察を行う。 SLAKE, VQA-RAD, PATH-VQAなどのデータセットを用いた実験により, MC-CoT はスタンドアロンのMLLM や様々なマルチモーダル CoT フレームワークをリコール率と精度で上回っていることがわかった。これらの知見は、複雑なゼロショットのMed-VQAタスクに、背景情報と詳細なガイダンスを組み込むことの重要性を強調している。

In recent advancements, multimodal large language models (MLLMs) have been fine-tuned on specific medical image datasets to address medical visual question answering (Med-VQA) tasks. However, this common approach of task-specific fine-tuning is costly and necessitates separate models for each downstream task, limiting the exploration of zero-shot capabilities. In this paper, we introduce MC-CoT, a modular cross-modal collaboration Chain-of-Thought (CoT) framework designed to enhance the zero-shot performance of MLLMs in Med-VQA by leveraging large language models (LLMs). MC-CoT improves reasoning and information extraction by integrating medical knowledge and task-specific guidance, where LLM provides various complex medical reasoning chains and MLLM provides various observations of medical images based on instructions of the LLM. Our experiments on datasets such as SLAKE, VQA-RAD, and PATH-VQA show that MC-CoT surpasses standalone MLLMs and various multimodality CoT frameworks in recall rate and accuracy. These findings highlight the importance of incorporating background information and detailed guidance in addressing complex zero-shot Med-VQA tasks.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# 航空・海上医療避難プラットフォーム調整のための半マルコフ計画

Semi-Markovian Planning to Coordinate Aerial and Maritime Medical Evacuation Platforms ( http://arxiv.org/abs/2410.04523v1 )

ライセンス: Link先を確認

Mahdi Al-Husseini, Kyle H. Wray, Mykel J. Kochenderfer,

(参考訳) 水中水上機を用いた2機間の患者の移動は、海洋環境における医療的避難範囲と柔軟性を増大させる。患者交換のための複数の水中船のいずれかの選択は、航空機の利用履歴と参加する水上船の位置と速度によって複雑である。選択問題は、固定地と移動船の交換点を含む作用空間を有するセミマルコフ決定過程としてモデル化される。ルート並列化によるモンテカルロ木探索は、最適な交換点を選択し、航空機の発送時間を決定するために用いられる。モデルパラメータは、ウォータークラフト交換点がインシデント応答時間を減少させる代表シナリオを特定するためにシミュレーションで変化する。その結果, 船舶交換点を用いた最適政策は, 船舶交換点を含まない最適政策と, グリーディ政策を35%, 40%向上させることがわかった。米国陸軍と共同で、ハワイのオアフ島の南で2機のHH-60M医療避難ヘリコプターと、進行中の陸軍物流支援船との間で、マニキンによる模擬患者輸送を実行することで、初めて水上交換地点を配備した。どちらのヘリコプターも我々の最適化された決定戦略に従って派遣された。

The transfer of patients between two aircraft using an underway watercraft increases medical evacuation reach and flexibility in maritime environments. The selection of any one of multiple underway watercraft for patient exchange is complicated by participating aircraft utilization history and a participating watercraft position and velocity. The selection problem is modeled as a semi-Markov decision process with an action space including both fixed land and moving watercraft exchange points. Monte Carlo tree search with root parallelization is used to select optimal exchange points and determine aircraft dispatch times. Model parameters are varied in simulation to identify representative scenarios where watercraft exchange points reduce incident response times. We find that an optimal policy with watercraft exchange points outperforms an optimal policy without watercraft exchange points and a greedy policy by 35% and 40%, respectively. In partnership with the United States Army, we deploy for the first time the watercraft exchange point by executing a mock patient transfer with a manikin between two HH-60M medical evacuation helicopters and an underway Army Logistic Support Vessel south of the Hawaiian island of Oahu. Both helicopters were dispatched in accordance with our optimized decision strategy.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# セキュアチューニングに向けて - 良質なインストラクションの微調整から生じるセキュリティリスクの軽減

Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning ( http://arxiv.org/abs/2410.04524v1 )

ライセンス: Link先を確認

Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Fenglei Fan, Ting Liu, Bing Qin,

(参考訳) インストラクションファインタニング(IFT)は、基礎となるLarge Language Models(LLM)をプロフェッショナルおよびプライベートな用途に応用するための重要な手法となっている。しかし、研究者は、IFTプロセスが完全に良性な命令(Benign IFT)を含む場合でも、IFT後のLLMのセキュリティが大幅に低下することを懸念している。我々の研究は、ベニグンIFTによるセキュリティリスクを軽減するための先駆的な取り組みである。具体的には,LLMの内部モジュールがセキュリティにどのように貢献するかを検討することを目的としたモジュールロバストネス解析を行う。本稿では,ML-LR(Modular Layer-wise Learning Rate)戦略と呼ばれる新しいIFT戦略を提案する。分析では,モジュールの堅牢性(例えば$Q$/$K$/$V$など)を測定するためのプロキシとして機能する,シンプルなセキュリティ機能分類器を実装した。モジュールの強靭性は,モジュールタイプや層深度によって定期的に変化し,明確なパターンを示すことがわかった。これらの知見を活用して、モジュールのロバストなサブセットを識別するプロキシ誘導探索アルゴリズムを Mods$_{Robust}$ と呼ぶ。 IFT中、ML-LR戦略はMods$_{Robust}$とその他のモジュールの差分学習率を採用している。本研究は,セキュリティ評価において,ML-LR戦略の適用により,良性IFT後のLSMの有害性の増加が著しく軽減されることを示す。特に,我々のML-LR戦略は Benign IFT に続く LLM のユーザビリティや専門性にはほとんど影響しない。さらに,ML-LR戦略の健全性と柔軟性を検証するため,包括的分析を行った。

Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the security risks arising from Benign IFT. Specifically, we conduct a Module Robustness Analysis, aiming to investigate how LLMs' internal modules contribute to their security. Based on our analysis, we propose a novel IFT strategy, called the Modular Layer-wise Learning Rate (ML-LR) strategy. In our analysis, we implement a simple security feature classifier that serves as a proxy to measure the robustness of modules (e.g. $Q$/$K$/$V$, etc.). Our findings reveal that the module robustness shows clear patterns, varying regularly with the module type and the layer depth. Leveraging these insights, we develop a proxy-guided search algorithm to identify a robust subset of modules, termed Mods$_{Robust}$. During IFT, the ML-LR strategy employs differentiated learning rates for Mods$_{Robust}$ and the rest modules. Our experimental results show that in security assessments, the application of our ML-LR strategy significantly mitigates the rise in harmfulness of LLMs following Benign IFT. Notably, our ML-LR strategy has little impact on the usability or expertise of LLMs following Benign IFT. Furthermore, we have conducted comprehensive analyses to verify the soundness and flexibility of our ML-LR strategy.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# 周りを見回して見る:相対角によるOOD検出

Look Around and Find Out: OOD Detection with Relative Angles ( http://arxiv.org/abs/2410.04525v1 )

ライセンス: Link先を確認

Berker Demirel, Marco Fumero, Francesco Locatello,

(参考訳) 現実世界のアプリケーションにデプロイされるディープラーニングシステムは、その分散(ID)とは異なるデータに遭遇することが多い。信頼できるシステムは、理想的には、このアウト・オブ・ディストリビューション(OOD)設定での意思決定を控えるべきです。既存の最先端の手法は、主にk番目の隣人や決定境界までの距離といった特徴距離に焦点を当てている。本研究では, 分布内構造に対して計算されるOOD検出のための新しい角度に基づく計量法を提案する。特徴表現と決定境界の間の角度は,分布内特徴の平均から見て,IDとOODデータ間の効果的な識別要因となることを示す。提案手法は, CIFAR-10 と ImageNet ベンチマークの最先端性能を実現し, FPR95 を 0.88% と 7.74% 削減した。我々のスコア関数は既存の特徴空間正規化技術と互換性があり、性能が向上する。さらに、そのスケール不変性により、単純なスコア和によるOOD検出のためのモデルのアンサンブルを作成することができる。

Deep learning systems deployed in real-world applications often encounter data that is different from their in-distribution (ID). A reliable system should ideally abstain from making decisions in this out-of-distribution (OOD) setting. Existing state-of-the-art methods primarily focus on feature distances, such as k-th nearest neighbors and distances to decision boundaries, either overlooking or ineffectively using in-distribution statistics. In this work, we propose a novel angle-based metric for OOD detection that is computed relative to the in-distribution structure. We demonstrate that the angles between feature representations and decision boundaries, viewed from the mean of in-distribution features, serve as an effective discriminative factor between ID and OOD data. Our method achieves state-of-the-art performance on CIFAR-10 and ImageNet benchmarks, reducing FPR95 by 0.88% and 7.74% respectively. Our score function is compatible with existing feature space regularization techniques, enhancing performance. Additionally, its scale-invariance property enables creating an ensemble of models for OOD detection via simple score summation.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# Casablanca:多方言アラビア語音声認識のデータとモデル

Casablanca: Data and Models for Multidialectal Arabic Speech Recognition ( http://arxiv.org/abs/2410.04527v1 )

ライセンス: Link先を確認

Bashar Talafha, Karima Kadaoui, Samar Mohamed Magdy, Mariem Habiboullah, Chafei Mohamed Chafei, Ahmed Oumar El-Shangiti, Hiba Zayed, Mohamedou cheikh tourad, Rahaf Alhamouri, Rwaa Assi, Aisha Alraeesi, Hour Mohamed, Fakhraddin Alwajih, Abdelrahman Mohamed, Abdellah El Mekki, El Moatez Billah Nagoudi, Benelhadj Djelloul Mama Saadia, Hamzah A. Alsayadi, Walid Al-Dhabyani, Sara Shatnawi, Yasir Ech-Chammakhy, Amal Makouar, Yousra Berrachedi, Mustafa Jarrar, Shady Shehata, Ismail Berrada, Muhammad Abdul-Mageed,

(参考訳) 近年の音声処理の進歩にもかかわらず、世界の言語や方言の大部分は明らかになっていない。この状況は、既に広範囲の技術的分断を妨げ、技術的・社会経済的包摂を妨げているだけである。この課題は主に、多様な音声システムを強化するデータセットがないためである。本稿では,多方言のアラビア語データセットを収集・転写する大規模コミュニティ主導の取り組みであるCasablancaを提示することにより,アラビア語方言のこの障害を軽減することを目的とする。このデータセットには、アルジェリア語、エジプト語、エミラティ語、ヨルダン語、モーリタニア語、モロッコ語、パレスチナ語、イエメン語の8つの方言が含まれ、転写、性別、方言、コードスイッチングのアノテーションが含まれている。私たちはまた、カサブランカを活用できる強力なベースラインを多数開発しています。 Casablanca のプロジェクトページは www.dlnlp.ai/speech/casablanca にある。

In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclusion. This challenge is largely due to the absence of datasets that can empower diverse speech systems. In this paper, we seek to mitigate this obstacle for a number of Arabic dialects by presenting Casablanca, a large-scale community-driven effort to collect and transcribe a multi-dialectal Arabic dataset. The dataset covers eight dialects: Algerian, Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni, and includes annotations for transcription, gender, dialect, and code-switching. We also develop a number of strong baselines exploiting Casablanca. The project page for Casablanca is accessible at: www.dlnlp.ai/speech/casablanca.

翻訳日:2024-11-02 06:56:10 公開日:2024-10-06

# 3次元シーン理解のための知覚的事前認識によるPlace Panoptic Radiance Field Segmentation

In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding ( http://arxiv.org/abs/2410.04529v1 )

ライセンス: Link先を確認

Shenghao Li,

(参考訳) 正確な3Dシーン表現とパノプティクス理解は、仮想現実、ロボティクス、自律運転などのアプリケーションに不可欠である。しかし、正確な2D-to-3Dマッピング、境界あいまいさやスケールの変化といった複雑なシーン特性の扱い、パノピックな擬似ラベルのノイズ軽減など、既存の手法では課題が続いている。本稿では,2次元のセマンティクスとインスタンス認識を含む線形代入問題として,ニューラルラディアンス領域におけるパノプティクス理解を再構成する,知覚優先の3次元シーン表現とパノプティカル理解手法を提案する。事前学習された2次元パノプティックセグメンテーションモデルからの知覚情報を事前指導として組み込むことにより、ニューラル放射場における外観、幾何学、およびパノプティック理解の学習過程を同期させる。縮小符号化されたカスケードグリッドを再パラメータ化ドメイン蒸留フレームワーク内に拡張することにより,屋内および屋外のシーン間の一般化を促進するために,暗黙のシーン表現と理解モデルを開発した。このモデルは複雑なシーン特性を効果的に管理し、3D一貫性のあるシーン表現と様々なシーンに対するパノラマ理解結果を生成する。合成シーンや実世界のシーンを含む難易度条件下での実験およびアブレーション研究は、3次元シーン表現の強化とパノプティックセグメンテーションの精度向上における提案手法の有効性を実証する。

Accurate 3D scene representation and panoptic understanding are essential for applications such as virtual reality, robotics, and autonomous driving. However, challenges persist with existing methods, including precise 2D-to-3D mapping, handling complex scene characteristics like boundary ambiguity and varying scales, and mitigating noise in panoptic pseudo-labels. This paper introduces a novel perceptual-prior-guided 3D scene representation and panoptic understanding method, which reformulates panoptic understanding within neural radiance fields as a linear assignment problem involving 2D semantics and instance recognition. Perceptual information from pre-trained 2D panoptic segmentation models is incorporated as prior guidance, thereby synchronizing the learning processes of appearance, geometry, and panoptic understanding within neural radiance fields. An implicit scene representation and understanding model is developed to enhance generalization across indoor and outdoor scenes by extending the scale-encoded cascaded grids within a reparameterized domain distillation framework. This model effectively manages complex scene attributes and generates 3D-consistent scene representations and panoptic understanding outcomes for various scenes. Experiments and ablation studies under challenging conditions, including synthetic and real-world scenes, demonstrate the proposed method's effectiveness in enhancing 3D scene representation and panoptic segmentation accuracy.