Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240929となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 定量的モラル基礎を用いたソーシャルメディアにおけるスタンス分類の強化 Enhancing Stance Classification on Social Media Using Quantified Moral Foundations ( http://arxiv.org/abs/2310.09848v2 ) ライセンス: Link先を確認	Hong Zhang, Prasanta Bhattacharya, Wei Gao, Liang Ze Wong, Brandon Siyuan Loh, Joseph J. P. Simons, Jisun An,	(参考訳) 本研究は、より深い心理的属性、特に個人の道徳的基盤を組み込むことにより、ソーシャルメディアにおけるスタンス検出を強化する。これらの理論的に派生した次元は、社会、政治、健康、環境など、様々な領域における行動と結びついている個人の道徳的関心の包括的プロファイルを提供することを目的としている。本稿では,道徳的基盤の次元が,特定の目標に対する個人の姿勢を予測するのにどのように貢献するかを検討する。具体的には、従来の機械学習モデルと大規模言語モデルの両方を用いて、テキストから抽出された道徳的基礎機能と、メッセージセマンティック機能を組み合わせて、メッセージレベルとユーザレベルの両方のスタンスを分類する。予備的な結果は、モラル基礎の符号化は、スタンス検出タスクのパフォーマンスを高め、特定のモラル基礎とターゲットトピックに対するオンラインスタンスとの関係を照らすのに役立つことを示唆している。この結果は、スタンス分析における深い心理的属性を考慮することの重要性を強調し、オンライン社会行動の指導における道徳的基礎の役割を浮き彫りにするものである。 This study enhances stance detection on social media by incorporating deeper psychological attributes, specifically individuals' moral foundations. These theoretically-derived dimensions aim to provide a comprehensive profile of an individual's moral concerns which, in recent work, has been linked to behaviour in a range of domains, including society, politics, health, and the environment. In this paper, we investigate how moral foundation dimensions can contribute to predicting an individual's stance on a given target. Specifically we incorporate moral foundation features extracted from text, along with message semantic features, to classify stances at both message- and user-levels using both traditional machine learning models and large language models. Our preliminary results suggest that encoding moral foundations can enhance the performance of stance detection tasks and help illuminate the associations between specific moral foundations and online stances on target topics. The results highlight the importance of considering deeper psychological attributes in stance analysis and underscores the role of moral foundations in guiding online social behavior.	翻訳日:2024-11-09 10:01:09 公開日:2024-09-29
# RSTAR4D: 分離4次元CNNを用いた4次元CBCTの回転ストリークアーティファクト低減 RSTAR4D: Rotational Streak Artifact Reduction in 4D CBCT using a Separable 4D CNN ( http://arxiv.org/abs/2403.16361v3 ) ライセンス: Link先を確認	Ziheng Deng, Hua Chen, Yongzheng Zhou, Haibo Hu, Zhiyong Xu, Jiayuan Sun, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao,	(参考訳) 4次元コーンビームCT(4D CBCT)は呼吸分解画像を提供し、放射線治療に用いられる。しかし、呼吸運動を明らかにする能力は、イメージアーティファクトのコストがかかる。生のプロジェクションデータを複数の呼吸相に分類すると、コーンビームプロジェクションはよりスペーサーになり、再構成された4D CBCT画像は厳しいストリークアーティファクトで被覆される。この問題に対処するためにいくつかのディープラーニングベースの手法が提案されているが、ほとんどのアルゴリズムは2Dネットワークモデルをバックボーンとして採用しており、4D CBCT画像内の固有の構造的先行性を無視している。本稿では,まず4次元CBCT画像におけるストリークアーティファクトの起源と外観について検討する。時空間領域における横隔膜駆動呼吸運動と区別し, ストリークアーティファクトは呼吸とともに独特の回転運動を示すことがわかった。そこで本研究では、4次元CBCT画像に空間情報と時間情報を統合することにより、回転STreakアーチファクト削減に対処する新しい4次元ニューラルネットワークモデルRSTAR4D-Netを提案する。具体的には、4Dニューラルネットワークの計算とトレーニングの難しさを克服する。特別に設計されたモデルは、4D畳み込みの効率的な実装を採用し、計算コストを削減し、4D画像全体を1パスで処理することができる。さらに,分離可能な4Dコンボリューションに関連するテトリストレーニング戦略を提案し,限られた4Dトレーニングサンプルを用いてモデルを効果的にトレーニングする。大規模な実験により提案手法の有効性が実証され,RSTAR4D-Netは他の手法と比較して優れた性能を示した。ソースコードと動的デモはhttps://github.com/ivy90921111/RSTARで公開されている。 Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ 2D network models as backbones, neglecting the intrinsic structural priors within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images. We find that streak artifacts exhibit a unique rotational motion along with the patient's respiration, distinguishable from diaphragm-driven respiratory motion in the spatiotemporal domain. Therefore, we propose a novel 4D neural network model, RSTAR4D-Net, designed to address Rotational STreak Artifact Reduction by integrating the spatial and temporal information within 4D CBCT images. Specifically, we overcome the computational and training difficulties of a 4D neural network. The specially designed model adopts an efficient implementation of 4D convolutions to reduce computational costs and thus can process the whole 4D image in one pass. Additionally, a Tetris training strategy pertinent to the separable 4D convolutions is proposed to effectively train the model using limited 4D training samples. Extensive experiments substantiate the effectiveness of our proposed method, and the RSTAR4D-Net shows superior performance compared to other methods. The source code and dynamic demos are available at https://github.com/ivy9092111111/RSTAR.	翻訳日:2024-11-09 03:48:22 公開日:2024-09-29
# 学習モダリティによる非同期マルチモーダルビデオシーケンスフュージョン-排他的および非依存的表現 Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations ( http://arxiv.org/abs/2407.04955v2 ) ライセンス: Link先を確認	Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang,	(参考訳) ビデオから人間の意図(例えば感情)を理解することは、最近かなりの注目を集めている。ビデオストリームは一般的に、自然言語、表情、聴覚的手がかりなど、異なるモーダル性に由来する時間データのブレンドを構成する。注意に基づくパラダイムによる先行研究の顕著な進歩にもかかわらず、本質的に時間的非同期性と不均一性の課題はマルチモーダルシーケンスの融合に残っており、パフォーマンスのボトルネックの原因となっている。これらの課題に対処するために,モーダリティ学習のためのマルチモーダル融合手法を提案する。一方、MEAは、モダリティ内の信頼性のあるコンテキストダイナミクスを捕捉し、モダリティ排他的空間上のユニークな特徴を補強する予測自己アテンションモジュールを導入している。一方、階層的クロスモーダルアテンションモジュールは、モダリティ-非依存空間上のモダリティ間の価値ある要素相関を探索するために設計されている。一方、異なる表現の対角的な生成を保証するために、二重識別器戦略が提示される。最終的に、不均一なモダリティ間の知識交換を強化し、下流タスクの堅牢なマルチモーダル表現を学習する疎結合グラフ融合機構を提案する。非同期シーケンスを持つ3つのマルチモーダルデータセット上で多数の実験を行う。システム分析は我々のアプローチの必要性を示している。 Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality heterogeneity challenges remain in multimodal sequence fusion, causing adverse performance bottlenecks. To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities. On the one hand, MEA introduces a predictive self-attention module to capture reliable context dynamics within modalities and reinforce unique features over the modality-exclusive spaces. On the other hand, a hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities over the modality-agnostic space. Meanwhile, a double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner. Eventually, we propose a decoupled graph fusion mechanism to enhance knowledge exchange across heterogeneous modalities and learn robust multimodal representations for downstream tasks. Numerous experiments are implemented on three multimodal datasets with asynchronous sequences. Systematic analyses show the necessity of our approach.	翻訳日:2024-11-08 23:35:45 公開日:2024-09-29
# CPM:音声視覚分割のためのクラス条件プロンプティングマシン CPM: Class-conditional Prompting Machine for Audio-visual Segmentation ( http://arxiv.org/abs/2407.05358v3 ) ライセンス: Link先を確認	Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro,	(参考訳) オーディオ・ビジュアル・セグメンテーション (AVS) は、オーディオ・ビジュアル・キューに基づいた音質オブジェクトを正確にセグメンテーションすることを目的とした新しいタスクである。 AVS学習システムの成功は、モーダル間相互作用の有効性に依存する。このような要求は、トランスフォーマーベースのセグメンテーションアーキテクチャを活用することで自然に達成できる。しかし,AVSでは,特に学習された音声クエリが明確な意味的手がかりを提供していない場合,クロスアテンションの有効性の低下や不安定なバイパーティイトマッチングなどのトランスフォーマーベースの手法の固有のトレーニング問題を増幅することができる。本稿では,これら2つの問題を,CPM(Class-conditional Prompting Machine)を用いて解決する。 CPMは、クラスに依存しないクエリとクラス条件のクエリを組み合わせた学習戦略により、バイパーティイトマッチングを改善している。クロスモーダルアテンションの有効性は,音声・視覚・関節モダリティの新しい学習目標によって向上する。我々はAVSベンチマーク実験を行い、その手法がSOTA(State-of-the-art)セグメンテーションの精度を実現することを示す。 Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.	翻訳日:2024-11-08 23:24:33 公開日:2024-09-29
# 語彙データのための木を解き放つ Unmasking Trees for Tabular Data ( http://arxiv.org/abs/2407.05593v2 ) ライセンス: Link先を確認	Calvin McCarter,	(参考訳) 表形式のデータ生成と計算のための高度なディープラーニングと生成モデリング技術に関する研究にもかかわらず、従来の手法は計算ベンチマークで勝利し続けている。本稿では,個々の特徴を段階的に解き放つために,勾配型決定木を用いた表計算(および生成)の簡単な方法であるUnmaskingTreesを提案する。このアプローチは、命令処理における最先端のパフォーマンスと、不足したトレーニングデータを生成時に提供し、バニラ生成に対する競合的なパフォーマンスを提供する。条件生成サブプロブレムを解決するために,木分類器のバランス木に適合する表型確率予測手法BaltoBotを提案する。従来の方法とは異なり、条件分布のパラメトリックな仮定は不要であり、より新しい拡散法とは異なり、高速サンプリング、クローズドフォーム密度推定、離散変数のフレキシブルハンドリングを提供する。我々はついに2つのアプローチをメタアルゴリズムとみなし、TabPFNを用いた文脈内学習に基づく生成モデリングを実証した。 Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. This approach offers state-of-the-art performance on imputation, and on generation given training data with missingness; and it has competitive performance on vanilla generation. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.	翻訳日:2024-11-08 23:24:33 公開日:2024-09-29
# 語彙データのための木を解き放つ Unmasking Trees for Tabular Data ( http://arxiv.org/abs/2407.05593v3 ) ライセンス: Link先を確認	Calvin McCarter,	(参考訳) 表形式のデータ生成と計算のための高度なディープラーニングと生成モデリング技術に関する研究にもかかわらず、従来の手法は計算ベンチマークで勝利し続けている。本稿では,個々の特徴を段階的に解き放つために,勾配型決定木を用いた表計算(および生成)の簡単な方法であるUnmaskingTreesを提案する。このアプローチは、命令処理における最先端のパフォーマンスと、不足したトレーニングデータを生成時に提供し、バニラ生成に対する競合的なパフォーマンスを提供する。条件生成サブプロブレムを解決するために,木分類器のバランス木に適合する表型確率予測手法BaltoBotを提案する。従来の方法とは異なり、条件分布のパラメトリックな仮定は不要であり、より新しい拡散法とは異なり、高速サンプリング、クローズドフォーム密度推定、離散変数のフレキシブルハンドリングを提供する。我々はついに2つのアプローチをメタアルゴリズムとみなし、TabPFNを用いた文脈内学習に基づく生成モデリングを実証した。 Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. This approach offers state-of-the-art performance on imputation, and on generation given training data with missingness; and it has competitive performance on vanilla generation. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.	翻訳日:2024-11-08 23:24:33 公開日:2024-09-29
# RadiomicsFill-Mammo: 合成マンモグラムマスマニピュレーションと放射能特性 RadiomicsFill-Mammo: Synthetic Mammogram Mass Manipulation with Radiomics Features ( http://arxiv.org/abs/2407.05683v2 ) ライセンス: Link先を確認	Inye Na, Jonghun Kim, Eun Sook Ko, Hyunjin Park,	(参考訳) 所望の属性を持つ腫瘍を生成するか?」という質問に動機づけられたこの研究は、放射能の特徴を活用して、合成腫瘍画像の作成の可能性を探る。低次元で生物学的に意味のあるマーカーによって特徴づけられる放射能は、複雑な医用画像データと実行可能な臨床所見のギャップを埋める。われわれはRadiomicsFillシリーズの第1弾であるRadiomicsFill-Mammoを紹介した。これは、マスク画像と反対乳房画像を用いて特定の放射能特性を反映したリアルなマンモグラムマス画像を生成する革新的な技術であり、最近の安定拡散モデルを利用している。このアプローチはまた、BI-RADSや乳房密度などの重要な臨床変数を、大量発生の条件として放射能の特徴とともに組み込むことも可能である。その結果,RadiomicsFill-Mammoは様々な放射線条件に基づいて,多彩で現実的な腫瘍像を効果的に生成できることが示唆された。また,RadiomicsFill-Mammoを模擬サンプル生成戦略として活用し,質量検出能力の大幅な向上を図った。さらに、RadiomicsFill-Mammoは、医療画像研究の進展だけでなく、治療計画と腫瘍シミュレーションの強化のための新たな道を開く。私たちのコードはhttps://github.com/nainye/RadiomicsFill.comから入手可能です。 Motivated by the question, "Can we generate tumors with desired attributes?'' this study leverages radiomics features to explore the feasibility of generating synthetic tumor images. Characterized by its low-dimensional yet biologically meaningful markers, radiomics bridges the gap between complex medical imaging data and actionable clinical insights. We present RadiomicsFill-Mammo, the first of the RadiomicsFill series, an innovative technique that generates realistic mammogram mass images mirroring specific radiomics attributes using masked images and opposite breast images, leveraging a recent stable diffusion model. This approach also allows for the incorporation of essential clinical variables, such as BI-RADS and breast density, alongside radiomics features as conditions for mass generation. Results indicate that RadiomicsFill-Mammo effectively generates diverse and realistic tumor images based on various radiomics conditions. Results also demonstrate a significant improvement in mass detection capabilities, leveraging RadiomicsFill-Mammo as a strategy to generate simulated samples. Furthermore, RadiomicsFill-Mammo not only advances medical imaging research but also opens new avenues for enhancing treatment planning and tumor simulation. Our code is available at https://github.com/nainye/RadiomicsFill.	翻訳日:2024-11-08 23:24:33 公開日:2024-09-29
# 大言語モデルリコール不確かさはファン効果によって変調される Large Language Model Recall Uncertainty is Modulated by the Fan Effect ( http://arxiv.org/abs/2407.06349v2 ) ライセンス: Link先を確認	Jesse Roberts, Kyle Moore, Thao Pham, Oseremhen Ewaleifoh, Doug Fisher,	(参考訳) 本稿では,人間のテキストデータを用いて事前学習した後,大きな言語モデル(LLM)が,アンダーソンがヒトで発見したものと同様の認知ファン効果を示すか否かを評価する。ファン効果を誘発する2組のコンテキスト内リコール実験を行う。また, LLMリコールの不確実性は, トークンの確率によって測定され, ファン効果に影響されていることがわかった。以上の結果から,不確実性除去が観察効果を阻害することが明らかとなった。実験により、ファン効果は、ファン値が文脈内で誘導されるか、事前学習データ内で誘導されるかの一致が示唆された。最後に、これらの発見はファン効果と典型性が同じ現象の表現であることを示す。 This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan effect. Our results show that removing uncertainty disrupts the observed effect. The experiments suggest the fan effect is consistent whether the fan value is induced in-context or in the pre-training data. Finally, these findings provide in-silico evidence that fan effects and typicality are expressions of the same phenomena.	翻訳日:2024-11-08 23:13:33 公開日:2024-09-29
# 多エージェントシステムのためのモデルベース強化学習を用いたグラフニューラルネットワーク Graph Neural Networks with Model-based Reinforcement Learning for Multi-agent Systems ( http://arxiv.org/abs/2407.09249v2 ) ライセンス: Link先を確認	Hanxiao Chen,	(参考訳) マルチエージェントシステム(MAS)は、マシンインテリジェンスと高度な応用を探索する上で重要な役割を果たしている。モデルベース強化学習を用いた状態空間グラフニューラルネットワークを用いて,MASのミッション(例えばビリヤード回避,自律走行車)に対処する。具体的には,まずGNNモデルを用いて,複数のエージェントの将来の状態や軌道を予測し,次にCEM(Cross-Entropy Method)最適化モデル予測制御を適用して,エゴエージェント計画動作を支援し,特定のMASタスクの達成に成功した。 Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.	翻訳日:2024-11-08 22:06:29 公開日:2024-09-29
# 視覚プロンプティングによるセンサデータを用いたマルチモーダル大言語モデルの構築 By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting ( http://arxiv.org/abs/2407.10385v2 ) ライセンス: Link先を確認	Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong, Kimin Lee, Sung-Ju Lee,	(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる例外的な能力を示している。しかし,LLMをユビキタスセンシングアプリケーションに利用することは,従来のテキストプロンプト手法が長いセンサデータシーケンスを扱う場合,大幅な性能劣化を示すため,依然として困難である。マルチモーダルLSM(MLLM)を用いたセンサデータに対する視覚的プロンプト手法を提案する。我々は,MLLMの視覚的プロンプトを設計し,ターゲットの知覚タスク記述と並行して可視化されたセンサデータを活用する。さらに、与えられた感覚タスクに合わせて最適な可視化を作成することを自動化する可視化生成装置を導入し、タスク固有の事前知識の必要性を解消する。我々は,4つの感覚モーダルを含む9つの感覚タスクに対するアプローチを評価し,テキストベースのプロンプトよりも平均10%高い精度を実現し,トークンコストを15.8倍削減した。 MLLMによる視覚刺激の有効性と費用対効果について検討した。ソースコードはhttps://github.com/diamond264/ByMyEyes.comで入手できる。 Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8 times. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks. The source code is available at https://github.com/diamond264/ByMyEyes.	翻訳日:2024-11-08 21:43:45 公開日:2024-09-29
# Splatfacto-W: 制約のない写真集のためのガウススプラッティングのNerfstudio実装 Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections ( http://arxiv.org/abs/2407.12306v2 ) ライセンス: Link先を確認	Congrong Xu, Justin Kerr, Angjoo Kanazawa,	(参考訳) 未制約画像からの新たなビュー合成は、正確なシーン再構成を複雑にする光度変化と過渡的オクローダのために、重要な課題でありながら難しい課題である。従来の手法では,Neural Radiance Fields (NeRFs) に画像単位の外観特徴を組み込むことで,これらの問題にアプローチしている。 3D Gaussian Splatting (3DGS)は、高速なトレーニングとリアルタイムレンダリングを提供するが、制約のない画像コレクションに適応することは、アーキテクチャがかなり異なるため、簡単ではない。本稿では,ガウス色ごとのニューラルカラー特徴と画像ごとの外観をラスタライズプロセスに組み込むアプローチであるSplatfacto-Wを紹介する。我々の重要な貢献は、潜時外見モデリング、効率的な過渡的オブジェクトハンドリング、正確な背景モデリングである。 Splatfacto-Wは高品質でリアルタイムな新しいビュー合成を提供する。提案手法は,3DGSに比べて平均5.3dBのPak Signal-to-Noise Ratio(PSNR)を向上し,NeRF法に比べて150倍のトレーニング速度を向上し,3DGSと同様のレンダリング速度を実現する。 Nerfstudioに統合された追加のビデオ結果とコードはhttps://kevinxu02.github.io/splatfactow/.comで公開されている。 Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. Additional video results and code integrated into Nerfstudio are available at https://kevinxu02.github.io/splatfactow/.	翻訳日:2024-11-08 20:48:00 公開日:2024-09-29
# チップ設計における強化学習に基づくマクロ細胞の非重複配置 Non-Overlapping Placement of Macro Cells based on Reinforcement Learning in Chip Design ( http://arxiv.org/abs/2407.18499v3 ) ライセンス: Link先を確認	Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan,	(参考訳) チップ設計の複雑さが増大しているため、既存の配置法では、マクロセルのカバレッジと最適化効率に多くの欠点がある。本稿では,既存のチップ設計手法におけるレイアウトの重複,性能の低下,最適化効率の低下といった問題に着目し,強化学習に基づくエンドツーエンド配置手法SRLPlacerを提案する。まず、配置問題をマクロセル間の結合関係グラフモデルを確立することによりマルコフ決定プロセスに変換し、レイアウトの最適化戦略を学ぶ。第2に、標準セルレイアウトを統合した後、配置プロセス全体を最適化する。提案するSRLPlacerは,一般ベンチマークISPD2005に基づいて,混雑のルーティングやワイヤ長さの短縮を考慮しつつ,マクロセル間の重なり合う問題を効果的に解くことができる。コードはhttps://github.com/shuyusd/SRLPlacer.comで公開されている。 Due to the increasing complexity of chip design, existing placement methods still have many shortcomings in dealing with macro cells coverage and optimization efficiency. Aiming at the problems of layout overlap, inferior performance, and low optimization efficiency in existing chip design methods, this paper proposes an end-to-end placement method, SRLPlacer, based on reinforcement learning. First, the placement problem is transformed into a Markov decision process by establishing the coupling relationship graph model between macro cells to learn the strategy for optimizing layouts. Secondly, the whole placement process is optimized after integrating the standard cell layout. By assessing on the public benchmark ISPD2005, the proposed SRLPlacer can effectively solve the overlap problem between macro cells while considering routing congestion and shortening the total wire length to ensure routability. Codes are available at https://github.com/zhouyusd/SRLPlacer.	翻訳日:2024-11-08 14:50:05 公開日:2024-09-29
# ControlMLLM:マルチモーダル大規模言語モデルのための学習不要なビジュアルプロンプト学習 ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models ( http://arxiv.org/abs/2407.21534v2 ) ライセンス: Link先を確認	Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji,	(参考訳) 本研究では,学習可能な視覚トークン最適化により,MLLM(Multimodal Large Language Models)に視覚参照を注入する学習自由手法を提案する。 MLLMにおけるテキストプロンプトトークンと視覚トークンの関係を観察する。提案手法では,推測中にMLP出力から視覚トークンを調整し,どのテキストプロンプトがどの視覚トークンに参加するかを制御する。我々は,エネルギー関数に基づいて学習可能な視覚トークンを最適化し,注目マップにおける参照領域の強度を高める。これにより、相当なトレーニングコストやモデル再トレーニングを必要とせずに、詳細な地域説明と推論が可能になる。本手法は,MLLMに参照能力を統合するための有望な方向を提供する。我々の方法は、ボックス、マスク、スクリブル、ポイントを参照することをサポートしている。その結果,本手法は制御性と解釈性を示すことがわかった。 In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlling which text prompt tokens attend to which visual tokens. We optimize a learnable visual token based on an energy function, enhancing the strength of referential regions in the attention map. This enables detailed region description and reasoning without the need for substantial training costs or model retraining. Our method offers a promising direction for integrating referential abilities into MLLMs. Our method support referring with box, mask, scribble and point. The results demonstrate that our method exhibits controllability and interpretability.	翻訳日:2024-11-08 13:40:32 公開日:2024-09-29
# 非畳み込みグラフニューラルネットワーク Non-convolutional Graph Neural Networks ( http://arxiv.org/abs/2408.00165v3 ) ライセンス: Link先を確認	Yuanqing Wang, Kyunghyun Cho,	(参考訳) 畳み込みベースのグラフニューラルネットワーク(GNN)を再考する -- 表現力の制限、過度なスムース化、過剰なスキャッシングが特徴であり、効率的な計算には特別なスパースカーネルが必要である。本稿では、畳み込み演算子を完全に含まない単純なグラフ学習モジュールを設計し、ランダムウォークと統一メモリ(RUM)ニューラルネットワークを合成し、RNNが各ノードで終了するランダムウォークに沿ってトポロジとセマンティックグラフの特徴をマージする。 RNNの挙動とグラフトポロジーに関する豊富な文献に関連して,RUMが上記の症状を緩和し,Weisfeiler-Lehman(WL)同型性試験よりも表現力が高いことを理論的に証明し,実験的に検証した。様々なノードレベルの分類と回帰タスクにおいて、RUMは競争性能を達成するだけでなく、最も単純な畳み込みGNNよりも堅牢で、メモリ効率が良く、スケーラブルで、高速である。 Rethink convolution-based graph neural networks (GNN) -- they characteristically suffer from limited expressiveness, over-smoothing, and over-squashing, and require specialized sparse kernels for efficient computation. Here, we design a simple graph learning module entirely free of convolution operators, coined random walk with unifying memory (RUM) neural network, where an RNN merges the topological and semantic graph features along the random walks terminating at each node. Relating the rich literature on RNN behavior and graph topology, we theoretically show and experimentally verify that RUM attenuates the aforementioned symptoms and is more expressive than the Weisfeiler-Lehman (WL) isomorphism test. On a variety of node- and graph-level classification and regression tasks, RUM not only achieves competitive performance, but is also robust, memory-efficient, scalable, and faster than the simplest convolutional GNNs.	翻訳日:2024-11-08 13:40:31 公開日:2024-09-29
# オンライン線形計画法における頻繁な解法 Infrequent Resolving Algorithm for Online Linear Programming ( http://arxiv.org/abs/2408.00465v3 ) ライセンス: Link先を確認	Guokai Li, Zizhuo Wang, Jingwei Zhang,	(参考訳) オンラインリニアプログラミング(OLP)は、オンラインオークション、ネットワーク収益管理、広告などの幅広い応用により、研究者と実践者の両方から大きな注目を集めている。既存のOLPアルゴリズムは、LPベースアルゴリズムとLPフリーアルゴリズムの2つのカテゴリに分類される。前者は典型的にはパフォーマンスの向上を保証し、常に後悔しても良いが、計算コストのかかる大量のLPを解く必要がある。対照的に、LPフリーアルゴリズムは1次計算しか必要としないが、より悪い性能を誘導し、絶え間ないリフレッシュバウンドを欠いている。本研究では、未知の有限支持分布から入力が引き出される場合について検討し、この2つの極端間のギャップを、時間的地平線上でのO(\log\log T)$倍のLPを解きながら、絶え間なく後悔するアルゴリズムを提案して橋渡しする。さらに、LPをわずかに$M$回だけ解ける場合、$O\left(T^{(1/2+\epsilon)^{M-1}}\right)を許すアルゴリズムを提案する。さらに、最初に到着確率が分かると、我々のアルゴリズムはLPs$O(\log\log T)$ times と $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret を LPs$M$ times で解くことで、絶え間ない後悔を保証できる。提案アルゴリズムの効率性を示すために, 数値実験を行った。 Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we study the case where the inputs are drawn from an unknown finite-support distribution, and bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.	翻訳日:2024-11-08 13:29:21 公開日:2024-09-29
# GoNoGo: 効率的なLCMベースのマルチエージェントシステム GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making ( http://arxiv.org/abs/2408.09785v2 ) ライセンス: Link先を確認	Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt, Andris Freimanis, Patrick Andersson Rhodin, Dhasarathy Parthasarathy,	(参考訳) 自動車業界におけるソフトウェアデプロイメントの決定を行う従来の手法は、通常、表形式のソフトウェアテストデータの手動分析に頼っている。これらの手法は、労働集約性のために、ソフトウェアリリースサイクルのコストと遅延を高くすることが多い。大規模言語モデル(LLM)はこれらの課題に対して有望な解決策を提供する。しかし、そのアプリケーションは一般的に、人間主導のプロンプトエンジニアリングのラウンドを複数回必要としており、特に信頼性と効率的な結果を必要とする産業のエンドユーザーに対して、その実践的な展開を制限している。本稿では,機能要件と実用的産業制約の両方を満たしつつ,自動車ソフトウェアデプロイメントを効率化するLLMエージェントシステムであるGoNoGoを提案する。従来のシステムとは異なり、GoNoGoはドメイン固有でリスクに敏感なシステムに特化している。我々は,産業実践から得たゼロショットと少数ショットの例を用いて,GoNoGoの性能を,異なる課題にまたがって評価した。以上の結果から,GoNoGoは3ショットの例ではレベル2の難易度までのタスクを100%成功率で達成し,さらに複雑なタスクにおいても高いパフォーマンスを維持していることがわかった。 GoNoGoは、より簡単なタスクのための意思決定を効果的に自動化し、手作業による介入の必要性を大幅に低減します。要約すると、GoNoGoは、我々の産業パートナーの会社で現在採用されている効率的でユーザフレンドリなLCMベースのソリューションであり、ソフトウェアのリリース決定を支援し、リスクに敏感な車両システムのリリースプロセスにおいて、より情報とタイムリーな決定をサポートします。 Traditional methods for making software deployment decisions in the automotive industry typically rely on manual analysis of tabular software test data. These methods often lead to higher costs and delays in the software release cycle due to their labor-intensive nature. Large Language Models (LLMs) present a promising solution to these challenges. However, their application generally demands multiple rounds of human-driven prompt engineering, which limits their practical deployment, particularly for industrial end-users who need reliable and efficient results. In this paper, we propose GoNoGo, an LLM agent system designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints. Unlike previous systems, GoNoGo is specifically tailored to address domain-specific and risk-sensitive systems. We evaluate GoNoGo's performance across different task difficulties using zero-shot and few-shot examples taken from industrial practice. Our results show that GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks. We find that GoNoGo effectively automates decision-making for simpler tasks, significantly reducing the need for manual intervention. In summary, GoNoGo represents an efficient and user-friendly LLM-based solution currently employed in our industrial partner's company to assist with software release decision-making, supporting more informed and timely decisions in the release process for risk-sensitive vehicle systems.	翻訳日:2024-11-08 06:55:48 公開日:2024-09-29
# 古文書間分析のためのLLM談話パターンの検討 Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis ( http://arxiv.org/abs/2409.01882v2 ) ライセンス: Link先を確認	Ray Umphrey, Jesse Roberts, Lindsey Roberts,	(参考訳) 本研究では,大言語モデル (LLMs) の聖書, コイナ語, ギリシア語文における文間関係の同定と検討の可能性について検討する。 LLMの性能を様々なテクスト間シナリオで評価することにより、これらのモデルがテキスト間の直接的引用、暗示、エコーを検出することができることを示した。 LLMが新たなテクスト間観測と接続を生成する能力は、新たな洞察を明らかにする可能性を浮き彫りにしている。しかし、このモデルは、長いクエリパスと偽のテキスト間依存を含まないことにも苦慮し、専門家の評価の重要性を強調している。論文のエキスパート・イン・ザ・ループの方法論は、聖書のコーパス内外における複雑なテクスチュアリティのウェブについて、インターテクスチュアな研究を行うためのスケーラブルなアプローチを提供する。 This study explores the potential of large language models (LLMs) for identifying and examining intertextual relationships within biblical, Koine Greek texts. By evaluating the performance of LLMs on various intertextuality scenarios the study demonstrates that these models can detect direct quotations, allusions, and echoes between texts. The LLM's ability to generate novel intertextual observations and connections highlights its potential to uncover new insights. However, the model also struggles with long query passages and the inclusion of false intertextual dependences, emphasizing the importance of expert evaluation. The expert-in-the-loop methodology presented offers a scalable approach for intertextual research into the complex web of intertextuality within and beyond the biblical corpus.	翻訳日:2024-11-07 23:56:04 公開日:2024-09-29
# E2CL: 身体的エージェントの探索に基づく誤り訂正学習 E2CL: Exploration-based Error Correction Learning for Embodied Agents ( http://arxiv.org/abs/2409.03256v2 ) ライセンス: Link先を確認	Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li,	(参考訳) 言語モデルは、知識利用と推論の能力が増大している。しかし、具体的環境においてエージェントとして適用された場合、本質的な知識と環境的な知識の相違に悩まされることがしばしばあり、実行不可能な行動を引き起こす。専門家軌道の教師付き学習や強化学習といった従来の環境アライメント手法は, 環境知識をカバーし, 効率の良い収束を達成する上での限界に遭遇する。人間の学習に触発されて,探索に基づく誤り訂正学習(E2CL)を提案する。 E2CLは、環境フィードバックを収集し、誤った行動を正すために、教師誘導と教師なしの探究を取り入れている。エージェントはフィードバックと自己修正を提供することを学び、それによってターゲット環境への適応性を高める。 VirtualHome環境における大規模な実験により、E2CL訓練エージェントはベースライン法で訓練されたエージェントよりも優れ、優れた自己補正能力を示すことが示された。 Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.	翻訳日:2024-11-07 23:23:02 公開日:2024-09-29
# オープンワールドダイナミックプロンプトと連続視覚表現学習 Open-World Dynamic Prompt and Continual Visual Representation Learning ( http://arxiv.org/abs/2409.05312v2 ) ライセンス: Link先を確認	Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer,	(参考訳) オープンワールドは本質的に動的であり、絶え間なく進化する概念と分布によって特徴づけられる。この動的なオープンワールド環境における連続学習(CL)は、目に見えないテストタイムクラスに効果的に一般化する上で大きな課題となる。この課題に対処するために,オープンワールドの視覚表現学習に適した,実用的なCL設定を提案する。この設定では、後続のデータストリームは、以前のトレーニングフェーズで見られるクラスとは相容れない新しいクラスを体系的に導入する。そこで本研究では,シンプルなPrompt-based CL (PCL) 法である Dynamic Prompt and Representation Learner (DPaRL) を提案する。我々のDPaRLは、従来のPCL法で静的なプロンプトプールに依存するのとは対照的に、推論のための動的プロンプトを生成することを学ぶ。さらに、DPaRLはトレーニング段階ごとに動的プロンプト生成と識別表現を共同で学習するのに対し、PCL以前の手法はプロセス全体を通してのみプロンプト学習を洗練させる。 Recall@1の性能は平均4.7%向上した。 The open world is inherently dynamic, characterized by ever-evolving concepts and distributions. Continual learning (CL) in this dynamic open-world environment presents a significant challenge in effectively generalizing to unseen test-time classes. To address this challenge, we introduce a new practical CL setting tailored for open-world visual representation learning. In this setting, subsequent data streams systematically introduce novel classes that are disjoint from those seen in previous training phases, while also remaining distinct from the unseen test classes. In response, we present Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method. Our DPaRL learns to generate dynamic prompts for inference, as opposed to relying on a static prompt pool in previous PCL methods. In addition, DPaRL jointly learns dynamic prompt generation and discriminative representation at each training stage whereas prior PCL methods only refine the prompt learning throughout the process. Our experimental results demonstrate the superiority of our approach, surpassing state-of-the-art methods on well-established open-world image retrieval benchmarks by an average of 4.7% improvement in Recall@1 performance.	翻訳日:2024-11-07 22:38:45 公開日:2024-09-29
# 衛星画像時系列に基づく作物分類用SITSMamba SITSMamba for Crop Classification based on Satellite Image Time Series ( http://arxiv.org/abs/2409.09673v2 ) ライセンス: Link先を確認	Xiaolei Qin, Xin Su, Liangpei Zhang,	(参考訳) 衛星画像時系列データ(SITS)は、季節や年を通して植生の変化や成長パターンを追跡することができる。 SITSを作物分類に用いた多くのディープラーニング(DL)アプローチが最近出現し、最新のアプローチではSITS分類にTransformerを採用している。しかし、Transformerにおける自己注意の二次的な複雑さは、時系列の分類に挑戦する。最先端のMambaアーキテクチャは、リモートセンシング画像解釈など様々な領域で強みを示してきたが、SITSデータの時間的表現を学習する能力は未解明のままである。さらに、既存のSITS分類法は、時間情報の完全活用に失敗する監視信号として、作物ラベルにのみ依存することが多い。本稿では,リモートセンシング時系列データに基づく作物分類のための衛星画像時系列マンバ(SITSMamba)手法を提案する。提案したSITSMambaは、畳み込みニューラルネットワーク(CNN)に基づく空間エンコーダと、マンバに基づく時間エンコーダを含む。 SITSからのより豊かな時間情報を活用するために、異なるタスクに使用するデコーダの2つのブランチを設計する。最初のブランチは、作物分類ブランチ(CBranch)で、作物マップに機能をデコードするConvBlockを含んでいる。第2のブランチはSITSレコンストラクションブランチで、線形層を使用してエンコードされた特徴を変換し、元の入力値を予測する。さらに、RBranchに適用された位置重み(PW)を設計し、SITSからリッチ潜在知識の学習を支援する。また、トレーニング中に2つの枝のバランスを制御するために2つの重み付け因子を設計する。 SITSMambaのコードは、https://github.com/XiaoleiQinn/SITSMambaで公開されている。 Satellite image time series (SITS) data provides continuous observations over time, allowing for the tracking of vegetation changes and growth patterns throughout the seasons and years. Numerous deep learning (DL) approaches using SITS for crop classification have emerged recently, with the latest approaches adopting Transformer for SITS classification. However, the quadratic complexity of self-attention in Transformer poses challenges for classifying long time series. While the cutting-edge Mamba architecture has demonstrated strength in various domains, including remote sensing image interpretation, its capacity to learn temporal representations in SITS data remains unexplored. Moreover, the existing SITS classification methods often depend solely on crop labels as supervision signals, which fails to fully exploit the temporal information. In this paper, we proposed a Satellite Image Time Series Mamba (SITSMamba) method for crop classification based on remote sensing time series data. The proposed SITSMamba contains a spatial encoder based on Convolutional Neural Networks (CNN) and a Mamba-based temporal encoder. To exploit richer temporal information from SITS, we design two branches of decoder used for different tasks. The first branch is a crop Classification Branch (CBranch), which includes a ConvBlock to decode the feature to a crop map. The second branch is a SITS Reconstruction Branch that uses a Linear layer to transform the encoded feature to predict the original input values. Furthermore, we design a Positional Weight (PW) applied to the RBranch to help the model learn rich latent knowledge from SITS. We also design two weighting factors to control the balance of the two branches during training. The code of SITSMamba is available at: https://github.com/XiaoleiQinn/SITSMamba.	翻訳日:2024-11-07 20:46:36 公開日:2024-09-29
# Rein to Fine-Tune Vision Foundation Model を用いたクロスオーガンおよびクロススキャン腺癌切除 Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models ( http://arxiv.org/abs/2409.11752v3 ) ライセンス: Link先を確認	Pengzhou Cai, Xueyuan Zhang, Libin Lan, Ze Zhao,	(参考訳) 近年,デジタル病理学の分野において腫瘍の分節化が著しい進展を遂げている。しかし, 臓器, 組織調製法, 画像取得過程の変動は, デジタル病理画像の領域差につながる可能性がある。そこで本論文では,MICCAI 2024とCOSAS2024の様々な視覚基盤モデル(VFM)をパラメトリックかつ効率的に微調整する手法であるReinを用いた。 Reinのコアは学習可能なトークンのセットで構成されており、インスタンスに直接リンクされ、各レイヤのインスタンスレベルの機能が改善されている。 COSAS2024 Challengeのデータ環境において、ラインは良好な結果を得るためにVFMを微調整した。具体的には、Reinを使ってConvNeXtとDINOv2を微調整しました。予備試験では0.7719点,最終試験では0.7557点,最終試験では0.8848点,最終試験では0.8192点を得た。コードはGitHubで入手できる。 In recent years, significant progress has been made in tumor segmentation within the field of digital pathology. However, variations in organs, tissue preparation methods, and image acquisition processes can lead to domain discrepancies among digital pathology images. To address this problem, in this paper, we use Rein, a fine-tuning method, to parametrically and efficiently fine-tune various vision foundation models (VFMs) for MICCAI 2024 Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation (COSAS2024). The core of Rein consists of a set of learnable tokens, which are directly linked to instances, improving functionality at the instance level in each layer. In the data environment of the COSAS2024 Challenge, extensive experiments demonstrate that Rein fine-tuned the VFMs to achieve satisfactory results. Specifically, we used Rein to fine-tune ConvNeXt and DINOv2. Our team used the former to achieve scores of 0.7719 and 0.7557 on the preliminary test phase and final test phase in task1, respectively, while the latter achieved scores of 0.8848 and 0.8192 on the preliminary test phase and final test phase in task2. Code is available at GitHub.	翻訳日:2024-11-07 19:50:48 公開日:2024-09-29
# SnO$_2$薄膜特性のスマートデータ駆動型GRU予測器 Smart Data-Driven GRU Predictor for SnO$_2$ Thin films Characteristics ( http://arxiv.org/abs/2409.11782v2 ) ライセンス: Link先を確認	Faiza Bouamra, Mohamed Sayah, Labib Sadek Terrissa, Noureddine Zerhouni,	(参考訳) 材料物理学では、物性、構造、エレクトロニクス、磁気、光学、誘電体、分光特性に関する材料データを取得するために、キャラクタリゼーション技術が最も重要である。しかし、多くの材料にとって、可用性と安全なアクセシビリティを確保することは必ずしも容易ではなく、完全に保証されているわけではない。さらに、モデリングとシミュレーション技術の使用には、コストのかかる計算時間と大きな複雑さを伴うことに加えて、多くの理論的知識が必要である。したがって、複数のサンプルを同時に分析する異なる手法で材料を分析することは、技術者や研究者にとって非常に困難である。非常に危険であるにもかかわらず、X線回折は結晶性1d, 2d, 3d材料の構造特性からデータを収集する、よく知られ、広く使われているキャラクタリゼーション技術である。本稿では, 酸化スズSnO$_2$(110) 薄膜の構造特性や特性を予測するための Gated Recurrent Unit モデルのためのスマート GRU を提案する。実際、薄膜サンプルは実験的に精巧に管理され、収集されたデータ辞書は、スズ酸化物SnO$_2$(110)構造特性のキャラクタリゼーションのためのAI-人工知能-GRUモデルを生成するために使用される。 In material physics, characterization techniques are foremost crucial for obtaining the materials data regarding the physical properties as well as structural, electronics, magnetic, optic, dielectric, and spectroscopic characteristics. However, for many materials, ensuring availability and safe accessibility is not always easy and fully warranted. Moreover, the use of modeling and simulation techniques need a lot of theoretical knowledge, in addition of being associated to costly computation time and a great complexity deal. Thus, analyzing materials with different techniques for multiple samples simultaneously, still be very challenging for engineers and researchers. It is worth noting that although of being very risky, X-ray diffraction is the well known and widely used characterization technique which gathers data from structural properties of crystalline 1d, 2d or 3d materials. We propose in this paper, a Smart GRU for Gated Recurrent Unit model to forcast structural characteristics or properties of thin films of tin oxide SnO$_2$(110). Indeed, thin films samples are elaborated and managed experimentally and the collected data dictionary is then used to generate an AI -- Artificial Intelligence -- GRU model for the thin films of tin oxide SnO$_2$(110) structural property characterization.	翻訳日:2024-11-07 19:50:48 公開日:2024-09-29
# 多段階因子モデルに適合する Fitting Multilevel Factor Models ( http://arxiv.org/abs/2409.12067v2 ) ライセンス: Link先を確認	Tetiana Parshakova, Trevor Hastie, Stephen Boyd,	(参考訳) マルチレベル低ランク行列~\cite{parshakova2023factor} で与えられる共分散を持つ多レベル因子モデルの特別な場合について検討する。我々は,観測データの可能性を最大化するために,多レベル因子モデルに適した予測最大化(EM)アルゴリズムを高速に実装する。この方法は任意の階層構造を許容し、反復ごとに線形時間と保存の複雑さを維持する。これは、正定値MLR行列の逆数を計算するための新しい効率的な手法によって達成される。逆 PSD MLR 行列の逆行列は因子の間隔が同じである MLR 行列でもあることを示し、逆行列の因子を得るために、再帰的シャーマン・モリソン・ウードベリー行列恒等式を用いる。さらに、線形時間と空間の複素量を持つ拡張行列のコレスキー分解を計算し、共分散行列をシュア補数とするアルゴリズムを提案する。本稿では,提案手法を実装したオープンソースパッケージを添付する。 We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~\cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization (EM) algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of an invertible PSD MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.	翻訳日:2024-11-07 19:26:16 公開日:2024-09-29
# 物理インフォームドニューラルネットワークのための高次ReLU-KAN(HRKAN) ネットワーク(PINN)はより正確で、堅牢で、より速く Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster ( http://arxiv.org/abs/2409.14248v1 ) ライセンス: Link先を確認	Chi Chiu So, Siu Pang Yung	(参考訳) 偏微分方程式(PDE)の解を見つけることは、多くの科学的・工学的な発見において重要な要素である。ディープラーニングによって強化される一般的なアプローチの1つは、物理情報ニューラルネットワーク(PINN)である。近年,MLP(Multilayer Perceptions)の代わりに,トレーニング可能なアクティベーション機能を持つニューラルネットワークモデルKAN(Kolmogorov-Arnold Networks)が提案されている。適合精度を高めるため, アクティベーション関数の基盤として「ReLU二乗」を用いた「ReLU-KAN」と呼ばれるカンの修正が提案されている。本研究では, 活性化関数である高次ReLU(HR)を, カンで使用される活性化関数であるBsplinesよりも単純で, 効率的なカン行列演算が可能であり, 物理インフォームドニューラルネットワークに必須なスムーズで非ゼロな高次微分を持つ, 活性化関数の別の基底として提案する。我々は、高次ReLU(HR)をアクティベーションとしてHRKAN(HRKAN)と呼ぶ。線形ポアソン方程式と非線形バーガース方程式の粘度に関する2つの有名なPDEに関する詳細な実験により,提案した高次ReLU-KAN (Higher-order-ReLU-KANs) がKans,ReLU-KANs,HRKANsの間で高い適合精度とトレーニングロバスト性,および最低トレーニング時間を達成することを明らかにした。実験を再現するコードはhttps://github.com/kelvinhkcs/HRKAN.comで公開されている。 Finding solutions to partial differential equations (PDEs) is an important and essential component in many scientific and engineering discoveries. One of the common approaches empowered by deep learning is Physics-informed Neural Networks (PINNs). Recently, a new type of fundamental neural network model, Kolmogorov-Arnold Networks (KANs), has been proposed as a substitute of Multilayer Perceptions (MLPs), and possesses trainable activation functions. To enhance KANs in fitting accuracy, a modification of KANs, so called ReLU-KANs, using "square of ReLU" as the basis of its activation functions, has been suggested. In this work, we propose another basis of activation functions, namely, Higherorder-ReLU (HR), which is simpler than the basis of activation functions used in KANs, namely, Bsplines; allows efficient KAN matrix operations; and possesses smooth and non-zero higher-order derivatives, essential to physicsinformed neural networks. We name such KANs with Higher-order-ReLU (HR) as their activations, HRKANs. Our detailed experiments on two famous and representative PDEs, namely, the linear Poisson equation and nonlinear Burgers' equation with viscosity, reveal that our proposed Higher-order-ReLU-KANs (HRKANs) achieve the highest fitting accuracy and training robustness and lowest training time significantly among KANs, ReLU-KANs and HRKANs. The codes to replicate our experiments are available at https://github.com/kelvinhkcs/HRKAN.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-29
# 物理学インフォームドニューラルネットワーク(PINN)をより正確に、堅牢かつ高速に解くための高次ReLU-KAN(HRKAN) Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster ( http://arxiv.org/abs/2409.14248v2 ) ライセンス: Link先を確認	Chi Chiu So, Siu Pang Yung,	(参考訳) 偏微分方程式(PDE)の解を見つけることは、多くの科学的・工学的な発見において重要な要素である。ディープラーニングによって強化される一般的なアプローチの1つは、物理情報ニューラルネットワーク(PINN)である。近年,MLP(Multilayer Perceptions)の代わりに,トレーニング可能なアクティベーション機能を持つニューラルネットワークモデルKAN(Kolmogorov-Arnold Networks)が提案されている。適合精度を高めるため, アクティベーション関数の基盤として「ReLU二乗」を用いた「ReLU-KAN」と呼ばれるカンの修正が提案されている。本研究では, 活性化関数である高次ReLU(HR)を, カンで使用される活性化関数であるBsplinesよりも単純で, 効率的なカン行列演算が可能であり, 物理インフォームドニューラルネットワークに必須なスムーズで非ゼロな高次微分を持つ, 活性化関数の別の基底として提案する。我々は、高次ReLU(HR)をアクティベーションとしてHRKAN(HRKAN)と呼ぶ。線形ポアソン方程式と非線形バーガース方程式の粘度に関する2つの有名なPDEに関する詳細な実験により,提案した高次ReLU-KAN (Higher-order-ReLU-KANs) がKans,ReLU-KANs,HRKANsの間で高い適合精度とトレーニングロバスト性,および最低トレーニング時間を達成することを明らかにした。実験を再現するコードはhttps://github.com/kelvinhkcs/HRKAN.comで公開されている。 Finding solutions to partial differential equations (PDEs) is an important and essential component in many scientific and engineering discoveries. One of the common approaches empowered by deep learning is Physics-informed Neural Networks (PINNs). Recently, a new type of fundamental neural network model, Kolmogorov-Arnold Networks (KANs), has been proposed as a substitute of Multilayer Perceptions (MLPs), and possesses trainable activation functions. To enhance KANs in fitting accuracy, a modification of KANs, so called ReLU-KANs, using "square of ReLU" as the basis of its activation functions, has been suggested. In this work, we propose another basis of activation functions, namely, Higherorder-ReLU (HR), which is simpler than the basis of activation functions used in KANs, namely, Bsplines; allows efficient KAN matrix operations; and possesses smooth and non-zero higher-order derivatives, essential to physicsinformed neural networks. We name such KANs with Higher-order-ReLU (HR) as their activations, HRKANs. Our detailed experiments on two famous and representative PDEs, namely, the linear Poisson equation and nonlinear Burgers' equation with viscosity, reveal that our proposed Higher-order-ReLU-KANs (HRKANs) achieve the highest fitting accuracy and training robustness and lowest training time significantly among KANs, ReLU-KANs and HRKANs. The codes to replicate our experiments are available at https://github.com/kelvinhkcs/HRKAN.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-29
# 物理学インフォームドニューラルネットワーク(PINN)をより正確に、堅牢かつ高速に解くための高次ReLU-KAN(HRKAN) Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster ( http://arxiv.org/abs/2409.14248v3 ) ライセンス: Link先を確認	Chi Chiu So, Siu Pang Yung,	(参考訳) 偏微分方程式(PDE)の解を見つけることは、多くの科学的・工学的な発見において重要な要素である。ディープラーニングによって強化される一般的なアプローチの1つは、物理情報ニューラルネットワーク(PINN)である。近年,MLP(Multilayer Perceptions)の代わりに,トレーニング可能なアクティベーション機能を持つニューラルネットワークモデルKAN(Kolmogorov-Arnold Networks)が提案されている。適合精度を高めるため, アクティベーション関数の基盤として「ReLU二乗」を用いた「ReLU-KAN」と呼ばれるカンの修正が提案されている。本研究では, 活性化関数である高次ReLU(HR)を, カンで使用される活性化関数であるBsplinesよりも単純で, 効率的なカン行列演算が可能であり, 物理インフォームドニューラルネットワークに必須なスムーズで非ゼロな高次微分を持つ, 活性化関数の別の基底として提案する。我々は、高次ReLU(HR)をアクティベーションとしてHRKAN(HRKAN)と呼ぶ。線形ポアソン方程式と非線形バーガース方程式の粘度に関する2つの有名なPDEに関する詳細な実験により,提案した高次ReLU-KAN (Higher-order-ReLU-KANs) がKans,ReLU-KANs,HRKANsの間で高い適合精度とトレーニングロバスト性,および最低トレーニング時間を達成することを明らかにした。実験を再現するコードはhttps://github.com/kelvinhkcs/HRKAN.comで公開されている。 Finding solutions to partial differential equations (PDEs) is an important and essential component in many scientific and engineering discoveries. One of the common approaches empowered by deep learning is Physics-informed Neural Networks (PINNs). Recently, a new type of fundamental neural network model, Kolmogorov-Arnold Networks (KANs), has been proposed as a substitute of Multilayer Perceptions (MLPs), and possesses trainable activation functions. To enhance KANs in fitting accuracy, a modification of KANs, so called ReLU-KANs, using "square of ReLU" as the basis of its activation functions, has been suggested. In this work, we propose another basis of activation functions, namely, Higherorder-ReLU (HR), which is simpler than the basis of activation functions used in KANs, namely, Bsplines; allows efficient KAN matrix operations; and possesses smooth and non-zero higher-order derivatives, essential to physicsinformed neural networks. We name such KANs with Higher-order-ReLU (HR) as their activations, HRKANs. Our detailed experiments on two famous and representative PDEs, namely, the linear Poisson equation and nonlinear Burgers' equation with viscosity, reveal that our proposed Higher-order-ReLU-KANs (HRKANs) achieve the highest fitting accuracy and training robustness and lowest training time significantly among KANs, ReLU-KANs and HRKANs. The codes to replicate our experiments are available at https://github.com/kelvinhkcs/HRKAN.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-29
# 開量子系におけるマルコフ-非マルコフ遷移のスペクトル信号 Spectral Signatures of the Markovian to Non-Markovian Transition in Open Quantum Systems ( http://arxiv.org/abs/2409.14661v1 ) ライセンス: Link先を確認	Zeng-Zhao Li, Cho-Tung Yip, Bo Li,	(参考訳) 本稿では, 線形吸収スペクトルの解析を通じて, 振動浴に強く結合した量子集合体におけるマルコフ的-非マルコフ的遷移を研究するための新しいアプローチを提案する。周波数領域における階層的代数方程式を用いて、これらのスペクトルが、散逸の複雑な相互作用、集合バスカップリング、およびアグリゲート内双極子-双極子相互作用によって、マルコフ型と非マルコフ型の間の遷移を効果的に明らかにする方法を解明する。以上の結果から, 消散効果の低下はスペクトルピークの分裂を引き起こすことが示され, ジポール-ジポール相互作用の強化によってさらに増幅される, 入浴による非マルコフ効果の出現が示唆された。さらに、集合バス結合強度が増大すると、最初は対称または非対称のピークとスペクトル振幅の異なるピークは弱い双極子-双極子相互作用の下で結合するが、強い双極子-双極子相互作用はピーク分裂を引き起こす。これらの現象はマルコフ的行動から非マルコフ的行動への移行の代替指標となる。さらに、スペクトルの特徴は、集合の幾何学的構造を識別するための感度の高いプローブとして機能する。この研究はマルコフから非マルコフ遷移への理解を深めるだけでなく、量子システムを最適化し制御するための堅牢な枠組みも提供する。 We present a new approach for investigating the Markovian to non-Markovian transition in quantum aggregates strongly coupled to a vibrational bath through the analysis of linear absorption spectra. Utilizing hierarchical algebraic equations in the frequency domain, we elucidate how these spectra can effectively reveal transitions between Markovian and non-Markovian regimes, driven by the complex interplay of dissipation, aggregate-bath coupling, and intra-aggregate dipole-dipole interactions. Our results demonstrate that reduced dissipation induces spectral peak splitting, signaling the emergence of bath-induced non-Markovian effects, which are further amplified by enhanced dipole-dipole interactions. Additionally, with an increase in aggregate-bath coupling strength, initially symmetric or asymmetric peaks with varying spectral amplitudes may merge under weak dipole-dipole interactions, whereas strong dipole-dipole interactions are more likely to cause peak splitting. These phenomena serve as alternative indicators of the shift from Markovian to non-Markovian behavior. Moreover, the spectral features can act as sensitive probes for distinguishing geometric structures of the aggregates. This study not only deepens our understanding of the Markovian to non-Markovian transition but also provides a robust framework for optimizing and controlling quantum systems.	翻訳日:2024-11-06 21:34:58 公開日:2024-09-29
# 開量子系におけるマルコフ変換と非マルコフ遷移のスペクトルシグネチャ Spectral signatures of the Markovian to Non-Markovian transition in open quantum systems ( http://arxiv.org/abs/2409.14661v2 ) ライセンス: Link先を確認	Zeng-Zhao Li, Cho-Tung Yip, Bo Li,	(参考訳) 本稿では, 線形吸収スペクトルの解析を通じて, 振動浴に強く結合した量子集合体におけるマルコフ的-非マルコフ的遷移を研究するための新しいアプローチを提案する。周波数領域における階層的代数方程式を用いて、これらのスペクトルが、散逸の複雑な相互作用、集合バスカップリング、およびアグリゲート内双極子-双極子相互作用によって、マルコフ型と非マルコフ型の間の遷移を効果的に明らかにする方法を解明する。以上の結果から,消散量の減少はスペクトルピークの分裂を誘導し,入浴による非マルコフ効果の出現を示唆することが明らかとなった。スペクトルピークスプリッティングは双極子-双極子相互作用の強化によっても駆動できるが、基礎となるメカニズムは散逸誘起スプリッティングと異なる。さらに、集合バス結合強度が増大すると、最初は対称または非対称のピークとスペクトル振幅の異なるピークは弱い双極子-双極子相互作用の下で結合するが、強い双極子-双極子相互作用はピーク分裂を引き起こす。さらに、スペクトル特徴は、凝集体の幾何学的構造を識別するための高感度な指標として機能し、非マルコフ的挙動を形成する上での幾何学的役割も明らかにしている。この研究はマルコフから非マルコフ遷移への理解を深めるだけでなく、量子システムを最適化し制御するための堅牢な枠組みも提供する。 We present a new approach for investigating the Markovian to non-Markovian transition in quantum aggregates strongly coupled to a vibrational bath through the analysis of linear absorption spectra. Utilizing hierarchical algebraic equations in the frequency domain, we elucidate how these spectra can effectively reveal transitions between Markovian and non-Markovian regimes, driven by the complex interplay of dissipation, aggregate-bath coupling, and intra-aggregate dipole-dipole interactions. Our results demonstrate that reduced dissipation induces spectral peak splitting, signaling the emergence of bath-induced non-Markovian effects. The spectral peak splitting can also be driven by enhanced dipole-dipole interactions, although the underlying mechanism differs from that of dissipation-induced splitting. Additionally, with an increase in aggregate-bath coupling strength, initially symmetric or asymmetric peaks with varying spectral amplitudes may merge under weak dipole-dipole interactions, whereas strong dipole-dipole interactions are more likely to cause peak splitting. Moreover, we find that spectral features serve as highly sensitive indicators for distinguishing the geometric structures of aggregates, while also unveiling the critical role geometry plays in shaping non-Markovian behavior. This study not only deepens our understanding of the Markovian to non-Markovian transition but also provides a robust framework for optimizing and controlling quantum systems.	翻訳日:2024-11-06 21:34:58 公開日:2024-09-29
# ロケーションが鍵:Verilogの関数バグローカライゼーションのための大規模言語モデルを活用する Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog ( http://arxiv.org/abs/2409.15186v2 ) ライセンス: Link先を確認	Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, Nan Guan,	(参考訳) Verilogコードのバグローカライゼーションは,ハードウェア設計の検証において重要かつ時間を要する課題である。導入以来、LLM(Large Language Models)はその強力なプログラミング能力を示している。しかしながら、VerilogコードのバグローカライゼーションにLLMを使うことを検討する作業はまだない。本稿では,Verilogスニペット内の機能的エラーを検出するオープンソースLLMソリューションであるLocation-is-Keyを提案する。 LiKは高いローカライゼーション精度を達成し、我々のテストデータセットでは、RTLLMに基づいて93.3%のパス@1ローカライゼーション精度を達成し、GPT-4の77.9%を超え、Claude-3.5の90.8%に匹敵する。さらに、LiK が取得したバグ位置は GPT-3.5 のバグ修正効率を大幅に改善し(Functional pass@1 は 40.39% から 58.92% に増加した)、LLM ベースの Verilog デバッグにおけるバグローカライゼーションの重要性を強調した。既存のメソッドと比較して、LiKはテストベンチやアサーション、その他のEDAツールを必要とせずに、設計仕様と誤ったコードスニペットのみを必要とする。本研究は,Verilog エラーローカライゼーションに LLM を用いることが可能であることを示す。 Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming capabilities. However, no work has yet considered using LLMs for bug localization in Verilog code. This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. LiK achieves high localization accuracy, with a pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassing GPT-4's 77.9% and comparable to Claude-3.5's 90.8%. Additionally, the bug location obtained by LiK significantly improves GPT-3.5's bug repair efficiency (Functional pass@1 increased from 40.39% to 58.92%), highlighting the importance of bug localization in LLM-based Verilog debugging. Compared to existing methods, LiK only requires the design specification and the erroneous code snippet, without the need for testbenches, assertions, or any other EDA tools. This research demonstrates the feasibility of using LLMs for Verilog error localization, thus providing a new direction for automatic Verilog code debugging.	翻訳日:2024-11-06 20:27:58 公開日:2024-09-29
# スペインの低リソース言語に対する多言語移動とドメイン適応 Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain ( http://arxiv.org/abs/2409.15924v2 ) ライセンス: Link先を確認	Yuanchang Luo, Zhanglin Wu, Daimeng Wei, Hengchao Shang, Zongyao Li, Jiaxin Guo, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Yuhao Xie, Jiawei Zheng Bin Wei, Hao Yang,	(参考訳) 本稿では,Huawei Translation Service Center (HW-TSC) による,スペインにおける低リソース言語への翻訳の提出状況について紹介する。我々は,スペイン語からアラゴネーズ (es-arg) ,スペイン語からアラン語 (es-arn) ,スペイン語からアストゥリアン語 (es-ast) の3つの翻訳作業に参加した。これら3つの翻訳タスクでは、多言語翻訳、正規化ドロップアウト、前方翻訳、前方翻訳、音声認識、トランスダクション・アンサンブル学習などの学習戦略を、深層トランスフォーマー・ビッグアーキテクチャのトレーニングに基づくニューラル・マシン・トランスフォーメーション(NMT)モデルに適用する。これらの強化戦略を用いることで,最終評価において競争的な結果が得られた。 This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). We participated in three translation tasks: spanish to aragonese (es-arg), spanish to aranese (es-arn), and spanish to asturian (es-ast). For these three translation tasks, we use training strategies such as multilingual transfer, regularized dropout, forward translation and back translation, labse denoising, transduction ensemble learning and other strategies to neural machine translation (NMT) model based on training deep transformer-big architecture. By using these enhancement strategies, our submission achieved a competitive result in the final evaluation.	翻訳日:2024-11-06 19:21:13 公開日:2024-09-29
# 談話レベル文学翻訳のための文脈認識とスタイル関連インクリメンタルデコーディングフレームワーク Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation ( http://arxiv.org/abs/2409.16539v2 ) ライセンス: Link先を確認	Yuanchang Luo, Jiaxin Guo, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Hao Yang,	(参考訳) 本稿では,WMT24 Discourse-Level Literary Translation Taskに対する我々のアプローチについて概説する。文学テキストの翻訳は、これらの作品に固有のニュアンスな意味、慣用的な表現、複雑な物語構造が原因で、大きな課題となっている。これらの課題に対処するために,我々はCPT(Continuous Pre-Training)とSFT(Supervised Fine-Tuning)を組み合わせることで,このタスクを特に強化した中国語-Llama2モデルを利用した。提案手法は,テキスト全体の一貫性と一貫性を維持しつつ,各文がより広い文脈で翻訳されることを保証する新しいインクリメンタル・デコーディング・フレームワークを含む。このアプローチにより、モデルは長距離の依存関係とスタイル的要素をキャプチャし、元の文学的品質を忠実に保存する翻訳を生成することができる。本実験は,文章レベルのBLEUスコアと文書レベルのBLEUスコアの両方において,文書レベルの文学翻訳の複雑さに対処する上で,提案手法の有効性を実証するものである。 This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT). Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text. This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality. Our experiments demonstrate significant improvements in both sentence-level and document-level BLEU scores, underscoring the effectiveness of our proposed framework in addressing the complexities of document-level literary translation.	翻訳日:2024-11-06 17:30:16 公開日:2024-09-29
# SynTQA: Text-to-SQLとE2E TQAの混合によるSynergistic Table-based Question Answering SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA ( http://arxiv.org/abs/2409.16682v2 ) ライセンス: Link先を確認	Siyue Zhang, Anh Tuan Luu, Chen Zhao,	(参考訳) Text-to-SQL解析とエンドツーエンド質問応答(E2E TQA)は、テーブルベースの質問回答タスクの2つの主要なアプローチである。複数のベンチマークで成功したが、まだ比較されておらず、相乗効果は未解明のままである。テキスト・トゥ・SQLは、算術演算や長いテーブルを含む問題を扱う上での優位性を示し、E2E TQAは曖昧な問題、非標準テーブルスキーマ、複雑なテーブル内容に対処する上で優れている。両長所を組み合わせるために,任意のモデルタイプに非依存な回答選択を通じて,異なるモデルを統合するSynergistic Tableベースの質問応答手法を提案する。さらに,機能ベースまたはLCMベースの回答セレクタによるアンサンブルモデルにより,個々のモデルよりも性能が大幅に向上することが検証された。 Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models.	翻訳日:2024-11-06 17:20:02 公開日:2024-09-29
# 時間感性質問応答の時間感性向上と推論 Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering ( http://arxiv.org/abs/2409.16909v2 ) ライセンス: Link先を確認	Wanqi Yang, Yanda Li, Meng Fang, Ling Chen,	(参考訳) Time-Sensitive Question Answering (TSQA)は、時間に敏感な質問に対処するために、複数の時間的事実を含む特定の時間的文脈を効果的に活用することを要求する。このことは、質問の中の時間情報のパーシングだけでなく、正確な答えを生成するために、時間進化する事実の識別と理解も必要である。しかし、現在の大規模言語モデルは、時間的情報に対する感度が限られており、その時間的推論能力が不十分である。本稿では,時間的認知と推論を時間的情報認識の埋め込みとグラニュラコントラスト強化学習を通じて促進する新しい枠組みを提案する。 4つのTSQAデータセットによる実験結果から、我々のフレームワークは、TSQAタスクにおける既存のLLMよりも大幅に優れており、マシンと人間の時間的理解と推論のパフォーマンスギャップを埋める上での一歩であることが示された。 Time-Sensitive Question Answering (TSQA) demands the effective utilization of specific temporal contexts, encompassing multiple time-evolving facts, to address time-sensitive questions. This necessitates not only the parsing of temporal information within questions but also the identification and understanding of time-evolving facts to generate accurate answers. However, current large language models still have limited sensitivity to temporal information and their inadequate temporal reasoning capabilities. In this paper, we propose a novel framework that enhances temporal awareness and reasoning through Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning. Experimental results on four TSQA datasets demonstrate that our framework significantly outperforms existing LLMs in TSQA tasks, marking a step forward in bridging the performance gap between machine and human temporal understanding and reasoning.	翻訳日:2024-11-06 17:10:14 公開日:2024-09-29
# 次世代予測のためのトレーニング変圧器の非漸近収束 Non-asymptotic Convergence of Training Transformers for Next-token Prediction ( http://arxiv.org/abs/2409.17335v2 ) ライセンス: Link先を確認	Ruiquan Huang, Yingbin Liang, Jing Yang,	(参考訳) トランスフォーマーは、特にNTP(Next-token Prediction)タスクにおいて、シーケンシャルなデータを処理する優れた能力のために、現代の機械学習において驚くべき成功を収めている。しかしながら、NTPにおけるそれらの性能に関する理論的理解は限られており、既存の研究は主に漸近性パフォーマンスに焦点を当てている。本稿では, 自己保持モジュールとフィードフォワード層からなる一層変圧器のトレーニングダイナミクスを, 微細な非漸近解析により解析する。まず,部分順序に基づく数学的枠組みを用いて,NTPのトレーニングデータセットの基本的な構造特性を特徴付ける。そこで,2段階学習アルゴリズムを設計し,フィードフォワード層をトレーニングする前処理ステージと,注目層をトレーニングする主処理ステージが高速収束性能を示す。具体的には、両方の層は対応する最大辺解の方向と直交収束する。また,クロスエントロピー損失は線形収束速度がよいことを示す。さらに、トレーニングされた変換器は、データセットシフトによる非自明な予測能力を示し、変換器の顕著な一般化性能に光を当てる。本手法は,注意勾配の新規な特性の発達と,これらの特性が学習過程の収束にどのように寄与するかを詳細に分析することを含む。我々の実験は理論的な結果をさらに検証する。 Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data, especially in next-token prediction (NTP) tasks. However, the theoretical understanding of their performance in NTP is limited, with existing studies focusing mainly on asymptotic performance. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer consisting of a self-attention module followed by a feed-forward layer. We first characterize the essential structural properties of training datasets for NTP using a mathematical framework based on partial orders. Then, we design a two-stage training algorithm, where the pre-processing stage for training the feed-forward layer and the main stage for training the attention layer exhibit fast convergence performance. Specifically, both layers converge sub-linearly to the direction of their corresponding max-margin solutions. We also show that the cross-entropy loss enjoys a linear convergence rate. Furthermore, we show that the trained transformer presents non-trivial prediction ability with dataset shift, which sheds light on the remarkable generalization performance of transformers. Our analysis technique involves the development of novel properties on the attention gradient and further in-depth analysis of how these properties contribute to the convergence of the training process. Our experiments further validate our theoretical findings.	翻訳日:2024-11-06 16:30:51 公開日:2024-09-29
# BEATS: BackVerifyとAdaptive Disambiguateに基づく効率的な木探索によるLLM数学能力の最適化 BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search ( http://arxiv.org/abs/2409.17972v2 ) ライセンス: Link先を確認	Linzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Conghui He, Zenan Zhou, Wentao Zhang,	(参考訳) 大規模言語モデル(LLM)は、幅広いタスクやドメインで例外的なパフォーマンスを示している。しかし、数学の厳密で論理的な性質のため、数学の問題を解くのに依然として苦労している。従来の研究では、教師付き微調整(SFT)、プロンプトエンジニアリング、LLMの数学的問題解決能力を改善するための探索に基づく手法が用いられてきた。これらの努力にもかかわらず、それらの性能は相変わらず最適であり、かなりの計算資源を必要としている。この問題に対処するために,数学的問題解決能力を高める新しい手法BEATSを提案する。提案手法では, モデルが反復的に書き直し, 一歩前進し, 前のステップに基づいて回答を生成するよう, 新たに設計されたプロンプトを利用する。さらに, LLMを用いた新たなバック検証手法を導入し, 結果の正当性を検証した。さらに, 探索時間を最適化し, 高い性能を実現するために, 伐採木探索を用いる。特に,本手法はQwen2-7b-Instructのスコアを36.94から61.52に改善し,GPT4の42.5をMATHベンチマークで上回った。 Large Language Models (LLMs) have exhibited exceptional performance across a broad range of tasks and domains. However, they still encounter difficulties in solving mathematical problems due to the rigorous and logical nature of mathematics. Previous studies have employed techniques such as supervised fine-tuning (SFT), prompt engineering, and search-based methods to improve the mathematical problem-solving abilities of LLMs. Despite these efforts, their performance remains suboptimal and demands substantial computational resources. To address this issue, we propose a novel approach, BEATS, to enhance mathematical problem-solving abilities. Our method leverages newly designed prompts that guide the model to iteratively rewrite, advance by one step, and generate answers based on previous steps. Additionally, we introduce a new back-verification technique that uses LLMs to validate the correctness of the generated answers. Furthermore, we employ a pruning tree search to optimize search time while achieving strong performance. Notably, our method improves Qwen2-7b-Instruct's score from 36.94 to 61.52, outperforming GPT4's 42.5 on the MATH benchmark.	翻訳日:2024-11-06 16:00:56 公開日:2024-09-29
# Dicke Superradiance の厳密解 Exact solution for Dicke superradiance ( http://arxiv.org/abs/2409.19040v1 ) ライセンス: Link先を確認	Raphael Holzinger, Claudiu Genes,	(参考訳) 我々は、Dicke Superradiance問題(英語版)という同一の$N$2レベル量子エミッタの初期の逆アンサンブルの時間進化の正確な解析解を提供する。この系は対称崩壊作用素で集団崩壊するので、密度作用素の時間的進化を、$N+1$ Dicke状態によって広がる完全対称部分空間に還元する。この集合基底において、系は全励起状態からゼロ励起基底状態へ、ディック状態の完全混合として緩和する。ここで導かれる解は、N$の任意の値と進化中の任意の時間$t$に対して正確である。 We provide an exact analytical solution for the time evolution of an initially inverted ensemble of identical $N$ two-level quantum emitters, i.e. to the Dicke superradiance problem. It is asumed that the system undergoes collective decay with a symmetric collapse operator, thus reducing the time evolution of the density operator to the fully symmetric subspace spanned by $N+1$ Dicke states. In this collective basis, the system relaxes from the fully excited state to the zero excitation ground state as a full mixture of Dicke states. The solution derived here is exact for any value of $N$ and any time time $t$ during the evolution.	翻訳日:2024-11-06 04:40:55 公開日:2024-09-29
# 高純度ダイヤモンドにおける5 %$^{13}$C核スピン超分極を室温・低磁場で達成する Achieving 5 % $^{13}$C nuclear spin hyperpolarization in high-purity diamond at room temperature and low field ( http://arxiv.org/abs/2409.19489v1 ) ライセンス: Link先を確認	Vladimir V. Kavtanyuk, Changjae Lee, Keunhong Jeong, Jeong Hyun Shim,	(参考訳) ダイヤモンド中の光学偏光性窒素空孔(NV)中心は、低磁場と室温での$^{13}$C核スピンの過分極を可能にする。しかし、従来の動的核分極に匹敵する高レベルの分極を達成することは依然として困難である。ここでは,10mT以下では,7<times 10^6$以上の拡張比に相当する5%の$^{13}$C分極が得られることを示した。初期窒素濃度が低い高純度ダイヤモンド($1 ppm)を用いた結果,保存時間が100分を超える結果となった。磁場を[100]に沿って配向させることで、偏光移動に関与するNVスピンの数は4倍になる。このフィールド指向性のために、電界強度とマイクロ波(MW)周波数スイートパラメータの総合的な最適化を行った。最適MWスイープ幅は、主に核スピン拡散による固体効果を通じてバルク$^{13}$Cスピンに偏光転移が起こることを示唆している。 Optically polarizable nitrogen-vacancy (NV) center in diamond enables the hyperpolarization of $^{13}$C nuclear spins at low magnetic field and room temperature. However, achieving a high level of polarization comparable to conventional dynamic nuclear polarization has remained challenging. Here we demonstrate that, at below 10 mT, a $^{13}$C polarization of 5 % can be obtained, equivalent to an enhancement ratio over $7 \times 10^6$. We used high-purity diamond with a low initial nitrogen concentration ($<$ 1 ppm), which also results in a long storage time exceeding 100 minutes. By aligning the magnetic field along [100], the number of NV spins participating in polarization transfer increases fourfold. We conducted a comprehensive optimization of field intensity and microwave (MW) frequency-sweep parameters for this field orientation. The optimum MW sweep width suggests that polarization transfer occurs primarily to bulk $^{13}$C spins through the integrated solid effect followed by nuclear spin diffusion.	翻訳日:2024-11-05 22:57:44 公開日:2024-09-29
# KineDepth:オンラインメトリクス深さ推定にロボットキネマティクスを活用する KineDepth: Utilizing Robot Kinematics for Online Metric Depth Estimation ( http://arxiv.org/abs/2409.19490v1 ) ライセンス: Link先を確認	Soofiyan Atar, Yuheng Zhi, Florian Richter, Michael Yip,	(参考訳) 深度知覚はロボットの環境の空間的および幾何学的理解に不可欠であり、多くのタスクは伝統的にRGB-Dやステレオカメラのようなハードウェアベースの深度センサーに依存している。しかし、これらのセンサーは、透明で反射的な物体の問題、高いコスト、キャリブレーションの複雑さ、空間的およびエネルギー的制約、複合システムにおける故障率の増加など、実用的な制限に直面している。単分子深度推定法はコスト効率が高く、より単純な代替手段を提供するが、ロボット工学におけるそれらの採用は、計量深度よりも相対的な出力によって制限されている。本稿では,1台のキャリブレーションカメラを用いて,ロボットが「測定スティック」として動作し,タスクの実行時に相対深度推定をリアルタイムに計量深度に変換する手法を提案する。提案手法はLSTMをベースとしたメートル法深度回帰器を用いて,特にロボットの動きに近縁な領域において,単眼深度マップ上でのメートル法深度を正確に復元する。実際のロボットを用いた実験により,本手法は現状のモノクロ距離推定手法よりも優れており,22.1%の深さ誤差の低減,52%のダウンストリームタスクの成功率向上を実現している。 Depth perception is essential for a robot's spatial and geometric understanding of its environment, with many tasks traditionally relying on hardware-based depth sensors like RGB-D or stereo cameras. However, these sensors face practical limitations, including issues with transparent and reflective objects, high costs, calibration complexity, spatial and energy constraints, and increased failure rates in compound systems. While monocular depth estimation methods offer a cost-effective and simpler alternative, their adoption in robotics is limited due to their output of relative rather than metric depth, which is crucial for robotics applications. In this paper, we propose a method that utilizes a single calibrated camera, enabling the robot to act as a ``measuring stick" to convert relative depth estimates into metric depth in real-time as tasks are performed. Our approach employs an LSTM-based metric depth regressor, trained online and refined through probabilistic filtering, to accurately restore the metric depth across the monocular depth map, particularly in areas proximal to the robot's motion. Experiments with real robots demonstrate that our method significantly outperforms current state-of-the-art monocular metric depth estimation techniques, achieving a 22.1% reduction in depth error and a 52% increase in success rate for a downstream task.	翻訳日:2024-11-05 22:57:44 公開日:2024-09-29
# MedHalu:大規模言語モデルによるヘルスケアクエリに対する幻覚 MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models ( http://arxiv.org/abs/2409.19492v1 ) ライセンス: Link先を確認	Vibhor Agarwal, Yiqiao Jin, Mohit Chandra, Munmun De Choudhury, Srijan Kumar, Nishanth Sastry,	(参考訳) 言語理解と生成における大きな言語モデル(LLM)の顕著な能力は、幻覚に免疫を与えていない。 LLMは、もっともらしい音を出すことができるが、実際には誤りまたは偽造情報を生成することができる。 LLMを利用したチャットボットが普及するにつれて、日常の人々は健康関連クエリーをよく尋ね、これらのLSM幻覚のリスクを負い、様々な社会的・医療的影響をもたらす。本研究は, LLMによる患者からのリアルタイム医療クエリに対する幻覚反応の先駆的な研究である。 MedHaluは、健康に関するさまざまなトピックと、ラベル付き幻覚タイプと幻覚テキストスパンを持つLSMからの対応する幻覚応答を備えた、慎重に構築された医療幻覚データセットである。 MedHaluDetect フレームワークを導入し,幻覚検出における様々な LLM の機能を評価する。我々はまた、医療専門家、LLM、および一般人という3つの評価者のグループを雇い、これらの医療幻覚に対してより脆弱な人物を研究する。 LLMは専門家よりもはるかに悪いことが分かりました。また、平民に劣らず、幻覚を検知するケースも少なくない。このギャップを埋めるために、専門家の推論を注入してLLMによる幻覚検出を改善するためのエキスパート・イン・ザ・ループアプローチを提案する。 GPT-4の平均マクロF1改善率は6.3ポイントである。 The remarkable capabilities of large language models (LLMs) in language understanding and generation have not rendered them immune to hallucinations. LLMs can still generate plausible-sounding but factually incorrect or fabricated information. As LLM-empowered chatbots become popular, laypeople may frequently ask health-related queries and risk falling victim to these LLM hallucinations, resulting in various societal and healthcare implications. In this work, we conduct a pioneering study of hallucinations in LLM-generated responses to real-world healthcare queries from patients. We propose MedHalu, a carefully crafted first-of-its-kind medical hallucination dataset with a diverse range of health-related topics and the corresponding hallucinated responses from LLMs with labeled hallucination types and hallucinated text spans. We also introduce MedHaluDetect framework to evaluate capabilities of various LLMs in detecting hallucinations. We also employ three groups of evaluators -- medical experts, LLMs, and laypeople -- to study who are more vulnerable to these medical hallucinations. We find that LLMs are much worse than the experts. They also perform no better than laypeople and even worse in few cases in detecting hallucinations. To fill this gap, we propose expert-in-the-loop approach to improve hallucination detection through LLMs by infusing expert reasoning. We observe significant performance gains for all the LLMs with an average macro-F1 improvement of 6.3 percentage points for GPT-4.	翻訳日:2024-11-05 22:57:44 公開日:2024-09-29
# OptiGrasp:暖房用ピッキングロボットのためのRGB画像を用いた最適グラフポス検出 OptiGrasp: Optimized Grasp Pose Detection Using RGB Images for Warehouse Picking Robots ( http://arxiv.org/abs/2409.19494v1 ) ライセンス: Link先を確認	Soofiyan Atar, Yi Li, Markus Grotz, Michael Wolf, Dieter Fox, Joshua Smith,	(参考訳) 倉庫環境では、ロボットはさまざまなオブジェクトを管理するために堅牢なピッキング機能を必要とする。効果的なデプロイメントには、最小限のハードウェア、新製品への強力な一般化、さまざまな設定でのレジリエンスが必要だ。現在の手法は、しばしば高コスト、複雑な設定、技術的な制限に悩まされる構造情報に対する深度センサーに依存している。コンピュータビジョンの最近の進歩に触発されて,基礎モデルを活用し,RGB画像のみを用いた吸引把握を向上させる革新的なアプローチを提案する。本手法は,合成データセットのみを用いて学習し,その把握能力を実世界のロボットに一般化し,トレーニングセットに含まれない多様な新しい物体を探索する。我々のネットワークは実世界のアプリケーションで82.3倍の成功率を達成した。コードとデータを備えたプロジェクトのWebサイトはhttp://optigrasp.github.io.comで公開されている。 In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advancements in computer vision, we propose an innovative approach that leverages foundation models to enhance suction grasping using only RGB images. Trained solely on a synthetic dataset, our method generalizes its grasp prediction capabilities to real-world robots and a diverse range of novel objects not included in the training set. Our network achieves an 82.3\% success rate in real-world applications. The project website with code and data will be available at http://optigrasp.github.io.	翻訳日:2024-11-05 22:57:44 公開日:2024-09-29
# 量子符号化のための量子スーパーポーシングアルゴリズム Quantum superposing algorithm for quantum encoding ( http://arxiv.org/abs/2409.19496v1 ) ライセンス: Link先を確認	Jaehee Kim, Taewan Kim, Kyunghyun Baek, Yongsoo Hwang, Joonsuk Huh, Jeongho Bang,	(参考訳) 現在量子符号化と呼ばれる量子状態への古典データの効率的な符号化は、量子計算において重要な意味を持つ。有限サイズのデータベースや量子ビットレジスタの場合、量子符号化の一般的な戦略は、マシン認識可能なデータアドレスと、後に重畳される量子ビットインデックスとを関連付ける古典的なマッピングを確立することである。ここで最も重要なのが、任意のキュービット指数の重ね合わせを生成するアルゴリズムを鋳造することである。このアルゴリズムは、正式には量子スーパーポーシングアルゴリズムとして知られている。本研究では,実効的な量子符号化シナリオにおいて,その有効性と優れた計算性能を実証する,効率的な量子スーパーポーザリングアルゴリズムを提案する。理論的および数値解析により,既存のアルゴリズムと比較して計算効率が大幅に向上したことを示す。特に、我々のアルゴリズムは最大2n-3制御ノット数(CNOT)を持ち、これまでで最も最適化された結果を示している。 Efficient encoding of classical data into quantum state -- currently referred to as quantum encoding -- holds crucial significance in quantum computation. For finite-size databases and qubit registers, a common strategy of the quantum encoding entails establishing a classical mapping that correlates machine-recognizable data addresses with qubit indices that are subsequently superposed. Herein, the most imperative lies in casting an algorithm for generating the superposition of any given number of qubit indices. This algorithm is formally known as quantum superposing algorithm. In this work, we present an efficient quantum superposing algorithm, affirming its effectiveness and superior computational performance in a practical quantum encoding scenario. Our theoretical and numerical analyses demonstrate a substantial enhancement in computational efficiency compared to existing algorithms. Notably, our algorithm has a maximum of 2n-3 controlled-not (CNOT) counts, representing the most optimized result to date.	翻訳日:2024-11-05 22:48:00 公開日:2024-09-29
# 音声駆動型トーキングヘッド生成のためのフレームワイズ感情強度の学習 Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation ( http://arxiv.org/abs/2409.19501v1 ) ライセンス: Link先を確認	Jingyi Xu, Hieu Le, Zhixin Shu, Yang Wang, Yi-Hsuan Tsai, Dimitris Samaras,	(参考訳) 人間の感情表現は本質的に動的、複雑、流動的であり、言語コミュニケーションを通しての強度のスムーズな推移を特徴とする。しかし、そのような強度変動のモデル化は、しばしば静的な感情的な出力をもたらす、従来の音声駆動のトーキングヘッド生成法によってほとんど見落とされてきている。本稿では,音声中の感情の強度がどのように変動するかを考察し,これらの微妙な変化を捉え,生成する方法を提案する。具体的には,強度レベルを正確に制御し,様々な感情を生成できるトーキングヘッドフレームワークを開発する。これは、感情のタイプが潜伏方向内にエンコードされ、感情の強さが潜伏ノルムに反映される連続的な感情の潜伏空間を学習することで達成される。さらに、動的強度変動を捉えるために、その強度を反映する発声音を考慮した音声からインテンシティ予測器を採用する。この予測器のトレーニング信号は,フレームワイド・インテンシティ・ラベリングを必要とせずに,感情に依存しないインテンシティ・擬似ラベル法によって得られる。提案手法の有効性を検証し,提案手法の有効性を検証し,提案手法の有効性を検証した。 Human emotional expression is inherently dynamic, complex, and fluid, characterized by smooth transitions in intensity throughout verbal communication. However, the modeling of such intensity fluctuations has been largely overlooked by previous audio-driven talking-head generation methods, which often results in static emotional outputs. In this paper, we explore how emotion intensity fluctuates during speech, proposing a method for capturing and generating these subtle shifts for talking-head generation. Specifically, we develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. This is achieved by learning a continuous emotion latent space, where emotion types are encoded within latent orientations and emotion intensity is reflected in latent norms. In addition, to capture the dynamic intensity fluctuations, we adopt an audio-to-intensity predictor by considering the speaking tone that reflects the intensity. The training signals for this predictor are obtained through our emotion-agnostic intensity pseudo-labeling method without the need of frame-wise intensity labeling. Extensive experiments and analyses validate the effectiveness of our proposed method in accurately capturing and reproducing emotion intensity fluctuations in talking-head generation, thereby significantly enhancing the expressiveness and realism of the generated outputs.	翻訳日:2024-11-05 22:48:00 公開日:2024-09-29
# NLPの性質:NLP論文における貢献の分析 The Nature of NLP: Analyzing Contributions in NLP Papers ( http://arxiv.org/abs/2409.19505v1 ) ライセンス: Link先を確認	Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych,	(参考訳) 自然言語処理(NLP)は、コンピュータ科学、言語学、社会科学などの知的な伝統を統合する、動的で学際的な分野である。確立された存在にもかかわらず、NLP研究を構成するものの定義については議論が続いている。本研究では,NLPを構成するものについて,研究論文から定量的に検討する。そこで本研究では,NLPコントリビューション(NLPコントリビューション)の分類法を提案する。NLPコントリビューション(NLPコントリビューション)は,研究論文を要約したもので,科学的コントリビューションの同定と分類を専門的に行っている。また、これらの要素を自動的に識別する新しいタスクを提案し、データセット上で強力なベースラインをトレーニングします。この課題から得られた実験結果と,NLP研究の性質の理解を支援するため,NLP研究論文の$\sim$29k$にモデルを適用した。以上の結果から,NLP における機械学習の関与は,言語や人に関する知識の付加に焦点を絞るとともに,90年代初めから増加しており,また,2020 年以降も言語や人への注目が復活していることが明らかとなった。この作業がコミュニティの規範に関する議論を引き起こし、未来を意識的に形作る努力を促すことを願っています。 Natural Language Processing (NLP) is a dynamic, interdisciplinary field that integrates intellectual traditions from computer science, linguistics, social science, and more. Despite its established presence, the definition of what constitutes NLP research remains debated. In this work, we quantitatively investigate what constitutes NLP by examining research papers. For this purpose, we propose a taxonomy and introduce NLPContributions, a dataset of nearly $2k$ research paper abstracts, expertly annotated to identify scientific contributions and classify their types according to this taxonomy. We also propose a novel task to automatically identify these elements, for which we train a strong baseline on our dataset. We present experimental results from this task and apply our model to $\sim$$29k$ NLP research papers to analyze their contributions, aiding in the understanding of the nature of NLP research. Our findings reveal a rising involvement of machine learning in NLP since the early nineties, alongside a declining focus on adding knowledge about language or people; again, in post-2020, there has been a resurgence of focus on language and people. We hope this work will spark discussions on our community norms and inspire efforts to consciously shape the future.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# IWN:Idempotencyに基づく画像透かし IWN: Image Watermarking Based on Idempotency ( http://arxiv.org/abs/2409.19506v1 ) ライセンス: Link先を確認	Kaixin Deng,	(参考訳) デジタルメディアの分野では、透かし技術の強さと完全性を維持することがますます困難になっている。本稿では,Idempotent Generative Network (IGN)に触発されて,画像透かし処理にIdempotencyを導入する可能性を探り,革新的なニューラルネットワークモデルであるIdempotent Watermarking Network (IWN)を提案する。カラー画像透かしの回復品質向上に焦点をあてたモデルでは、イデオロシティを活用して画像の可逆性を向上する。この機能により、カラー画像の透かしが攻撃されたり損傷されたりしても、効果的に投影され、元の状態にマッピングされる。そのため、抽出された透かしは必然的に品質が向上した。 IWNモデルは、従来の透かし技術とステガノグラフィー手法におけるこれらの2つの要因の相違をある程度緩和し、埋め込み能力とロバストネスのバランスを達成する。 In the expanding field of digital media, maintaining the strength and integrity of watermarking technology is becoming increasingly challenging. This paper, inspired by the Idempotent Generative Network (IGN), explores the prospects of introducing idempotency into image watermark processing and proposes an innovative neural network model - the Idempotent Watermarking Network (IWN). The proposed model, which focuses on enhancing the recovery quality of color image watermarks, leverages idempotency to ensure superior image reversibility. This feature ensures that, even if color image watermarks are attacked or damaged, they can be effectively projected and mapped back to their original state. Therefore, the extracted watermarks have unquestionably increased quality. The IWN model achieves a balance between embedding capacity and robustness, alleviating to some extent the inherent contradiction between these two factors in traditional watermarking techniques and steganography methods.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# メタ評価要約評価尺度の批判的考察 A Critical Look at Meta-evaluating Summarisation Evaluation Metrics ( http://arxiv.org/abs/2409.19507v1 ) ライセンス: Link先を確認	Xiang Dai, Sarvnaz Karimi, Biaoyan Fang,	(参考訳) 効果的な要約評価指標により、研究者と実践者は異なる要約システムを効率的に比較することができる。メタ評価と呼ばれる自動評価尺度の有効性を推定することは、非常に重要な研究課題である。本稿では,最近,要約評価指標のメタ評価手法を概説し,(1)評価指標が主にニュース要約データセットの例からなるデータセット上でメタ評価されていること,(2)生成した要約の忠実度を評価することに焦点を当てた研究が注目されていること等について述べる。我々は、より堅牢な評価指標の開発を可能にし、既存の評価指標の一般化能力を分析するために、より多様なベンチマークを構築するのに時間がかかっていると論じる。さらに、生成した要約のコミュニケーション目標とワークフローにおける要約の役割を考慮し、ユーザ中心の品質次元に焦点を当てた研究を呼び掛けている。 Effective summarisation evaluation metrics enable researchers and practitioners to compare different summarisation systems efficiently. Estimating the effectiveness of an automatic evaluation metric, termed meta-evaluation, is a critically important research question. In this position paper, we review recent meta-evaluation practices for summarisation evaluation metrics and find that (1) evaluation metrics are primarily meta-evaluated on datasets consisting of examples from news summarisation datasets, and (2) there has been a noticeable shift in research focus towards evaluating the faithfulness of generated summaries. We argue that the time is ripe to build more diverse benchmarks that enable the development of more robust evaluation metrics and analyze the generalization ability of existing evaluation metrics. In addition, we call for research focusing on user-centric quality dimensions that consider the generated summary's communicative goal and the role of summarisation in the workflow.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# ランドスケープの変容 : 大規模言語モデルがコンピュータ科学以外の学術分野に与える影響 Transforming Scholarly Landscapes: Influence of Large Language Models on Academic Fields beyond Computer Science ( http://arxiv.org/abs/2409.19508v1 ) ライセンス: Link先を確認	Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych,	(参考訳) 大規模言語モデル(LLM)は、自然言語処理(NLP)の転換期を辿り、研究を再構築し、NLPの影響を他の研究分野にまで広げてきた。しかし、LLMが他の研究分野にどのような影響を及ぼすかを調べる研究はほとんどない。この研究は、NLP以外の分野におけるLSMの影響と利用を経験的かつ体系的に検証する。 LLMを引用し、その影響を定量化し、使用パターンの傾向を明らかにするために、116ドルのLLMをキュレートし、$\sim$148kの論文を分析します。我々の分析では、非CS分野におけるLLMの普及だけでなく、その利用状況の相違も明らかであり、2018年以降の他の分野よりも利用頻度が高い分野もあり、特に言語学と工学が共にLLM引用の$\sim$45\%を計上している。さらに,これらの分野のほとんどが,ドメイン固有の問題に対処するために,さらなる微調整を必要とせず,ゼロあるいは少数ショット学習に熟練したタスク非依存のLLMを主に採用していることが示唆された。本研究は,LPMによるNLPの学際的影響に光を当て,その機会と課題をより深く理解するものである。 Large Language Models (LLMs) have ushered in a transformative era in Natural Language Processing (NLP), reshaping research and extending NLP's influence to other fields of study. However, there is little to no work examining the degree to which LLMs influence other research fields. This work empirically and systematically examines the influence and use of LLMs in fields beyond NLP. We curate $106$ LLMs and analyze $\sim$$148k$ papers citing LLMs to quantify their influence and reveal trends in their usage patterns. Our analysis reveals not only the increasing prevalence of LLMs in non-CS fields but also the disparities in their usage, with some fields utilizing them more frequently than others since 2018, notably Linguistics and Engineering together accounting for $\sim$$45\%$ of LLM citations. Our findings further indicate that most of these fields predominantly employ task-agnostic LLMs, proficient in zero or few-shot learning without requiring further fine-tuning, to address their domain-specific problems. This study sheds light on the cross-disciplinary impact of NLP through LLMs, providing a better understanding of the opportunities and challenges.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# 不均一性を考慮した階層型エッジ学習のための資源配分とトポロジー設計 Heterogeneity-Aware Resource Allocation and Topology Design for Hierarchical Federated Edge Learning ( http://arxiv.org/abs/2409.19509v1 ) ライセンス: Link先を確認	Zhidong Gao, Yu Zhang, Yanmin Gong, Yuanxiong Guo,	(参考訳) Federated Learning (FL)は、モバイルデバイス上で機械学習モデルをトレーニングするためのプライバシー保護フレームワークを提供する。従来のFLアルゴリズム、例えばFedAvgは、これらのデバイスに大量の通信負荷をかける。この問題を軽減するため、階層型フェデレーションエッジラーニング(HFEL)が提案され、エッジサーバをモデルアグリゲーションの仲介手段として活用している。その効果にもかかわらず、HFELは、特にシステムやデータの不均一性の存在下で、収束速度の緩やかさや資源消費などの課題に直面している。しかし、既存の研究は主に従来のFLの訓練効率の改善に重点を置いており、HFELの効率は未調査のままである。本稿では、エッジデバイスをエッジサーバに接続し、エッジサーバをピアツーピア(P2P)エッジバックホールを介して相互接続する2層HFELシステムについて考察する。我々の目標は、戦略的資源配分とトポロジ設計により、HFELシステムの訓練効率を向上させることである。具体的には、計算と通信資源を割り当て、P2P接続を調整することにより、トレーニング全体の遅延を最小化する最適化問題を定式化する。動的トポロジ下で収束を確保するため,収束誤差を解析し,最適化問題にモデルコンセンサス制約を導入する。提案した問題はいくつかのサブプロブレムに分解され、代わりにオンラインで解決することができる。本手法は,エッジネットワークにおけるデータとシステムの不均一性を考慮した大規模FLの効率的な実装を容易にする。ベンチマークデータセットの総合的な実験評価は,提案手法の有効性を検証し,各種ベースラインと比較してモデルの精度を維持しつつ,トレーニングの遅延を著しく低減することを示した。 Federated Learning (FL) provides a privacy-preserving framework for training machine learning models on mobile edge devices. Traditional FL algorithms, e.g., FedAvg, impose a heavy communication workload on these devices. To mitigate this issue, Hierarchical Federated Edge Learning (HFEL) has been proposed, leveraging edge servers as intermediaries for model aggregation. Despite its effectiveness, HFEL encounters challenges such as a slow convergence rate and high resource consumption, particularly in the presence of system and data heterogeneity. However, existing works are mainly focused on improving training efficiency for traditional FL, leaving the efficiency of HFEL largely unexplored. In this paper, we consider a two-tier HFEL system, where edge devices are connected to edge servers and edge servers are interconnected through peer-to-peer (P2P) edge backhauls. Our goal is to enhance the training efficiency of the HFEL system through strategic resource allocation and topology design. Specifically, we formulate an optimization problem to minimize the total training latency by allocating the computation and communication resources, as well as adjusting the P2P connections. To ensure convergence under dynamic topologies, we analyze the convergence error bound and introduce a model consensus constraint into the optimization problem. The proposed problem is then decomposed into several subproblems, enabling us to alternatively solve it online. Our method facilitates the efficient implementation of large-scale FL at edge networks under data and system heterogeneity. Comprehensive experiment evaluation on benchmark datasets validates the effectiveness of the proposed method, demonstrating significant reductions in training latency while maintaining the model accuracy compared to various baselines.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# CoT-ST:マルチモーダル・チェーン・オブ・サートによるLLM音声翻訳の強化 CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought ( http://arxiv.org/abs/2409.19510v1 ) ライセンス: Link先を確認	Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin,	(参考訳) 音声言語モデル (SLM) は, 音声翻訳作業において顕著な性能を示した。しかし、既存の研究は主に直接指導の微調整に焦点を当てており、しばしばSLMの本質的な推論能力を見落としている。本稿では,SLMのチェーン・オブ・シント(CoT)機能を活性化する3段階のトレーニングフレームワークを提案する。本稿では,マルチモーダルCoTを用いた音声翻訳モデルであるCoT-STを提案する。提案手法の有効性を,CoVoST-2データセットとMuST-Cデータセットの2つのデータセットで検証した。実験の結果,CoT-STは従来の最先端手法よりも優れ,BLEUスコアは高い(CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2)。この作業はhttps://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2で公開されている。 Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# ユーザ毎のノード: グラフニューラルネットワークのためのノードレベルフェデレーション学習 One Node Per User: Node-Level Federated Learning for Graph Neural Networks ( http://arxiv.org/abs/2409.19513v1 ) ライセンス: Link先を確認	Zhidong Gao, Yuanxiong Guo, Yanmin Gong,	(参考訳) グラフニューラルネットワーク(GNN)トレーニングは、しばしば中央サーバに生のユーザデータを収集する必要がある。フェデレーション学習はソリューションとして登場し、ユーザが生データを直接共有することなく、協調的なモデルトレーニングを可能にする。しかし、GNNにフェデレートした学習を統合することは、特にクライアントがグラフノードを表現し、単に単一の特徴ベクトルを保持する場合、ユニークな課題を示す。本稿では,ノードレベルのフェデレーショングラフ学習のための新しいフレームワークを提案する。具体的には、第1のGNN層のメッセージパッシングと特徴ベクトル変換処理を分離し、ユーザデバイスとクラウドサーバ上で個別に実行されるようにする。さらに,特徴ベクトルの潜在表現に基づくグラフラプラシアン項を導入し,ユーザ側モデル更新を制御する。複数のデータセットに対する実験結果から,本手法はベースラインよりも性能がよいことが示された。 Graph Neural Networks (GNNs) training often necessitates gathering raw user data on a central server, which raises significant privacy concerns. Federated learning emerges as a solution, enabling collaborative model training without users directly sharing their raw data. However, integrating federated learning with GNNs presents unique challenges, especially when a client represents a graph node and holds merely a single feature vector. In this paper, we propose a novel framework for node-level federated graph learning. Specifically, we decouple the message-passing and feature vector transformation processes of the first GNN layer, allowing them to be executed separately on the user devices and the cloud server. Moreover, we introduce a graph Laplacian term based on the feature vector's latent representation to regulate the user-side model updates. The experiment results on multiple datasets show that our approach achieves better performance compared with baselines.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# KODA:Koopman演算子を用いた時系列予測とデータ同化のためのデータ駆動再帰モデル KODA: A Data-Driven Recursive Model for Time Series Forecasting and Data Assimilation using Koopman Operators ( http://arxiv.org/abs/2409.19518v1 ) ライセンス: Link先を確認	Ashutosh Singh, Ashish Singh, Tales Imbiriba, Deniz Erdogmus, Ricardo Borsoi,	(参考訳) クープマン演算子に基づくアプローチは、複素非線形力学系(NLDS)によって生成される時系列データの予測に大きな可能性を示してきた。このような手法は、NLDSの潜在状態表現を捉えることができるが、実世界のデータに適用した場合、長期的な予測が困難である。具体的には、多くの現実世界のNLDSは時間変化の挙動を示し、そのようなモデルでは捉えにくい非定常性をもたらす。さらに、彼らはデータ同化を行うための体系的なデータ駆動アプローチ、すなわち予測タスクにおけるハエのノイズ測定を活用できない。上記の問題を緩和するために,NLDSにおける予測とデータ同化を統合したKoopman演算子(Koda-Koopman Operator with Data Assimilation)を提案する。特に、フーリエ領域フィルタを用いてデータを物理的コンポーネントに切り離し、そのダイナミクスはクープマン演算子によって正確に表現できる。我々はアーキテクチャを慎重に設計し、この分解が安定した長期的な予測につながることを確実にするためのトレーニング基準を策定する。さらに,推定時刻における新しい測定値とデータ同化を行うコース補正戦略を導入する。提案されたアプローチは完全にデータ駆動であり、エンドツーエンドで学習することができる。広範に実験を行った結果,KODAは電気,温度,天気,ローレンツ63,ダッフィング発振器などの複数の時系列ベンチマークにおいて,既存手法よりも優れた性能と有効性を示した。予測; 予測; 予測; 予測 b) データ同化及びデータ同化 c) 状態予測 Approaches based on Koopman operators have shown great promise in forecasting time series data generated by complex nonlinear dynamical systems (NLDS). Although such approaches are able to capture the latent state representation of a NLDS, they still face difficulty in long term forecasting when applied to real world data. Specifically many real-world NLDS exhibit time-varying behavior, leading to nonstationarity that is hard to capture with such models. Furthermore they lack a systematic data-driven approach to perform data assimilation, that is, exploiting noisy measurements on the fly in the forecasting task. To alleviate the above issues, we propose a Koopman operator-based approach (named KODA - Koopman Operator with Data Assimilation) that integrates forecasting and data assimilation in NLDS. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator, and residual dynamics that represents the local or time varying behavior that are captured by a flexible and learnable recursive model. We carefully design an architecture and training criterion that ensures this decomposition lead to stable and long-term forecasts. Moreover, we introduce a course correction strategy to perform data assimilation with new measurements at inference time. The proposed approach is completely data-driven and can be learned end-to-end. Through extensive experimental comparisons we show that KODA outperforms existing state of the art methods on multiple time series benchmarks such as electricity, temperature, weather, lorenz 63 and duffing oscillator demonstrating its superior performance and efficacy along the three tasks a) forecasting, b) data assimilation and c) state prediction.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# GenTel-Safe: プロンプトインジェクション攻撃に対する防御のための統一ベンチマークとシールドフレームワーク GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks ( http://arxiv.org/abs/2409.19521v1 ) ライセンス: Link先を確認	Rongchang Li, Minjie Chen, Chang Hu, Han Chen, Wenpeng Xing, Meng Han,	(参考訳) GPT-4、LLaMA、Qwenのような大規模言語モデル(LLM)は、幅広いアプリケーションで顕著な成功を収めている。しかしながら、これらのモデルは、既存の安全性メカニズムを回避し、より堅牢な攻撃検出方法と包括的な評価ベンチマークの緊急性の必要性を強調したインジェクション攻撃に対して本質的に脆弱なままである。これらの課題に対処するために、新しいプロンプトインジェクション攻撃検出方法であるGenTel-Shieldと、包括的な評価ベンチマークであるGenTel-Benchを含む統合フレームワークであるGenTel-Safeを紹介した。 GenTel-Shieldの有効性を証明するため,GenTel-Benchデータセットに対するバニラ安全ガードレールと併用して評価を行った。実証的に、GenTel-Shieldは最先端の攻撃検出成功率を達成することができ、有害なプロンプトに対する既存の保護技術の重要な弱点を明らかにする。再現性のために、コードとベンチマークデータセットをプロジェクトページのhttps://gentellab.github.io/gentel-safe.github.io/で公開しました。 Large Language Models (LLMs) like GPT-4, LLaMA, and Qwen have demonstrated remarkable success across a wide range of applications. However, these models remain inherently vulnerable to prompt injection attacks, which can bypass existing safety mechanisms, highlighting the urgent need for more robust attack detection methods and comprehensive evaluation benchmarks. To address these challenges, we introduce GenTel-Safe, a unified framework that includes a novel prompt injection attack detection method, GenTel-Shield, along with a comprehensive evaluation benchmark, GenTel-Bench, which compromises 84812 prompt injection attacks, spanning 3 major categories and 28 security scenarios. To prove the effectiveness of GenTel-Shield, we evaluate it together with vanilla safety guardrails against the GenTel-Bench dataset. Empirically, GenTel-Shield can achieve state-of-the-art attack detection success rates, which reveals the critical weakness of existing safeguarding techniques against harmful prompts. For reproducibility, we have made the code and benchmarking dataset available on the project page at https://gentellab.github.io/gentel-safe.github.io/.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# LANDeRMT:LLMを機械翻訳に選択的に微調整するための言語対応ニューロンの検出とルーティング LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation ( http://arxiv.org/abs/2409.19523v1 ) ライセンス: Link先を確認	Shaolin Zhu, Leiyu Pan, Bo Li, Deyi Xiong,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は,バイリンガルの監督が限定された場合でも,多言語翻訳において有望な結果を示している。主な課題は、並列トレーニングデータを提供する際に、微調整LDMに対する破滅的な忘れとパラメータ干渉である。これらの課題に対処するために,LANDeRMT, a \textbf{L}anguage-\textbf{A}ware \textbf{N}euron \textbf{De}tectingおよび \textbf{R}outing frameworkを提案する。 LANDeRMTでは、MTタスクに対するニューロンの認識を評価し、それらを言語一般ニューロンと言語固有ニューロンに分類する。この分類は、微調整、パラメータ干渉の緩和、破滅的な忘れの問題の間の選択的なパラメータ更新を可能にする。検出されたニューロンに対しては,LLM内の言語一般および言語固有能力を動的に調整し,翻訳信号で誘導する条件付き認識に基づくルーティング機構を提案する。実験の結果,提案するLANDeRMTは翻訳知識の学習に非常に有効であることが確認された。 Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision. The major challenges are catastrophic forgetting and parameter interference for finetuning LLMs when provided parallel training data. To address these challenges, we propose LANDeRMT, a \textbf{L}anguage-\textbf{A}ware \textbf{N}euron \textbf{De}tecting and \textbf{R}outing framework that selectively finetunes LLMs to \textbf{M}achine \textbf{T}ranslation with diverse translation training data. In LANDeRMT, we evaluate the awareness of neurons to MT tasks and categorize them into language-general and language-specific neurons. This categorization enables selective parameter updates during finetuning, mitigating parameter interference and catastrophic forgetting issues. For the detected neurons, we further propose a conditional awareness-based routing mechanism to dynamically adjust language-general and language-specific capacity within LLMs, guided by translation signals. Experimental results demonstrate that the proposed LANDeRMT is very effective in learning translation knowledge, significantly improving translation quality over various strong baselines for multiple language pairs.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# マルチモーダル・コントラスト学習における効果的なバックドア・ディフェンス:脅威の軽減のためのトーケンレベル・アンラーニング手法 Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats ( http://arxiv.org/abs/2409.19526v1 ) ライセンス: Link先を確認	Kuanrong Liu, Siyuan Liang, Jiawei Liang, Pengwen Dai, Xiaochun Cao,	(参考訳) マルチモーダルコントラスト学習は高品質な特徴を生み出すために様々なデータモダリティを使用するが、インターネット上の広範囲なデータソースに依存しているため、バックドア攻撃に弱い。これらの攻撃は、推論中に特定のトリガーによって起動されるトレーニング中に悪意のある振る舞いを挿入し、重大なセキュリティリスクを生じさせる。このような攻撃による悪意のある影響を減らすための微調整による既存の対策にもかかわらず、これらの防御は大規模な訓練時間を必要とし、クリーンな精度を低下させる。本研究では,マシン・アンラーニングという概念を用いて,バックドア・脅威に対する効果的な防御機構を提案する。これは、Unlearn Backdoor Threats(UBT)として知られる、モデルによるバックドア脆弱性の迅速な未学習を支援するために、小さな毒のサンプルを戦略的に作成することを必要とする。具体的には、バックドアショートカットの改善と、潜在的中毒データセットにおける疑わしいサンプルの正確な検出に、オーバーフィットトレーニングを使用します。そして, バックドア効果を排除し, バックドア防御効率を向上させるため, 不審な試料から, 急激な忘れがちな試料を選別する。バックドア・アンラーニング・プロセスでは,新しいトークン・ベースの非ラーニング・トレーニング・システムを提案する。このテクニックは、モデル全体の完全性を維持しながら、バックドアの相関関係を解離する、モデルの妥協された要素に焦点を当てる。実験結果から,CLIPモデルのバックドア攻撃手法を効果的に防御できることが示唆された。 SoTAのバックドア防御法と比較して、UBTはモデルのクリーンな精度を保ちながら最小の攻撃成功率を達成する(攻撃成功率はSOTAに比べて19%減少し、クリーンな精度は2.57%上昇する)。 Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# BuildingView:ストリートビュー画像とマルチモーダル大言語モードによる都市ビルの外部データベースの構築 BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode ( http://arxiv.org/abs/2409.19527v1 ) ライセンス: Link先を確認	Zongrong Li, Yunlei Su, Chenyuan Zhu, Wufan Zhao,	(参考訳) 都市ビルの外観は、ストリートビュー画像の進歩と都市研究との統合によって、都市分析においてますます重要になっている。マルチモーダル大言語モデル(LLM)は都市アノテーションのための強力なツールを提供し、都市環境に対する深い洞察を可能にする。しかし、正確な都市ビルの外装データベースの作成、エネルギー効率、環境の持続可能性、人間中心の設計の重要指標の特定、これらの指標の体系的な整理といった課題が残されている。これらの課題に対処するために,Googleストリートビューの高解像度視覚データをOpenStreetMapの空間情報とOverpass APIを介して統合する,新しいアプローチであるBuildingViewを提案する。本研究は,都市の建築外装データの精度を向上し,キーサステナビリティと設計指標を特定し,その抽出と分類のための枠組みを開発する。本手法は,ChatGPT-4O APIを用いた文献の体系的レビュー,ビルディングとストリートビューのサンプリング,アノテーションを含む。結果として得られたデータベースは、ニューヨーク市、アムステルダム、シンガポールからのデータで検証され、都市計画、建築設計、環境政策における情報的意思決定をサポートする都市研究のための総合的なツールを提供する。 BuildingViewのコードはhttps://github.com/Jasper0122/BuildingViewで入手できる。 Urban Building Exteriors are increasingly important in urban analytics, driven by advancements in Street View Imagery and its integration with urban research. Multimodal Large Language Models (LLMs) offer powerful tools for urban annotation, enabling deeper insights into urban environments. However, challenges remain in creating accurate and detailed urban building exterior databases, identifying critical indicators for energy efficiency, environmental sustainability, and human-centric design, and systematically organizing these indicators. To address these challenges, we propose BuildingView, a novel approach that integrates high-resolution visual data from Google Street View with spatial information from OpenStreetMap via the Overpass API. This research improves the accuracy of urban building exterior data, identifies key sustainability and design indicators, and develops a framework for their extraction and categorization. Our methodology includes a systematic literature review, building and Street View sampling, and annotation using the ChatGPT-4O API. The resulting database, validated with data from New York City, Amsterdam, and Singapore, provides a comprehensive tool for urban studies, supporting informed decision-making in urban planning, architectural design, and environmental policy. The code for BuildingView is available at https://github.com/Jasper0122/BuildingView.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# 従来の東アジア医学におけるディメンダリティ・リダクションによる臨床的意思決定の理解--実証的研究 Understanding Clinical Decision-Making in Traditional East Asian Medicine through Dimensionality Reduction: An Empirical Investigation ( http://arxiv.org/abs/2409.19531v1 ) ライセンス: Link先を確認	Hyojin Bae, Bongsu Kang, Chang-Eop Kim,	(参考訳) 本研究では,従来の東アジア医学(TEAM)における臨床意思決定過程について,次元減少のレンズを通してパターン識別(PI)を再解釈することにより検討した。 8原則パターン同定(EPPI)システムに着目し,Shang-Han-Lunの実証データを活用することにより,診断と治療選択における外部パターンの優先順位付けの必要性と意義を検討する。 Ext-Intパターンが患者の症状に関する情報を最も多く含んでいるか,最も抽象的で一般化可能な症状情報を示し,適切な処方薬の選択を容易にするか,という3つの仮説を検証した。解析指標,クロスコンディショナライゼーション性能,決定木回帰などの定量的指標を用いて,Exterior-Interiorパターンは最も抽象的で一般化可能な症状情報を表現し,症状と草本処方薬の効率的なマッピングに寄与することを示した。本研究は、TEAMの基礎となる認知過程を理解するための客観的な枠組みを提供し、現代の計算手法で伝統的な医療実践をブリッジする。この発見は、TEAMおよび従来の医学におけるAI駆動診断ツールの開発に関する洞察を与え、臨床実践、教育、研究を進展させる可能性がある。 This study examines the clinical decision-making processes in Traditional East Asian Medicine (TEAM) by reinterpreting pattern identification (PI) through the lens of dimensionality reduction. Focusing on the Eight Principle Pattern Identification (EPPI) system and utilizing empirical data from the Shang-Han-Lun, we explore the necessity and significance of prioritizing the Exterior-Interior pattern in diagnosis and treatment selection. We test three hypotheses: whether the Ext-Int pattern contains the most information about patient symptoms, represents the most abstract and generalizable symptom information, and facilitates the selection of appropriate herbal prescriptions. Employing quantitative measures such as the abstraction index, cross-conditional generalization performance, and decision tree regression, our results demonstrate that the Exterior-Interior pattern represents the most abstract and generalizable symptom information, contributing to the efficient mapping between symptom and herbal prescription spaces. This research provides an objective framework for understanding the cognitive processes underlying TEAM, bridging traditional medical practices with modern computational approaches. The findings offer insights into the development of AI-driven diagnostic tools in TEAM and conventional medicine, with the potential to advance clinical practice, education, and research.	翻訳日:2024-11-05 22:47:59 公開日:2024-09-29
# Video DataFlywheel:ビデオ言語理解における不可能なデータのトリニティを解決する Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding ( http://arxiv.org/abs/2409.19532v1 ) ライセンス: Link先を確認	Xiao Wang, Jianlong Wu, Zijia Lin, Fuzheng Zhang, Di Zhang, Liqiang Nie,	(参考訳) 近年,ビデオ言語理解は大規模事前学習によって大きな成功を収めている。しかし、データの不足は依然として大きな課題だ。本研究では,事前学習データセットにおけるデータ量,多様性,品質の「不可能トリニティ」を定量的に明らかにする。近年の取り組みは、合成アノテーションによって低品質で妥協された大規模で多様なASRデータセットを改良することを目指している。これらの手法は、オリジナルのアノテーションを洗練させるために、マルチモーダルなビデオコンテンツ(フレーム、タグ、ASR transcriptsなど)で有用な情報を活用することに成功した。それでも彼らは、合成アノテーション内のノイズを軽減し、データセットのサイズが拡大するにつれてスケーラビリティを欠いている。これらの問題に対処するために,ビデオアノテーションを改良されたノイズコントロール手法で反復的に洗練するVideo DataFlywheelフレームワークを導入する。反復的改良のために、まずビデオ言語モデルを用いて合成アノテーションを生成し、洗練されたデータセットを生成する。そして,それを事前訓練し,より強力なモデルのための人間の洗練例を微調整する。これらのプロセスは継続的改善のために繰り返されます。ノイズ制御のための新しいノイズ制御手法であるAda TaiLrを提案する。反復リファインメントとAdaTaiLrを組み合わせることで、ビデオ言語理解のスケーラビリティが向上する。大規模な実験により、我々のフレームワークは既存のデータ改善ベースラインよりも優れており、3%のパフォーマンス向上と、多様性の損失を最小限に抑えてデータセットの品質の向上を実現している。さらに、改良されたデータセットは、ビデオ質問応答やテキストビデオ検索など、様々なビデオ言語理解タスクの大幅な改善を促進する。 Recently, video-language understanding has achieved great success through large-scale pre-training. However, data scarcity remains a prevailing challenge. This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets. Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations. These methods successfully leverage useful information in multimodal video content (frames, tags, ASR transcripts, etc.) to refine the original annotations. Nevertheless, they struggle to mitigate noise within synthetic annotations and lack scalability as the dataset size expands. To address these issues, we introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods. For iterative refinement, we first leverage a video-language model to generate synthetic annotations, resulting in a refined dataset. Then, we pre-train on it and fine-tune on human refinement examples for a stronger model. These processes are repeated for continuous improvement. For noise control, we present AdaTaiLr, a novel noise control method that requires weaker assumptions on noise distribution, thereby proving more effective in large datasets with theoretical guarantees. The combination of iterative refinement and AdaTaiLr can achieve better scalability in video-language understanding. Extensive experiments show that our framework outperforms existing data refinement baselines, delivering a 3% performance boost and improving dataset quality with minimal diversity loss. Furthermore, our refined dataset facilitates significant improvements in various video-language understanding tasks, including video question answering and text-video retrieval.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 感情支援型チャットボットのための混合型心理療法 Mixed Chain-of-Psychotherapies for Emotional Support Chatbot ( http://arxiv.org/abs/2409.19533v1 ) ライセンス: Link先を確認	Siyuan Chen, Cong Ming, Zhiling Zhang, Yanyi Chen, Kenny Q. Zhu, Mengyue Wu,	(参考訳) メンタルヘルス支援チャットボットの領域では、共感を示し、適切なソリューションを提供するための自己探索を促進することが不可欠である。しかし、現在のアプローチは、ヘルプ・シーカーの状況を完全に理解することなく、一般的な洞察や解決策を提供する傾向にある。そこで我々は, 心理療法(Chain-of-Psychotherapies, CoP) の観点から, 探索者の状態分析を統合したチャットボットPsyMixを提案する。包括的評価により,PsyMixはChatGPTベースラインを上回り,ヒトカウンセラーに対する共感度は同等であった。 In the realm of mental health support chatbots, it is vital to show empathy and encourage self-exploration to provide tailored solutions. However, current approaches tend to provide general insights or solutions without fully understanding the help-seeker's situation. Therefore, we propose PsyMix, a chatbot that integrates the analyses of the seeker's state from the perspective of a psychotherapy approach (Chain-of-Psychotherapies, CoP) before generating the response, and learns to incorporate the strength of various psychotherapies by fine-tuning on a mixture of CoPs. Through comprehensive evaluation, we found that PsyMix can outperform the ChatGPT baseline, and demonstrate a comparable level of empathy in its responses to that of human counselors.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 非局所クラマース-モヤル式に基づく非ガウス確率力学系発見への進化的アプローチ An evolutionary approach for discovering non-Gaussian stochastic dynamical systems based on nonlocal Kramers-Moyal formulas ( http://arxiv.org/abs/2409.19534v1 ) ライセンス: Link先を確認	Yang Li, Shengyuan Xu, Jinqiao Duan,	(参考訳) 統計力学系の(ガウス的)ブラウンノイズと(ガウス的でない)L''evyノイズの両方を持つ明示的な支配方程式をデータから発見することは、複雑な機能形式とL'evy運動の本質的な複雑さによって、変化している。本研究では,非局所クラマース・モラル式,遺伝的プログラミング,スパース回帰に基づいて,非ガウス確率力学系をサンプルパスデータから抽出する進化的シンボルスパース回帰(ESSR)手法を提案する。より具体的には、遺伝的プログラミングは多様な候補関数を生成するために使用され、スパース回帰法はこれらの候補に関連する係数を学習することを目的としており、非局所クラマース・モヤル式は、スパース回帰における適合度尺度と損失関数を構築する基盤となる。このアプローチの有効性と能力は、いくつかのイラストレーターモデルに適用することで示される。このアプローチは、利用可能なデータセットから非ガウス確率力学を解読するための強力な手段であり、様々な分野にまたがる幅広い応用を示す。 Discovering explicit governing equations of stochastic dynamical systems with both (Gaussian) Brownian noise and (non-Gaussian) L\'evy noise from data is chanllenging due to possible intricate functional forms and the inherent complexity of L\'evy motion. This present research endeavors to develop an evolutionary symbol sparse regression (ESSR) approach to extract non-Gaussian stochastic dynamical systems from sample path data, based on nonlocal Kramers-Moyal formulas, genetic programming, and sparse regression. More specifically, the genetic programming is employed to generate a diverse array of candidate functions, the sparse regression technique aims at learning the coefficients associated with these candidates, and the nonlocal Kramers-Moyal formulas serve as the foundation for constructing the fitness measure in genetic programming and the loss function in sparse regression. The efficacy and capabilities of this approach are showcased through its application to several illustrative models. This approach stands out as a potent instrument for deciphering non-Gaussian stochastic dynamics from available datasets, indicating a wide range of applications across different fields.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 量子鍵分布における位相誤差推定のポストセレクションセキュリティ解析の改善 Improved postselection security analysis of phase error estimation in quantum key distribution ( http://arxiv.org/abs/2409.19538v1 ) ライセンス: Link先を確認	Yang-Guang Shan, Zhen-Qiang Yin, Shuang Wang, Wei Chen, De-Yong He, Guang-Can Guo, Zheng-Fu Han,	(参考訳) 量子鍵分散(QKD)は、2つの離れたユーザ間でセキュアな鍵を生成する。一般的なコヒーレントな攻撃に対するQKDのセキュリティ証明は難しいが、集団攻撃に対する攻撃はずっと容易である。ポストセレクション法は,効果的で汎用的な手法として,コヒーレント攻撃に対する集団攻撃のセキュリティ解析を拡張しようとするものである。しかし、パフォーマンスは悪い。この欠点を克服するために、ポストセレクション法でキーレートを直接計算するのではなく、集合的およびコヒーレント攻撃に対する位相誤差推定の失敗確率を関連づける手法を提案し、コヒーレント攻撃に対するパラメータ推定において、独立的および同一に分布した仮定を用いることを可能にした。すると、キーレートはエントロピーの不確実性関係によって得られる。提案手法は様々なQKDプロトコルに適用可能であり,従来のポストセレクション法と比較して性能が向上する。例えば、サイドチャネルセキュア(SCS)QKDとノーフェーズポストセレクション(NPP)ツインフィールド(TF)QKDの有限鍵解析を行い、提案手法による性能改善を示す。 Quantum key distribution (QKD) enables the generation of secure keys between two distant users. Security proof of QKD against general coherent attacks is challenging, while the one against collective attacks is much easier. As an effective and general solution, the postselection method tries to extend security analyses of collective attacks to be against coherent attacks. However, it gives a bad performance. To overcome this drawback, instead of directly calculating key rate by postselection method, we propose a method correlating the failure probabilities of phase error estimation against collective and coherent attacks, enabling the use of the independent and identically distributed assumption in parameter estimation against coherent attacks. Then the key rate can be obtained by uncertainty relation of entropy. Our method can be applied to various QKD protocols, providing better performance compared with the traditional postselection method. For instance, we give the finite-key analyses of the side-channel-secure (SCS) QKD and the no-phase-postselection (NPP) twin-field (TF) QKD to show their performance improvements with the proposed method.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# LoRKD:医療ファウンデーションモデルのための低レベル知識分割 LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models ( http://arxiv.org/abs/2409.19540v1 ) ライセンス: Link先を確認	Haolin Li, Yuhang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya Zhang, Yanfeng Wang,	(参考訳) 大規模プレトレーニング技術の普及により、医療基盤モデルの開発が大幅に進展し、幅広い医療タスクにおいて汎用的なツールとして機能することができるようになった。しかし、その強力な一般化能力にもかかわらず、大規模なデータセットで事前訓練された医療基礎モデルは、異種データ間のドメインギャップに悩まされがちであり、以前の研究で証明されたように、専門的なモデルと比較して特定のタスクに対する準最適性能をもたらす。本稿では, 特定の医療課題における「知識分解」と呼ばれる新たな視点を探求し, 基礎モデルを複数の軽量専門家モデルに分解し, それぞれが特定の解剖学的領域に特化して, 専門性を高め, 資源消費を同時に低減することを目的としている。この目的を達成するために,ローランク知識分解(LoRKD)と呼ばれる新しいフレームワークを提案する。低ランクの専門家モジュールは、異なる解剖学的領域の異種データ間の勾配の衝突を解消し、低コストで強力な特殊化を提供する。効率的な知識分離畳み込みは、単一の前方伝播における知識分離を達成することにより、アルゴリズム効率を著しく向上させる。セグメンテーションおよび分類タスクに関する大規模な実験結果から, 分割されたモデルが最先端の性能を達成するだけでなく, 下流タスクに優れた伝達性を示すことが示され, タスク固有の評価において, 元の基礎モデルを上回る結果が得られた。コードはここにある。 The widespread adoption of large-scale pre-training techniques has significantly advanced the development of medical foundation models, enabling them to serve as versatile tools across a broad range of medical tasks. However, despite their strong generalization capabilities, medical foundation models pre-trained on large-scale datasets tend to suffer from domain gaps between heterogeneous data, leading to suboptimal performance on specific tasks compared to specialist models, as evidenced by previous studies. In this paper, we explore a new perspective called "Knowledge Decomposition" to improve the performance on specific medical tasks, which deconstructs the foundation model into multiple lightweight expert models, each dedicated to a particular anatomical region, with the aim of enhancing specialization and simultaneously reducing resource consumption. To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution. The low-rank expert modules resolve gradient conflicts between heterogeneous data from different anatomical regions, providing strong specialization at lower costs. The efficient knowledge separation convolution significantly improves algorithm efficiency by achieving knowledge separation within a single forward propagation. Extensive experimental results on segmentation and classification tasks demonstrate that our decomposed models not only achieve state-of-the-art performance but also exhibit superior transferability on downstream tasks, even surpassing the original foundation models in task-specific evaluations. The code is available at here.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# BiPC: 教師なしドメイン適応のための双方向確率校正 BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption ( http://arxiv.org/abs/2409.19542v1 ) ライセンス: Link先を確認	Wenlve Zhou, Zhiheng Zhou, Junyuan Shang, Chang Niu, Mingyue Zhang, Xiyuan Tao, Tianlei Wang,	(参考訳) Unsupervised Domain Adaptation (UDA)はラベル付きソースドメインを利用してラベルなしのターゲットドメインのタスクを解決する。 Transformer ベースの手法は UDA の有望性を示しているが、その応用は Convolutional Neural Networks (CNN) と階層型 Transformer を除いてプレーンな Transformer に限られている。この問題に対処するため,確率空間の観点からBidirectional Probability Calibration (BiPC)を提案する。本研究では,事前学習した頭部からの確率出力が領域ギャップに対して頑健であり,タスクヘッドの確率分布を調整できることを実証する。さらに、タスクヘッドは、適応訓練中に事前訓練されたヘッドを強化することができ、双方向補完によるモデル性能を向上させることができる。技術的には、ImageNet-1kプリトレーニングされた分類器などの事前学習された頭部の確率を調整するために、校正確率アライメント(CPA)を導入する。さらに,事前学習した分類器から学習した校正係数を用いて,タスクヘッドを改良するキャリブレーションギニ不純物(CGI)損失を設計する。 BiPCは、CNNやTransformerなど、さまざまなネットワークに適用可能な、シンプルで効果的な方法である。実験の結果、複数のUDAタスクにまたがる顕著なパフォーマンスが示された。私たちのコードは、https://github.com/Wenlve-Zhou/BiPC.comで公開されます。 Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: https://github.com/Wenlve-Zhou/BiPC.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 協調型企業間労働市場予測のための収束型クラスタ化グラフ学習フレームワーク Convergence-aware Clustered Federated Graph Learning Framework for Collaborative Inter-company Labor Market Forecasting ( http://arxiv.org/abs/2409.19545v1 ) ライセンス: Link先を確認	Zhuoning Guo, Hao Liu, Le Zhang, Qi Zhang, Hengshu Zhu, Hui Xiong,	(参考訳) 人材需要と供給を予測する労働市場は、経営管理と経済発展に不可欠である。正確でタイムリーな予測では、雇用主は採用戦略を進化する労働市場に合わせて調整することができ、雇用者は将来の需要と供給に応じて積極的なキャリアパス計画を行うことができる。しかし、従来の研究では、異なる企業間の需要供給シーケンスと変動予測の立場の相互関係は無視されていた。さらに企業は、競争上の優位性やセキュリティ上の脅威、倫理的または法的違反を危険にさらす懸念から、グローバルな労働市場分析のためにプライベートな人的資源データを共有することに消極的だ。そこで本稿では,FedLMF(Federated Labor Market Forecasting)の問題を定式化し,MPCAC-FL(Meta-personalized Convergence-aware Clustered Federated Learning)フレームワークを提案する。まず、需要と供給の順序と企業配置のペアの間に固有の相関関係を捉えるグラフベースのシーケンシャルモデルを設計する。第2に,企業間で共有可能な効果的な初期モデルパラメータを学習するためにメタラーニング手法を採用し,異種データを持つ企業であっても,企業固有の需要と供給を予測するためにパーソナライズされたモデルを最適化する。第3に,モデル類似性に応じて企業を動的にグループに分割し,各グループにフェデレーションアグリゲーションを適用するコンバージェンス対応クラスタリングアルゴリズムを考案する。不均一性はより安定した収束とより良い性能のために緩和することができる。大規模な実験では、MPCAC-FLは3つの実世界のデータセットのベースラインを比較し、DH-GEMという最先端モデルの97%以上を非公開企業データを公開せずに達成している。 Labor market forecasting on talent demand and supply is essential for business management and economic development. With accurate and timely forecasts, employers can adapt their recruitment strategies to align with the evolving labor market, and employees can have proactive career path planning according to future demand and supply. However, previous studies ignore the interconnection between demand-supply sequences among different companies and positions for predicting variations. Moreover, companies are reluctant to share their private human resource data for global labor market analysis due to concerns over jeopardizing competitive advantage, security threats, and potential ethical or legal violations. To this end, in this paper, we formulate the Federated Labor Market Forecasting (FedLMF) problem and propose a Meta-personalized Convergence-aware Clustered Federated Learning (MPCAC-FL) framework to provide accurate and timely collaborative talent demand and supply prediction in a privacy-preserving way. First, we design a graph-based sequential model to capture the inherent correlation between demand and supply sequences and company-position pairs. Second, we adopt meta-learning techniques to learn effective initial model parameters that can be shared across companies, allowing personalized models to be optimized for forecasting company-specific demand and supply, even when companies have heterogeneous data. Third, we devise a Convergence-aware Clustering algorithm to dynamically divide companies into groups according to model similarity and apply federated aggregation in each group. The heterogeneity can be alleviated for more stable convergence and better performance. Extensive experiments demonstrate that MPCAC-FL outperforms compared baselines on three real-world datasets and achieves over 97% of the state-of-the-art model, i.e., DH-GEM, without exposing private company data.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 類似行列補完のためのテーラー低ランク行列分解法 Tailed Low-Rank Matrix Factorization for Similarity Matrix Completion ( http://arxiv.org/abs/2409.19550v1 ) ライセンス: Link先を確認	Changyi Ma, Runsheng Yu, Xiao Chen, Youzhi Zhang,	(参考訳) 類似度行列は、多くの下流機械学習タスクの中核にある基本的なツールとして機能する。しかし、欠落したデータは避けられず、しばしば不正確な類似性行列をもたらす。この問題に対処するため, 類似行列補完法(SMC)が提案されているが, Singular Value Decomposition (SVD) 演算による計算の複雑さに悩まされている。計算複雑性を低減するため、行列因子化(MF)技術はより明示的で、低ランクなソリューションを提供するために頻繁に適用されるが、非凸構造に苦しむため、正確な低ランクの最適解を保証することはできない。本稿では,より信頼性が高く効率的なソリューションを提供する新しいSMCフレームワークを提案する。具体的には,PSD(Positive Semi-Definiteness)特性を利用して完成過程を導出するだけでなく,最適かつ低ランクな解を実現するために,慎重に設計されたランク最小化正規化器をさらに補完する。基礎となるPSD特性と低ランク特性がSMC性能を改善するというキーインサイトに基づいて、PSD特性を探索し、非凸低ランク正規化器を組み込んで低ランク解を確実にする2つの新しい、スケーラブルで効果的なアルゴリズムSMCNNとSMCNmFを提案する。理論的解析により、より良い推定性能と収束速度が保証される。実世界のデータセットにおける実験結果から,提案手法が様々なベースライン手法よりも優れていることを示す。 Similarity matrix serves as a fundamental tool at the core of numerous downstream machine-learning tasks. However, missing data is inevitable and often results in an inaccurate similarity matrix. To address this issue, Similarity Matrix Completion (SMC) methods have been proposed, but they suffer from high computation complexity due to the Singular Value Decomposition (SVD) operation. To reduce the computation complexity, Matrix Factorization (MF) techniques are more explicit and frequently applied to provide a low-rank solution, but the exact low-rank optimal solution can not be guaranteed since it suffers from a non-convex structure. In this paper, we introduce a novel SMC framework that offers a more reliable and efficient solution. Specifically, beyond simply utilizing the unique Positive Semi-definiteness (PSD) property to guide the completion process, our approach further complements a carefully designed rank-minimization regularizer, aiming to achieve an optimal and low-rank solution. Based on the key insights that the underlying PSD property and Low-Rank property improve the SMC performance, we present two novel, scalable, and effective algorithms, SMCNN and SMCNmF, which investigate the PSD property to guide the estimation process and incorporate nonconvex low-rank regularizer to ensure the low-rank solution. Theoretical analysis ensures better estimation performance and convergence speed. Empirical results on real-world datasets demonstrate the superiority and efficiency of our proposed methods compared to various baseline methods.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 擬リーマン計量--量子領域の新しい視点 Pseudo-Riemannian metric: a new perspective on the quantum realm ( http://arxiv.org/abs/2409.19551v1 ) ライセンス: Link先を確認	Miaomiao Wei, Longjun Xiang, Fuming Xu, Baigeng Wang, Jian Wang,	(参考訳) 凝縮物質物理学の基本的な概念として、リーマン計量の量子幾何学はベリー曲率と量子計量によって駆動されるホール効果を含む様々なエキゾチックな現象を解明する。本研究では,量子物質の特異な性質を探求するために,擬リーマン的枠組み内での新しい量子幾何学を提案する。擬リーマン多様体上の異なる距離を定義し、スピン次数の自由を導入することにより、パウリ量子幾何テンソルを導入する。このテンソルの想像上の部分は、パウリ・ベリー曲率に対応し、新しい量子相: PT対称系におけるパウリ半金属の発見につながる。この位相は、位相的パウリ・チャーン数によって特徴づけられ、ヘリカルエッジ状態を持つ2次元のパウリ・チャーン絶縁体として現れる。これらの位相位相は、パウリ・リーマン計量によって一意に明らかにされ、リーマン計量を超越し、ベリー曲率はPT対称性により消える。パウリ・チャーン数(Pauli Chern number)は、ヘリカルトポロジカル絶縁体を時間反転対称性で分類することができる。擬リーマン計量は、量子材料に対する新たな洞察を与え、量子幾何学の範囲を広げる。 As a fundamental concept in condensed matter physics, quantum geometry within the Riemannian metric elucidates various exotic phenomena, including the Hall effects driven by Berry curvature and quantum metric. In this work, we propose novel quantum geometries within a pseudo-Riemannian framework to explore unique characteristic of quantum matter. By defining distinct distances on pseudo-Riemannian manifolds and incorporating spin degree of freedom, we introduce the Pauli quantum geometric tensor. The imaginary part of this tensor corresponds to the Pauli Berry curvature, leading to the discovery a novel quantum phase: Pauli semimetal in PT-symmetric systems. This phase, characterized by the topological Pauli Chern number, manifests as a two-dimensional Pauli Chern insulator with helical edge states. These topological phases, uniquely revealed by the Pauli-Riemannian metric, go beyond the familiar Riemannian metric, where Berry curvature vanishes due to PT-symmetry. Pauli Chern number can classify helical topological insulator with or without time reversal symmetry. Pseudo-Riemannian metrics offer new insights into quantum materials and extend the scope of quantum geometry.	翻訳日:2024-11-05 22:38:15 公開日:2024-09-29
# 材料X線吸収スペクトルの普遍的深層学習フレームワーク A Universal Deep Learning Framework for Materials X-ray Absorption Spectra ( http://arxiv.org/abs/2409.19552v1 ) ライセンス: Link先を確認	Shubha R. Kharel, Fanchen Meng, Xiaohui Qu, Matthew R. Carbone, Deyu Lu,	(参考訳) X線吸収分光法(XAS)は、吸収する原子の局所的な化学的環境を調べるための強力な特徴付け技術である。しかしながら、XASデータの解析には重大な課題が伴い、多くの場合、広範囲で計算集約的なシミュレーションと重要なドメインの専門知識が必要である。これらの制限は、高速で堅牢なXAS分析パイプラインの開発を妨げる。 8個の3d遷移金属(Ti-Cu)をカバーするK-edge Spectraデータベースに示すように,これらの課題をXAS予測のための伝達学習アプローチを用いて解決し,それぞれが精度と効率の向上に一意に寄与する。私たちのフレームワークは3つの異なる戦略に基づいて構築されています。まず,M3GNetを用いて,吸収部位の局所化学環境の潜在的表現をXAS予測の入力として導出し,従来の工法よりも高次化を達成している。第二に、我々は階層的な伝達学習戦略を採用し、要素ごとの予測を微調整する前に、要素間で普遍的なマルチタスクモデルを訓練する。要素ワイド・ファインターン後のこのケースケードアプローチでは、要素固有モデルを最大31.5%上回るモデルが得られる。第3に、計算コストがはるかに高い異なるフィリティのシミュレーションによって生成されるスペクトルを予測するために、普遍モデルを適用し、クロスフィデリティ変換学習を実装した。このアプローチは、ターゲット忠実度だけで訓練されたモデルよりも最大24倍の精度で予測精度を向上させる。我々のアプローチは、幅広い要素に対してXAS予測に拡張可能であり、物質科学における他の深層学習モデルを強化するための一般化可能な伝達学習フレームワークを提供する。 X-ray absorption spectroscopy (XAS) is a powerful characterization technique for probing the local chemical environment of absorbing atoms. However, analyzing XAS data presents with significant challenges, often requiring extensive, computationally intensive simulations, as well as significant domain expertise. These limitations hinder the development of fast, robust XAS analysis pipelines that are essential in high-throughput studies and for autonomous experimentation. We address these challenges with a suite of transfer learning approaches for XAS prediction, each uniquely contributing to improved accuracy and efficiency, as demonstrated on K-edge spectra database covering eight 3d transition metals (Ti-Cu). Our framework is built upon three distinct strategies. First, we use M3GNet to derive latent representations of the local chemical environment of absorption sites as input for XAS prediction, achieving up to order-of-magnitude improvements over conventional featurization techniques. Second, we employ a hierarchical transfer learning strategy, training a universal multi-task model across elements before fine-tuning for element-specific predictions. This cascaded approach after element-wise fine-turning yields models that outperform element-specific models by up to 31\%. Third, we implement cross-fidelity transfer learning, adapting a universal model to predict spectra generated by simulation of a different fidelity with a much higher computational cost. This approach improves prediction accuracy by up to 24\% over models trained on the target fidelity alone. Our approach is extendable to XAS prediction for a broader range of elements and offers a generalizable transfer learning framework to enhance other deep-learning models in materials science.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 自律運転における高速収束とコミュニケーションの両立した不均一な階層的フェデレーション学習 Fast-Convergent and Communication-Alleviated Heterogeneous Hierarchical Federated Learning in Autonomous Driving ( http://arxiv.org/abs/2409.19560v1 ) ライセンス: Link先を確認	Wei-Bin Kou, Qingfeng Lin, Ming Tang, Rongguang Ye, Shuai Wang, Guangxu Zhu, Yik-Chung Wu,	(参考訳) ストリートシーンセマンティック理解(Street Scene Semantic Understanding、TriSU)は、自動運転(AD)の複雑なタスクである。しかし、特定の地理的領域のデータから訓練された推論モデルは、都市間データドメインシフトによって他の領域に適用された場合、一般化が不十分である。 Hierarchical Federated Learning (HFL)は、異なる都市の分散データセット上での協調的なプライバシ保存トレーニングによって、TriSUモデルの一般化を改善する潜在的なソリューションを提供する。残念なことに、異なる都市のデータは異なる統計特性を持つため、収束が遅い。既存のHFL法を超えて,都市間データの不均一性に対処し,収束を加速するガウス異質HFLアルゴリズム(FedGau)を提案する。提案したFedGauアルゴリズムでは、単一のRGB画像とRGBデータセットの両方をガウス分布としてモデル化し、集約重み付け設計を行う。このアプローチは、各RGB画像を統計分布で区別するだけでなく、従来検討されていたデータ量に加えて、各都市からのデータセットの統計を利用する。提案手法では既存のSOTA法と比較して35.5 %-40.6 %の収束が加速される。一方,通信資源の削減のため,新たなアダプティブ・アダプティブ・リソース・スケジューリング(AdapRS)ポリシーを導入する。隣接する2つのアグリゲーション間で一定の数のモデルを交換する従来の静的リソーススケジューリングポリシーとは異なり、AdapRSは不必要な通信を最小限に抑えるために異なるレベルのHFLのモデルアグリゲーション数を調整している。大規模な実験では、AdapRSは従来の静的リソーススケジューリングポリシーと比べて29.65 %の通信オーバーヘッドを節約し、ほぼ同じ性能を維持している。 Street Scene Semantic Understanding (denoted as TriSU) is a complex task for autonomous driving (AD). However, inference model trained from data in a particular geographical region faces poor generalization when applied in other regions due to inter-city data domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization by collaborative privacy-preserving training over distributed datasets from different cities. Unfortunately, it suffers from slow convergence because data from different cities are with disparate statistical properties. Going beyond existing HFL methods, we propose a Gaussian heterogeneous HFL algorithm (FedGau) to address inter-city data heterogeneity so that convergence can be accelerated. In the proposed FedGau algorithm, both single RGB image and RGB dataset are modelled as Gaussian distributions for aggregation weight design. This approach not only differentiates each RGB image by respective statistical distribution, but also exploits the statistics of dataset from each city in addition to the conventionally considered data volume. With the proposed approach, the convergence is accelerated by 35.5\%-40.6\% compared to existing state-of-the-art (SOTA) HFL methods. On the other hand, to reduce the involved communication resource, we further introduce a novel performance-aware adaptive resource scheduling (AdapRS) policy. Unlike the traditional static resource scheduling policy that exchanges a fixed number of models between two adjacent aggregations, AdapRS adjusts the number of model aggregation at different levels of HFL so that unnecessary communications are minimized. Extensive experiments demonstrate that AdapRS saves 29.65\% communication overhead compared to conventional static resource scheduling policy while maintaining almost the same performance.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# モデル予測制御によるバックプロパゲーションとフォワードフォワードアルゴリズムの統合 Unifying back-propagation and forward-forward algorithms through model predictive control ( http://arxiv.org/abs/2409.19561v1 ) ライセンス: Link先を確認	Lianhai Ren, Qianxiao Li,	(参考訳) 本稿では,深いニューラルネットワークをトレーニングするためのモデル予測制御(MPC)フレームワークを導入し,バックプロパゲーション(BP)アルゴリズムとフォワードフォワード(FF)アルゴリズムを体系的に統一する。同時に、様々なルックフォワードの水平線を持つ様々な中間トレーニングアルゴリズムが生まれ、パフォーマンス効率のトレードオフにつながります。定性的な結論が一般的なネットワークに渡される深層線形ネットワーク上で、このトレードオフを正確に解析する。そこで本研究では,目的とモデル仕様に基づいて最適化地平線を選択するための原理的手法を提案する。各種モデルおよびタスクの数値計算結果から,本手法の汎用性を示す。 We introduce a Model Predictive Control (MPC) framework for training deep neural networks, systematically unifying the Back-Propagation (BP) and Forward-Forward (FF) algorithms. At the same time, it gives rise to a range of intermediate training algorithms with varying look-forward horizons, leading to a performance-efficiency trade-off. We perform a precise analysis of this trade-off on a deep linear network, where the qualitative conclusions carry over to general networks. Based on our analysis, we propose a principled method to choose the optimization horizon based on given objectives and model specifications. Numerical results on various models and tasks demonstrate the versatility of our method.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# カメラ内人物再同定のためのCLIPに基づくカメラ非依存の特徴学習 CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification ( http://arxiv.org/abs/2409.19563v1 ) ライセンス: Link先を確認	Xuan Tan, Xun Gong, Yang Xiang,	(参考訳) コントラスト言語-画像事前訓練(CLIP)モデルは、歩行者画像のテキスト記述の生成に固有の利点があるため、従来の人物再識別(ReID)タスクに優れる。しかし、CLIPをカメラ内監督者再識別(ICS ReID)に直接適用することは、課題を提起する。 ICS ReIDは、カメラ間での関連なしに、各カメラ内で独立したIDラベリングを必要とする。これにより、テキストベースの拡張の有効性が制限される。そこで我々は,ICS ReIDのためのCLIPベースのカメラ非依存特徴学習(CCAFL)という新しいフレームワークを提案する。そのため、カメラ非依存の歩行者特徴を積極的に学習するためのモデルとして、ICDL(Intra-Camera Discriminative Learning)とIC(Inter-Camera Adversarial Learning)の2つのカスタムモジュールが設計されている。具体的には、まず、カメラ内歩行者画像の学習可能なテキストプロンプトを確立し、その後のカメラ内およびカメラ間学習において重要な意味的監視信号を得る。そこで, ICDLを設計し, 各カメラ内の強正・強負のサンプルを考慮し, より微細な歩行者特性を学習することで, クラス間変動を増大させる。さらに、歩行者画像が生み出すカメラの予測能力をペナルティ化し、異なる視点から歩行者を識別する能力を高めることにより、カメラ間歩行者特徴差の低減を図ることを提案する。一般的なReIDデータセットに関する大規模な実験は、我々のアプローチの有効性を実証している。特に、挑戦的なMSMT17データセットでは、mAPの精度で58.9\%に達し、最先端の手法を7.6\%上回る。コードは、https://github.com/Trangle12/CCAFL.comから入手できる。 Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model's ability to predict the camera from which a pedestrian image originates, thus enhancing the model's capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9\% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6\%. Code will be available at: https://github.com/Trangle12/CCAFL.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 多言語変換器を用いた低資源ネパール語の抽象要約 Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers ( http://arxiv.org/abs/2409.19566v1 ) ライセンス: Link先を確認	Prakash Dhakal, Daya Sagar Baral,	(参考訳) ネパール語におけるテキストの自動要約は、自然言語処理(NLP)における未探索領域である。抽出的な要約を専門とする研究が盛んに行われているが、抽象的な要約の領域、特にネパール語のような低リソース言語については、ほとんど探索されていない。本研究では,多言語トランスフォーマーモデル,特にmBARTとmT5を用いて,抽象要約によるネパールのニュース記事の見出しを生成する。この研究は、ネパールの様々なニュースポータルからのWebスクレイピングを通じて、まず要約データセットを作成することで、ネパールのテキストの要約に関連する重要な課題に対処する。これらの多言語モデルは異なる戦略を用いて微調整された。次に、ROUGEスコアと人的評価を用いて微調整モデルの性能を評価し、生成した要約が一致していることを確認し、本来の意味を伝達した。被験者は, 妥当性, 流布度, 簡潔さ, 情報性, 事実的正確性, カバレッジなどの基準に基づいて, モデルが生成したモデルの中から, 最高の要約を選択するよう依頼された。 ROUGEスコアを用いた評価では、LoRAモデルを用いた4ビット量子化mBARTは、他のモデルと比較してネパールのニュースの見出しを生成するのに有効であることが判明した。 Automatic text summarization in Nepali language is an unexplored area in natural language processing (NLP). Although considerable research has been dedicated to extractive summarization, the area of abstractive summarization, especially for low-resource languages such as Nepali, remains largely unexplored. This study explores the use of multilingual transformer models, specifically mBART and mT5, for generating headlines for Nepali news articles through abstractive summarization. The research addresses key challenges associated with summarizing texts in Nepali by first creating a summarization dataset through web scraping from various Nepali news portals. These multilingual models were then fine-tuned using different strategies. The performance of the fine-tuned models were then assessed using ROUGE scores and human evaluation to ensure the generated summaries were coherent and conveyed the original meaning. During the human evaluation, the participants were asked to select the best summary among those generated by the models, based on criteria such as relevance, fluency, conciseness, informativeness, factual accuracy, and coverage. During the evaluation with ROUGE scores, the 4-bit quantized mBART with LoRA model was found to be effective in generating better Nepali news headlines in comparison to other models and also it was selected 34.05% of the time during the human evaluation, outperforming all other fine-tuned models created for Nepali News headline generation.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 画像分割参照のための完全アライメントネットワーク Fully Aligned Network for Referring Image Segmentation ( http://arxiv.org/abs/2409.19569v1 ) ライセンス: Link先を確認	Yong Liu, Ruihao Xu, Yansong Tang,	(参考訳) 本稿では、与えられた言語記述に基づいて画像からオブジェクトをセグメント化することを目的とした参照イメージセグメンテーション(RIS)タスクに焦点を当てる。 RISの重要な問題は、ターゲットオブジェクトを認識し、セグメント化するために、異なるモダリティ間のきめ細かいアライメントを達成することである。近年,モーダル間相互作用におけるアテンション機構の進歩は大きな進歩を遂げている。しかしながら、現在の手法は、ガイドラインとして相互作用設計の明確な原則を欠く傾向にあり、モダル間の理解が不十分になる。さらに、以前のほとんどの作品では、予測に単一モードマスクデコーダを使用しており、完全なクロスモーダルアライメントの利点を失っている。これらの課題に対処するために,4つのモード間相互作用の原則に従うフルアラインドネットワーク(FAN)を提案する。合理的なルールのガイダンスにより、我々のFANは、一般的なRISベンチマーク(RefCOCO、RefCOCO+、G-Ref)の最先端のパフォーマンスをシンプルなアーキテクチャで達成する。 This paper focuses on the Referring Image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description. The critical problem of RIS is achieving fine-grained alignment between different modalities to recognize and segment the target object. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 会話クエリ生成におけるオーバーアソシエーションの負の影響について Mitigating the Negative Impact of Over-association for Conversational Query Production ( http://arxiv.org/abs/2409.19572v1 ) ライセンス: Link先を確認	Ante Wang, Linfeng Song, Zijun Min, Ge Xu, Xiaoli Wang, Junfeng Yao, Jinsong Su,	(参考訳) 会話クエリ生成は、対話履歴から検索クエリを生成することを目的としており、このクエリは、知識に基づく対話システムを支援するために、検索エンジンから関連する知識を取得するために使用される。金のクエリの可能性を最大化するために訓練された以前のモデルは、データ飢餓の問題に悩まされ、対話履歴から重要な概念を落とし、推論時に無関係な概念を生成する傾向がある。これらの問題は、多くのゴールドクエリが対話トピックと間接的に関連しているオーバー・アソシエーション現象によるもので、アノテータは、これらのゴールドクエリを生成する際に、その背景知識で無意識に推論を行う可能性があるためである。この現象が事前訓練したSeq2seqクエリー生成者に与える影響を慎重に分析し、これらの問題を複数の視点から緩和するための効果的なインスタンスレベルの重み付け戦略を提案する。 Wizard-of-InternetとDuSincという2つのベンチマークの実験は、私たちの戦略が負の効果を効果的に軽減し、パフォーマンスが大幅に向上することを示しています。さらに,本モデルでは,対話履歴からより良い概念を選択し,ベースラインの10倍のデータ効率を示す。コードはhttps://github.com/DeepLearnXMU/QG-OverAssoで公開されている。 Conversational query generation aims at producing search queries from dialogue histories, which are then used to retrieve relevant knowledge from a search engine to help knowledge-based dialogue systems. Trained to maximize the likelihood of gold queries, previous models suffer from the data hunger issue, and they tend to both drop important concepts from dialogue histories and generate irrelevant concepts at inference time. We attribute these issues to the over-association phenomenon where a large number of gold queries are indirectly related to the dialogue topics, because annotators may unconsciously perform reasoning with their background knowledge when generating these gold queries. We carefully analyze the negative effects of this phenomenon on pretrained Seq2seq query producers and then propose effective instance-level weighting strategies for training to mitigate these issues from multiple perspectives. Experiments on two benchmarks, Wizard-of-Internet and DuSinc, show that our strategies effectively alleviate the negative effects and lead to significant performance gains (2%-5% across automatic metrics and human evaluation). Further analysis shows that our model selects better concepts from dialogue histories and is 10 times more data efficient than the baseline. The code is available at https://github.com/DeepLearnXMU/QG-OverAsso.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 視覚的接地による鍵情報抽出の強化 See then Tell: Enhancing Key Information Extraction with Vision Grounding ( http://arxiv.org/abs/2409.19573v1 ) ライセンス: Link先を確認	Shuhang Liu, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Qing Wang, Jianshu Zhang, Chenyu Liu,	(参考訳) デジタル時代には、テキスト、複雑なレイアウト、画像を統合する視覚的にリッチな文書を理解する能力が不可欠である。従来のキー情報抽出(KIE)手法は主に光学文字認識(OCR)に依存しており、大きなレイテンシ、計算オーバーヘッド、エラーをもたらすことが多い。現在の高度な画像からテキストへのアプローチは、OCRをバイパスし、通常、対応する視覚的接地を伴わない平易なテキスト出力を出力する。本稿では,視覚基盤の正確な答えを提供するために設計された,新しいエンドツーエンドモデルSTNet(See then Tell Net)を紹介する。直感的には、STNetは固有の<see>トークンを使用して、関連する画像領域を観察し、このトークンにリンクされた物理座標を解釈するデコーダによって支援される。応答テキストの先頭に配置された<see>トークンは、まず入力された質問に関連する画像の領域を保存し、次に、指示されたテキスト応答を提供する。モデルの可視性を高めるため、広範囲に構造化されたテーブル認識データセットを収集する。 GPT-4の高度なテキスト処理技術を生かしたTVG(TableQA with Vision Grounding)データセットを開発した。提案手法は, CORD, SROIE, DocVQAなどの公開データセットに対して, 最先端の成果を達成し, KIE性能の大幅な向上を示す。コードは一般公開される予定だ。 In the digital era, the ability to understand visually rich documents that integrate text, complex layouts, and imagery is critical. Traditional Key Information Extraction (KIE) methods primarily rely on Optical Character Recognition (OCR), which often introduces significant latency, computational overhead, and errors. Current advanced image-to-text approaches, which bypass OCR, typically yield plain text outputs without corresponding vision grounding. In this paper, we introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding. Distinctively, STNet utilizes a unique <see> token to observe pertinent image areas, aided by a decoder that interprets physical coordinates linked to this token. Positioned at the outset of the answer text, the <see> token allows the model to first see--observing the regions of the image related to the input question--and then tell--providing articulated textual responses. To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets. Leveraging the advanced text processing prowess of GPT-4, we develop the TVG (TableQA with Vision Grounding) dataset, which not only provides text-based Question Answering (QA) pairs but also incorporates precise vision grounding for these pairs. Our approach demonstrates substantial advancements in KIE performance, achieving state-of-the-art results on publicly available datasets such as CORD, SROIE, and DocVQA. The code will also be made publicly available.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 音声視覚課題の定量的分析 : 情報理論の視点から Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective ( http://arxiv.org/abs/2409.19575v1 ) ライセンス: Link先を確認	Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang,	(参考訳) 音声言語処理の分野では、音声・視覚音声処理が研究の注目を集めている。本研究の主な構成要素は, 唇読解, 音声・視覚音声認識, 音声合成などである。かなりの成功を収めたものの、理論的解析は未だに音声・視覚のタスクには不十分である。本稿では,異なるモーダル間の情報交差に着目し,情報理論に基づく定量的解析を行う。この分析は,音声・視覚処理タスクの難易度や,モダリティ統合によって得られるメリットを理解する上で有用であることを示す。 In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis. Although significant success has been achieved, theoretical analysis is still insufficient for audio-visual tasks. This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities. Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 地域スーパービジョンとモーションブラー条件を用いた高品質な人体画像アニメーション High Quality Human Image Animation using Regional Supervision and Motion Blur Condition ( http://arxiv.org/abs/2409.19580v1 ) ライセンス: Link先を確認	Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou,	(参考訳) 近年,映像拡散モデルの進歩により,時間的コヒーレンスを伴う現実的で制御可能な人間の画像アニメーションが実現されている。合理的な結果を生み出すが、既存の手法は、顔や手などの重要な領域における地域監督の必要性を無視し、動きのぼやけを明示的にモデル化することを無視し、非現実的な低品質合成に繋がる。これらの制限に対処するために、我々はまず、顔と手の忠実度を高めるために、詳細領域の地域監督を活用する。第二に、動作のぼかしを明示的にモデル化し、外観の質をさらに向上させる。第3に,高精細な人体アニメーションのための新しいトレーニング戦略を探求し,全体の忠実度を向上する。実験の結果,提案手法は最先端の手法よりも優れており,HumanDanceデータセットの再現精度 (L1) と知覚品質 (FVD) において,最強のベースラインを21.0%以上,57.4%以上向上した。コードとモデルは利用可能になる。 Recent advances in video diffusion models have enabled realistic and controllable human image animation with temporal coherence. Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial areas such as the face and hands, and neglect the explicit modeling for motion blur, leading to unrealistic low-quality synthesis. To address these limitations, we first leverage regional supervision for detailed regions to enhance face and hand faithfulness. Second, we model the motion blur explicitly to further improve the appearance quality. Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity. Experimental results demonstrate that our proposed method outperforms state-of-the-art approaches, achieving significant improvements upon the strongest baseline by more than 21.0% and 57.4% in terms of reconstruction precision (L1) and perceptual quality (FVD) on HumanDance dataset. Code and model will be made available.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# DiMB-RE: ダイエットマイクロバイオーム協会のための科学文献のマイニング DiMB-RE: Mining the Scientific Literature for Diet-Microbiome Associations ( http://arxiv.org/abs/2409.19581v1 ) ライセンス: Link先を確認	Gibong Hong, Veronica Hindle, Nadine M. Veasley, Hannah D. Holscher, Halil Kilicoglu,	(参考訳) モチベーション:腸内微生物は、最近、食事と人間の健康の特定の関係を支えている重要な要因として現れました。食事、ヒトの代謝、微生物に関する実験研究から、膨大な量の知識が集められている。しかし、この証拠はほとんど科学論文に埋もれており、この領域の生物医学文献の採掘は少ない。 DMB-REは15の実体型(例えば栄養素,微生物)と13の関連型(例:増加,改善)をアノテートした包括的コーパスである。また,名前付きエンティティ,トリガ,関係抽出のための最先端自然言語処理(NLP)モデルや,DMB-REを用いた事実検出の訓練と評価を行った。結果: DiMB-REは165記事から14,450のエンティティと4,206のリレーションシップで構成されている。 NLPモデルは、名前付きエンティティ認識(0.760 F$_{1}$)に対して合理的に動作したが、エンティティとトリガの欠如と、クロス文関係のため、エンドツーエンドの関係抽出性能は控えめであった(0.356 F$_{1}$)。結論: 我々の知る限り、ダイエットと微生物の相互作用に焦点を当てたDiMB-REは最大かつ最も多様なデータセットである。バイオメディカル文献採掘のためのベンチマークコーパスとして機能する。 DiMB-REとNLPモデルはhttps://github.com/ScienceNLP-Lab/DiMB-REで入手できる。 Motivation: The gut microbiota has recently emerged as a key factor that underpins certain connections between diet and human health. A tremendous amount of knowledge has been amassed from experimental studies on diet, human metabolism and microbiome. However, this evidence remains mostly buried in scientific publications, and biomedical literature mining in this domain remains scarce. We developed DiMB-RE, a comprehensive corpus annotated with 15 entity types (e.g., Nutrient, Microorganism) and 13 relation types (e.g., increases, improves) capturing diet-microbiome associations. We also trained and evaluated state-of-the-art natural language processing (NLP) models for named entity, trigger, and relation extraction as well as factuality detection using DiMB-RE. Results: DiMB-RE consists of 14,450 entities and 4,206 relationships from 165 articles. While NLP models performed reasonably well for named entity recognition (0.760 F$_{1}$), end-to-end relation extraction performance was modest (0.356 F$_{1}$), partly due to missed entities and triggers as well as cross-sentence relations. Conclusions: To our knowledge, DiMB-RE is largest and most diverse dataset focusing on diet-microbiome interactions. It can serve as a benchmark corpus for biomedical literature mining. Availability: DiMB-RE and the NLP models are available at https://github.com/ScienceNLP-Lab/DiMB-RE.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# テクスチャとモデルに基づくハイブリッドロバストのための自己教師付き補助学習と顔分析における公正な特徴 Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis ( http://arxiv.org/abs/2409.19582v1 ) ライセンス: Link先を確認	Shukesh Reddy, Nishit Poddar, Srijan Das, Abhijit Das,	(参考訳) 本研究では,テクスチャベースの局所記述子を特徴モデリングにブレンドし,効率的な顔分析を行うための補助課題として,自己教師あり学習(SSL)について検討する。主タスクと自己監督型補助タスクを組み合わせることは、堅牢な表現に有用である。そこで我々は,マスクオートエンコーダ(MAE)のSSLタスクを,局所パターンなどのテクスチャの特徴を再構築する補助タスクとして使用した。顔属性と顔に基づく感情分析,深度検出という,顔分析の3つの主要なパラダイムを仮説として検討した。実験結果から,提案モデルからより優れた特徴表現を抽出し,不公平かつ偏りのない顔分析を行うことができた。 In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as local patterns along with the primary task for robust and unbiased face analysis. We experimented with our hypothesis on three major paradigms of face analysis: face attribute and face-based emotion analysis, and deepfake detection. Our experiment results exhibit that better feature representation can be gleaned from our proposed model for fair and bias-less face analysis.	翻訳日:2024-11-05 22:28:30 公開日:2024-09-29
# 分子マーカーを用いたMRIの脳腫瘍分類 Brain Tumor Classification on MRI in Light of Molecular Markers ( http://arxiv.org/abs/2409.19583v1 ) ライセンス: Link先を確認	Jun Liu, Geng Yuan, Weihao Zeng, Hao Tang, Wenbin Zhang, Xue Lin, XiaoLin Xu, Dong Huang, Yanzhi Wang,	(参考訳) 研究報告では,1p/19q遺伝子の同時欠失は低次グリオーマの臨床成績と関連している。 1p19qの状態を予測できる能力は、治療計画と患者の追跡に重要である。本研究の目的は,MRIを用いた畳み込みニューラルネットワークを脳がん検出に活用することである。 RestNetやAlexNetのような公開ネットワークは、転写学習を用いて脳がんを効果的に診断できるが、このモデルには医療画像とは無関係な重みが含まれている。その結果、伝達学習モデルでは診断結果は信頼できない。信頼性の問題に対処するため、事前訓練されたモデルに依存するのではなく、ゼロからモデルを作成する。柔軟性を実現するため, オーバーフィッティングを低減し, コンボリューションスタックとドロップアウトとフル接続操作を併用し, 性能を向上した。モデルトレーニング中、与えられたデータセットを補完し、ガウスノイズを注入する。最適な選択モデルをトレーニングするために、3倍のクロスバリデーションを使用します。 InceptionV3,VGG16,MobileNetV2を事前学習モデルで微調整した場合と比較して,より優れた結果が得られる。 125のコードレプション対31の検証セットでは、コードレプション画像ではなく、1p/19qのコードレプションを分類すると96.37のF1スコア、97.46の精度、96.34のリコールを達成する。 In research findings, co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. The ability to predict 1p19q status is critical for treatment planning and patient follow-up. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection. Although public networks such as RestNet and AlexNet can effectively diagnose brain cancers using transfer learning, the model includes quite a few weights that have nothing to do with medical images. As a result, the diagnostic results are unreliable by the transfer learning model. To deal with the problem of trustworthiness, we create the model from the ground up, rather than depending on a pre-trained model. To enable flexibility, we combined convolution stacking with a dropout and full connect operation, it improved performance by reducing overfitting. During model training, we also supplement the given dataset and inject Gaussian noise. We use three--fold cross-validation to train the best selection model. Comparing InceptionV3, VGG16, and MobileNetV2 fine-tuned with pre-trained models, our model produces better results. On an validation set of 125 codeletion vs. 31 not codeletion images, the proposed network achieves 96.37\% percent F1-score, 97.46\% percent precision, and 96.34\% percent recall when classifying 1p/19q codeletion and not codeletion images.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# ターゲット話者抽出を用いたロバスト音声認識のための2段階フレームワーク Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions ( http://arxiv.org/abs/2409.19585v1 ) ライセンス: Link先を確認	Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda,	(参考訳) 雑音条件下での頑健な音声感情認識(SER)システムの開発は、異なる雑音特性によって生じる課題に直面している。従来の研究は人間の音声雑音の影響を考慮していないため、SERの適用範囲は制限されている。本稿では,ターゲット話者抽出法(TSE)とSERを用いて,この問題に対する新たな2段階の枠組みを提案する。まず、TSEモデルを訓練し、混合からターゲット話者の音声を抽出する。そして第2段階で,抽出した音声をSER訓練に用いる。さらに,第2段階におけるTSEモデルとSERモデルの共同トレーニングについて検討する。提案手法は,TSE法を使わずにベースラインと比較して14.33%の精度向上を実現し,人間の音声雑音の影響を緩和する枠組みの有効性を示した。さらに, 話者の性別を考慮した実験を行い, 異なるジェンダーの混合において, フレームワークが特に良好に機能することを示した。 Developing a robust speech emotion recognition (SER) system in noisy conditions faces challenges posed by different noise properties. Most previous studies have not considered the impact of human speech noise, thus limiting the application scope of SER. In this paper, we propose a novel two-stage framework for the problem by cascading target speaker extraction (TSE) method and SER. We first train a TSE model to extract the speech of target speaker from a mixture. Then, in the second stage, we utilize the extracted speech for SER training. Additionally, we explore a joint training of TSE and SER models in the second stage. Our developed system achieves a 14.33% improvement in unweighted accuracy (UA) compared to a baseline without using TSE method, demonstrating the effectiveness of our framework in mitigating the impact of human speech noise. Moreover, we conduct experiments considering speaker gender, showing that our framework performs particularly well in different-gender mixture.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# ヒト・イン・ザ・ループトレーニングによる全スライド画像の品質管理 Efficient Quality Control of Whole Slide Pathology Images with Human-in-the-loop Training ( http://arxiv.org/abs/2409.19587v1 ) ライセンス: Link先を確認	Abhijeet Patil, Harsh Diwakar, Jay Sawant, Nikhil Cherian Kurian, Subhash Yadav, Swapnil Rane, Tripti Bameta, Amit Sethi,	(参考訳) 病理組織学全体のスライド画像(WSI)は、特に精密腫瘍学において深層学習に基づく診断ソリューションの開発に広く利用されている。これらの診断ソフトウェアのほとんどは、トレーニングやテストデータにおけるバイアスや不純物に弱いため、不正確な診断につながる可能性がある。例えば、WSIには複数の種類の組織領域が含まれており、少なくともそのいくつかは診断に関連しないかもしれない。我々は,WSIを上皮,線条体,リンパ球,脂肪体,人工物,雑多な6つの組織領域に分離する,頑健で軽量なディープラーニングベースの分類器であるHistoROIを紹介した。 HistoROIは、ラベル付け効率のよい一般化のためのトレーニングデータのバリエーションを保証する新しいヒューマン・イン・ザ・ループ・アクティブ・ラーニング・パラダイムを用いて訓練されている。 HistoROIは、単一のデータセットでのみトレーニングされているにも関わらず、複数の臓器で一貫して良好に機能し、強力な一般化を示している。さらに,CAMELYON乳がんリンパ節とTGA肺がんデータセットを用いて,下流深層学習タスクの性能向上のためのHistoROIの有用性を検討した。前者のデータセットでは、弱い教師付き学習を用いて訓練したニューラルネットワークの転移と正常組織に対するレシーバ操作特性曲線(AUC)の下の領域は、HistoROIを用いてデータをフィルタリングすることにより0.88から0.92に増加した。同様に、AUCは肺がんデータセット上の腺癌と扁平上皮癌の分類において0.88から0.93に増加した。また,93個のアノテートされたWSIの試験データセット上で,HistoQCによるアーティファクト検出の性能向上も確認した。提案モデルの限界を解析し,潜在的な拡張についても論じる。 Histopathology whole slide images (WSIs) are being widely used to develop deep learning-based diagnostic solutions, especially for precision oncology. Most of these diagnostic softwares are vulnerable to biases and impurities in the training and test data which can lead to inaccurate diagnoses. For instance, WSIs contain multiple types of tissue regions, at least some of which might not be relevant to the diagnosis. We introduce HistoROI, a robust yet lightweight deep learning-based classifier to segregate WSI into six broad tissue regions -- epithelium, stroma, lymphocytes, adipose, artifacts, and miscellaneous. HistoROI is trained using a novel human-in-the-loop and active learning paradigm that ensures variations in training data for labeling-efficient generalization. HistoROI consistently performs well across multiple organs, despite being trained on only a single dataset, demonstrating strong generalization. Further, we have examined the utility of HistoROI in improving the performance of downstream deep learning-based tasks using the CAMELYON breast cancer lymph node and TCGA lung cancer datasets. For the former dataset, the area under the receiver operating characteristic curve (AUC) for metastasis versus normal tissue of a neural network trained using weakly supervised learning increased from 0.88 to 0.92 by filtering the data using HistoROI. Similarly, the AUC increased from 0.88 to 0.93 for the classification between adenocarcinoma and squamous cell carcinoma on the lung cancer dataset. We also found that the performance of the HistoROI improves upon HistoQC for artifact detection on a test dataset of 93 annotated WSIs. The limitations of the proposed model are analyzed, and potential extensions are also discussed.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# 画像超解像のための効果的な拡散変換器アーキテクチャ Effective Diffusion Transformer Architecture for Image Super-Resolution ( http://arxiv.org/abs/2409.19589v1 ) ライセンス: Link先を確認	Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu,	(参考訳) 近年の進歩は、拡散モデルが画像超解像において大きな可能性を秘めていることを示している。最新の手法は主に畳み込みニューラルネットワークを用いた潜時拡散モデルに基づいているが、画像生成において顕著な性能を示すトランスフォーマーを探索する試みはほとんどない。本研究では,画像超解像(DiT-SR)のための効果的な拡散変換器を設計する。実際には、DiT-SRは全体のU字型アーキテクチャを活用し、異なるステージにわたるすべての変圧器ブロックに対して均一な等方性設計を採用する。前者はマルチスケールの階層的特徴抽出を促進し、後者は計算資源を重要な層に再配置して性能をさらに向上させる。さらに、広く使われているAdaLNの制限を徹底的に分析し、異なる時間ステップで異なる周波数情報を処理するために、周波数適応型時間ステップ条件付けモジュールを提案する。広汎な実験により、DiT-SRは既存のスクラッチ拡散に基づくSR法よりも優れており、画像超解像における拡散変圧器の優越性を証明し、事前訓練された安定拡散法に先立ついくつかの手法よりも優れていることが証明された。 Recent advances indicate that diffusion models hold great promise in image super-resolution. While the latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super-resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocates the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super-resolution.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# DiffCP:拡散モデルによる超低ビット協調知覚 DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model ( http://arxiv.org/abs/2409.19592v1 ) ライセンス: Link先を確認	Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu,	(参考訳) コラボレーティブ・インテリジェンス(CP)は、スタンドアローン・インテリジェンスの本質的な限界に対する有望な解決策として浮上している。しかし、現在の無線通信システムは、膨大な帯域幅要求のため、特徴レベルおよび生レベルの協調アルゴリズムをサポートできない。本稿では, DiffCPを提案する。 DiffCPは, 特殊な拡散モデルを用いて協調者の知覚情報を効率的に圧縮する新しいCPパラダイムである。幾何条件と意味条件の両方を生成モデルに組み込むことで、DiffCPは超低通信コストで特徴レベルの協調を可能にし、CPシステムの実践的実装を前進させる。このパラダイムは既存のCPアルゴリズムにシームレスに統合して、幅広い下流タスクを強化することができる。広範な実験を通じて,コミュニケーション,計算,性能のトレードオフについて検討する。 DiffCPは,最先端のアルゴリズムと同じ性能を維持しつつ,通信コストを14.5倍に削減できることを示す。 Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# MASKDROID: マスケグラフ表現を用いたロバストAndroidマルウェア検出 MASKDROID: Robust Android Malware Detection with Masked Graph Representations ( http://arxiv.org/abs/2409.19594v1 ) ライセンス: Link先を確認	Jingnan Zheng, Jiaohao Liu, An Zhang, Jun Zeng, Ziqi Yang, Zhenkai Liang, Tat-Seng Chua,	(参考訳) Androidのマルウェア攻撃は、モバイルユーザーに深刻な脅威を与えており、自動検出システムに対する大きな需要を必要としている。マルウェア検出に使用されるさまざまなツールの中で、グラフ表現(例えば関数呼び出しグラフ)は、Androidアプリの振る舞いを特徴づける上で重要な役割を担っている。しかし、現在最先端のグラフベースのマルウェア検出装置は、マルウェア検出において優れた性能を発揮するが、敵の例には弱い。これらの敵対的な例は、正常な悪意のある入力に特定の摂動を導入することで、細心の注意を払って作られている。敵の攻撃から守るために、既存の防御機構は、通常、検出器に補足的な追加であり、しばしば敵の事例の事前の知識に依存し、目に見えない種類の攻撃に対して効果的に防御することができない、重大な制限を示す。本稿では,マルウェアを識別する強力な識別能力と,敵攻撃に対する顕著な堅牢性を備えた強力な検出器MASKDROIDを提案する。具体的には、グラフニューラルネットワーク(GNN)ベースのフレームワークにマスキング機構を導入し、ランダムに選択されたノードの小さな部分(例えば20%)を使って、MASKDROIDに入力グラフ全体を復元させる。グラフ構造内の依存関係の形で安定な悪意的セマンティクスをキャプチャする一方で,MASKDROIDがよりコンパクトな表現を学習し,良質なアプリからマルウェアを検出するための識別力を高めるために,さらにコントラスト的なモジュールを用いている。 Android malware attacks have posed a severe threat to mobile users, necessitating a significant demand for the automated detection system. Among the various tools employed in malware detection, graph representations (e.g., function call graphs) have played a pivotal role in characterizing the behaviors of Android apps. However, though achieving impressive performance in malware detection, current state-of-the-art graph-based malware detectors are vulnerable to adversarial examples. These adversarial examples are meticulously crafted by introducing specific perturbations to normal malicious inputs. To defend against adversarial attacks, existing defensive mechanisms are typically supplementary additions to detectors and exhibit significant limitations, often relying on prior knowledge of adversarial examples and failing to defend against unseen types of attacks effectively. In this paper, we propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware and remarkable robustness against adversarial attacks. Specifically, we introduce a masking mechanism into the Graph Neural Network (GNN) based framework, forcing MASKDROID to recover the whole input graph using a small portion (e.g., 20%) of randomly selected nodes.This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks. While capturing stable malicious semantics in the form of dependencies inside the graph structures, we further employ a contrastive module to encourage MASKDROID to learn more compact representations for both the benign and malicious classes to boost its discriminative power in detecting malware from benign apps and adversarial examples.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# ECCV第2受入テストチャレンジ2024における時間音像定位課題の解法 Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024 ( http://arxiv.org/abs/2409.19595v1 ) ライセンス: Link先を確認	Haowei Gu, Weihao Zhu, Yang Yang,	(参考訳) 本報告では,ビデオ中に発生する音のイベントを,予め定義された音の集合に従って局所化し,分類するTSLタスクの改良手法を提案する。昨年の第1回大会のチャンピオンソリューションは、同じ重さでオーディオとビデオのモダリティを融合させることで、TSLを探索した。 TSLタスクは音事象の局所化を目的としており、音特徴の優越性を実証する関連実験を行っている(第3部)。この結果をもとに,InterVideo, CaVMAE, VideoMAEモデルなどの音声特徴を抽出するために,様々なモデルを用いた。私たちのアプローチは最終テストで最初に0.4925のスコアでランク付けします。 This report proposes an improved method for the Temporal Sound Localisation (TSL) task, which localizes and classifies the sound events occurring in the video according to a predefined set of sound classes. The champion solution from last year's first competition has explored the TSL by fusing audio and video modalities with the same weight. Considering the TSL task aims to localize sound events, we conduct relevant experiments that demonstrated the superiority of sound features (Section 3). Based on our findings, to enhance audio modality features, we employ various models to extract audio features, such as InterVideo, CaVMAE, and VideoMAE models. Our approach ranks first in the final test with a score of 0.4925.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# グラディエントは必要なもの:赤外小ターゲット検出のためのグラディエントベースアテンションフュージョン Gradient is All You Need: Gradient-Based Attention Fusion for Infrared Small Target Detection ( http://arxiv.org/abs/2409.19599v1 ) ライセンス: Link先を確認	Chen Hu, Yian Huang, Kexuan Li, Luping Zhang, Yiming Zhu, Yufei Peng, Tian Pu, Zhenming Peng,	(参考訳) 赤外線小目標検出(IRSTD)は、民間や軍事用途で広く用いられている。しかし、IRSTDは、小さなターゲットや薄暗いターゲットが複雑な背景によって隠蔽される傾向など、いくつかの課題に直面している。この問題に対処するために,小ターゲットのエッジや勾配情報を抽出し,保存することを目的としたGradient Network(GaNet)を提案する。 GaNetはGradient Transformer(GradFormer)モジュールを採用し、中心差分畳み込み(CDC)をシミュレートして、より深い機能で勾配機能を抽出し統合している。さらに,背景情報を無視しながら,ネットワークが詳細のみに集中しないように包括的視点を提供するグローバル特徴抽出モデル(GFEM)を提案する。ネットワークと最先端技術(SOTA)のアプローチを比較し,本手法が有効であることを示す。ソースコードはhttps://github.com/greekinRoma/Gradient-Transformer.comで公開されています。 Infrared small target detection (IRSTD) is widely used in civilian and military applications. However, IRSTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Gradient Network (GaNet), which aims to extract and preserve edge and gradient information of small targets. GaNet employs the Gradient Transformer (GradFormer) module, simulating central difference convolutions (CDC) to extract and integrate gradient features with deeper features. Furthermore, we propose a global feature extraction model (GFEM) that offers a comprehensive perspective to prevent the network from focusing solely on details while neglecting the background information. We compare the network with state-of-the-art (SOTA) approaches, and the results demonstrate that our method performs effectively. Our source code is available at https://github.com/greekinRoma/Gradient-Transformer.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# 拡張クラスを用いた部分的ラベル学習のための非偏りリスク推定器 An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes ( http://arxiv.org/abs/2409.19600v1 ) ライセンス: Link先を確認	Jiayu Hu, Senlin Shu, Beibei Li, Tao Xiang, Zhongshi He,	(参考訳) 部分ラベル学習(Partial Label Learning, PLL)は、各トレーニングインスタンスが接頭辞を含む候補ラベルのセットでアノテートされていると仮定する、典型的な弱教師付き学習タスクである。近年のPLL法では、偽陽性ラベルの影響を緩和し、有望な性能を達成するために識別に基づく曖昧さが採用されている。しかし、それらはテストセットのすべてのクラスがトレーニングセットに現れることを要求し、新しいクラスが実際のアプリケーションで出現し続けるという事実を無視します。本稿では,1つ以上の拡張クラスがトレーニング段階では見えず,推論段階で現れるPLLAC(Partial Label Learning with Augmented Class)の問題に焦点をあてる。具体的には、既知のクラスをラベルなしデータと区別することにより、拡張クラスの分布を推定し、任意のPLL損失関数を組み込むことができるPLLACの理論的保証付き非バイアスリスク推定器を提案する。さらに,実験的リスク最小化器の真のリスク最小化器への収束を保証するため,推定器の誤差境界の理論的解析を行う。さらに、ネガティブな経験的リスクに起因する過度に適合する問題の影響を軽減するため、最適化目標にリスク対価正規化の項を付加する。ベンチマーク、UCI、実世界のデータセットに関する大規模な実験は、提案手法の有効性を実証している。 Partial Label Learning (PLL) is a typical weakly supervised learning task, which assumes each training instance is annotated with a set of candidate labels containing the ground-truth label. Recent PLL methods adopt identification-based disambiguation to alleviate the influence of false positive labels and achieve promising performance. However, they require all classes in the test set to have appeared in the training set, ignoring the fact that new classes will keep emerging in real applications. To address this issue, in this paper, we focus on the problem of Partial Label Learning with Augmented Class (PLLAC), where one or more augmented classes are not visible in the training stage but appear in the inference stage. Specifically, we propose an unbiased risk estimator with theoretical guarantees for PLLAC, which estimates the distribution of augmented classes by differentiating the distribution of known classes from unlabeled data and can be equipped with arbitrary PLL loss functions. Besides, we provide a theoretical analysis of the estimation error bound of the estimator, which guarantees the convergence of the empirical risk minimizer to the true risk minimizer as the number of training data tends to infinity. Furthermore, we add a risk-penalty regularization term in the optimization objective to alleviate the influence of the over-fitting issue caused by negative empirical risk. Extensive experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# 闇の中でのインファイティング:フェデレートラーニングにおけるマルチラベルバックドアアタック Infighting in the Dark: Multi-Labels Backdoor Attack in Federated Learning ( http://arxiv.org/abs/2409.19601v1 ) ライセンス: Link先を確認	Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang,	(参考訳) フェデレートラーニング(FL)は、バックドア攻撃に弱いことが示されている。分散機械学習フレームワークとして、ほとんどの研究はSBA(Single-Label Backdoor Attack)に焦点を当てている。残念なことに、MBAに事前の作業を適用することは、効果がないだけでなく、お互いを緩和する可能性がある。本稿では,MBAに先行研究を適用する際の限界について検討する。続いて, 裏口トリガを逆順に適応させて, 裏口サンプルをグローバルモデルにおけるクリーンターゲットとして処理する, 新たな多ラベルバックドア攻撃であるM2Mを提案する。我々の重要な直感は、トリガーパターンとターゲットクラスの分布との接続を確立することであり、異なるトリガーが潜在的な緩和を心配することなく、ターゲットクラスのクリーンなアクティベーションパスに沿ってバックドアをアクティベートできるようにする。広範囲な評価により、M2Mは様々な最先端の攻撃方法より優れていることが示された。この研究は、研究者や開発者にこの潜在的な脅威を警告し、効果的な検出方法の設計を促すことを目的としている。私たちのコードは後で利用可能になります。 Federated Learning (FL) has been demonstrated to be vulnerable to backdoor attacks. As a decentralized machine learning framework, most research focuses on the Single-Label Backdoor Attack (SBA), where adversaries share the same target but neglect the fact that adversaries may be unaware of each other's existence and hold different targets, i.e., Multi-Label Backdoor Attack (MBA). Unfortunately, directly applying prior work to the MBA would not only be ineffective but also potentially mitigate each other. In this paper, we first investigate the limitations of applying previous work to the MBA. Subsequently, we propose M2M, a novel multi-label backdoor attack in federated learning (FL), which adversarially adapts the backdoor trigger to ensure that the backdoored sample is processed as clean target samples in the global model. Our key intuition is to establish a connection between the trigger pattern and the target class distribution, allowing different triggers to activate backdoors along clean activation paths of the target class without concerns about potential mitigation. Extensive evaluations comprehensively demonstrate that M2M outperforms various state-of-the-art attack methods. This work aims to alert researchers and developers to this potential threat and to inspire the design of effective detection methods. Our code will be made available later.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# ビデオのセグメンテーションを指示した言語を、すべてセグメンテーションする1つの方法 One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos ( http://arxiv.org/abs/2409.19603v1 ) ライセンス: Link先を確認	Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou,	(参考訳) 本稿では,ビデオにおける言語指示による推論セグメンテーションの問題に対処するために,ビデオベースマルチモーダルな大規模言語モデルであるVideoLISAを紹介する。大規模言語モデルの推論能力と世界知識を活用し、Segment Anything Modelによって強化されたVideoLISAは、言語命令に基づいてビデオ内の時間的に一貫したセグメンテーションマスクを生成する。 LISAのような既存の画像ベース手法は、時間的ダイナミックな理解とフレーム間の一貫したセグメンテーションを必要とする、追加の時間的次元のために、ビデオタスクと競合する。 VideoLISAはこれらの課題に対処するため、Sparse Dense Smpling戦略をビデオLLMに統合し、計算制約の中で時間的コンテキストと空間的詳細をバランスさせる。さらに, 特別に設計された<TRK>トークンを用いて, 複数のフレームにまたがるオブジェクトのセグメンテーションと追跡を可能にするワンToken-Seg-Allアプローチを提案する。新しいReasonVOSベンチマークを含む多種多様なベンチマークの広範な評価は、複雑な推論、時間的理解、オブジェクト追跡を含むビデオオブジェクトセグメンテーションタスクにおいて、VideoLISAの優れたパフォーマンスを示す。 VideoLISAはビデオに最適化されているが、画像セグメンテーションへの有望な一般化を示し、言語で指示されたオブジェクトセグメンテーションの統一基盤モデルとしての可能性を明らかにしている。コードとモデルは、https://github.com/showlab/VideoLISA.comで利用可能になる。 We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions. Existing image-based methods, such as LISA, struggle with video tasks due to the additional temporal dimension, which requires temporal dynamic understanding and consistent segmentation across frames. VideoLISA addresses these challenges by integrating a Sparse Dense Sampling strategy into the video-LLM, which balances temporal context and spatial detail within computational constraints. Additionally, we propose a One-Token-Seg-All approach using a specially designed <TRK> token, enabling the model to segment and track objects across multiple frames. Extensive evaluations on diverse benchmarks, including our newly introduced ReasonVOS benchmark, demonstrate VideoLISA's superior performance in video object segmentation tasks involving complex reasoning, temporal understanding, and object tracking. While optimized for videos, VideoLISA also shows promising generalization to image segmentation, revealing its potential as a unified foundation model for language-instructed object segmentation. Code and model will be available at: https://github.com/showlab/VideoLISA.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# ハイパーコネクション Hyper-Connections ( http://arxiv.org/abs/2409.19606v1 ) ライセンス: Link先を確認	Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou,	(参考訳) 残余接続の代替として機能する,単純かつ効果的な方法であるハイパーコネクションを提案する。このアプローチは、勾配消滅と表現崩壊の間のシーソー効果のような、残差接続変種で観測される共通の欠点に特に対処する。理論的には、ハイパーコネクションにより、ネットワークは異なる深さと動的に再配列する層における特徴間の接続の強度を調整できる。我々は,高接続が残接続よりも顕著な性能向上を示すような高密度およびスパースモデルを含む,大規模言語モデルの事前学習に焦点を当てた実験を行う。視覚タスクに関する追加の実験でも同様の改善が示された。我々は、この手法が幅広いAI問題に広く適用され、有益なものになることを期待する。 We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.	翻訳日:2024-11-05 22:18:46 公開日:2024-09-29
# 視覚言語基礎モデルからのフェデレーション学習:理論的解析と方法 Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method ( http://arxiv.org/abs/2409.19610v1 ) ライセンス: Link先を確認	Bikang Pan, Wei Huang, Ye Shi,	(参考訳) CLIPのような事前学習された視覚言語基礎モデルをフェデレートラーニングに統合することは、様々なタスクにおける一般化を促進する上で大きな注目を集めている。一般的に、視覚言語モデルの連合学習は、即時学習を用いてコミュニケーションと計算コストを削減し、即時学習(即時学習)である。しかし、素早い学習におけるフェデレート学習の性能を理解するための理論的分析は限られている。本研究では,特徴学習理論を用いた素早いフェデレーション学習のための理論的分析フレームワークを構築した。具体的には,課題関連係数と課題関連係数の比率で評価できることを示す。さらに,ポートフォリオ最適化における収益とリスクの類似点と,特徴学習におけるタスク関連用語とタスク関連用語の類似点を抽出する。 2つの独立した資産を組み合わせることで、リスクを低減しつつ収入を維持するというポートフォリオ最適化からのインスピレーションを生かして、グローバル・プロンプトとローカル・プロンプトという2つのプロンプトを導入し、一般化とパーソナライゼーションのバランスをとるための迅速なポートフォリオを構築する。その結果,プロンプトポートフォリオの性能上の利点を示し,最適混合係数を導出した。これらの理論的な主張は実証実験によってさらに支持されている。 Integrating pretrained vision-language foundation models like CLIP into federated learning has attracted significant attention for enhancing generalization across diverse tasks. Typically, federated learning of vision-language models employs prompt learning to reduce communication and computational costs, i.e., prompt-based federated learning. However, there is limited theoretical analysis to understand the performance of prompt-based federated learning. In this work, we construct a theoretical analysis framework for prompt-based federated learning via feature learning theory. Specifically, we monitor the evolution of signal learning and noise memorization in prompt-based federated learning, demonstrating that performance can be assessed by the ratio of task-relevant to task-irrelevant coefficients. Furthermore, we draw an analogy between income and risk in portfolio optimization and the task-relevant and task-irrelevant terms in feature learning. Leveraging inspiration from portfolio optimization that combining two independent assets will maintain the income while reducing the risk, we introduce two prompts: global prompt and local prompt to construct a prompt portfolio to balance the generalization and personalization. Consequently, we showed the performance advantage of the prompt portfolio and derived the optimal mixing coefficient. These theoretical claims have been further supported by empirical experiments.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# 言語モデル連続学習のためのLORAの意図的混合学習 Learning Attentional Mixture of LoRAs for Language Model Continual Learning ( http://arxiv.org/abs/2409.19611v1 ) ライセンス: Link先を確認	Jialin Liu, Jianhua Wu, Jie Liu, Yutai Duan,	(参考訳) Low-Rank Adaption (LoRA) を用いた細調整型大規模言語モデル (LLM) は,新しいタスクに対する継続的な学習に有効なアプローチとして広く認められている。しかし、複数のタスクを逐次処理する場合は、悲惨な忘れがちであることが多い。この目的のために,LLM に適した連続学習手法である AM-LoRA (Attentional Mixture of LoRAs) を提案する。具体的には、AM-LoRAは一連のタスクに対するLoRAのシーケンスを学習し、異なるタスクからの知識を継続的に学習する。アプローチの鍵となるのは、各LoRAからの情報を適応的に統合する知識混合モジュールとして注意機構を考案することである。注意機構により、AM-LoRAはそれぞれのLoRAの特有な貢献を効果的に活用でき、一方、破滅的な忘れを招く可能性のある相互に負の相互作用のリスクを軽減できる。さらに、注意ベクトルをよりスパースにするために、学習プロセスに$L1$ normを導入します。スパース制約により、モデルはすべてのLoRAをまとめて重み付けするのではなく、いくつかの非常に関係の深いLoRAを選択することができるため、相互干渉による影響をさらに軽減することができる。連続学習ベンチマークの実験結果は,提案手法の優位性を示している。 Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks. However, it often suffers from catastrophic forgetting when dealing with multiple tasks sequentially. To this end, we propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs. Specifically, AM-LoRA learns a sequence of LoRAs for a series of tasks to continually learn knowledge from different tasks. The key of our approach is that we devise an attention mechanism as a knowledge mixture module to adaptively integrate information from each LoRA. With the attention mechanism, AM-LoRA can efficiently leverage the distinctive contributions of each LoRA, while mitigating the risk of mutually negative interactions among them that may lead to catastrophic forgetting. Moreover, we further introduce $L1$ norm in the learning process to make the attention vector more sparse. The sparse constraints can enable the model to lean towards selecting a few highly relevant LoRAs, rather than aggregating and weighting all LoRAs collectively, which can further reduce the impact stemming from mutual interference. Experimental results on continual learning benchmarks indicate the superiority of our proposed method.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# Few-Shotセグメンテーション用ハイブリッドマンバ Hybrid Mamba for Few-Shot Segmentation ( http://arxiv.org/abs/2409.19613v1 ) ライセンス: Link先を確認	Qianxiong Xu, Xuanyi Liu, Lanyun Zhu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao,	(参考訳) 多くの小ショットセグメンテーション(FSS)メソッドは、2次複雑さに関係なく、FG(Fusion Support Foreground)をクエリ機能に利用している。最近の進歩であるMambaは、シーケンス内依存関係をうまくキャプチャできるが、複雑さは線形のみである。したがって、FSSのシーケンス間の依存関係をキャプチャするために、クロス(アテンションのような)Mambaを考案することを目指している。単純なアイデアは、サポート機能をスキャンして、それを隠された状態に選択的に圧縮し、クエリ機能をシーケンシャルにスキャンする初期隠れ状態として使用する、というものだ。クエリ FG は FG をサポートするよりも本質的にはそれ自身に似ており、すなわち、クエリはサポート機能をフューズするのではなく、隠れた状態から独自のものを使うのが好まれるが、FSS の成功はサポート情報の有効利用に依存している。そこで本研究では,(1) 検索時のサポート機能を定期的に再起動するMambaのハイブリッドネットワーク(HMNet)を設計し,隠れた状態が常にリッチなサポート情報を含むようにし,(2) クエリインターセプトされたMambaは,クエリピクセル間の相互通信を禁止し,隠れた状態からより多くのサポート機能を融合させる。これにより、サポート情報がより活用され、パフォーマンスが向上する。 2つの公開ベンチマークで大規模な実験が行われ、HMNetの優位性を示している。コードはhttps://github.com/Sam1224/HMNetで公開されている。 Many few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity. A recent advance Mamba can also well capture intra-sequence dependencies, yet the complexity is only linear. Hence, we aim to devise a cross (attention-like) Mamba to capture inter-sequence dependencies for FSS. A simple idea is to scan on support features to selectively compress them into the hidden state, which is then used as the initial hidden state to sequentially scan query features. Nevertheless, it suffers from (1) support forgetting issue: query features will also gradually be compressed when scanning on them, so the support features in hidden state keep reducing, and many query pixels cannot fuse sufficient support features; (2) intra-class gap issue: query FG is essentially more similar to itself rather than to support FG, i.e., query may prefer not to fuse support features but their own ones from the hidden state, yet the success of FSS relies on the effective use of support information. To tackle them, we design a hybrid Mamba network (HMNet), including (1) a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information; (2) a query intercepted Mamba to forbid the mutual interactions among query pixels, and encourage them to fuse more support features from the hidden state. Consequently, the support information is better utilized, leading to better performance. Extensive experiments have been conducted on two public benchmarks, showing the superiority of HMNet. The code is available at https://github.com/Sam1224/HMNet.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# カオスの識別:意図しない雑音から意図を遠ざけながら対向的摂動を検出する Discerning the Chaos: Detecting Adversarial Perturbations while Disentangling Intentional from Unintentional Noises ( http://arxiv.org/abs/2409.19619v1 ) ライセンス: Link先を確認	Anubhooti Jain, Susim Roy, Kwanit Gupta, Mayank Vatsa, Richa Singh,	(参考訳) 顔認識や属性予測に使用される深層学習モデルは、ガウスノイズやインパルスノイズなど、敵対的ノイズや意図しないノイズなどの操作に影響を受けやすい。本稿では, 視覚変換器を改良し, 検出層を組み込んだクラス独立適応入出力検出ネットワークCIAIを紹介する。 CIAIは、画像クラスに関係なく、意図的(敵の攻撃)と意図しないノイズの両方を検出するために、最大平均離散性とセンターロスを組み合わせた新しい損失関数を採用している。マルチステップで訓練されている。また、追加のセキュリティ層として機能する検出時の意図的側面も紹介します。 CelebA, CelebA-HQ, LFW, AgeDB, CIFAR-10データセット上で提案した検出器の性能を示す。我々の検出器は、意図的に(FGSM、PGD、DeepFoolのような)、意図しない(ガウスノイズやソルト・アンド・ペッパーノイズのような)摂動を検出することができる。 Deep learning models, such as those used for face recognition and attribute prediction, are susceptible to manipulations like adversarial noise and unintentional noise, including Gaussian and impulse noise. This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers. CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise, regardless of the image class. It is trained in a multi-step fashion. We also introduce the aspect of intent during detection that can act as an added layer of security. We further showcase the performance of our proposed detector on CelebA, CelebA-HQ, LFW, AgeDB, and CIFAR-10 datasets. Our detector is able to detect both intentional (like FGSM, PGD, and DeepFool) and unintentional (like Gaussian and Salt & Pepper noises) perturbations.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# Bitcoinのプログラミング: Bitcoinエコシステムにおけるレイヤ1とレイヤ2技術の調査 Programming on Bitcoin: A Survey of Layer 1 and Layer 2 Technologies in Bitcoin Ecosystem ( http://arxiv.org/abs/2409.19622v1 ) ライセンス: Link先を確認	Guofu Liao, Taotao Wang, Qing Yang, Yihan Xia, Long Shi, Xiang Zhao, Xiaoxiao Wu, Shengli Zhang, Anthony Chan, Richard Yuen,	(参考訳) 本稿では「Bitcoinエコシステム」の重要な部分であるビットコインブロックチェーンのプログラミング機能を強化する革新的なプロトコルについて調査する。 Bitcoinは、Unspent Transaction Output(UTXO)モデルとスタックベースのスクリプト言語を使用して、効率的なピアツーピア支払いを実現しているが、プログラミング能力とスループットの制限に直面している。 2021年のTaprootはSchnorrシグネチャアルゴリズムとP2TRトランザクションタイプを導入し、Bitcoinのプライバシとプログラミング能力を大幅に改善した。このアップグレードにより、Odinals、Atomicals、BitVMなどのプロトコルが開発され、Bitcoinのプログラミング機能を強化し、そのエコシステムが強化された。 Taprootのアップグレードの技術的側面について検討し、Taprootの機能を活用してOrdinalsやAtomicalsを含むトランザクションにNFT(non-fungible tokens)をプログラムするBitcoin Layer 1プロトコルと、fugible token Standard BRC-20とARC-20について検討する。さらに、一部のBitcoinエコシステムプロトコルをEthereumに似たレイヤ2ソリューションに分類し、Bitcoinのパフォーマンスへの影響を分析します。 Bitcoinブロックチェーンのデータを分析することで、ブロック容量、マイナ手数料、Taprootトランザクションの成長に関するメトリクスを収集します。これらのプロトコルがBitcoinのメインネットに与える影響を確認し、Bitcoinのプログラミング能力とエコシステムプロトコルに関する文献のギャップを埋め、実践者や研究者に貴重な洞察を提供する。 This paper surveys innovative protocols that enhance the programming functionality of the Bitcoin blockchain, a key part of the "Bitcoin Ecosystem." Bitcoin utilizes the Unspent Transaction Output (UTXO) model and a stack-based script language for efficient peer-to-peer payments, but it faces limitations in programming capability and throughput. The 2021 Taproot upgrade introduced the Schnorr signature algorithm and P2TR transaction type, significantly improving Bitcoin's privacy and programming capabilities. This upgrade has led to the development of protocols like Ordinals, Atomicals, and BitVM, which enhance Bitcoin's programming functionality and enrich its ecosystem. We explore the technical aspects of the Taproot upgrade and examine Bitcoin Layer 1 protocols that leverage Taproot's features to program non-fungible tokens (NFTs) into transactions, including Ordinals and Atomicals, along with the fungible token standards BRC-20 and ARC-20. Additionally, we categorize certain Bitcoin ecosystem protocols as Layer 2 solutions similar to Ethereum's, analyzing their impact on Bitcoin's performance. By analyzing data from the Bitcoin blockchain, we gather metrics on block capacity, miner fees, and the growth of Taproot transactions. Our findings confirm the positive effects of these protocols on Bitcoin's mainnet, bridging gaps in the literature regarding Bitcoin's programming capabilities and ecosystem protocols and providing valuable insights for practitioners and researchers.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# MCDDPM:脳MRIにおける教師なし異常検出のためのマルチチャンネル条件付き拡散モデル MCDDPM: Multichannel Conditional Denoising Diffusion Model for Unsupervised Anomaly Detection in Brain MRI ( http://arxiv.org/abs/2409.19623v1 ) ライセンス: Link先を確認	Vivek Kumar Trivedi, Bheeshm Sharma, P. Balamurugan,	(参考訳) 教師付きディープラーニング法を用いた脳MRIスキャンの異常検出は、解剖学的多様性とピクセルレベルのアノテーションの労働集約的要求による課題を提示する。 Denoising Diffusion Probabilistic Model (DDPM)のような生成モデルと、pDDPM、mDDPM、cDDPMのような変異は、最近、脳MRIスキャンで教師なしの異常検出を行うための強力な代替手段として現れている。これらの方法は、健康な脳のフレームレベルラベルを利用して、脳MRIスキャンで健康な組織を生成する。推論中、異常(または不健康)スキャン画像が入力として提示されると、これらのモデルが入力異常スキャンに対応する健全なスキャン画像を生成し、生成された健康スキャン画像と元の異常スキャン画像との差マップは、異常組織の画素レベル同定に必要となる。しかし、DDPM、pDDPM、mDDPMモデルから生成された健康画像は、忠実度の問題に悩まされ、医学的な意味を持たないアーティファクトを含んでいる。 cDDPMは若干の忠実さとアーチファクトの抑制を実現するが、メモリフットプリントが大幅に必要であり、他のDDPMベースモデルよりも計算コストがかかる。本研究では,脳MRIスキャンにおける異常検出のためのMCDDPM(Multi channel Conditional Denoising Diffusion Probabilistic Model)と呼ばれるDDPMの改良版を提案する。提案モデルでは, DDPM, pDDPM, mDDPMモデルと同等の計算コストとメモリ要求を伴って, DDPMモデルの表現力を向上し, トレーニング過程における健康画像からの付加情報を活用することにより, 高忠実度を実現する。複数のデータセット(例えば BraTS20, BraTS21)の実験結果から,提案手法の有望な性能が示された。コードはhttps://github.com/vivekkumartri/MCDDPMで公開されている。 Detecting anomalies in brain MRI scans using supervised deep learning methods presents challenges due to anatomical diversity and labor-intensive requirement of pixel-level annotations. Generative models like Denoising Diffusion Probabilistic Model (DDPM) and their variants like pDDPM, mDDPM, cDDPM have recently emerged to be powerful alternatives to perform unsupervised anomaly detection in brain MRI scans. These methods leverage frame-level labels of healthy brains to generate healthy tissues in brain MRI scans. During inference, when an anomalous (or unhealthy) scan image is presented as an input, these models generate a healthy scan image corresponding to the input anomalous scan, and the difference map between the generated healthy scan image and the original anomalous scan image provide the necessary pixel level identification of abnormal tissues. The generated healthy images from the DDPM, pDDPM and mDDPM models however suffer from fidelity issues and contain artifacts that do not have medical significance. While cDDPM achieves slightly better fidelity and artifact suppression, it requires huge memory footprint and is computationally expensive than the other DDPM based models. In this work, we propose an improved version of DDPM called Multichannel Conditional Denoising Diffusion Probabilistic Model (MCDDPM) for unsupervised anomaly detection in brain MRI scans. Our proposed model achieves high fidelity by making use of additional information from the healthy images during the training process, enriching the representation power of DDPM models, with a computational cost and memory requirements on par with DDPM, pDDPM and mDDPM models. Experimental results on multiple datasets (e.g. BraTS20, BraTS21) demonstrate promising performance of the proposed method. The code is available at https://github.com/vivekkumartri/MCDDPM.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# Storynizor: フレーム間同期およびシャッフルID注入による一貫性のあるストーリー生成 Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection ( http://arxiv.org/abs/2409.19624v1 ) ライセンス: Link先を確認	Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Zeng Zhao, Changjie Fan, Zhipeng Hu,	(参考訳) テキスト・画像拡散モデルの最近の進歩は、連続したストーリー画像生成に大きな関心を喚起している。本稿では,フレーム間キャラクタの一貫性の強いコヒーレントなストーリーを生成可能なモデルであるStorynizorについて紹介する。 Storynizorの中核となるイノベーションは、主要なモジュールであるID-SynchronizerとID-Injectorにある。 ID-Synchronizerは、オートマスクの自己認識モジュールと、フレーム間のイメージ間の知覚的損失を利用して、キャラクター生成の一貫性を改善し、姿勢と背景を鮮明に表現する。 IDインジェクタは、Shuffling Reference Strategy(SRS)を使用して、ID機能を特定の場所に統合し、IDベースの一貫した文字生成を強化する。さらに、Storynizorのトレーニングを容易にするために、100,000の画像からなるStoryDBと呼ばれる新しいデータセットをキュレートした。このデータセットには、さまざまな環境、レイアウト、詳細な記述を伴うジェスチャーの単一および複数文字セットが含まれている。実験結果から,Storynizorは,他のキャラクタ固有の手法と比較して,高忠実度なキャラクタ一貫性,フレキシブルな姿勢,鮮明な背景を有する優れたコヒーレントなストーリー生成を示すことが示された。 Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images. This dataset contains single and multiple-character sets in diverse environments, layouts, and gestures with detailed descriptions. Experimental results indicate that Storynizor demonstrates superior coherent story generation with high-fidelity character consistency, flexible postures, and vivid backgrounds compared to other character-specific methods.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# 抽象的議論フレームワークのアクション言語に基づく形式化 An action language-based formalisation of an abstract argumentation framework ( http://arxiv.org/abs/2409.19625v1 ) ライセンス: Link先を確認	Yann Munro, Camilo Sarmiento, Isabelle Bloch, Gauvain Bourgne, Catherine Pelachaud, Marie-Jeanne Lesot,	(参考訳) 抽象的議論フレームワークは、対話の静的表現を提供するために一般的に使用される形式主義である。しかし、議論的対話における議論の列挙は非常に重要であり、この対話の結果に影響を与える可能性がある。本稿では,抽象的議論グラフをモデル化するための新しいフレームワークを提案する。この順序を考慮に入れれば、拡張(extension)と呼ばれる対話毎にユニークな結果を導出する手段が得られます。また、終端や正しさといったいくつかの性質を確立し、完全性の2つの概念について議論する。特に、「最後の最終更新」戦略に基づく前回の変換の修正を提案し、完全性の第2形態を検証した。 An abstract argumentation framework is a commonly used formalism to provide a static representation of a dialogue. However, the order of enunciation of the arguments in an argumentative dialogue is very important and can affect the outcome of this dialogue. In this paper, we propose a new framework for modelling abstract argumentation graphs, a model that incorporates the order of enunciation of arguments. By taking this order into account, we have the means to deduce a unique outcome for each dialogue, called an extension. We also establish several properties, such as termination and correctness, and discuss two notions of completeness. In particular, we propose a modification of the previous transformation based on a "last enunciated last updated" strategy, which verifies the second form of completeness.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# IDEAW: 可逆的デュアル埋め込みによるロバストなニューラルオーディオ透かし IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding ( http://arxiv.org/abs/2409.19627v1 ) ライセンス: Link先を確認	Pengcheng Li, Xulong Zhang, Jing Xiao, Jianzong Wang,	(参考訳) 音声透かし技術は、メッセージをオーディオに埋め込んで、透かし付きオーディオからメッセージを正確に抽出する。従来の手法では、専門的な経験に基づくアルゴリズムを開発し、透かしを信号の時間領域や変換領域に埋め込む。ディープニューラルネットワークの開発に伴い、ディープラーニングベースのニューラルオーディオ透かしが登場している。従来のアルゴリズムと比較して、ニューラルオーディオの透かしは、トレーニング中に様々な攻撃を考慮することにより、より堅牢性を達成する。しかし、現在のニューラルウォーターマーキング法は、低容量で不満足な非受容性に悩まされている。さらに、ニューラルオーディオの透かしにおいて非常に重要であり、さらに顕著である透かし位置の問題は十分に研究されていない。本稿では,効率的な位置決めのための二重埋め込み型透かしモデルの設計を行う。また、ロバストネストレーニングにおいて、攻撃層が非可逆ニューラルネットワークに与える影響についても検討し、その妥当性と安定性の両方を高めるためにモデルを改善した。実験の結果,提案モデルであるIDEAWは,既存の手法と比較して,高いキャパシティと効率的な位置決め能力を備えた様々な攻撃に耐えることができることがわかった。 The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional methods develop algorithms based on expert experience to embed watermarks into the time-domain or transform-domain of signals. With the development of deep neural networks, deep learning-based neural audio watermarking has emerged. Compared to traditional algorithms, neural audio watermarking achieves better robustness by considering various attacks during training. However, current neural watermarking methods suffer from low capacity and unsatisfactory imperceptibility. Additionally, the issue of watermark locating, which is extremely important and even more pronounced in neural audio watermarking, has not been adequately studied. In this paper, we design a dual-embedding watermarking model for efficient locating. We also consider the impact of the attack layer on the invertible neural network in robustness training, improving the model to enhance both its reasonableness and stability. Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# 生活予測を継続するグラフニューラルネットワークに関する調査研究:方法論,評価,今後の展望 A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future Trends ( http://arxiv.org/abs/2409.19629v1 ) ライセンス: Link先を確認	Yucheng Wang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen,	(参考訳) Remaining Useful Life (RUL) prediction is a critical aspects of Prognostics and Health Management (PHM)。既存のディープラーニング手法は将来性を示しているが、複雑なシステムに固有の空間情報の活用に苦慮し、RUL予測の有効性を制限していることが多い。この課題に対処するため、最近の研究では、より正確なRUL予測のために空間情報をモデル化するためのグラフニューラルネットワーク(GNN)の利用について検討している。本稿では、RUL予測に適用されたGNN手法の総合的なレビューを行い、既存の手法を要約し、今後の研究のためのガイダンスを提供する。まず、GNNをRUL予測に適用する段階に基づいて、グラフ構築、グラフモデリング、グラフ情報処理、グラフ読み出しの4つの重要な段階に体系的にアプローチを分類する新しい分類法を提案する。このようにしてフィールドを整理することで、GNNパイプラインの各ステージにおけるユニークな課題と考慮点を強調します。さらに, 各種SOTA(State-of-the-art) GNN法を徹底的に評価し, 公正比較のための一貫した実験的設定を確実にする。この厳密な分析は、異なるアプローチの強みと弱みに関する貴重な洞察を与え、この分野で働く研究者や実践者の実験的ガイドとなる。最後に、GNNがRUL予測に革命をもたらし、PHM戦略の有効性を高める可能性を強調し、フィールドをさらに前進させるいくつかの有望な研究方向を特定し、議論する。ベンチマークコードはGitHubで公開されている。 Remaining Useful Life (RUL) prediction is a critical aspect of Prognostics and Health Management (PHM), aimed at predicting the future state of a system to enable timely maintenance and prevent unexpected failures. While existing deep learning methods have shown promise, they often struggle to fully leverage the spatial information inherent in complex systems, limiting their effectiveness in RUL prediction. To address this challenge, recent research has explored the use of Graph Neural Networks (GNNs) to model spatial information for more accurate RUL prediction. This paper presents a comprehensive review of GNN techniques applied to RUL prediction, summarizing existing methods and offering guidance for future research. We first propose a novel taxonomy based on the stages of adapting GNNs to RUL prediction, systematically categorizing approaches into four key stages: graph construction, graph modeling, graph information processing, and graph readout. By organizing the field in this way, we highlight the unique challenges and considerations at each stage of the GNN pipeline. Additionally, we conduct a thorough evaluation of various state-of-the-art (SOTA) GNN methods, ensuring consistent experimental settings for fair comparisons. This rigorous analysis yields valuable insights into the strengths and weaknesses of different approaches, serving as an experimental guide for researchers and practitioners working in this area. Finally, we identify and discuss several promising research directions that could further advance the field, emphasizing the potential for GNNs to revolutionize RUL prediction and enhance the effectiveness of PHM strategies. The benchmarking codes are available in GitHub: https://github.com/Frank-Wang-oss/GNN\_RUL\_Benchmarking.	翻訳日:2024-11-05 22:09:00 公開日:2024-09-29
# 古典的浅波における量子粒子統計 Quantum Particle Statistics in Classical Shallow Water Waves ( http://arxiv.org/abs/2409.19632v1 ) ライセンス: Link先を確認	Idan Ceausu, Yuval Dagan,	(参考訳) 我々は、ポテンシャル井戸における非相対論的量子粒子の新しい流体力学的類似性を示す。 Schr\\odinger方程式の実際の変種と重力キャピラリー浅瀬波の類似性を報告し、解析した。実波勾配によって局所的に振動する粒子が導かれると、粒子は波動ポテンシャルを増大させながら周期的あるいはカオス的ダイナミクスの軌跡を示す可能性がある。このアナログの粒子確率分布関数はシュリンガー方程式の標準解の量子統計を明らかにし、したがってボルン則の古典的決定論的解釈として表す。最後に、準定常状態間の遷移の古典的なメカニズムを提案する。 We present a new hydrodynamic analogy of nonrelativistic quantum particles in potential wells. Similarities between a real variant of the Schr\"odinger equation and gravity-capillary shallow water waves are reported and analyzed. We show that when locally oscillating particles are guided by real wave gradients, particles may exhibit trajectories of alternating periodic or chaotic dynamics while increasing the wave potential. The particle probability distribution function of this analogy reveals the quantum statistics of the standard solutions of the Schr\"odinger equation and thus manifests as a classical deterministic interpretation of Born's rule. Finally, a classical mechanism for the transition between quasi-stationary states is proposed.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# 時系列非教師なし領域適応のための時間的ソース復元 Temporal Source Recovery for Time-Series Source-Free Unsupervised Domain Adaptation ( http://arxiv.org/abs/2409.19635v1 ) ライセンス: Link先を確認	Yucheng Wang, Peiliang Gong, Min Wu, Felix Ott, Xiaoli Li, Lihua Xie, Zhenghua Chen,	(参考訳) Source-Free Unsupervised Domain Adaptation (SFUDA)は、ソースドメインにアクセスすることなく、トレーニング済みモデルをターゲットドメインに適応し、ソースデータのプライバシを確保する能力で人気を集めている。 SFUDAは視覚的なタスクでよく開発されているが、ドメイン間で重要な時間的依存関係を転送することの難しさから、時系列SFUDA(TS-SFUDA)への応用は制限されている。少数の研究者がこの領域を探求し始めたが、特定のソースドメイン設計に依存しており、ソースデータ所有者は特定の事前学習プロトコルに従うことが期待できないため、現実的ではない。そこで本稿では,効率的なTS-SFUDAの時間依存性をソース固有の設計を必要とせずに転送するフレームワークであるTemSRを提案する。 TemSRは、マスキング、リカバリ、最適化を活用して、ソース時間依存性を回復したソース風のディストリビューションを生成するリカバリプロセスを備えている。効率的な回復を確保するため,局所的な依存関係を復元するためのセグメントベース正規化と,ソースライクな分布の多様性を高めるためにアンカーベースリカバリの多様性の最大化を更に設計する。ソースライクな分布は、従来のUDA技術を使用してターゲットドメインに適合する。複数のTSタスクにわたる大規模な実験は、ソースドメイン設計を必要とする既存のTS-SFUDAメソッドを超越したTemSRの有効性を示す。コードはhttps://github.com/Frank-Wang-oss/TemSRで入手できる。 Source-Free Unsupervised Domain Adaptation (SFUDA) has gained popularity for its ability to adapt pretrained models to target domains without accessing source domains, ensuring source data privacy. While SFUDA is well-developed in visual tasks, its application to Time-Series SFUDA (TS-SFUDA) remains limited due to the challenge of transferring crucial temporal dependencies across domains. Although a few researchers begin to explore this area, they rely on specific source domain designs, which are impractical as source data owners cannot be expected to follow particular pretraining protocols. To solve this, we propose Temporal Source Recovery (TemSR), a framework that transfers temporal dependencies for effective TS-SFUDA without requiring source-specific designs. TemSR features a recovery process that leverages masking, recovery, and optimization to generate a source-like distribution with recovered source temporal dependencies. To ensure effective recovery, we further design segment-based regularization to restore local dependencies and anchor-based recovery diversity maximization to enhance the diversity of the source-like distribution. The source-like distribution is then adapted to the target domain using traditional UDA techniques. Extensive experiments across multiple TS tasks demonstrate the effectiveness of TemSR, even surpassing existing TS-SFUDA method that requires source domain designs. Code is available in https://github.com/Frank-Wang-oss/TemSR.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# BadHMP:人間の動き予測に対するバックドア攻撃 BadHMP: Backdoor Attack against Human Motion Prediction ( http://arxiv.org/abs/2409.19638v1 ) ライセンス: Link先を確認	Chaohui Xu, Si Wang, Chip-Hong Chang,	(参考訳) 過去の観測から得られたサブ秒間地平線上での人体運動の正確な予測は、様々な安全クリティカルな応用に不可欠である。これまで、回避攻撃に対する人間の動き予測の脆弱性を調べる研究は1つしかなかった。本稿では,人間の動作予測を対象とする最初のバックドアアタックであるBadHMPを提案する。我々のアプローチは、骨格の片腕に局所的なバックドアトリガーを埋め込むことで、有毒なトレーニングサンプルを生成し、選択された関節が比較的静止状態に留まったり、過去のステップで事前に定義された動きに追従したりすることである。その後、将来のシーケンスをターゲットシーケンスにグローバルに修正し、トレーニングデータセット全体をトラバースして、最も適した毒素サンプルを選択する。筆者らが設計したバックドアトリガとターゲットは, 有毒試料の滑らかさと自然さを保証し, モデルトレーナーによる検出を回避できるほどステルス性が高く, また, 有毒モデルが不確定な配列に対する予測忠実度を保ちつつも, その検出を回避できる。ターゲットシーケンスは、低毒性試料注入比であっても、設計された入力シーケンスによって正常に活性化することができる。 Human3.6MとCMU-Mocapの2つのデータセットと2つのネットワークアーキテクチャ(LTDとHRI)の実験結果は、BadHMPの高忠実性、有効性、ステルス性を示している。微調整防御に対する我々の攻撃のロバスト性も検証された。 Precise future human motion prediction over subsecond horizons from past observations is crucial for various safety-critical applications. To date, only one study has examined the vulnerability of human motion prediction to evasion attacks. In this paper, we propose BadHMP, the first backdoor attack that targets specifically human motion prediction. Our approach involves generating poisoned training samples by embedding a localized backdoor trigger in one arm of the skeleton, causing selected joints to remain relatively still or follow predefined motion in historical time steps. Subsequently, the future sequences are globally modified to the target sequences, and the entire training dataset is traversed to select the most suitable samples for poisoning. Our carefully designed backdoor triggers and targets guarantee the smoothness and naturalness of the poisoned samples, making them stealthy enough to evade detection by the model trainer while keeping the poisoned model unobtrusive in terms of prediction fidelity to untainted sequences. The target sequences can be successfully activated by the designed input sequences even with a low poisoned sample injection ratio. Experimental results on two datasets (Human3.6M and CMU-Mocap) and two network architectures (LTD and HRI) demonstrate the high-fidelity, effectiveness, and stealthiness of BadHMP. Robustness of our attack against fine-tuning defense is also verified.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# fCOP:カテゴリーレベルのオブジェクトからの焦点長推定 fCOP: Focal Length Estimation from Category-level Object Priors ( http://arxiv.org/abs/2409.19641v1 ) ライセンス: Link先を確認	Xinyue Zhang, Jiaqi Yang, Xiangting Meng, Abdelrahman Mohamed, Laurent Kneip,	(参考訳) コンピュータビジョンの領域では、視覚信号による3D世界の認識と再構築は、長い間コミュニティ内で激しい研究の対象となっていたカメラ固有のパラメータに大きく依存している。現実的な応用では、マンハッタン・ワールドの仮定や特別な人工キャリブレーションパターンのような強いシーン幾何学がなければ、単眼焦点距離推定は難しい課題となる。本稿では,カテゴリレベルの対象先行値を用いた単眼焦点距離推定手法を提案する。単眼深度推定とカテゴリーレベルの対象標準表現学習という,既存の2つの課題に基づいて,対象を含む画像から奥行き先と対象形状先を抽出し,クローズド形式の三重項から焦点長を推定する。シミュレーションおよび実世界データを用いた実験により,提案手法は現状よりも優れており,長期の単分子焦点長推定問題に対する有望な解であることが示された。 In the realm of computer vision, the perception and reconstruction of the 3D world through vision signals heavily rely on camera intrinsic parameters, which have long been a subject of intense research within the community. In practical applications, without a strong scene geometry prior like the Manhattan World assumption or special artificial calibration patterns, monocular focal length estimation becomes a challenging task. In this paper, we propose a method for monocular focal length estimation using category-level object priors. Based on two well-studied existing tasks: monocular depth estimation and category-level object canonical representation learning, our focal solver takes depth priors and object shape priors from images containing objects and estimates the focal length from triplets of correspondences in closed form. Our experiments on simulated and real world data demonstrate that the proposed method outperforms the current state-of-the-art, offering a promising solution to the long-standing monocular focal length estimation problem.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# 自動車ダイナミクスモデル推定のための微調整ハイブリッド物理インフォームドニューラルネットワーク Fine-Tuning Hybrid Physics-Informed Neural Networks for Vehicle Dynamics Model Estimation ( http://arxiv.org/abs/2409.19647v1 ) ライセンス: Link先を確認	Shiming Fang, Kaiyan Yu,	(参考訳) 正確なダイナミックモデリングは、特に安全のために正確な動き予測が不可欠である高速かつアジャイルな操作において、自動運転車にとって重要なものである。従来のパラメータ推定手法では、初期推定への依存、労働集約的な適合手順、複雑なテスト設定などの制限に直面している。一方、純粋にデータ駆動機械学習手法は、固有の物理的制約を捉えるのに苦労し、通常、最適なパフォーマンスのために大きなデータセットを必要とする。これらの課題に対処するために,物理に基づくモデリングとデータ駆動技術を組み合わせた,教師付きおよび教師なしの物理情報ニューラルネットワーク(PINN)を統合したFTHD(Fin-Tuning Hybrid Dynamics)手法を提案する。 FTHDは、より小さなトレーニングデータセットを使用して、トレーニング済みのDeep Dynamics Model(DDM)を微調整し、Deep Pacejka Model(DPM)のような最先端の手法よりも優れたパフォーマンスを提供し、オリジナルのDDMよりも優れたパフォーマンスを提供する。さらに、拡張カルマンフィルタ(EKF)をFTHD(EKF-FTHD)内に埋め込んで、ノイズの多い実世界のデータを効果的に管理し、車両の本質的な物理的特性を保ちながら正確な復調を保証する。提案するFTHDフレームワークは,BayesRace Physics-based Simulator を用いた大規模シミュレーションと,Indy Autonomous Challenge による実世界の実環境実験により検証された。その結果, パラメータ推定精度は従来のモデルより大幅に向上し, 既存のモデルよりも優れていた。 EKF-FTHDは、物理的洞察を維持しながら現実世界のデータをノイズ化することでロバスト性を高める。 Accurate dynamic modeling is critical for autonomous racing vehicles, especially during high-speed and agile maneuvers where precise motion prediction is essential for safety. Traditional parameter estimation methods face limitations such as reliance on initial guesses, labor-intensive fitting procedures, and complex testing setups. On the other hand, purely data-driven machine learning methods struggle to capture inherent physical constraints and typically require large datasets for optimal performance. To address these challenges, this paper introduces the Fine-Tuning Hybrid Dynamics (FTHD) method, which integrates supervised and unsupervised Physics-Informed Neural Networks (PINNs), combining physics-based modeling with data-driven techniques. FTHD fine-tunes a pre-trained Deep Dynamics Model (DDM) using a smaller training dataset, delivering superior performance compared to state-of-the-art methods such as the Deep Pacejka Model (DPM) and outperforming the original DDM. Furthermore, an Extended Kalman Filter (EKF) is embedded within FTHD (EKF-FTHD) to effectively manage noisy real-world data, ensuring accurate denoising while preserving the vehicle's essential physical characteristics. The proposed FTHD framework is validated through scaled simulations using the BayesRace Physics-based Simulator and full-scale real-world experiments from the Indy Autonomous Challenge. Results demonstrate that the hybrid approach significantly improves parameter estimation accuracy, even with reduced data, and outperforms existing models. EKF-FTHD enhances robustness by denoising real-world data while maintaining physical insights, representing a notable advancement in vehicle dynamics modeling for high-speed autonomous racing.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# OrientedFormer: リモートセンシング画像におけるEnd-to-End変換器に基づくオブジェクト指向物体検出器 OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images ( http://arxiv.org/abs/2409.19648v1 ) ライセンス: Link先を確認	Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wen-Liang Du, Rui Yao, Abdulmotaleb El Saddik,	(参考訳) リモートセンシング画像におけるオブジェクト指向物体検出は、複数方向のオブジェクトが分散しているため、難しい課題である。近年,従来のCNN方式と比較して,後処理演算子の必要性を排除して,エンドツーエンドトランスフォーマーベースの手法が成功を収めている。しかし、変換器を直接オブジェクト指向オブジェクト検出に拡張することは、次の3つの主要な問題をもたらす。 1) 物体は任意に回転し,位置及び大きさとともに角度の符号化を必要とする。 2 配向対象物の幾何学的関係は、内容と位置関係の相互作用が欠如しているため、自己注意が欠如している。 3) 対象物は, 主に位置関係における値と位置関係の相違を生じ, 正確な分類と局所化を困難にしている。本稿では,これらの問題に対処する3つの専用モジュールからなる,エンドツーエンドのトランスフォーマーに基づくオブジェクト指向検出器を提案する。まず、ガウス分布を用いた配向箱の角度、位置、大きさを符号化するガウス位置符号化を提案する。第二に、ワッサーシュタインの自己注意は幾何学的関係を導入し、ガウス的ワッサーシュタイン距離スコアを利用して、内容と位置的クエリ間の相互作用を促進する。第3に、位置問合せの周囲のサンプリング点を角度に応じて回転させることにより、値と位置問合せを整列させる指向的相互注意を提案する。 DOTA,HRSC2016, ICDAR2015のシリーズであるDIOR-Rによる6つのデータセットの実験は、我々のアプローチの有効性を示している。従来のエンドツーエンド検出器と比較して、OrientedFormerはDIOR-RとDOTA-v1.0でそれぞれ1.16および1.21 AP$_{50}$を獲得し、トレーニングエポックを3$\times$から1$\times$に下げる。コードはhttps://github.com/wokaikaixinxin/OrientedFormerで入手できる。 Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation. Recently, end-to-end transformer-based methods have achieved success by eliminating the need for post-processing operators compared to traditional CNN-based methods. However, directly extending transformers to oriented object detection presents three main issues: 1) objects rotate arbitrarily, necessitating the encoding of angles along with position and size; 2) the geometric relations of oriented objects are lacking in self-attention, due to the absence of interaction between content and positional queries; and 3) oriented objects cause misalignment, mainly between values and positional queries in cross-attention, making accurate classification and localization difficult. In this paper, we propose an end-to-end transformer-based oriented object detector, consisting of three dedicated modules to address these issues. First, Gaussian positional encoding is proposed to encode the angle, position, and size of oriented boxes using Gaussian distributions. Second, Wasserstein self-attention is proposed to introduce geometric relations and facilitate interaction between content and positional queries by utilizing Gaussian Wasserstein distance scores. Third, oriented cross-attention is proposed to align values and positional queries by rotating sampling points around the positional query according to their angles. Experiments on six datasets DIOR-R, a series of DOTA, HRSC2016 and ICDAR2015 show the effectiveness of our approach. Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_{50}$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$\times$ to 1$\times$. The codes are available at https://github.com/wokaikaixinxin/OrientedFormer.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# エゴセントリック相互作用による3次元シーンのグラウンディング Grounding 3D Scene Affordance From Egocentric Interactions ( http://arxiv.org/abs/2409.19650v1 ) ライセンス: Link先を確認	Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha,	(参考訳) 3Dシーンの空き地は、3D環境における対話的な領域を見つけることを目的としており、エージェントが周囲と知的に対話することが重要である。既存のほとんどのアプローチは、静的な幾何学的構造と視覚的外観に基づいてセマンティクスを3Dインスタンスにマッピングすることでこれを達成している。この受動的戦略は、エージェントが環境を積極的に知覚し、関与する能力を制限し、事前に定義された意味的指示に依存する。対照的に、人間は周囲との相互作用を観察し模倣することで複雑な相互作用のスキルを発達させる。このような能力でモデルを強化するために,エゴセントリックなインタラクションから3Dシーンのアベイランスを基盤として,インタラクションのエゴセントリックなビデオに基づいて,対応する3Dシーンのアベイランス領域を特定するという,新しいタスクを導入する。このタスクは、複数のソースにわたる空間的複雑さとアライメント複雑性の課題に直面する。これらの課題に対処するために,インタラクション関連サブリージョンに着目し,双方向クエリデコーダ機構を通じて異なるソースからのアプライアンス機能を調整することを目的とした,インタラクションインテンションを利用したEgocentric Interaction-driven 3D Scene Affordance Grounding(Ego-SAG)フレームワークを提案する。さらに,エゴセントリックなビデオ3D Scene Affordance Dataset (VSAD)を導入し,多種多様なインタラクションタイプと多種多様な3D環境をカバーした。 VSADにおける広範囲な実験により,提案課題の実現可能性と提案手法の有効性が検証された。 Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the environment, making it reliant on predefined semantic instructions. In contrast, humans develop complex interaction skills by observing and imitating how others interact with their surroundings. To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction. This task faces the challenges of spatial complexity and alignment complexity across multiple sources. To address these challenges, we propose the Egocentric Interaction-driven 3D Scene Affordance Grounding (Ego-SAG) framework, which utilizes interaction intent to guide the model in focusing on interaction-relevant sub-regions and aligns affordance features from different sources through a bidirectional query decoder mechanism. Furthermore, we introduce the Egocentric Video-3D Scene Affordance Dataset (VSAD), covering a wide range of common interaction types and diverse 3D environments to support this task. Extensive experiments on VSAD validate both the feasibility of the proposed task and the effectiveness of our approach.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# 心理測定尺度を用いた事前学習言語モデルにおける潜時構造の評価と操作 Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales ( http://arxiv.org/abs/2409.19655v1 ) ライセンス: Link先を確認	Maor Reuben, Ortal Slobodin, Aviad Elyshar, Idan-Chaim Cohen, Orna Braun-Lewensohn, Odeya Cohen, Rami Puzis,	(参考訳) 人間のような性格特性は、最近、大きな言語モデルで発見され、その(未知の)バイアスが人間の潜伏した心理的構造に一致するという仮説を提起した。大きな会話モデルは心理測定のアンケートに答えるのに騙されるかもしれないが、他のタスクのために訓練された何千もの単純なトランスフォーマーの潜在的な心理的構成は、現在適切な心理測定方法が欠如しているため評価できない。本稿では,標準的な心理アンケートを自然言語推論のプロンプトに再構成する方法を示し,任意のモデルの心理指標評価を支援するためのコードライブラリを提供する。我々は、88の公開モデルを用いて、人間の心理学における標準的な理論に準拠し、類似の相関関係と緩和戦略を示す、人間に似た精神保健関連構造(不安、抑うつ、一貫性の感覚を含む)の存在を実証する。心理的ツールを使用して言語モデルのパフォーマンスを解釈し、修正する能力は、より説明可能な、制御可能な、信頼できるモデルの開発を促進することができる。 Human-like personality traits have recently been discovered in large language models, raising the hypothesis that their (known and as yet undiscovered) biases conform with human latent psychological constructs. While large conversational models may be tricked into answering psychometric questionnaires, the latent psychological constructs of thousands of simpler transformers, trained for other tasks, cannot be assessed because appropriate psychometric methods are currently lacking. Here, we show how standard psychological questionnaires can be reformulated into natural language inference prompts, and we provide a code library to support the psychometric assessment of arbitrary models. We demonstrate, using a sample of 88 publicly available models, the existence of human-like mental health-related constructs (including anxiety, depression, and Sense of Coherence) which conform with standard theories in human psychology and show similar correlations and mitigation strategies. The ability to interpret and rectify the performance of language models by using psychological tools can boost the development of more explainable, controllable, and trustworthy models.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# マルチモーダルLLMを用いた合成データからの学習によるマルチモーダル誤情報検出 Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs ( http://arxiv.org/abs/2409.19656v1 ) ライセンス: Link先を確認	Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang,	(参考訳) マルチモーダルな誤情報の検出,特に画像とテキストのペアによる検出が重要である。大規模で高品質な実世界のファクトチェックデータセットをトレーニングするには、コストがかかるため、研究者はAI技術によって生成された合成データセットを使用することができる。しかし、合成データに基づいて訓練された検出器の現実シナリオへの一般化性は、分布ギャップのため不明である。そこで本研究では,合成データと実世界のデータ分布を一致させる2つのモデルに依存しないデータ選択手法を用いて,実世界のマルチモーダル誤情報を検出するための合成データからの学習を提案する。 GPT-4V~\cite{GPT-4V} を超越して実世界のファクトチェックデータセット上でのMLLM (13B) の性能を向上させる実験を行った。 Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# 関節分割と変形性医用画像登録のためのマルチスケールデュアルアテンション周波数フュージョン Dual-Attention Frequency Fusion at Multi-Scale for Joint Segmentation and Deformable Medical Image Registration ( http://arxiv.org/abs/2409.19658v1 ) ライセンス: Link先を確認	Hongchao Zhou, Shunbo Hu,	(参考訳) 変形可能な医用画像登録は、医用画像解析の重要な側面である。近年, 医用画像登録における複雑な変形問題に対処するため, 補助的タスク(教師付きセグメンテーションなど)を活用して, 一次登録タスクの解剖学的構造情報の提供を開始している。本研究では,マルチスケールデュアルアテンション周波数融合(DAFF-Net)に基づくマルチタスク学習フレームワークを提案する。 DAFF-Netはグローバルエンコーダ、セグメンテーションデコーダ、粗大なピラミッド登録デコーダで構成される。登録復号処理中、我々は2つのタスク間の相関関係を完全に活用し、異なるスケールでの登録とセグメント化機能を融合するためのデュアルアテンション周波数特徴融合(DAFF)モジュールを設計する。 DAFFモジュールはグローバルおよびローカルな重み付け機構を通じて機能を最適化する。局所重み付けでは、高周波情報と低周波情報の両方を組み込んで、登録作業に不可欠な特徴をさらに捉えている。セグメンテーションの助けを借りて、登録はより正確な解剖学的構造情報を学び、これにより登録後の歪んだ画像の解剖学的整合性を高める。さらに,DAFFモジュールが効果的な特徴情報を抽出する能力に優れており,その応用を教師なし登録に拡張する。 3つのパブリック3次元脳磁気共鳴画像(MRI)データセットの広範囲な実験により,提案したDAFF-Netとその教師なし変種は,いくつかの評価指標において,最先端の登録方法よりも優れており,変形可能な医用画像登録におけるアプローチの有効性が実証されている。 Deformable medical image registration is a crucial aspect of medical image analysis. In recent years, researchers have begun leveraging auxiliary tasks (such as supervised segmentation) to provide anatomical structure information for the primary registration task, addressing complex deformation challenges in medical image registration. In this work, we propose a multi-task learning framework based on multi-scale dual attention frequency fusion (DAFF-Net), which simultaneously achieves the segmentation masks and dense deformation fields in a single-step estimation. DAFF-Net consists of a global encoder, a segmentation decoder, and a coarse-to-fine pyramid registration decoder. During the registration decoding process, we design the dual attention frequency feature fusion (DAFF) module to fuse registration and segmentation features at different scales, fully leveraging the correlation between the two tasks. The DAFF module optimizes the features through global and local weighting mechanisms. During local weighting, it incorporates both high-frequency and low-frequency information to further capture the features that are critical for the registration task. With the aid of segmentation, the registration learns more precise anatomical structure information, thereby enhancing the anatomical consistency of the warped images after registration. Additionally, due to the DAFF module's outstanding ability to extract effective feature information, we extend its application to unsupervised registration. Extensive experiments on three public 3D brain magnetic resonance imaging (MRI) datasets demonstrate that the proposed DAFF-Net and its unsupervised variant outperform state-of-the-art registration methods across several evaluation metrics, demonstrating the effectiveness of our approach in deformable medical image registration.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# マルチパスアグリゲーションを用いたヒト・マシーン共同視覚のためのオールインワン画像符号化 All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation ( http://arxiv.org/abs/2409.19660v1 ) ライセンス: Link先を確認	Xu Zhang, Peiyao Guo, Ming Lu, Zhan Ma,	(参考訳) マルチタスクアプリケーションのための画像符号化は、人間の知覚とマシンビジョンの両方に対応し、広範囲に研究されている。既存の手法は複数のタスク固有のエンコーダとデコーダのペアに依存しており、パラメータとビットレートの使用のオーバーヘッドが高くなり、あるいは統一された表現の下で多目的最適化の課題に直面し、性能と効率の両方を達成できなかった。そこで本研究では,マルチパスアグリゲーション(MPA, Multi-Path Aggregation)を協調型ヒューマンマシンビジョンのための既存の符号化モデルに統合し,特徴表現をオールインワンアーキテクチャで統一する。 MPAはタスクごとに異なる特徴量に基づいてタスク固有のパスに潜時的特徴を割り当てる予測器を使用し、タスク固有の特徴を後続の改善のために保存しながら、共有機能の有用性を最大化する。特徴相関を利用して、マルチタスク性能劣化を軽減するための2段階最適化戦略を開発する。共有機能の再利用では、1.89%のパラメータが特定のタスクのためにさらに拡張され、微調整され、モデル全体の広範囲な最適化が完全に回避される。実験結果から,MPAは人間の視界と機械分析タスクをまたいだタスク固有および多目的の最適化において,最先端の手法に匹敵する性能を達成していることがわかった。さらに、我々のオールインワン設計は、人間と機械指向の再構築間のシームレスな遷移をサポートし、統一モデルを変更することなくタスク制御可能な解釈を可能にする。コードはhttps://github.com/NJUVISION/MPA.comで入手できる。 Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at https://github.com/NJUVISION/MPA.	翻訳日:2024-11-05 21:58:59 公開日:2024-09-29
# 整数二次計画法の局所探索 Local Search for Integer Quadratic Programming ( http://arxiv.org/abs/2409.19668v1 ) ライセンス: Link先を確認	Xiang He, Peng Lin, Shaowei Cai,	(参考訳) Integer Quadratic Programming (IQP) はオペレーション研究において重要な問題である。局所探索は難しい問題を解くための強力な手法であるが、IQP解決のための局所探索アルゴリズムの研究はまだ初期段階にある。本稿では、LS-IQCQPと呼ばれる一般IQPを解くための効率的な局所探索解法を開発する。目的関数,制約,あるいはその両方で2次項を扱えるIQP用の新しい局所探索演算子を4つ提案する。さらに、2モードの局所探索アルゴリズムを導入し、新たに設計されたスコアリング機能を利用して探索プロセスを強化する。標準IQPベンチマークQPLIBとMINLPLIBで実験を行い、LS-IQCQPと最先端IQPソルバを比較した。実験の結果、LS-IQCQPは最も強力な商用解法であるグロビと競合し、他の最先端解法よりも優れていることが示された。さらに、LS-IQCQPはQPLIBとMINLPLIBのオープンインスタンスに6つの新しいレコードを樹立した。 Integer Quadratic Programming (IQP) is an important problem in operations research. Local search is a powerful method for solving hard problems, but the research on local search algorithms for IQP solving is still on its early stage. This paper develops an efficient local search solver for solving general IQP, called LS-IQCQP. We propose four new local search operators for IQP that can handle quadratic terms in the objective function, constraints or both. Furthermore, a two-mode local search algorithm is introduced, utilizing newly designed scoring functions to enhance the search process. Experiments are conducted on standard IQP benchmarks QPLIB and MINLPLIB, comparing LS-IQCQP with several state-of-the-art IQP solvers. Experimental results demonstrate that LS-IQCQP is competitive with the most powerful commercial solver Gurobi and outperforms other state-of-the-art solvers. Moreover, LS-IQCQP has established 6 new records for QPLIB and MINLPLIB open instances.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 非理想性認識トレーニングは、敵の攻撃に対して間欠的ネットワークをより堅牢にする Nonideality-aware training makes memristive networks more robust to adversarial attacks ( http://arxiv.org/abs/2409.19671v1 ) ライセンス: Link先を確認	Dovydas Joksas, Luis Muñoz-González, Emil Lupu, Adnan Mehonic,	(参考訳) ニューラルネットワークは現在、オブジェクト分類から自然言語システムまで、幅広い領域に展開されている。 memristorのようなアナログデバイスを使った実装は、より優れた電力効率を約束する。しかし、このようなシステムではデバイス障害が頻発し、全体としては敵攻撃への曝露は広く研究されていない。本研究では, 身体的非観念に対処する一般的な手法である非観念的訓練が, 相手の強靭性にどのように影響するかを検討する。非イデアルがテスト時間中に何に遭遇するかという知識が限られているにもかかわらず、敵の堅牢性は著しく改善されていることが分かりました。 Neural networks are now deployed in a wide number of areas from object classification to natural language systems. Implementations using analog devices like memristors promise better power efficiency, potentially bringing these applications to a greater number of environments. However, such systems suffer from more frequent device faults and overall, their exposure to adversarial attacks has not been studied extensively. In this work, we investigate how nonideality-aware training - a common technique to deal with physical nonidealities - affects adversarial robustness. We find that adversarial robustness is significantly improved, even with limited knowledge of what nonidealities will be encountered during test time.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 視覚豊かな文書理解のための順序関係としてのレイアウト読解順序のモデル化 Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding ( http://arxiv.org/abs/2409.19672v1 ) ライセンス: Link先を確認	Chong Zhang, Yi Tu, Yixi Zhao, Chenshu Yuan, Huan Chen, Yue Zhang, Mingxu Chai, Ya Guo, Huijia Zhu, Qi Zhang, Tao Gui,	(参考訳) 視覚的にリッチなドキュメント(VrD)におけるレイアウト読み込み順序のモデル化と活用は、ドキュメント内のリッチな構造セマンティクスを捉えるため、ドキュメントインテリジェンスにおいて重要である。以前の作業は通常、レイアウト要素の置換、すなわちすべてのレイアウト要素を含むシーケンスとしてレイアウト読み込み順序を定式化した。しかし、この定式化はレイアウトの完全な読み出し順序情報を適切に伝達しないため、下流のVrDタスクの性能低下につながる可能性がある。この問題に対処するために、レイアウト要素の集合上の順序関係としてレイアウト読み出し順序をモデル化し、完全な読み出し順序情報に十分な表現能力を有することを提案する。改良型読み順序予測(ROP)に向けた手法の実証評価を可能にするため,レイアウト要素上の関係として読み順序アノテーションを含む包括的なベンチマークデータセットと,従来手法よりも優れた関係抽出に基づく手法を構築した。そこで本研究では,任意のVrDタスク上でのモデル性能向上のために,読み出し順序関係入力を導入することで,読み出し順序対応型パイプラインを提案する。総合的な結果から,パイプラインは一般的に下流VrDタスクに有効であることが示された。(1)読み出し順序関係情報を活用することにより,対象データセットの2つのタスク設定でSOTA結果が得られること,(2)提案したROPモデルによって生成された擬似読み出し順序情報を活用することにより,拡張モデルの性能は3つのモデルすべてと8つのクロスドメインVrD-IE/QAタスク設定で目標最適化なしで改善されている。 Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements. However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream VrD tasks. To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the reading order annotation as relations over layout elements, together with a relation-extraction-based method that outperforms previous methods. Moreover, to highlight the practical benefits of introducing the improved form of layout reading order, we propose a reading-order-relation-enhancing pipeline to improve model performance on any arbitrary VrD task by introducing additional reading order relation inputs. Comprehensive results demonstrate that the pipeline generally benefits downstream VrD tasks: (1) with utilizing the reading order relation information, the enhanced downstream models achieve SOTA results on both two task settings of the targeted dataset; (2) with utilizing the pseudo reading order information generated by the proposed ROP model, the performance of the enhanced models has improved across all three models and eight cross-domain VrD-IE/QA task settings without targeted optimization.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 計算生物学におけるシミュレーションに基づく推論の包括的ガイド A Comprehensive Guide to Simulation-based Inference in Computational Biology ( http://arxiv.org/abs/2409.19675v1 ) ライセンス: Link先を確認	Xiaoyu Wang, Ryan P. Kelly, Adrianne L. Jenner, David J. Warne, Christopher Drovandi,	(参考訳) 計算モデルは現実世界の生物学的プロセスの複雑さを捉えるのに有用である。しかし、特に実世界の観測データを扱う際には、推論タスクに適したアルゴリズムの選択は困難で未探索の領域のままである。このギャップは、特にSBI(Simulation-Based Inference)の領域において、ニューラルネットワークや統計的SBI法のような様々なパラメータ推定アルゴリズムの開発を加速させた。実世界のデータに直面する場合、SBIの方法に対して情報的選択を行う方法に関する限定的な研究があり、しばしばある種のモデルの誤特定をもたらす。本稿では,複雑な生物モデルに対するSBIアプローチを決定するための包括的ガイドラインを提供する。本ガイドラインは,実世界のデータを用いた細胞動態を記述する2つのエージェントベースモデルに適用する。我々の研究では、ニューラルネットワークのSBI手法は推論結果のシミュレーションをはるかに少なく要求する一方で、偏りのある推定を産み出す傾向があります。一方,SBI法の精度はシミュレーションの数が増えるにつれて著しく向上する。この結果から, 計算予算が十分であれば, 統計SBIがニューラルSBIを超えることが示唆された。実世界のシナリオで異なるSBI手法が有効であるだけでなく、神経性SBIアプローチを強化するための潜在的な道筋も示唆している。本研究は,生物モデルにおけるSBIの複雑な景観をナビゲートする計算生物学者にとって有用な資源であると考えられる。 Computational models are invaluable in capturing the complexities of real-world biological processes. Yet, the selection of appropriate algorithms for inference tasks, especially when dealing with real-world observational data, remains a challenging and underexplored area. This gap has spurred the development of various parameter estimation algorithms, particularly within the realm of Simulation-Based Inference (SBI), such as neural and statistical SBI methods. Limited research exists on how to make informed choices on SBI methods when faced with real-world data, which often results in some form of model misspecification. In this paper, we provide comprehensive guidelines for deciding between SBI approaches for complex biological models. We apply the guidelines to two agent-based models that describe cellular dynamics using real-world data. Our study unveils a critical insight: while neural SBI methods demand significantly fewer simulations for inference results, they tend to yield biased estimations, a trend persistent even with robust variants of these algorithms. On the other hand, the accuracy of statistical SBI methods enhances substantially as the number of simulations increases. This finding suggests that, given a sufficient computational budget, statistical SBI can surpass neural SBI in performance. Our results not only shed light on the efficacy of different SBI methodologies in real-world scenarios but also suggest potential avenues for enhancing neural SBI approaches. This study is poised to be a useful resource for computational biologists navigating the intricate landscape of SBI in biological modeling.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# SemiDDM-Weather:オールインワン逆気象除去のための半教師付き学習フレームワーク SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal ( http://arxiv.org/abs/2409.19679v1 ) ライセンス: Link先を確認	Fang Long, Wenkang Su, Zixuan Li, Lei Cai, Mingjie Li, Yuan-Gen Wang, Xiaochun Cao,	(参考訳) 逆天候除去は、悪天候下で透明な視界を回復することを目的としている。既存の方法は、主に特定の気象タイプに合わせて調整されており、広範囲のラベル付きデータに大きく依存している。この2つの制約に対処するために,教師学生ネットワーク上に構築された半教師付きオールインワン悪天候除去フレームワークを,セミDDMウェザー(SemiDDM-Weather)と呼ぶバックボーンとしてデノイング拡散モデル(DDM)を用いて提案する。筆者らは,SemiDDM-WeatherにおけるDDMバックボーンの設計について,限定ラベルデータを用いた効率的なオールインワン悪天候除去のための多対一マッピング分布の学習を容易にすることを目的とした,カスタマイズされた入力と損失関数を備えたSOTAウェーブレット拡散モデル-Wavediffを採用している。半教師学習において教師ネットワークが生み出す潜在的に不正確な擬似ラベルによる誤学習のリスクを軽減するため,教師ネットワークからの「最適」出力を擬似ラベルとして表示する品質評価と内容整合性制約を導入し,未ラベルデータによる学生ネットワークトレーニングをより効果的に指導する。実験結果から,SemiDDM-Weatherは,総合的および実世界の両方のデータセットにおいて,完全に監督された競合相手と比較して,常に高い視覚的品質と優れた悪天候の除去を提供することが示された。私たちのコードと事前訓練されたモデルは、このリポジトリで利用可能です。 Adverse weather removal aims to restore clear vision under adverse weather conditions. Existing methods are mostly tailored for specific weather types and rely heavily on extensive labeled data. In dealing with these two limitations, this paper presents a pioneering semi-supervised all-in-one adverse weather removal framework built on the teacher-student network with a Denoising Diffusion Model (DDM) as the backbone, termed SemiDDM-Weather. As for the design of DDM backbone in our SemiDDM-Weather, we adopt the SOTA Wavelet Diffusion Model-Wavediff with customized inputs and loss functions, devoted to facilitating the learning of many-to-one mapping distributions for efficient all-in-one adverse weather removal with limited label data. To mitigate the risk of misleading model training due to potentially inaccurate pseudo-labels generated by the teacher network in semi-supervised learning, we introduce quality assessment and content consistency constraints to screen the "optimal" outputs from the teacher network as the pseudo-labels, thus more effectively guiding the student network training with unlabeled data. Experimental results show that on both synthetic and real-world datasets, our SemiDDM-Weather consistently delivers high visual quality and superior adverse weather removal, even when compared to fully supervised competitors. Our code and pre-trained model are available at this repository.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# インストラクション埋め込み:タスク識別に向けてのインストラクションの潜在表現 Instruction Embedding: Latent Representations of Instructions Towards Task Identification ( http://arxiv.org/abs/2409.19680v1 ) ライセンス: Link先を確認	Yiwei Li, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li,	(参考訳) インストラクションデータは、人間レベルのパフォーマンスと整合する大規模言語モデル(LLM)の能力を改善するために不可欠である。近年のLIMAは、アライメントは、モデルが様々なタスクを解決し、事前訓練された知識とスキルを活用するために、命令のインタラクションスタイルやフォーマットを適応させるプロセスであることを示した。したがって、教育データにとって最も重要な側面は、特定の意味論や知識情報ではなく、それが表すタスクである。命令の潜在表現は、データ選択やデモ検索のような、命令関連のタスクで役割を果たす。しかし、それらは常にテキスト埋め込みから派生しており、タスクカテゴリの表現に影響を与える全体的な意味情報を含んでいる。本研究では,そのトレーニングと評価のための新しい概念,命令埋め込み,命令埋め込みベンチマーク(IEB)の構築について紹介する。そこで本研究では,PIE(Prompt-based Instruction Embedding)法を提案する。 PIEの評価は、IPB上に2つのタスクを組み込んだ他の埋め込み手法とともに、タスクカテゴリを正確に識別する上で、優れた性能を示している。さらに、4つの下流タスクへの命令埋め込みの適用は、命令関連タスクの有効性と適合性を示している。 Instruction data is crucial for improving the capability of Large Language Models (LLMs) to align with human-level performance. Recent research LIMA demonstrates that alignment is essentially a process where the model adapts instructions' interaction style or format to solve various tasks, leveraging pre-trained knowledge and skills. Therefore, for instructional data, the most important aspect is the task it represents, rather than the specific semantics and knowledge information. The latent representations of instructions play roles for some instruction-related tasks like data selection and demonstrations retrieval. However, they are always derived from text embeddings, encompass overall semantic information that influences the representation of task categories. In this work, we introduce a new concept, instruction embedding, and construct Instruction Embedding Benchmark (IEB) for its training and evaluation. Then, we propose a baseline Prompt-based Instruction Embedding (PIE) method to make the representations more attention on tasks. The evaluation of PIE, alongside other embedding methods on IEB with two designed tasks, demonstrates its superior performance in accurately identifying task categories. Moreover, the application of instruction embeddings in four downstream tasks showcases its effectiveness and suitability for instruction-related tasks.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 拡散モデルの簡易・高速蒸留 Simple and Fast Distillation of Diffusion Models ( http://arxiv.org/abs/2409.19681v1 ) ライセンス: Link先を確認	Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu,	(参考訳) 拡散に基づく生成モデルは、様々なタスクにまたがって強力な性能を示すが、これはサンプリング速度の遅いコストが伴う。効率的かつ高品質な合成を実現するため, 近年, 蒸留法に基づく加速サンプリング法が開発されている。しかし、一般に、特定の関数評価(NFE)において満足な性能を達成するために、精巧な設計による細かなチューニングに時間を要するため、実際にの使用は困難である。この問題に対処するため,拡散モデルの簡易・高速蒸留(SFD)を提案し,既存の手法で用いられるパラダイムを単純化し,微調整時間を1000$\times$まで短縮する。本研究は,バニラ蒸留法に基づくサンプリング法から始まり,合成効率と品質に影響を及ぼすいくつかの小さいが重要な要因を特定し,対処することにより,その性能を最先端に向上する。また, 単一蒸留モデルを用いて, 可変NFEを用いたサンプリングも行うことができる。大規模な実験により、SFDは、数ステップの画像生成タスクにおいて、サンプル品質と微調整コストのバランスが良好であることを実証した。例えば、SFDはCIFAR-10上で4.53 FID(NFE=2)を達成する。私たちのコードはhttps://github.com/zju-pi/diff-sampler.comから入手可能です。 Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000$\times$. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://github.com/zju-pi/diff-sampler.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 半導体二重量子ドットHeisenbergスピントリマーから導出される真のデコヒーレンス自由部分空間 True decoherence-free-subspace derived from a semiconductor double quantum dot Heisenberg spin-trimer ( http://arxiv.org/abs/2409.19683v1 ) ライセンス: Link先を確認	Wonjin Jang, Jehyun Kim, Jaemin Park, Min-Kyun Cho, Hyeongyu Jang, Sangwoo Sim, Hwanchul Jung, Vladimir Umansky, Dohun Kim,	(参考訳) 固体系のスピンは本質的に量子シミュレーションや量子情報処理の量子ビットとして機能する。スピン量子ビットは通常、環境磁場のゆらぎに起因するが、デコヒーレンスフリーサブスペース(DFS)に符号化されたスピン量子ビットは、DFSの特定の構造に依存する特定の環境ノイズから保護することができる。ここでは、反強磁性ハイゼンベルクスピン-1/2トリマーから「真の」DFSを導出し、短波長および長波長の磁場変動に対して量子状態を保護する。 3つの電子をゲート定義のGaAs二重量子ドット(DQD)に閉じ込めたスピントリマーを定義し、量子ドットの1つでウィグナー分子化を利用する。まず、DQD内において、大きな磁場差である$\Delta B_\mathrm{z}$を生成する動的核偏極(DNP)をトリマーとして利用する。大型の$\Delta B_\mathrm{z}$はトリマーの固有スペクトルを著しく変化させ、DQDの「真の」DFSをもたらすことを示す。 DFSエネルギーギャップのリアルタイムベイズ推定は、長波長のものに加えて短波長の磁場変動に対するDFSの保護を明示的に示している。我々の研究は、交換結合型量子ドットスピン鎖のコンパクトDFS構造への道を開いた。 Spins in solid systems can inherently serve as qubits for quantum simulation or quantum information processing. Spin qubits are usually prone to environmental magnetic field fluctuations; however, a spin qubit encoded in a decoherence-free-subspace (DFS) can be protected from certain degrees of environmental noise depending on the specific structure of the DFS. Here, we derive the "true" DFS from an antiferromagnetic Heisenberg spin-1/2 trimer, which protects the qubit states against both short- and long-wavelength magnetic field fluctuations. We define the spin trimer with three electrons confined in a gate-defined GaAs double quantum dot (DQD) where we exploit Wigner-molecularization in one of the quantum dots. We first utilize the trimer for dynamic nuclear polarization (DNP), which generates a sizable magnetic field difference, $\Delta B_\mathrm{z}$, within the DQD. We show that large $\Delta B_\mathrm{z}$ significantly alters the eigenspectrum of the trimer and results in the "true" DFS in the DQD. Real-time Bayesian estimation of the DFS energy gap explicitly demonstrates protection of the DFS against short-wavelength magnetic field fluctuations in addition to long-wavelength ones. Our findings pave the way toward compact DFS structures for exchange-coupled quantum dot spin chains, the internal structure of which can be coherently controlled completely decoupled from environmental magnetic fields.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# MedViLaM:医療データ理解と生成のための高度な一般化性と説明可能性を備えた多モード大言語モデル MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation ( http://arxiv.org/abs/2409.19684v1 ) ライセンス: Link先を確認	Lijian Xu, Hao Sun, Ziyu Ni, Hongsheng Li, Shaoting Zhang,	(参考訳) 医学は本質的にマルチモーダルでマルチタスクであり、テキストや画像にまたがる多様なデータモダリティがある。しかし、医療分野のほとんどのモデルは単調な単一タスクであり、優れた一般化性と説明性に欠ける。本研究では,医療データに対する汎用モデルに向けた統一視覚言語モデルであるMedViLaMを紹介する。このようなマルチタスクモデルの作成を容易にするため,我々は,連続質問応答,マルチラベル病分類,疾患の局所化,放射線診断レポートの生成と要約という,いくつかの異なるタスクからなる総合的な保持データセットとベンチマークであるMultiMedBenchをキュレートした。 MedViLaMは、すべてのMultiMedBenchタスクで強力なパフォーマンスを示し、他のジェネラリストモデルよりも大幅に優れています。さらに,新たな医療概念やタスクへのゼロショット一般化,タスク間の効果的な伝達学習,ゼロショット医学推論の出現について紹介する。 Medicine is inherently multimodal and multitask, with diverse data modalities spanning text, imaging. However, most models in medical field are unimodal single tasks and lack good generalizability and explainability. In this study, we introduce MedViLaM, a unified vision-language model towards a generalist model for medical data that can flexibly encode and interpret various forms of medical data, including clinical language and imaging, all using the same set of model weights. To facilitate the creation of such multi-task model, we have curated MultiMedBench, a comprehensive pretaining dataset and benchmark consisting of several distinct tasks, i.e., continuous question-answering, multi-label disease classification, disease localization, generation and summarization of radiology reports. MedViLaM demonstrates strong performance across all MultiMedBench tasks, frequently outpacing other generalist models by a significant margin. Additionally, we present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# カラーコード分解・適応・補間による水中生物色強調 Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation ( http://arxiv.org/abs/2409.19685v1 ) ライセンス: Link先を確認	Xiaofeng Cong, Jing Zhang, Yeying Jin, Junming Hou, Yu Zhao, Jie Gui, James Tin-Yau Kwok, Yuan Yan Tang,	(参考訳) 水中画像は、しばしば吸収と散乱効果による品質劣化に悩まされる。既存の水中画像強調アルゴリズムは、単一の固定色画像を生成し、ユーザの柔軟性と応用を制限している。この制限に対処するため,制御可能な色出力の範囲を提供しながら,水中画像の強調を行う「textit{ColorCode}」という手法を提案する。提案手法では、教師付きトレーニングにより水中画像を基準強調画像に復元し、自己再構成とクロスコンストラクションにより色と内容コードに分解する。カラーコードはガウス分布に従うように明示的に制約され、推論中に効率的なサンプリングと補間を可能にする。 ColorCodeには3つの重要な機能がある。 1) 色強調,固定色による強調画像の作成 2) 誘導画像を用いた長波長成分の制御可能な調整を可能にする色適応,及び 3)カラーコードの連続サンプリングにより複数色をスムーズに生成できるカラー補間。人気のある、挑戦的なベンチマークデータセットに対する定量的かつ視覚的な評価は、多様な、制御可能な、色現実的な拡張結果を提供することにおいて、既存のメソッドよりもColorCodeの方が優れていることを示している。ソースコードはhttps://github.com/Xiaofeng-life/ColorCodeで入手できる。 Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our approach involves recovering an underwater image to a reference enhanced image through supervised training and decomposing it into color and content codes via self-reconstruction and cross-reconstruction. The color code is explicitly constrained to follow a Gaussian distribution, allowing for efficient sampling and interpolation during inference. ColorCode offers three key features: 1) color enhancement, producing an enhanced image with a fixed color; 2) color adaptation, enabling controllable adjustments of long-wavelength color components using guidance images; and 3) color interpolation, allowing for the smooth generation of multiple colors through continuous sampling of the color code. Quantitative and visual evaluations on popular and challenging benchmark datasets demonstrate the superiority of ColorCode over existing methods in providing diverse, controllable, and color-realistic enhancement results. The source code is available at https://github.com/Xiaofeng-life/ColorCode.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# 運動マスク付き拡散モデルを用いたテキスト駆動型人体運動生成 Text-driven Human Motion Generation with Motion Masked Diffusion Model ( http://arxiv.org/abs/2409.19686v1 ) ライセンス: Link先を確認	Xingyu Chen,	(参考訳) テキスト駆動型ヒューマンモーション生成は、自然言語で条件付けられた人間のモーションシーケンスを合成するマルチモーダルタスクである。多様な条件入力の下でテキスト記述を満足すると同時に、多様で現実的な人間の行動を生成する必要がある。既存の拡散モデルに基づくアプローチは、生成の多様性と多モード性において優れた性能を持つ。しかし, 自己回帰法と比較して, 拡散法は不満足なFIDスコアにつながる人間の動作特徴の分布に適合しない。 1つの洞察は、拡散モデルは文脈的推論を通して時空間意味論の運動関係を学習する能力が欠けていることである。そこで本研究では,移動列間の文脈的関節から時空間的関係を学習する能力を明確に向上させる,新しい人体運動マスク機構である運動マスク拡散モデル(MMDM)を提案する。また、動的時間特性と空間構造を持つ人間の動作データの複雑さを考慮し、2つのマスクモデリング戦略を設計した: \textbf{time frames mask} と \textbf{body parts mask}。トレーニング中、MMDMはモーション埋め込み空間内の特定のトークンをマスクする。そして、拡散復号器は、各サンプリングステップのマスク埋め込みから全動作シーケンスを学習するように設計され、不完全表現から完全シーケンスを復元することができる。 HumanML3DとKIT-MLデータセットの実験では、動作品質とテキスト-モーションの一貫性のバランスをとることで、マスク戦略が効果的であることが示された。 Text-driven human motion generation is a multimodal task that synthesizes human motion sequences conditioned on natural language. It requires the model to satisfy textual descriptions under varying conditional inputs, while generating plausible and realistic human actions with high diversity. Existing diffusion model-based approaches have outstanding performance in the diversity and multimodality of generation. However, compared to autoregressive methods that train motion encoders before inference, diffusion methods lack in fitting the distribution of human motion features which leads to an unsatisfactory FID score. One insight is that the diffusion model lack the ability to learn the motion relations among spatio-temporal semantics through contextual reasoning. To solve this issue, in this paper, we proposed Motion Masked Diffusion Model \textbf{(MMDM)}, a novel human motion masked mechanism for diffusion model to explicitly enhance its ability to learn the spatio-temporal relationships from contextual joints among motion sequences. Besides, considering the complexity of human motion data with dynamic temporal characteristics and spatial structure, we designed two mask modeling strategies: \textbf{time frames mask} and \textbf{body parts mask}. During training, MMDM masks certain tokens in the motion embedding space. Then, the diffusion decoder is designed to learn the whole motion sequence from masked embedding in each sampling step, this allows the model to recover a complete sequence from incomplete representations. Experiments on HumanML3D and KIT-ML dataset demonstrate that our mask strategy is effective by balancing motion quality and text-motion consistency.	翻訳日:2024-11-05 21:49:14 公開日:2024-09-29
# ラマン分光法による魚介類生化学組成分析のための機械学習 Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis ( http://arxiv.org/abs/2409.19688v1 ) ライセンス: Link先を確認	Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen,	(参考訳) 魚の生化学成分の迅速かつ正確な検出は,魚介類産業における高付加価値製品の最適利用と抽出を容易にする重要な実世界の課題である。ラマン分光法は、機械学習回帰モデルを用いて、ラマンスペクトルと生化学参照データとを関連付けることにより、魚の生化学組成を迅速かつ非破壊的に分析するための有望なソリューションを提供する。本稿では, この課題に対処するさまざまな回帰モデルについて検討し, 水, タンパク質, 脂質の収量を予測するために, 畳み込みニューラルネットワーク(CNN)の新たな設計を提案する。我々の知る限りでは、非常に小さなラマン分光データセットに基づいて魚の生化学的組成を分析するために、CNNを用いて成功した研究を最初に行った。当社のアプローチでは,CNNアーキテクチャと包括的データ準備手順を組み合わせることで,極端なデータ不足による課題を効果的に軽減する。その結果、我々のCNNは2つの最先端のCNNモデルと複数の従来の機械学習モデルを大きく上回り、魚の生化学組成の正確かつ自動分析の道を開くことができた。 The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# InfantCryNet: 幼児の泣き声をインテリジェントに分析するためのデータ駆動フレームワーク InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries ( http://arxiv.org/abs/2409.19689v1 ) ライセンス: Link先を確認	Mengze Hong, Chen Jason Zhang, Lingxiao Yang, Yuanfeng Song, Di Jiang,	(参考訳) 幼児の泣き声の意味を理解することは、新生児の世話をする若い親にとって重要な課題である。背景雑音の存在とラベル付きデータの欠如は、泣き声を検知し、その根本原因を分析するシステム開発における実践的な課題である。本稿では,これらのタスクを実現するための新しいデータ駆動フレームワーク"InfantCryNet"を提案する。データ不足の問題に対処するために、事前学習された音声モデルを用いて、事前知識をモデルに組み込む。本稿では,より効率的に特徴を抽出するために,統計的プーリングとマルチヘッドアテンションプーリング手法を提案する。さらに、知識蒸留とモデル量子化を適用して、モデル効率を高め、モデルサイズを小さくし、モバイルデバイスの産業展開をより良く支援する。実生活データセットの実験では、提案フレームワークの優れた性能を示し、分類精度が4.4%向上した。モデル圧縮は、性能を損なうことなくモデルサイズを7%、精度を8%低下させるだけで最大28%削減し、モデル選択とシステム設計の実践的な洞察を提供する。 Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# Neural-Polyptych: 多様な遺伝子に対するコンテントコントロール可能な絵画レクリエーション Neural-Polyptych: Content Controllable Painting Recreation for Diverse Genres ( http://arxiv.org/abs/2409.19690v1 ) ライセンス: Link先を確認	Yiming Zhao, Dewen Guo, Zhouhui Lian, Yue Gao, Jianhong Han, Jie Feng, Guoping Wang, Bingfeng Zhou, Sheng Li,	(参考訳) アーティストと非スペシャリストのギャップを埋めるため,原画の断片にインタラクティブな手描きスケッチをシームレスに組み込むことで,広範かつ高解像度な絵画の作成を容易にする統一的な枠組みであるNeural-Polyptychを提案する。我々は、生成プロセスを2つの部分に分割し、グローバルな特徴とローカルな特徴を識別するマルチスケールのGANアーキテクチャを設計した。ユーザによるスケッチアウトラインから生成されたセマンティックディテールの忠実性を高めるため,我々の参照銀行戦略を利用した対応注意モジュールを提案する。これにより、高品質で複雑な要素をアートワーク内で作成することができる。最終的な結果は、これらの局所的な要素を慎重にブレンドし、一貫性のあるグローバルな一貫性を保つことで達成される。これにより、メガピクセルスケールでデジタル絵画を制作し、多様な芸術表現を収容し、ユーザーが制御された方法でコンテンツを再現することができる。我々は東洋絵画と西洋絵画の両方の多様なジャンルへのアプローチを検証する。大規模塗装, テクスチャシャッフル, ジャンル変更, 壁画の復元, 再構成などの応用は, 我々の枠組みに基づいて実現可能である。 To bridge the gap between artists and non-specialists, we present a unified framework, Neural-Polyptych, to facilitate the creation of expansive, high-resolution paintings by seamlessly incorporating interactive hand-drawn sketches with fragments from original paintings. We have designed a multi-scale GAN-based architecture to decompose the generation process into two parts, each responsible for identifying global and local features. To enhance the fidelity of semantic details generated from users' sketched outlines, we introduce a Correspondence Attention module utilizing our Reference Bank strategy. This ensures the creation of high-quality, intricately detailed elements within the artwork. The final result is achieved by carefully blending these local elements while preserving coherent global consistency. Consequently, this methodology enables the production of digital paintings at megapixel scale, accommodating diverse artistic expressions and enabling users to recreate content in a controlled manner. We validate our approach to diverse genres of both Eastern and Western paintings. Applications such as large painting extension, texture shuffling, genre switching, mural art restoration, and recomposition can be successfully based on our framework.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# CERD:エッセイにおける修辞的理解と生成のための総合的な中国の修辞的データセット CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays ( http://arxiv.org/abs/2409.19691v1 ) ライセンス: Link先を確認	Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia,	(参考訳) 既存の修辞的理解と生成データセットやコーパスは、主に単一の粗いカテゴリまたは細かなカテゴリに焦点を当て、独立したサブタスクとして扱うことで、異なる修辞的装置間の共通的な相互関係を無視している。本稿では, 比喩, 人格化, ハイパーボレ, 並列性を含む4つの大まかなカテゴリと, 形と内容の双方で23の細かなカテゴリから構成される中国のエッセイ・レトリック・データセット(CERD)を提案する。 CERDは、手動で注釈付きで包括的な中国の修辞的データセットで、5つの相互関連サブタスクがある。過去の研究と異なり,我々のデータセットは,様々な修辞装置を理解し,対応する修辞要素を認識し,与えられた条件下で修辞文を生成することを支援する。 CERDにおける複数のタスク間の相互関係を実証し、将来の修辞学研究のためのベンチマークを確立するために、広範囲な実験を行った。実験結果から,大規模言語モデルがほとんどのタスクで最高のパフォーマンスを達成し,複数のタスクを共同で微調整することで,パフォーマンスがさらに向上することが示唆された。 Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including metaphor, personification, hyperbole and parallelism and 23 fine-grained categories across both form and content levels. CERD is a manually annotated and comprehensive Chinese rhetoric dataset with five interrelated sub-tasks. Unlike previous work, our dataset aids in understanding various rhetorical devices, recognizing corresponding rhetorical components, and generating rhetorical sentences under given conditions, thereby improving the author's writing proficiency and language usage skills. Extensive experiments are conducted to demonstrate the interrelations between multiple tasks in CERD, as well as to establish a benchmark for future research on rhetoric. The experimental results indicate that Large Language Models achieve the best performance across most tasks, and jointly fine-tuning with multiple tasks further enhances performance.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# 強雑音ラベル検出器としての視覚言語モデル Vision-Language Models are Strong Noisy Label Detectors ( http://arxiv.org/abs/2409.19696v1 ) ライセンス: Link先を確認	Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang,	(参考訳) 微調整型視覚言語モデルに関する最近の研究は、様々な下流タスクにおいて印象的な性能を示している。しかし、実世界のアプリケーションで正確にラベル付けされたデータを得るという課題は、微調整の過程で大きな障害となる。この課題に対処するために、視覚言語モデルに適応するためのDeFTと呼ばれるDenoising Fine-Tuningフレームワークを提案する。 DeFTは、何百万もの補助的な画像テキストペアで事前訓練されたテキストと視覚的特徴のロバストなアライメントを利用して、ノイズの多いラベルを抽出する。提案フレームワークは,各クラスに対して正および負のテキストプロンプトを学習することにより,ノイズのあるラベル検出を行う。正のプロンプトはクラスの特徴を明らかにしようとするが、負のプロンプトはクリーンでノイズの多いサンプルを分離するための学習可能なしきい値となる。我々は、学習したテキストプロンプトとのアライメントを促進するために、事前学習されたビジュアルエンコーダの適応にパラメータ効率の微調整を用いる。一般的なフレームワークとして、DeFTは慎重に選択されたクリーンサンプルを利用して、多くの事前訓練されたモデルを下流タスクにシームレスに微調整することができる。 7つの合成および実世界のノイズデータセットの実験結果から,ノイズラベル検出と画像分類の両方においてDeFTの有効性が検証された。 Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples. We employ parameter-efficient fine-tuning for the adaptation of a pre-trained visual encoder to promote its alignment with the learned textual prompts. As a general framework, DeFT can seamlessly fine-tune many pre-trained models to downstream tasks by utilizing carefully selected clean samples. Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# フォック状態格子におけるダークステートエンジニアリング Dark-state engineering in Fock-state lattices ( http://arxiv.org/abs/2409.19697v1 ) ライセンス: Link先を確認	Xuan Zhao, Yi Xu, Le-Man Kuang, Jie-Qiao Liao,	(参考訳) フォック状態格子(FSL)は、量子物理学において新たなホットスポットになりつつある。これは、FSLが原子-場相互作用の研究の新しい視点を提供するだけでなく、量子光学と凝縮物質物理学の接続を構築するためでもある。格子内の複数の遷移経路のため、これらの系には固有の量子干渉効果が存在し、新しい量子コヒーレント現象を発見し、それらの応用を利用する方法がこの分野で重要かつ望ましい課題となっている。本研究では,マルチモードJaynes-Cummings(JC)モデルにより生成されたFSLの暗黒状態効果について検討する。ある種の励起数部分空間におけるFSLを考慮し、矢頭行列法を用いて原子励起状態に関連する状態に関する暗黒状態について検討する。直交ダークステートの数によって決定される次元を持つダークステート部分空間が存在することが分かる。次元が 1 より大きいとき、これらのダークステート基底の形式はユニークではない。さらに,2モード,3モード,4モードのJCモデルにおいて,直交暗黒状態の数と形状を求める。さらに、一般的な$N$-mode JCモデルに対して、$n$-励起部分空間に$C_{N+n-2}^{N-2}$直交ダーク状態が存在することが分かる。ダークモードとダークステートの関係も構築しています。我々の研究は、FSLに基づく量子光学効果と量子情報処理の探求の道を開く。 Fock-state lattices (FSLs) are becoming an emerging research hotspot in quantum physics, not only because the FSLs provide a new perspective for studying atom-field interactions, but also because they build the connection between quantum optics and condensed matter physics. Owing to the multiple transition paths in the lattices, inherent quantum interference effect exists in these systems, and hence how to find new quantum coherent phenomena and exploit their applications becomes a significant and desired task in this field. In this work, we study the dark-state effect in the FSLs generated by the multimode Jaynes-Cummings (JC) models. By considering the FSLs in certain-excitation-number subspaces, we study the dark states with respect to the states associated with the atomic excited state using the arrowhead-matrix method. We find that there exist dark-state subspaces with the dimensions determined by the number of orthogonal dark states. When the dimension is larger than one, the forms of these dark-state bases are not unique. Further, we obtain the number and form of the orthogonal dark states in the two-, three-, and four-mode JC models. In addition, we find that for a general $N$-mode JC model, there are $C_{N+n-2}^{N-2}$ orthogonal dark states in the $n$-excitation subspace. We also build the relationship between the dark modes and dark states. Our work will pave the way for exploring quantum optical effects and quantum information processing based on the FSLs.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# 局所測定による安定化器符号部分空間の有効検証 Efficient Verification of Stabilizer Code Subspaces with Local Measurements ( http://arxiv.org/abs/2409.19699v1 ) ライセンス: Link先を確認	Congcong Zheng, Xutao Yu, Zaichen Zhang, Ping Xu, Kun Wang,	(参考訳) 本稿では、特定の安定化器符号で保護されるように設計された量子コンピュータが、対応する論理量子ビットを正しく符号化するかどうかを検証するタスクに対処する。これを実現するために,サブスペース検証のための汎用フレームワークを開発し,実用上重要な安定化符号部分空間を探索する。まず、安定化器コード部分空間に対する2つの効率的な検証方法を提案し、安定化器生成器と安定化器群の測定をそれぞれ利用した。次に,その部分空間が特定の構造特性を示すとき,ある試験を並列に行うことができるという観測に基づいて,グラフコード部分空間に適した着色戦略と,Calderbank-Shor-Steane (CSS)コード部分空間に適したXZ戦略を提案する。安定化器ベースの戦略と比較して、これらの新しい戦略は測定設定を著しく減らし、状態コピーを減らし、準グローバルな最適性に近づいた。特に、全ての戦略は限られた数のパウリ測度を使用し、非適応的であり、混合状態に作用し、雑音量子コンピュータにおける論理量子ビットと論理演算の両方の効率的な実験的な証明を可能にする。この研究は、局所測定による安定化符号部分空間の効率的な検証に関する最初の体系的研究に寄与する。 We address the task of verifying whether a quantum computer, designed to be protected by a specific stabilizer code, correctly encodes the corresponding logical qubits. To achieve this, we develop a general framework for subspace verification and explore several stabilizer code subspaces of practical significance. First, we present two efficient verification strategies for general stabilizer code subspaces, utilizing measurements of their stabilizer generators and stabilizer groups, respectively. Then, building on the observation that certain tests can be conducted in parallel when the subspace exhibits specific structural properties, we propose a coloring strategy tailored to graph code subspaces and an XZ strategy tailored to Calderbank-Shor-Steane (CSS) code subspaces. Compared to stabilizer-based strategies, these new strategies require significantly fewer measurement settings and consume fewer state copies, approaching near-global optimality. Notably, all the strategies employ a limited number of Pauli measurements, are non-adaptive, and work on mixed states, enabling efficient experimental certification of both logical qubits and logical operations in noisy quantum computers. This work contributes to the first systematic study of efficient verification of stabilizer code subspaces with local measurements.	翻訳日:2024-11-05 21:39:30 公開日:2024-09-29
# 適応型U-Netアーキテクチャを用いたUAVからの農業画像のハイパースペクトルアンミックス Hyperspectral Unmixing of Agricultural Images taken from UAV Using Adapted U-Net Architecture ( http://arxiv.org/abs/2409.19701v1 ) ライセンス: Link先を確認	Vytautas Paura, Virginijus Marcinkevičius,	(参考訳) ハイパースペクトルアンミキシング法(Hyperspectral unmixing method)は、ハイパースペクトルデータ立方体ピクセルから物質(通常はエンドメンバーと呼ばれる)データを抽出するアルゴリズムである。ハイパースペクトルセンサデータの空間分解能が低いため、各画素は複数のエンドメンバーからの混合情報を含むことができる。本稿では,UAVに搭載されたハイパースペクトルカメラによって収集されたブルーベリーフィールドデータから作成したハイパースペクトルアンミックスデータセットを作成する。また,U-Netネットワークアーキテクチャに基づくハイパースペクトルアンミキシングアルゴリズムを提案し,既存のハイパースペクトルアンミキシングデータセットに対してより正確なアンミキシング結果を実現する。 The hyperspectral unmixing method is an algorithm that extracts material (usually called endmember) data from hyperspectral data cube pixels along with their abundances. Due to a lower spatial resolution of hyperspectral sensors data in each of the pixels may contain mixed information from multiple endmembers. In this paper we create a hyperspectral unmixing dataset, created from blueberry field data gathered by a hyperspectral camera mounted on a UAV. We also propose a hyperspectral unmixing algorithm based on U-Net network architecture to achieve more accurate unmixing results on existing and newly created hyperspectral unmixing datasets.	翻訳日:2024-11-05 21:29:26 公開日:2024-09-29
# 大規模言語モデルのための認証されたロバストな透かし A Certified Robust Watermark For Large Language Models ( http://arxiv.org/abs/2409.19708v1 ) ライセンス: Link先を確認	Xianheng Feng, Jian Liu, Kui Ren, Chun Chen,	(参考訳) AI生成テキスト識別における透かしアルゴリズムの有効性は注目されている。同時に、様々なウォーターマーク攻撃に対するロバスト性を高めるために、ウォーターマークアルゴリズムの数が増加している。しかし、これらの透かしアルゴリズムは、適応攻撃や見当たらない攻撃の影響を受けやすいままである。この問題に対処するため,我々は,ランダムな平滑化に基づく大規模言語モデルに対して,最初の確証付き頑健な透かしアルゴリズムを提案し,透かし付きテキストの保証を提供する。具体的には、透かし生成と検出にそれぞれ2つの異なるモデルを使用し、透かし検出器のトレーニングおよび推論段階における埋め込みおよび置換空間にガウスノイズと均一ノイズを加え、透かし検出器の信頼性を向上し、認証半径を導出する。透かしアルゴリズムの実証的ロバスト性および証明的ロバスト性を評価するため,包括的実験を行った。その結果,本アルゴリズムはベースラインアルゴリズムに匹敵する性能を示す一方で,精度の高いロバスト性が得られることが示唆された。 The effectiveness of watermark algorithms in AI-generated text identification has garnered significant attention. Concurrently, an increasing number of watermark algorithms have been proposed to enhance the robustness against various watermark attacks. However, these watermark algorithms remain susceptible to adaptive or unseen attacks. To address this issue, to our best knowledge, we propose the first certified robust watermark algorithm for large language models based on randomized smoothing, which can provide provable guarantees for watermarked text. Specifically, we utilize two different models respectively for watermark generation and detection and add Gaussian and Uniform noise respectively in the embedding and permutation space during the training and inference stages of the watermark detector to enhance the certified robustness of our watermark detector and derive certified radius. To evaluate the empirical robustness and certified robustness of our watermark algorithm, we conducted comprehensive experiments. The results indicate that our watermark algorithm shows comparable performance to baseline algorithms while our algorithm can derive substantial certified robustness, which means that our watermark can not be removed even under significant alterations.	翻訳日:2024-11-05 21:29:26 公開日:2024-09-29
# 脳記録からの音声テキストの非侵襲復号のためのマルチモーダルLLM A multimodal LLM for the non-invasive decoding of spoken text from brain recordings ( http://arxiv.org/abs/2409.19710v1 ) ライセンス: Link先を確認	Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni,	(参考訳) 人工知能における脳関連研究トピックは、特にコンピュータビジョンから自然言語処理へのマルチモーダルアーキテクチャの拡張により、最近人気が高まっている。本研究の主な目的は、非侵襲的なfMRI記録からの音声テキスト復号におけるこれらのアーキテクチャの可能性と限界を探ることである。視覚とテキストデータとは対照的に、fMRIデータは脳スキャナーの多様性による複雑なモダリティを表す。 i) 記録された信号形式の種類二生信号の低分解能及び雑音三生成学習の基礎モデルとして活用できる事前学習モデルの不足これらの点は、fMRI記録からのテキストの非侵襲的復号化の問題を非常に困難にしている。本稿では,fMRI信号から音声テキストを復号するマルチモーダルLLMを提案する。提案されているアーキテクチャは一エンコーダの増設埋込層を組み込んだ特定の変圧器から派生したエンコーダであって、最先端にあるものよりも優れた調整された注意機構を有するもの (II) 入力テキストの埋め込みと脳活動のエンコード埋め込みを調整して出力テキストを復号化するために適合した凍結した大言語モデル。 fMRIと会話信号が同期的に記録される、人間-ロボット相互作用と人間-ロボット相互作用のセットからなるコーパス上で行われたベンチマーク。提案手法は評価されたモデルより優れており, より正確な意味を抽出するテキストを生成することができるため, 得られた結果は非常に有望である。実装コードはhttps://github.com/Hmamouche/brain_decode.comで提供されている。 Brain-related research topics in artificial intelligence have recently gained popularity, particularly due to the expansion of what multimodal architectures can do from computer vision to natural language processing. Our main goal in this work is to explore the possibilities and limitations of these architectures in spoken text decoding from non-invasive fMRI recordings. Contrary to vision and textual data, fMRI data represent a complex modality due to the variety of brain scanners, which implies (i) the variety of the recorded signal formats, (ii) the low resolution and noise of the raw signals, and (iii) the scarcity of pretrained models that can be leveraged as foundation models for generative learning. These points make the problem of the non-invasive decoding of text from fMRI recordings very challenging. In this paper, we propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals. The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art, and (ii) a frozen large language model adapted to align the embedding of the input text and the encoded embedding of brain activity to decode the output text. A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously. The obtained results are very promising, as our proposal outperforms the evaluated models, and is able to generate text capturing more accurate semantics present in the ground truth. The implementation code is provided in https://github.com/Hmamouche/brain_decode.	翻訳日:2024-11-05 21:29:26 公開日:2024-09-29
# 後方コンフォーマル予測 Posterior Conformal Prediction ( http://arxiv.org/abs/2409.19712v1 ) ライセンス: Link先を確認	Yao Zhang, Emmanuel J. Candès,	(参考訳) コンフォーマル予測は、分布のないカバレッジ保証を伴う予測区間を構築するための一般的な手法である。カバー範囲は極端であり、人口全体に対して平均的にしか持たないが、必ずしも特定のサブグループに限らない。本稿では,データから自然に発見されたクラスタ(あるいはサブグループ)に対して,境界条件と近似条件の双方で予測間隔を生成する手法である後部共形予測(PCP)を提案する。 PCPは、クラスタ分布の混合として条件整合スコア分布をモデル化することにより、これらの保証を達成する。近似条件妥当性を持つ他の手法と比較して、本手法は特に検証データによく表されるクラスタからテストデータが引き出される場合、より厳密な間隔を生じる。 PCPはまた、ユーザが指定したサブグループの条件付きカバレッジを保証するためにも適用できる。分類において、PCPの基礎となる理論は、分類器の信頼度に基づいてカバレッジレベルを調整でき、標準の共形予測セットよりもはるかに小さな集合を達成できる。社会経済・科学・医療分野の多様なデータセットを対象としたPCPの性能評価を行った。 Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, meaning it only holds on average over the entire population but not necessarily for any specific subgroup. This article introduces a new method, posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional validity for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional conformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional validity, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it achieves robust coverage on smaller subgroups within the specified subgroups. In classification, the theory underlying PCP allows for adjusting the coverage level based on the classifier's confidence, achieving significantly smaller sets than standard conformal prediction sets. We evaluate the performance of PCP on diverse datasets from socio-economic, scientific and healthcare applications.	翻訳日:2024-11-05 21:29:26 公開日:2024-09-29
# 配電系統運用者のメタデータを用いた低電圧給電装置のピーク対応擬似測定生成 Generating peak-aware pseudo-measurements for low-voltage feeders using metadata of distribution system operators ( http://arxiv.org/abs/2409.19713v1 ) ライセンス: Link先を確認	Manuel Treutlein, Marc Schmidt, Roman Hahn, Matthias Hertel, Benedikt Heidrich, Ralf Mikut, Veit Hagenmeyer,	(参考訳) 配電系統オペレーター(DSO)は、気候中立性経路に沿った配電網の再構築や、配電網の消費と発生を管理し制御する能力など、新たな課題に対処しなければならない。課題を満たすため、分散グリッド内の測定は、しばしばDSOの基礎を形成する。したがって、測定装置が多くの低電圧(LV)グリッドにインストールされていないことは緊急の問題である。この問題を解決するために,回帰モデルを用いて,各供給者のメタデータに基づいて,非測定型LVフィードの擬似測定を推定する手法を提案する。供給者メタデータは、グリッド接続点数、消費者及び生産者の設置電力、下流のLVグリッドにおける請求データを含む。さらに、天気データ、カレンダーデータ、タイムスタンプ情報をモデル機能として使用しています。既存の測定値はモデルターゲットとして使用される。 2,323LVのフィードインを特徴とする大規模実世界のデータセット上での擬似測定を広範囲に評価した。この目的のために,BigDEAL チャレンジにインスパイアされたピークメトリクスを導入し,消費とフィードインの両面でのピークサイズ,タイミング,形状について検討する。回帰モデルとして、XGBoost、多層パーセプトロン(MLP)、線形回帰(LR)を用いる。 XGBoost と MLP は LR よりも優れていた。さらに, 気象, カレンダー, タイムスタンプの異なる条件に適応し, フィードのメタデータに基づいて現実的な負荷曲線を生成できることを示した。将来的には、サブステーション変圧器のような他のグリッドレベルにも適用でき、負荷モデリングや状態推定、LV負荷予測といった研究分野を補うことができる。 Distribution system operators (DSOs) must cope with new challenges such as the reconstruction of distribution grids along climate neutrality pathways or the ability to manage and control consumption and generation in the grid. In order to meet the challenges, measurements within the distribution grid often form the basis for DSOs. Hence, it is an urgent problem that measurement devices are not installed in many low-voltage (LV) grids. In order to overcome this problem, we present an approach to estimate pseudo-measurements for non-measured LV feeders based on the metadata of the respective feeder using regression models. The feeder metadata comprise information about the number of grid connection points, the installed power of consumers and producers, and billing data in the downstream LV grid. Additionally, we use weather data, calendar data and timestamp information as model features. The existing measurements are used as model target. We extensively evaluate the estimated pseudo-measurements on a large real-world dataset with 2,323 LV feeders characterized by both consumption and feed-in. For this purpose, we introduce peak metrics inspired by the BigDEAL challenge for the peak magnitude, timing and shape for both consumption and feed-in. As regression models, we use XGBoost, a multilayer perceptron (MLP) and a linear regression (LR). We observe that XGBoost and MLP outperform the LR. Furthermore, the results show that the approach adapts to different weather, calendar and timestamp conditions and produces realistic load curves based on the feeder metadata. In the future, the approach can be adapted to other grid levels like substation transformers and can supplement research fields like load modeling, state estimation and LV load forecasting.	翻訳日:2024-11-05 21:29:26 公開日:2024-09-29
# 安全ヒートポンプ制御のための拘束強化学習 Constrained Reinforcement Learning for Safe Heat Pump Control ( http://arxiv.org/abs/2409.19716v1 ) ライセンス: Link先を確認	Baohe Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker,	(参考訳) 制約強化学習(RL:Constrained Reinforcement Learning)は、様々な制御タスクにおける安全性とパフォーマンスを高めるために、報酬への制約の統合が不可欠であるRL内の重要な研究領域として登場した。建物内の暖房システムの文脈では、住民の熱快適性を保ちながらエネルギー効率を最適化することは、制約付き最適化問題として直感的に定式化することができる。しかし、それをRLで解くには大量のデータが必要になるかもしれない。そのため、正確で多用途なシミュレータが好まれる。本稿では,異なる用途のインタフェースを提供する新しいビルディングシミュレータI4Bを提案するとともに,線形平滑ログバリア関数(CSAC-LB)を用いた制約付きソフトアクター・クリティカルというモデルレス制約付きRLアルゴリズムを加熱最適化問題に適用する。ベースラインアルゴリズムに対するベンチマークは、CSAC-LBのデータ探索、制約満足度、性能における効率を示す。 Constrained Reinforcement Learning (RL) has emerged as a significant research area within RL, where integrating constraints with rewards is crucial for enhancing safety and performance across diverse control tasks. In the context of heating systems in the buildings, optimizing the energy efficiency while maintaining the residents' thermal comfort can be intuitively formulated as a constrained optimization problem. However, to solve it with RL may require large amount of data. Therefore, an accurate and versatile simulator is favored. In this paper, we propose a novel building simulator I4B which provides interfaces for different usages and apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# 分散シフト下における時系列予測のためのマルチスケール正規化の展開 Evolving Multi-Scale Normalization for Time Series Forecasting under Distribution Shifts ( http://arxiv.org/abs/2409.19718v1 ) ライセンス: Link先を確認	Dalin Qin, Yehui Li, Weiqi Chen, Zhaoyang Zhu, Qingsong Wen, Liang Sun, Pierre Pinson, Yi Wang,	(参考訳) 複雑な分布シフトは、正確な長期時系列予測を達成するための主要な障害である。分布特性を把握し、分布シフトの影響を軽減するための適応正規化手法を提案するために、いくつかの試みがなされている。しかし、これらの手法は、様々なスケールで観測される複雑な分布ダイナミクスと、分布力学と正規化された写像関係の進化する機能を無視している。そこで本研究では,分散シフト問題に対処するモデル非依存型マルチスケール正規化(EvoMSN)フレームワークを提案する。マルチスケール統計予測モジュールと適応アンサンブルに基づくフレキシブル正規化と非正規化を提案する。予測モデルと統計予測モジュールを協調的に更新して、シフトする分布を追跡するために、進化的最適化戦略が設計されている。ベンチマークデータセット上での5つの主流予測手法の性能向上におけるEvoMSNの有効性を評価するとともに,既存の高度正規化やオンライン学習手法と比較して,その優位性を示す。コードはhttps://github.com/qindalin/EvoMSN.comで公開されている。 Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving functions of distribution dynamics and normalized mapping relationships. To this end, we propose a novel model-agnostic Evolving Multi-Scale Normalization (EvoMSN) framework to tackle the distribution shift problem. Flexible normalization and denormalization are proposed based on the multi-scale statistics prediction module and adaptive ensembling. An evolving optimization strategy is designed to update the forecasting model and statistics prediction module collaboratively to track the shifting distributions. We evaluate the effectiveness of EvoMSN in improving the performance of five mainstream forecasting methods on benchmark datasets and also show its superiority compared to existing advanced normalization and online learning approaches. The code is publicly available at https://github.com/qindalin/EvoMSN.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# FAST:全スライド画像分類のための2層Few-Shot学習パラダイム FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification ( http://arxiv.org/abs/2409.19720v1 ) ライセンス: Link先を確認	Kexue Fu, Xiaoyuan Luo, Linhao Qu, Shuo Wang, Ying Xiong, Ilias Maglogiannis, Longxiang Gao, Manning Wang,	(参考訳) 高価な微粒化アノテーションとデータ不足は、ディープラーニングベースのWSI分類アルゴリズムを臨床実践で広く採用する上で、主要な障害となっている。各画像のラベルを活用できる自然画像の少数ショット学習法とは異なり、既存の少数ショットWSI分類法では、高価な微粒なアノテーションを避けるために、少数の微粒なラベルまたは弱い教師付きスライドラベルしか使用していない。利用可能なWSIを十分にマイニングすることができず、WSI分類性能を著しく制限しています。上記の課題に対処するため,WSI分類のためのFASTという,新規で効率的な2層複数ショット学習パラダイムを提案する。 FASTはデュアルレベルアノテーション戦略とデュアルブランチ分類フレームワークで構成されている。まず、高価なきめ細かいアノテーションを避けるために、スライドレベルで非常に少数のWSIを集め、非常に少数のパッチを注釈付けします。そして、利用可能なWSIを完全にマイニングするために、すべてのパッチと利用可能なパッチラベルを使用してキャッシュブランチを構築します。また、キャッシュブランチに加えて、学習可能なプロンプトベクトルを含む事前ブランチを構築し、パッチ分類のための視覚言語モデルのテキストエンコーダを使用する。最後に、WSI分類を達成するために、両方のブランチの結果を統合する。バイナリおよびマルチクラスデータセットに対する大規模な実験により,提案手法は既存の少数ショット分類法をはるかに上回り,アノテーションコストがわずか0.22$\%の完全教師付き手法の精度にアプローチすることを示した。すべてのコードとモデルはhttps://github.com/fukexue/FASTで公開される。 The expensive fine-grained annotation and data scarcity have become the primary obstacles for the widespread adoption of deep learning-based Whole Slide Images (WSI) classification algorithms in clinical practice. Unlike few-shot learning methods in natural images that can leverage the labels of each image, existing few-shot WSI classification methods only utilize a small number of fine-grained labels or weakly supervised slide labels for training in order to avoid expensive fine-grained annotation. They lack sufficient mining of available WSIs, severely limiting WSI classification performance. To address the above issues, we propose a novel and efficient dual-tier few-shot learning paradigm for WSI classification, named FAST. FAST consists of a dual-level annotation strategy and a dual-branch classification framework. Firstly, to avoid expensive fine-grained annotation, we collect a very small number of WSIs at the slide level, and annotate an extremely small number of patches. Then, to fully mining the available WSIs, we use all the patches and available patch labels to build a cache branch, which utilizes the labeled patches to learn the labels of unlabeled patches and through knowledge retrieval for patch classification. In addition to the cache branch, we also construct a prior branch that includes learnable prompt vectors, using the text encoder of visual-language models for patch classification. Finally, we integrate the results from both branches to achieve WSI classification. Extensive experiments on binary and multi-class datasets demonstrate that our proposed method significantly surpasses existing few-shot classification methods and approaches the accuracy of fully supervised methods with only 0.22$\%$ annotation costs. All codes and models will be publicly available on https://github.com/fukexue/FAST.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# パーソナリティトラストの探索:対話における説明可能なパーソナリティ認識のための新しいベンチマークデータセット Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues ( http://arxiv.org/abs/2409.19723v1 ) ライセンス: Link先を確認	Lei Sun, Jinming Zhao, Qin Jin,	(参考訳) パーソナリティ認識は,対話やソーシャルメディア投稿などのユーザデータに含まれる性格特性を識別することを目的としている。現在の研究では、主にパーソナリティを分類課題として扱い、認識されたパーソナリティを裏付ける証拠を明らかにしていない。本稿では,人格特性の証拠として推論過程を明らかにすることを目的とした,説明可能なパーソナリティ認識という新しいタスクを提案する。人格論に触発された性格特性は、特定の瞬間における具体的な状況下での思考、感情、行動の短期的な特徴パターンである人格状態の安定パターンで構成されている。本稿では、特定の文脈から短期的な人格状態、長期的な人格特性への推論プロセスを含む、CoPE(Chain-of-Personality-Evidence)と呼ばれる説明可能な人格認識フレームワークを提案する。さらに,CoPEフレームワークをベースとして,対話から説明可能なパーソナリティ認識データセットであるPersonalityEvdを構築した。本稿では2つのパーソナリティ状態認識タスクと説明可能なパーソナリティ特性認識タスクを導入し,モデルがパーソナリティ状態と特徴ラベルとそれに対応するサポートエビデンスを認識することを要求する。この2つの課題に関する大規模言語モデルに基づく広範な実験により,人格の特徴を明らかにすることは極めて困難であることが示され,今後の研究にいくつかの知見が提示される。私たちのデータとコードはhttps://github.com/Lei-Sun-RUC/PersonalityEvd.comで公開されています。 Personality recognition aims to identify the personality traits implied in user data such as dialogues and social media posts. Current research predominantly treats personality recognition as a classification task, failing to reveal the supporting evidence for the recognized personality. In this paper, we propose a novel task named Explainable Personality Recognition, aiming to reveal the reasoning process as supporting evidence of the personality trait. Inspired by personality theories, personality traits are made up of stable patterns of personality state, where the states are short-term characteristic patterns of thoughts, feelings, and behaviors in a concrete situation at a specific moment in time. We propose an explainable personality recognition framework called Chain-of-Personality-Evidence (CoPE), which involves a reasoning process from specific contexts to short-term personality states to long-term personality traits. Furthermore, based on the CoPE framework, we construct an explainable personality recognition dataset from dialogues, PersonalityEvd. We introduce two explainable personality state recognition and explainable personality trait recognition tasks, which require models to recognize the personality state and trait labels and their corresponding support evidence. Our extensive experiments based on Large Language Models on the two tasks show that revealing personality traits is very challenging and we present some insights for future research. Our data and code are available at https://github.com/Lei-Sun-RUC/PersonalityEvd.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# DataDRILL:掘削リグの生成圧力予測とキック検出 DataDRILL: Formation Pressure Prediction and Kick Detection for Drilling Rigs ( http://arxiv.org/abs/2409.19724v1 ) ライセンス: Link先を確認	Murshedul Arifeen, Andrei Petrovski, Md Junayed Hasan, Igor Kotenko, Maksim Sletov, Phil Hassard,	(参考訳) 製造工程の意思決定とコスト効率を大幅に向上させるため, 掘削作業において, 生成圧力とキック検出の正確なリアルタイム予測が不可欠である。データ駆動型モデルは、形成圧力を予測し、キックを検出することによって掘削作業を自動化することで人気を集めている。しかし、現在の文献では、掘削リグの分野での研究を進めるために、サポートデータセットが公開されていないため、この領域の技術的進歩を妨げている。本稿では,石油・ガス掘削研究を強化するインテリジェントアルゴリズムの開発を支援するために,新たな2つのデータセットを提案する。データセットには、28の掘削変数と2000以上のデータサンプルを備えた、形成圧力予測とキック検出のためのデータサンプルが含まれている。主成分回帰は生成圧力を予測するために使用され、主成分分析はデータセットの技術検証のためのキックを特定するために使用される。特に、主成分回帰に対するR2と残留予測偏差スコアはそれぞれ0.78と0.922である。 Accurate real-time prediction of formation pressure and kick detection is crucial for drilling operations, as it can significantly improve decision-making and the cost-effectiveness of the process. Data-driven models have gained popularity for automating drilling operations by predicting formation pressure and detecting kicks. However, the current literature does not make supporting datasets publicly available to advance research in the field of drilling rigs, thus impeding technological progress in this domain. This paper introduces two new datasets to support researchers in developing intelligent algorithms to enhance oil/gas well drilling research. The datasets include data samples for formation pressure prediction and kick detection with 28 drilling variables and more than 2000 data samples. Principal component regression is employed to forecast formation pressure, while principal component analysis is utilized to identify kicks for the dataset's technical validation. Notably, the R2 and Residual Predictive Deviation scores for principal component regression are 0.78 and 0.922, respectively.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# ネットワーク・プルーニングが性能と解釈可能性に及ぼす影響の検討 Investigating the Effect of Network Pruning on Performance and Interpretability ( http://arxiv.org/abs/2409.19727v1 ) ライセンス: Link先を確認	Jonathan von Rad, Florian Seuffert,	(参考訳) ディープニューラルネットワーク(DNN)は、タスクに対して過度にパラメータ化され、重みを取り除くことで大幅に圧縮される、プルーニングと呼ばれるプロセスである。異なる刈り取り技術がGoogLeNetの分類性能と解釈性に与える影響について検討する。ネットワークに非構造的および構造的プルーニングと接続間隔(入力重みのプルーニング)手法を体系的に適用し、ImageNetの検証セット上でのネットワークの性能に関する結果を分析する。また,反復刈り込みやワンショット刈り込みなど,異なるトレーニング戦略を比較した。十分な再トレーニングエポックがあれば、ネットワークのパフォーマンスはデフォルトのGoogLeNetのパフォーマンスを近似することができます。解釈可能性を評価するために、Zimmermannらが開発したメカニスティック解釈可能性スコア(MIS)を用いる。実験の結果,MISを指標とした場合,解釈可能性とプルーニング率との間に有意な相関は認められなかった。さらに、極めて低い精度のネットワークは高いMISスコアを達成でき、MISは正しい決定の基盤を理解するなど、直感的な解釈可能性の概念と常に一致するとは限らないことを示唆している。 Deep Neural Networks (DNNs) are often over-parameterized for their tasks and can be compressed quite drastically by removing weights, a process called pruning. We investigate the impact of different pruning techniques on the classification performance and interpretability of GoogLeNet. We systematically apply unstructured and structured pruning, as well as connection sparsity (pruning of input weights) methods to the network and analyze the outcomes regarding the network's performance on the validation set of ImageNet. We also compare different retraining strategies, such as iterative pruning and one-shot pruning. We find that with sufficient retraining epochs, the performance of the networks can approximate the performance of the default GoogLeNet - and even surpass it in some cases. To assess interpretability, we employ the Mechanistic Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments reveal that there is no significant relationship between interpretability and pruning rate when using MIS as a measure. Additionally, we observe that networks with extremely low accuracy can still achieve high MIS scores, suggesting that the MIS may not always align with intuitive notions of interpretability, such as understanding the basis of correct decisions.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# 残留幾何強化を用いた一元化勾配型機械学習 Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement ( http://arxiv.org/abs/2409.19732v1 ) ライセンス: Link先を確認	Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang,	(参考訳) 深層ニューラルネットワークのプライバシーと信頼性を高めるために、機械学習(MU)が登場した。近似MUは大規模モデルの実用的手法である。約MUに関する我々の研究は、パラメータの近傍の正確なMUへの出力Kulback-Leiblerの発散を最小限に抑え、最も急降下方向を特定することから始まる。このプローブ方向は、重み付き忘れ勾配上昇、微調整による勾配降下、重み付き塩分濃度行列の3つの成分に分解される。ユークリッド計量から導かれるそのような分解は、既存の勾配に基づくMU法の大半を包含する。それでもユークリッド空間に付着すると、出力確率空間の見過ごされた幾何学的構造のために、準最適反復軌道が生じる可能性がある。残りの幾何によって表現された多様体に、未学習の更新を埋め込むことを提案し、残りのデータから2階ヘッセンを組み込む。効果的なアンラーニングが維持されたパフォーマンスに干渉するのを防ぐのに役立つ。しかし、大規模モデルに対する2階Hessianの計算は難解である。ヘッセン変調の利点を効果的に活用するために,最新の正解な未学習方向を暗黙的に近似する高速スローパラメータ更新戦略を提案する。特定のモーダル制約がなければ、我々のアプローチは、分類や生成を含む、コンピュータビジョンの未学習タスクに適応できる。大規模な実験は、我々の有効性と効率を検証します。特に,DiTを用いたImageNetのクラスフォゲッティングに成功し,DDPMを用いたCIFAR-10のクラスを50ステップで忘れることに成功した。 Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. Our investigation into approximate MU starts with identifying the steepest descent direction, minimizing the output Kullback-Leibler divergence to exact MU inside a parameters' neighborhood. This probed direction decomposes into three components: weighted forgetting gradient ascent, fine-tuning retaining gradient descent, and a weight saliency matrix. Such decomposition derived from Euclidean metric encompasses most existing gradient-based MU methods. Nevertheless, adhering to Euclidean space may result in sub-optimal iterative trajectories due to the overlooked geometric structure of the output probability space. We suggest embedding the unlearning update into a manifold rendered by the remaining geometry, incorporating second-order Hessian from the remaining data. It helps prevent effective unlearning from interfering with the retained performance. However, computing the second-order Hessian for large-scale models is intractable. To efficiently leverage the benefits of Hessian modulation, we propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction. Free from specific modal constraints, our approach is adaptable across computer vision unlearning tasks, including classification and generation. Extensive experiments validate our efficacy and efficiency. Notably, our method successfully performs class-forgetting on ImageNet using DiT and forgets a class on CIFAR-10 using DDPM in just 50 steps, compared to thousands of steps required by previous methods.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# Pear: ビジュアルパラメータ効率の良いファインチューニングにおけるプルーニングとアダプタの共有 Pear: Pruning and Sharing Adapters in Visual Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2409.19733v1 ) ライセンス: Link先を確認	Yibo Zhong, Yao Zhou,	(参考訳) アダプタは、微調整された基礎モデルにおいて計算と記憶のコストを軽減するために広く研究されてきた。しかし、アダプタ自体が冗長性を示し、不要なストレージオーバーヘッドと性能の低下につながる。本稿では,事前学習した視覚基盤モデルの高精度な微調整を行うための,新しいアダプタ・プルーニングフレームワークであるPrune and Share (Pear)を提案する。具体的には、特定のアダプタをプルークし、より重要でないアダプタをプルークされた位置と共有し、プルーニング後のこれらの位置への連続的な適応を可能にする。さらに、プルーンドアダプタの情報を保存する知識チェックポイント戦略を導入し、パフォーマンスをさらに向上させる。視覚適応ベンチマークの実験結果は、他の競合手法と比較して、提案したPearの有効性と効率を検証した。コードはhttps://github.com/yibozhong/pear.comにある。 Adapters have been widely explored to alleviate computational and storage costs when fine-tuning pretrained foundation models. However, the adapter itself can exhibit redundancy, leading to unnecessary storage overhead and inferior performance. In this paper, we propose Prune and Share (Pear), a novel adapter-pruning framework for efficient fine-tuning of pretrained visual foundation models. Specifically, we prune certain adapters and share the more important unpruned ones with positions where adapters are pruned, allowing continual adaptation at these positions after pruning. Additionally, a knowledge checkpoint strategy is introduced, which preserves the information of the pruned adapters and further boosts performance. Experimental results on visual adaptation benchmark validate the effectiveness and efficiency of the proposed Pear comparing to other competitive methods. Code is in https://github.com/yibozhong/pear.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# スクランブルテキスト:合成データを用いたOCR誤り訂正のための言語モデルトレーニング Scrambled text: training Language Models to correct OCR errors using synthetic data ( http://arxiv.org/abs/2409.19735v1 ) ライセンス: Link先を確認	Jonathan Bourne,	(参考訳) OCRエラーは、デジタル化された歴史的アーカイブにおいて、そのユーザビリティと価値に大きな影響を及ぼす。生成言語モデル(LM)は、腐敗したテキストとより広い社会文化的文脈(Context Leveraging OCR Correction (CLOCR-C)と呼ばれるプロセス)によって、これらのエラーを修正する可能性を示している。しかし、そのようなモデルを微調整するのに十分なトレーニングデータを取得することは困難である。本稿では,LMを用いた合成データ上での言語モデルの微調整と文字レベルのマルコフ汚損処理により,OCR誤りの訂正能力を大幅に向上できることを示す。合成データで訓練されたモデルは、文字誤り率を55%減らし、単語誤り率を32%減らし、実際のデータで訓練されたモデルよりも優れていた。主な発見は、過度に破損したデータに対するトレーニングは、過度に破損したデータよりも優れており、不均一な文字レベルの汚職は、均一な汚職よりも優れている。本論文の出力は,有効なCLOCR-Cモデルの学習のための8つのヒューリスティックス,合成19世紀の新聞記事11,000のデータセット,および合成劣化データを作成するためのピソンライブラリスクランブルテキストである。 OCR errors are common in digitised historical archives significantly affecting their usability and value. Generative Language Models (LMs) have shown potential for correcting these errors using the context provided by the corrupted text and the broader socio-cultural context, a process called Context Leveraging OCR Correction (CLOCR-C). However, getting sufficient training data for fine-tuning such models can prove challenging. This paper shows that fine-tuning a language model on synthetic data using an LM and using a character level Markov corruption process can significantly improve the ability to correct OCR errors. Models trained on synthetic data reduce the character error rate by 55% and word error rate by 32% over the base LM and outperform models trained on real data. Key findings include; training on under-corrupted data is better than over-corrupted data; non-uniform character level corruption is better than uniform corruption; More tokens-per-observation outperforms more observations for a fixed token budget. The outputs for this paper are a set of 8 heuristics for training effective CLOCR-C models, a dataset of 11,000 synthetic 19th century newspaper articles and scrambledtext a python library for creating synthetic corrupted data.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# 認知症タスク・データセット・オポチュニティのためのNLPの体系的レビュー A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities ( http://arxiv.org/abs/2409.19737v1 ) ライセンス: Link先を確認	Lotem Peled-Cohen, Roi Reichart,	(参考訳) 認知機能低下と言語との密接な関係は、認知症研究におけるNLPと医療コミュニティの長年の協力を後押ししてきた。そこで本研究では,NLPを認知症関連に応用した200以上の論文について,医学・技術・NLPに焦点をあてた文献から概説した。認知症検出,言語的バイオマーカー抽出,介護支援,患者支援などの主要な研究領域を同定し,その半数が臨床データを用いた認知症検出のみに焦点を当てていることを示す。人工的に劣化した言語モデル、合成データ、デジタルツインなどだ。信頼、科学的厳密性、適用性、コミュニティ間のコラボレーションに関するギャップと機会を強調し、レビューを通じて遭遇する多様なデータセット(記録、書き込み、構造化、自然発生、合成、臨床、ソーシャルメディアベースなど)を紹介します。このレビューは、医療・NLPコミュニティにおける認知症研究へのより創造的なアプローチを刺激することを目的としている。 The close link between cognitive decline and language has fostered long-standing collaboration between the NLP and medical communities in dementia research. To examine this, we reviewed over 200 papers applying NLP to dementia related efforts, drawing from medical, technological, and NLP-focused literature. We identify key research areas, including dementia detection, linguistic biomarker extraction, caregiver support, and patient assistance, showing that half of all papers focus solely on dementia detection using clinical data. However, many directions remain unexplored: artificially degraded language models, synthetic data, digital twins, and more. We highlight gaps and opportunities around trust, scientific rigor, applicability, and cross-community collaboration, and showcase the diverse datasets encountered throughout our review: recorded, written, structured, spontaneous, synthetic, clinical, social media based, and more. This review aims to inspire more creative approaches to dementia research within the medical and NLP communities.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# ANNによる3kbit NMR量子プロセッサにおけるマルチパートの絡み合いの検出 ANN-Enhanced Detection of Multipartite Entanglement in a Three-Qubit NMR Quantum Processor ( http://arxiv.org/abs/2409.19739v1 ) ライセンス: Link先を確認	Vaishali Gulati, Shivanshu Siyanwal, Arvind, Kavita Dorai,	(参考訳) 確率的局所演算および古典的通信(SLOCC)の下で、6つの非等価なクラスのうちの1つから引き出された実験的に生成された3ビット純状態の絡み合いクラスを、人工知能ニューラルネットワーク(ANN)モデルを用いて同定する。 ANNモデルは、州における真のマルチパーティ・エンタングルメント(GME)の存在を検出することもできる。計算対象の密度行列要素の削減に対応する問題次元の削減にデータサイエンス手法を適用した。 ANNモデルは、まずランダムに生成された状態を含むシミュレーションデータセットで訓練され、その後、標準形にキャストされたノイズの多い実験的な3量子状態で検証され、核磁気共鳴(NMR)量子プロセッサで生成される。我々は,Support Vector Machines (SVMs) とK-Nearest Neighbor (KNN) アルゴリズムを用いてANNモデルをベンチマークし,既存の3ビットSLOCCエンタングル分類スキームと比較した。以上の結果から,ANNモデルでは,数個の密度行列要素の事前知識を入力として,GME検出とSLOCCクラス識別を高精度に行うことができることが示された。 ANNモデルは入力データセットの削減に有効であるため、実験データセットに制限のある実生活環境での絡み合い分類には魅力的な方法である。 We use an artificial neural network (ANN) model to identify the entanglement class of an experimentally generated three-qubit pure state drawn from one of the six inequivalent classes under stochastic local operations and classical communication (SLOCC). The ANN model is also able to detect the presence of genuinely multipartite entanglement (GME) in the state. We apply data science techniques to reduce the dimensionality of the problem, which corresponds to a reduction in the number of required density matrix elements to be computed. The ANN model is first trained on a simulated dataset containing randomly generated states, and is later tested and validated on noisy experimental three-qubit states cast in the canonical form and generated on a nuclear magnetic resonance (NMR) quantum processor. We benchmark the ANN model via Support Vector Machines (SVMs) and K-Nearest Neighbor (KNN) algorithms and compare the results of our ANN-based entanglement classification with existing three-qubit SLOCC entanglement classification schemes such as 3-tangle and correlation tensors. Our results demonstrate that the ANN model can perform GME detection and SLOCC class identification with high accuracy, using a priori knowledge of only a few density matrix elements as inputs. Since the ANN model works well with a reduced input dataset, it is an attractive method for entanglement classification in real-life situations with limited experimental data sets.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# Molecular GANがByte-Pairエンコーディングに出会ったとき When Molecular GAN Meets Byte-Pair Encoding ( http://arxiv.org/abs/2409.19740v1 ) ライセンス: Link先を確認	Huidong Tang, Chen Li, Yasuhiko Morimoto,	(参考訳) GAN(Generative Adversarial Network)のような深層生成モデルは、デノボ分子生成による新規な薬物様候補の発見において重要な役割を担っている。しかし、伝統的なキャラクタリゼーションは、分子データの新規で複雑なサブ構造を特定するのにしばしば苦労する。対照的に、代替トークン化法は優れた性能を示している。本研究は, バイトレベルのバイトペアエンコーディングトークンを組み込んだ分子GANを導入し, デノボ分子生成の強化に強化学習を用いた。具体的には、ジェネレータはアクターとして機能し、SMILES文字列を生成し、識別器は批評家として機能し、その品質を評価する。我々の分子GANは、計算効率の向上を目的とした革新的な報酬機構も統合している。詳細な可視化分析によって補完された妥当性,独特性,新奇性,多様性を評価し,GANの有効性を強く実証した。 Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by detailed visualization analysis, robustly demonstrate the effectiveness of our GAN.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# Tailored Federated Learning: 方向性規制と知識蒸留の活用 Tailored Federated Learning: Leveraging Direction Regulation & Knowledge Distillation ( http://arxiv.org/abs/2409.19741v1 ) ライセンス: Link先を確認	Huidong Tang, Chen Li, Huachong Yu, Sayaka Kamei, Yasuhiko Morimoto,	(参考訳) フェデレートラーニング(FL)は、特に医療のようなプライバシーに敏感な分野において、変革的なトレーニングパラダイムとして登場した。しかし、データ、計算能力、タスクにおけるクライアントの不均一性は重大な課題である。このような課題に対処するため、モデルデルタ正則化、パーソナライズされたモデル、フェデレーションされた知識蒸留、混合プールを統合するFL最適化アルゴリズムを提案する。モデルデルタ正則化は、サーバ上のモデル更新を集中的に最適化し、最小の通信コストでクライアントを効率的に更新する。タスクの不均一性に効果的に取り組むために、パーソナライズドモデルとフェデレーションド知識蒸留戦略が採用されている。さらに、読み出し操作の感度の変動に対応するためにミックスプーリングが導入されている。実験結果は,モデルデルタ正則化によって達成された顕著な精度と迅速な収束を示す。さらに,フェデレートド・ナレッジ蒸留アルゴリズムはFL性能,特に多様なデータを持つシナリオにおいて顕著に向上する。さらに,ミキシング・プール・リードアウト・オペレーションは,提案手法の有効性を示すとともに,クライアントにとって有意義なメリットを提供する。 Federated learning (FL) has emerged as a transformative training paradigm, particularly invaluable in privacy-sensitive domains like healthcare. However, client heterogeneity in data, computing power, and tasks poses a significant challenge. To address such a challenge, we propose an FL optimization algorithm that integrates model delta regularization, personalized models, federated knowledge distillation, and mix-pooling. Model delta regularization optimizes model updates centrally on the server, efficiently updating clients with minimal communication costs. Personalized models and federated knowledge distillation strategies are employed to tackle task heterogeneity effectively. Additionally, mix-pooling is introduced to accommodate variations in the sensitivity of readout operations. Experimental results demonstrate the remarkable accuracy and rapid convergence achieved by model delta regularization. Additionally, the federated knowledge distillation algorithm notably improves FL performance, especially in scenarios with diverse data. Moreover, mix-pooling readout operations provide tangible benefits for clients, showing the effectiveness of our proposed methods.	翻訳日:2024-11-05 21:19:41 公開日:2024-09-29
# 可視化のための自然言語生成 - 現状, 課題, 今後の方向性 Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions ( http://arxiv.org/abs/2409.19747v1 ) ライセンス: Link先を確認	Enamul Hoque, Mohammed Saidul Islam,	(参考訳) 自然言語と可視化は、情報伝達において重要な役割を果たす人間のコミュニケーションの2つの相補的なモダリティである。可視化はデータのトレンドやパターン、異常を発見するのに役立つが、自然言語の記述はこれらの洞察を説明するのに役立つ。したがって、テキストと視覚化を組み合わせることは、データのコアメッセージを効果的に配信するための一般的なテクニックである。自然言語生成(NLG)の台頭を踏まえると、視覚化のための自然言語記述を自動的に作成することへの関心が高まっている。本調査では, 可視化のためのNLGの現状を体系的に検討し, 問題の分類について紹介する。 NLGタスクは、可視化のための自然言語インタフェース(NLI)の領域に該当する。調査の範囲を狭めるため、主に可視化のためのテキスト生成に焦点を当てた研究に焦点をあてる。提案手法のNLG問題と設計空間を特徴付けるために、5つのWh-questions, why and how NLG task are performed for visualizations, the task input and outputs, and the where and when the generated texts are integrated with visualizations。我々はこれらの5つのWh-questionsに基づいて、調査論文で使用されるソリューションを分類する。最後に、この領域における今後の研究の鍵となる課題と可能性について論じる。 Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# NeuroMax: 相互情報の最大化とグループトピック正規化によるニューラルトピックモデリングの強化 NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization ( http://arxiv.org/abs/2409.19749v1 ) ライセンス: Link先を確認	Duy-Tung Pham, Thien Trang Nguyen Vu, Tung Nguyen, Linh Ngo Van, Duc Anh Nguyen, Thien Huu Nguyen,	(参考訳) ニューラルトピックモデルの最近の進歩は、推論ネットワーク(エンコーダ)と事前学習言語モデル(PLM)の統合と、生成モデル(デコーダ)における単語とトピックの関係のモデリングの2つの主要な方向に集中している。しかし、大きなPLMを使用することで推論コストが大幅に増加し、推論時間が少なくなる状況では実用性が低下する。さらに、話題と言葉の関係とトピック間の相互関係を同時にモデル化することが重要である。本研究では,これらの課題に対処するため,ニューロマックス(ニューラルトピックモデルと事前学習言語モデルとグループトピック正規化を用いた相互情報の最大化)という新しいフレームワークを提案する。 NeuroMaxは、ニューラルトピックモデルにおけるエンコーダから得られたトピック表現と、PLMから派生した表現との相互情報を最大化する。さらに、NeuroMaxは最適なトランスポートを使用して、それらの間の情報の転送方法を分析することで、トピック間の関係を学習する。実験結果から、NeuroMaxは推論時間を短縮し、より一貫性のあるトピックやトピックグループを生成し、より代表的なドキュメント埋め込みを生成し、下流タスクのパフォーマンスを向上させることが示唆された。 Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low inference times. Furthermore, it is crucial to simultaneously model the relationships between topics and words as well as the interrelationships among topics themselves. In this work, we propose a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization) to address these challenges. NeuroMax maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM. Additionally, NeuroMax employs optimal transport to learn the relationships between topics by analyzing how information is transported among them. Experimental results indicate that NeuroMax reduces inference time, generates more coherent topics and topic groups, and produces more representative document embeddings, thereby enhancing performance on downstream tasks.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialated LLMs for Astronomy AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy ( http://arxiv.org/abs/2409.19750v1 ) ライセンス: Link先を確認	Rui Pan, Tuan Dung Nguyen, Hardik Arora, Alberto Accomazzi, Tirthankar Ghosal, Yuan-Sen Ting,	(参考訳) ダウンストリームタスクの性能を高めるため,大規模言語モデルのドメイン固有データへの継続的な事前学習が提案されている。天文学では、以前は天文学に焦点を当てたベンチマークがなかったため、これらの特殊なLLMモデルの客観的評価が妨げられている。本研究は、高品質の天体MCQをキュレートする最近の取り組みを活用し、天文学における特殊なLSMを定量的に評価することを目的としている。 LLaMA-2-7BをベースとしたAstroLLaMAシリーズは,ベースモデルと比較して性能が低かった。この性能劣化は、arXivの要約テキストなど、継続事前学習のための高品質なデータを活用することで部分的に軽減できることを示す。その結果,70Bモデル上での連続的事前訓練は大きな改善をもたらす可能性が示唆された。しかし、現在の教師付き微調整データセットはインストラクションモデルの性能を制限している。本研究と合わせて,AstroLLaMA-3-8BとAstroLLaMA-2-70Bという新モデルを紹介した。 Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised fine-tuning dataset still constrains the performance of instruct models. In conjunction with this study, we introduce a new set of models, AstroLLaMA-3-8B and AstroLLaMA-2-70B, building upon the previous AstroLLaMA series.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# 尺度のバランスをとる:二項分類におけるクラス不均衡への対処に関する総合的研究 Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification ( http://arxiv.org/abs/2409.19751v1 ) ライセンス: Link先を確認	Mohamed Abdelhamid, Abhyuday Desai,	(参考訳) 二項分類タスクにおけるクラス不均衡は、機械学習において重要な課題であり、しばしば少数クラスのパフォーマンスが低下する。本研究では,SMOTE(Synthetic Minority Over-Sampling Technique),SMOTE(Class Weights tuning),Decision Threshold Calibration(Decision Threshold Calibration)の3つの手法を網羅的に評価した。これらの手法を、15の多様な機械学習モデルと、さまざまなドメインからの30のデータセットにまたがる非干渉のベースラインシナリオと比較し、合計9000の実験を行った。 F1スコア, 精度, リコール, Brier-score, PR-AUC, AUCの計9項目について, 評価を行った。以上の結果から,3つの戦略がベースラインを上回り,決定閾値キャリブレーションが一貫した有効手法として出現することが示唆された。しかし,データセット間でのベストパフォーマンス手法では,特定の問題に対する複数のアプローチをテストすることの重要性が強調され,大きなばらつきが見られた。本研究は、不均衡なデータセットを扱う実践者にとって貴重な洞察を提供し、クラス不均衡処理手法を評価する際に、データセット固有の分析の必要性を強調する。 Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling class imbalance: Synthetic Minority Over-sampling Technique (SMOTE), Class Weights tuning, and Decision Threshold Calibration. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models and 30 datasets from various domains, conducting a total of 9,000 experiments. Performance was primarily assessed using the F1-score, although our study also tracked results on additional 9 metrics including F2-score, precision, recall, Brier-score, PR-AUC, and AUC. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold Calibration emerging as the most consistently effective technique. However, we observed substantial variability in the best-performing method across datasets, highlighting the importance of testing multiple approaches for specific problems. This study provides valuable insights for practitioners dealing with imbalanced datasets and emphasizes the need for dataset-specific analysis in evaluating class imbalance handling techniques.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# 特徴分散型変分オートエンコーダを用いたオフライン署名検証 Offline Signature Verification Based on Feature Disentangling Aided Variational Autoencoder ( http://arxiv.org/abs/2409.19754v1 ) ライセンス: Link先を確認	Hansong Zhang, Jiangjian Guo, Kun Li, Yang Zhang, Yimei Zhao,	(参考訳) オフライン手書き署名検証システムは、手書き署名画像を本物の署名または偽物として認識することで、個人の身元を確認するために使用される。シグネチャ検証システムの主な課題は、シグネチャ画像から特徴を抽出し、分類のための分類器を訓練することである。これらのタスクの課題は2つあります。第一に、本物のシグネチャと熟練した偽造物はその外観に非常によく似ており、クラス間距離は小さい。第二に、シグネチャ検証モデルがトレーニングされている場合、熟練したフォージェリーのインスタンスは利用できないことが多い。そこで本研究では,新しい署名検証手法を提案する。署名画像から直接特徴を抽出するために可変オートエンコーダ(VAE)を使用する最初のモデルである。機能をより差別的にするために、機能分離のための新しい損失関数を導入することで、従来のVAEを改善します。さらに、抽出した特徴に応じて分類するためにSVM(Support Vector Machine)に依存している。 MCYT-75とGPDS-syntheticの2つの公開データセットで大規模な実験を行い、提案手法は13ドルの代表的オフライン署名検証法を著しく上回った。特徴的データセットの達成された改善は、実際のアプリケーションにおける開発システムの堅牢性と大きなポテンシャルを示している。 Offline handwritten signature verification systems are used to verify the identity of individuals, through recognizing their handwritten signature image as genuine signatures or forgeries. The main tasks of signature verification systems include extracting features from signature images and training a classifier for classification. The challenges of these tasks are twofold. First, genuine signatures and skilled forgeries are highly similar in their appearances, resulting in a small inter-class distance. Second, the instances of skilled forgeries are often unavailable, when signature verification models are being trained. To tackle these problems, this paper proposes a new signature verification method. It is the first model that employs a variational autoencoder (VAE) to extract features directly from signature images. To make the features more discriminative, it improves the traditional VAEs by introducing a new loss function for feature disentangling. In addition, it relies on SVM (Support Vector Machine) for classification according to the extracted features. Extensive experiments are conducted on two public datasets: MCYT-75 and GPDS-synthetic where the proposed method significantly outperformed $13$ representative offline signature verification methods. The achieved improvement in distinctive datasets indicates the robustness and great potential of the developed system in real application.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# Bose-Einstein Condensateを用いたLambda-Gravityの探索 Probing Lambda-Gravity with Bose-Einstein Condensate ( http://arxiv.org/abs/2409.19755v1 ) ライセンス: Link先を確認	Hector A. Fernandez-Melendez, Alexander Belyaev, Vahe Gurzadyan, Ivette Fuentes,	(参考訳) 本稿では,ボース・アインシュタイン凝縮体(BEC)における量子音速励起のダイナミクスを利用した,テーブルトップ実験の規模で動作可能な新しい検出器の概念を用いて,2つの基本重力定数の精密な試験を提案する。この設定では、音速励起とBECの基底状態とを混合するトリッター演算により、約2桁の感度が向上する。 BECは重力ポテンシャルの2つの重要な成分に対して、Newtonian $GM/r$項と宇宙定数$\Lambda r^2$の2つの感度を示す。最先端の実験設計を用いて、重力定数$G$を10^{-17}$ N m$^2$/kg$^2$まで精度で測定できることを予測した。さらに、この実験は、地球上の最高の上限である$\Lambda$を$<10^{-31}$ m$^{-2}$と定め、宇宙定数を初めて実験室で観測した。さらに、この設定は重力ポテンシャルにおける各項の距離依存的な振る舞いの測定を可能にし、修正重力理論をテストする新しい手段を提供する。 We propose a precise test of two fundamental gravitational constants using a novel detector concept that exploits the dynamics of quantum phononic excitations in a trapped Bose-Einstein condensate (BEC), operable at the scale of table-top experiments. In this setup, the sensitivity is enhanced by approximately two orders of magnitude through the use of a tritter operation, which mixes phononic excitations with the BEC's ground state. The BEC exhibits unique sensitivity to the two key components of the gravitational potential in $\Lambda$-gravity: the Newtonian $GM/r$ term and the cosmological constant $\Lambda r^2$. Using state-of-the-art experimental design, we predict that the gravitational constant $G$ could be measured with an accuracy up to $10^{-17}$ N m$^2$/kg$^2$, representing an improvement by two orders of magnitude over current measurements. Moreover, this experiment could establish the best Earth-based upper limit on $\Lambda$ at $<10^{-31}$ m$^{-2}$, marking the first laboratory-based probe of the cosmological constant. Additionally, the setup allows for the measurement of the distance-dependent behaviour of each term in the gravitational potential, providing a novel means to test modified gravity theories.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# 完全学習型医療システムを実現するためのフェデレーションラーニングのためのプライバシの進歩 Advances in Privacy Preserving Federated Learning to Realize a Truly Learning Healthcare System ( http://arxiv.org/abs/2409.19756v1 ) ライセンス: Link先を確認	Ravi Madduri, Zilinghan Li, Tarak Nandi, Kibaek Kim, Minseok Ryu, Alex Rodriguez,	(参考訳) 学習医療システム(LHS)の概念は、患者医療からのマルチモーダルデータを継続的に分析し、将来の医療成果を高める自己改善ネットワークを構想している。しかし、このビジョンを実現することは、データ共有とプライバシ保護において大きな課題に直面している。プライバシ保護フェデレーションラーニング(PPFL)は、患者プライバシを保護しながら、分散データからの協調的な学習を可能にすることによって、これらの課題に対処する可能性を秘めている、変革的で有望なアプローチである。本稿では,医学研究所 (IOM) Roundtable が定義した,真の LHS を実現するために,PPFL を医療エコシステムに統合するビジョンを提案する。 The concept of a learning healthcare system (LHS) envisions a self-improving network where multimodal data from patient care are continuously analyzed to enhance future healthcare outcomes. However, realizing this vision faces significant challenges in data sharing and privacy protection. Privacy-Preserving Federated Learning (PPFL) is a transformative and promising approach that has the potential to address these challenges by enabling collaborative learning from decentralized data while safeguarding patient privacy. This paper proposes a vision for integrating PPFL into the healthcare ecosystem to achieve a truly LHS as defined by the Institute of Medicine (IOM) Roundtable.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# 三次元因果順序推定ゲームにおける共有絡み合い Shared entanglement for three-party causal order guessing game ( http://arxiv.org/abs/2409.19762v1 ) ライセンス: Link先を確認	Ryszard Kukulski, Paulina Lewandowska, Karol Życzkowski,	(参考訳) コミュニケーションタスクの変種では、プレイヤーは特定のタスクを後で計算するためのローカル戦略を選択し、別々に作業する。パーティ間の通信と共有の絡み合いに量子ビットを利用することは、これらの状況におけるパフォーマンスを高めるための認識された方法である。本研究では,アリス,ボブ,チャーリーの3人が,その動きの隠された順序を知りたがるゲームを紹介した。演算の合成順序を識別するために、古典的な設定よりも共有絡み合いと局所演算を用いる量子戦略の利点を示す。また, 量子資源の役割について検討した。 In a variant of communication tasks, players cooperate in choosing their local strategies to compute a given task later, working separately. Utilizing quantum bits for communication and sharing entanglement between parties is a recognized method to enhance performance in these situations. In this work, we introduce the game for which three parties, Alice, Bob and Charlie, would like to discover the hidden order in which they make the moves. We show the advantage of quantum strategies that use shared entanglement and local operations over classical setups for discriminating operations' composition order. The role of quantum resources improving the probability of successful discrimination is also investigated.	翻訳日:2024-11-05 17:49:48 公開日:2024-09-29
# 空間的時間的注意を伴うスパイキングトランス Spiking Transformer with Spatial-Temporal Attention ( http://arxiv.org/abs/2409.19764v1 ) ライセンス: Link先を確認	Donghyun Lee, Yuhang Li, Youngeun Kim, Shiting Xiao, Priyadarshini Panda,	(参考訳) スパイキングニューラルネットワーク(SNN)は、疎二元活性化のため、従来のニューラルネットワーク(ANN)に代わる、魅力的でエネルギー効率のよい代替手段を提供する。トランスアーキテクチャの成功を生かしたスパイクトランスアーキテクチャは、データセットのサイズとパフォーマンスをスケールアップするために検討されている。しかし、既存の研究はスパイク変圧器における空間的自己意識のみを考慮し、時間経過を通して固有の時間的文脈を無視している。本研究では,空間的および時間的情報を付加的な計算負荷で自己注意に組み込むための,シンプルで簡単なアーキテクチャである空間的時間的注意を伴うスパイキングトランスフォーマー(STAtten)を提案する。 STAttenは、時間的またはトークンのインデックスを分割し、クロスマンタ内の自己アテンションを計算して、空間的時間的情報を効果的に組み込む。まず、時系列データセットを用いて、長期の時間的依存を捕捉する空間的注意機構の能力を検証する。さらに、CIFAR10/100、ImageNet、CIFAR10-DVS、N-Caltech101など、さまざまなデータセットに関する広範な実験を通じて、このアプローチを検証する。特に、当社のクロスアテンションメカニズムは、ImageNetデータセットで78.39パーセントの精度を実現しています。 Spiking Neural Networks (SNNs) present a compelling and energy-efficient alternative to traditional Artificial Neural Networks (ANNs) due to their sparse binary activation. Leveraging the success of the transformer architecture, the spiking transformer architecture is explored to scale up dataset size and performance. However, existing works only consider the spatial self-attention in spiking transformer, neglecting the inherent temporal context across the timesteps. In this work, we introduce Spiking Transformer with Spatial-Temporal Attention (STAtten), a simple and straightforward architecture designed to integrate spatial and temporal information in self-attention with negligible additional computational load. The STAtten divides the temporal or token index and calculates the self-attention in a cross-manner to effectively incorporate spatial-temporal information. We first verify our spatial-temporal attention mechanism's ability to capture long-term temporal dependencies using sequential datasets. Moreover, we validate our approach through extensive experiments on varied datasets, including CIFAR10/100, ImageNet, CIFAR10-DVS, and N-Caltech101. Notably, our cross-attention mechanism achieves an accuracy of 78.39 % on the ImageNet dataset.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# ロバスト抽出質問応答モデルに向けて : トレーニング方法論の再考 Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology ( http://arxiv.org/abs/2409.19766v1 ) ライセンス: Link先を確認	Son Quoc Tran, Matt Kretchmar,	(参考訳) 本稿では,抽出質問応答(EQA)モデルの堅牢性を向上させるための新しいトレーニング手法を提案する。従来の研究によると、既存のモデルは、答えがつかない質問を含むEQAデータセットでトレーニングされている場合、分散シフトや敵の攻撃に対するロバスト性の欠如が示されていた。それにもかかわらず、EQAトレーニングデータセットに解決不可能な質問を含めることは、現実の信頼性を確保する上で不可欠である。提案手法は,EQA問題に対する新たな損失関数を含み,多数のEQAデータセットに存在する暗黙の仮定に挑戦する。我々の方法で訓練されたモデルはドメイン内のパフォーマンスを維持しつつ、ドメイン外のデータセットを顕著に改善します。その結果、全テストセットでF1スコアが5.7に向上した。さらに,本モデルでは,2種類の敵攻撃に対するロバスト性が有意に向上し,デフォルトモデルに比べて性能は3分の1程度低下した。 This paper proposes a novel training method to improve the robustness of Extractive Question Answering (EQA) models. Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness against distribution shifts and adversarial attacks. Despite this, the inclusion of unanswerable questions in EQA training datasets is essential for ensuring real-world reliability. Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets. Models trained with our method maintain in-domain performance while achieving a notable improvement on out-of-domain datasets. This results in an overall F1 score improvement of 5.7 across all testing sets. Furthermore, our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# 複合非線形システムに対する適応型イベントトリガー強化学習制御 Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems ( http://arxiv.org/abs/2409.19769v1 ) ライセンス: Link先を確認	Umer Siddique, Abhinav Sinha, Yongcan Cao,	(参考訳) 本稿では,複雑な相互作用を特徴とする境界不確実性を考慮した連続時間非線形システムに対する適応型イベントトリガー型強化学習制御を提案する。具体的には,制御ポリシと通信ポリシの両方を共同で学習することで,個別に学習する際のパラメータ数や計算オーバーヘッドを削減できる。軌道全体の性能を表すアキュレートされた報酬で状態空間を拡大することにより、明示的な学習トリガー条件を必要とせずにトリガー条件の正確かつ効率的な決定が可能であることを示し、適応的な非定常ポリシーを導出する。最後に,提案手法の有効性を示す数値的な例をいくつか提示する。 In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# PPLNs: イベントベース時間モデリングのためのパラメトリックピスワイズ線形ネットワーク PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond ( http://arxiv.org/abs/2409.19772v1 ) ライセンス: Link先を確認	Chen Song, Zhenxiao Liang, Bo Sun, Qixing Huang,	(参考訳) 時間的視覚推定のためのPPLN(Parametric Piecewise Linear Networks)を提案する。 PPLNは、生物学的な神経行動を制御する神経型原理に動機付けられ、ヒト網膜の神経活動をシミュレートするために構築されたイベントカメラによってキャプチャされたデータを処理するのに理想的である。本稿では, 学習可能な係数を持つパラメトリック片方向線形関数を用いて, 人工ニューロンの膜電位を表現する方法について論じる。この設計は、最近KAN(Kolmogorov-Arnold Networks)によって普及した学習可能なパラメトリック関数からディープモデルを構築するという考え方を反映している。イベントベースおよび画像ベース視覚アプリケーションにおけるPPLNの最先端性能を示す実験は、ステアリング予測、人間のポーズ推定、動きの鈍化などである。実装のソースコードはhttps://github.com/chensong 1995/PPLN.comで公開されています。 We present Parametric Piecewise Linear Networks (PPLNs) for temporal vision inference. Motivated by the neuromorphic principles that regulate biological neural behaviors, PPLNs are ideal for processing data captured by event cameras, which are built to simulate neural activities in the human retina. We discuss how to represent the membrane potential of an artificial neuron by a parametric piecewise linear function with learnable coefficients. This design echoes the idea of building deep models from learnable parametric functions recently popularized by Kolmogorov-Arnold Networks (KANs). Experiments demonstrate the state-of-the-art performance of PPLNs in event-based and image-based vision applications, including steering prediction, human pose estimation, and motion deblurring. The source code of our implementation is available at https://github.com/chensong1995/PPLN.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# 単元領域一般化における検証・訓練のための工法分布シフト Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization ( http://arxiv.org/abs/2409.19774v1 ) ライセンス: Link先を確認	Nikos Efthymiadis, Giorgos Tolias, Ondřej Chum,	(参考訳) 単一ソースドメインの一般化は、ソースドメインのモデルを学習し、未確認のターゲットドメインにデプロイしようとする。ソースドメインデータのみへのアクセスを制限することは、一般化可能なモデルのトレーニング方法と、その検証方法という、2つの重要な課題を課します。トレーニング分布に対する検証の標準的な実践は、モデルの一般化能力を正確に反映していないが、テスト分布に対する検証は避けるべき誤りである。本研究では,対象領域における潜在的分布シフトの幅広い範囲を網羅した拡張リストを用いて,ソースドメインイメージを変換して,独立した検証セットを構築する。複数の手法の検証とテスト性能の相関関係を多種多様なデータセットで示す。提案したバリデーションは,メソッド選択や学習率チューニングにそれぞれ用いた場合,標準バリデーションよりも15.4%,1.6%の相対的精度向上を実現する。さらに, エッジマップの強化により形状バイアスを増大させる手法を新たに導入する。トレーニング中の強化の恩恵を受け、検証セットの独立性を維持するため、トレーニングや検証に使用される拡張タイプを分離するkフォールド検証プロセスが設計されている。拡張検証における最高の性能を達成する方法は、提案したファミリーから選択される。様々な標準ベンチマークで最先端のパフォーマンスを達成する。 Code at: https://github.com/NikosEfth/crafting-shifts Single-source domain generalization attempts to learn a model on a source domain and deploy it to unseen target domains. Limiting access only to source domain data imposes two key challenges - how to train a model that can generalize and how to verify that it does. The standard practice of validation on the training distribution does not accurately reflect the model's generalization ability, while validation on the test distribution is a malpractice to avoid. In this work, we construct an independent validation set by transforming source domain images with a comprehensive list of augmentations, covering a broad spectrum of potential distribution shifts in target domains. We demonstrate a high correlation between validation and test performance for multiple methods and across various datasets. The proposed validation achieves a relative accuracy improvement over the standard validation equal to 15.4% or 1.6% when used for method selection or learning rate tuning, respectively. Furthermore, we introduce a novel family of methods that increase the shape bias through enhanced edge maps. To benefit from the augmentations during training and preserve the independence of the validation set, a k-fold validation process is designed to separate the augmentation types used in training and validation. The method that achieves the best performance on the augmented validation is selected from the proposed family. It achieves state-of-the-art performance on various standard benchmarks. Code at: https://github.com/NikosEfth/crafting-shifts	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# モーメント制約学習によるニューラルネットワークの自動脱バイアス Automatic debiasing of neural networks via moment-constrained learning ( http://arxiv.org/abs/2409.19777v1 ) ライセンス: Link先を確認	Christian L. Hines, Oliver J. Hines,	(参考訳) 経済学やバイオ統計学における因果推定と非パラメトリック推定は、未知の結果回帰関数に適用される線形関数の平均と見なされることが多い。回帰関数をネーティブに学習し、対象関数のサンプル平均をバイアス付き推定器で取得し、ターゲット推定器(ターゲット学習、ダブルML、自動脱バイアスなど)のいわゆるRiesz表現器(RR)を学習するリッチな脱バイアス文学が発達した。 RRをその導出関数形式で学習することは、例えば、極端な逆確率重みや条件密度関数を学習する必要があるため、困難である。このような課題は、RRを直接学習するオートマチックデバイアス(AD)の最近の進歩を動機付けている。我々は、ADの欠点に対処し、予測モーメントを制限し、RR推定の堅牢性を改善し、ハイパーパラメーターを最適化する新しいRR学習手法として、モーメント制約学習を提案する。本手法は学習者の特定のクラスに縛られていないが,ニューラルネットワークを用いてこれを記述し,半合成データを用いた平均処理・導出効果推定の問題点を評価する。我々の数値実験は、最先端のベンチマークよりも性能が向上したことを示している。 Causal and nonparametric estimands in economics and biostatistics can often be viewed as the mean of a linear functional applied to an unknown outcome regression function. Naively learning the regression function and taking a sample mean of the target functional results in biased estimators, and a rich debiasing literature has developed where one additionally learns the so-called Riesz representer (RR) of the target estimand (targeted learning, double ML, automatic debiasing etc.). Learning the RR via its derived functional form can be challenging, e.g. due to extreme inverse probability weights or the need to learn conditional density functions. Such challenges have motivated recent advances in automatic debiasing (AD), where the RR is learned directly via minimization of a bespoke loss. We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in AD, constraining the predicted moments and improving the robustness of RR estimates to optimization hyperparamters. Though our approach is not tied to a particular class of learner, we illustrate it using neural networks, and evaluate on the problems of average treatment/derivative effect estimation using semi-synthetic data. Our numerical experiments show improved performance versus state of the art benchmarks.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# DNA分類の逆例 Adversarial Examples for DNA Classification ( http://arxiv.org/abs/2409.19788v1 ) ライセンス: Link先を確認	Hyunwoo Yoo,	(参考訳) DNA配列に基づいて訓練されたDNABERT2やヌクレオチドトランスフォーマーなどの事前訓練された言語モデルは、DNA配列分類タスクにおいて有望な性能を示した。これらのモデルの分類能力は、大量のDNA配列サンプルに基づいて訓練された言語モデルと、比較的小さな分類データセットによる微調整に起因している。しかし、これらのテキストベースのシステムは十分に堅牢ではなく、敵の例に対して脆弱である。逆行攻撃はテキスト分類において広く研究されているが、DNA配列分類では限定的な研究がなされている。本稿では,DNAシークエンス分類のためのテキスト分類において,一般的な攻撃アルゴリズムを適用した。様々な攻撃方法が文字・単語・文レベルでのDNA配列分類に与える影響について検討した。以上の結果から,実際のDNA言語モデル配列分類器はこれらの攻撃に対して脆弱であることが示唆された。 Pre-trained language models such as DNABERT2 and Nucleotide Transformer, which are trained on DNA sequences, have shown promising performance in DNA sequence classification tasks. The classification ability of these models stems from language models trained on vast amounts of DNA sequence samples, followed by fine-tuning with relatively smaller classification datasets. However, these text-based systems are not robust enough and can be vulnerable to adversarial examples. While adversarial attacks have been widely studied in text classification, there is limited research in DNA sequence classification. In this paper, we adapt commonly used attack algorithms in text classification for DNA sequence classification. We evaluated the impact of various attack methods on DNA sequence classification at the character, word, and sentence levels. Our findings indicate that actual DNA language model sequence classifiers are vulnerable to these attacks.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# クロスエントロピー最適化と推論を用いたリーマン仮説の解析 Analysis on Riemann Hypothesis with Cross Entropy Optimization and Reasoning ( http://arxiv.org/abs/2409.19790v1 ) ライセンス: Link先を確認	Kevin Li, Fulu Li,	(参考訳) 本稿では,リーマン仮説 [27] の解析のための新しい枠組みについて述べる。 a) クロスエントロピー最適化及び推論による確率的モデリング b) 大数の法則の適用 c) 数学的誘導の適用この分析は主に、クロスエントロピー最適化の確率論的モデリングと、まれな事象シミュレーション手法による推論によって行われる。大きな数 [2, 3, 6] の法則の適用と数学的帰納法の応用は、リーマン仮説の分析を自己完結させ、複素平面全体がリーマン仮説で予想されるようにカバーされることを保証する。また,大規模言語モデル (LLMs) を用いた拡張トップpサンプリング手法についても論じる。次のトークン予測は,現在のラウンドにおける各トークンの予測確率だけでなく,複数のトップkチェーンの思考(CoTs)パスの蓄積経路確率にも基づく。クロスエントロピー最適化と推論の確率的モデリングは、リーマンゼータ函数が本質的に複素数級数の無限成分の和を扱うので、リーマン仮説の分析に相応しい。この論文における我々の分析が、リーマン仮説のいくつかの洞察に光を当てることを願っている。本稿では,大規模言語モデル (LLM) における強化学習 (RL) [1, 7, 18, 21, 24, 34, 39-41] による思考の連鎖 (CoT) や思考の図 (DoT) による最近の発展と合わせて, リーマン仮説の最終的な証明の道を開いた。 In this paper, we present a novel framework for the analysis of Riemann Hypothesis [27], which is composed of three key components: a) probabilistic modeling with cross entropy optimization and reasoning; b) the application of the law of large numbers; c) the application of mathematical inductions. The analysis is mainly conducted by virtue of probabilistic modeling of cross entropy optimization and reasoning with rare event simulation techniques. The application of the law of large numbers [2, 3, 6] and the application of mathematical inductions make the analysis of Riemann Hypothesis self-contained and complete to make sure that the whole complex plane is covered as conjectured in Riemann Hypothesis. We also discuss the method of enhanced top-p sampling with large language models (LLMs) for reasoning, where next token prediction is not just based on the estimated probabilities of each possible token in the current round but also based on accumulated path probabilities among multiple top-k chain of thoughts (CoTs) paths. The probabilistic modeling of cross entropy optimization and reasoning may suit well with the analysis of Riemann Hypothesis as Riemann Zeta functions are inherently dealing with the sums of infinite components of a complex number series. We hope that our analysis in this paper could shed some light on some of the insights of Riemann Hypothesis. The framework and techniques presented in this paper, coupled with recent developments with chain of thought (CoT) or diagram of thought (DoT) reasoning in large language models (LLMs) with reinforcement learning (RL) [1, 7, 18, 21, 24, 34, 39-41], could pave the way for eventual proof of Riemann Hypothesis [27].	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# 適応的な段差収束を伴う勾配降下は(ほぼ)4次成長の下で線形に収束する Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth ( http://arxiv.org/abs/2409.19791v1 ) ライセンス: Link先を確認	Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang,	(参考訳) 最適化スペシャリストの間では、勾配降下の線型収束は、その最小化子から2次に成長する関数に付随するという考えが有力である。この研究では、この信念は不正確であると主張する。適応的な段差のある勾配降下は、その最小値から4階成長しか示さない任意の滑らかな関数に対して局所(ほぼ)線形速度で収束することを示す。任意のそのような函数は最適解 (ravine) と呼ばれる最適解の周りの滑らかな多様体を許容するので、函数は谷から少なくとも2次に成長し、それに沿って一定の順序展開を持つ。渓谷は、多くの短い勾配ステップを1つの長いポリアク勾配ステップでインターレースすることができ、これにより最小化器への急激な収束が保証される。本稿では,行列検出と因子化の問題に関する理論とアルゴリズムについて解説し,パラメータ化状態における単一ニューロンの学習について述べる。 A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# 電子カルテのブラックボックスセグメンテーション Black-Box Segmentation of Electronic Medical Records ( http://arxiv.org/abs/2409.19796v1 ) ライセンス: Link先を確認	Hongyi Yuan, Sheng Yu,	(参考訳) 電子カルテ(EMR)には、患者の医療詳細の大半が含まれている。それは、自動医療システムを開発するための豊富な資源である。概念抽出などのEMR処理に関する自然言語処理(NLP)研究のほとんどは、EMRセクションの不正確なセグメンテーションの影響を受けている。同時に、EMRの正確な切断には十分な注意が払われていない。セクション構造で発生する可能性のある情報は、未評価である。本研究は、EMRのセグメンテーションに焦点を当て、簡単な文埋め込みモデルとニューラルネットワークを用いたブラックボックスセグメンテーション法と適切なトレーニング手法を提案する。普遍的な適応性を達成するために、異なるセクションの方向のフォーマットでデータセット上でモデルをトレーニングする。先進的な深層学習に基づくNLP法の比較を行い,本手法は各種試験データに対して最適セグメンテーション精度(98%以上)を適切な学習コーパスで達成する。 Electronic medical records (EMRs) contain the majority of patients' healthcare details. It is an abundant resource for developing an automatic healthcare system. Most of the natural language processing (NLP) studies on EMR processing, such as concept extraction, are adversely affected by the inaccurate segmentation of EMR sections. At the same time, not enough attention has been given to the accurate sectioning of EMRs. The information that may occur in section structures is unvalued. This work focuses on the segmentation of EMRs and proposes a black-box segmentation method using a simple sentence embedding model and neural network, along with a proper training method. To achieve universal adaptivity, we train our model on the dataset with different section headings formats. We compare several advanced deep learning-based NLP methods, and our method achieves the best segmentation accuracies (above 98%) on various test data with a proper training corpus.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# 無向グラフ上のスピン相互作用によって生成される動的リー代数の分類 Classification of dynamical Lie algebras generated by spin interactions on undirected graphs ( http://arxiv.org/abs/2409.19797v1 ) ライセンス: Link先を確認	Efekan Kökcü, Roeland Wiersema, Alexander F. Kemper, Bojko N. Bakalov,	(参考訳) 非方向グラフ上の2-局所スピン相互作用によって生成されるすべての動的リー代数の分類を提供する。スピン鎖のそのような分類を提供するこれまでの研究に基づいて、非方向グラフのより一般的な場合を考える。他のグラフに対して、動的リー代数はグラフが双分数であるかどうかにのみ依存する。この結果の重要な結果は、力学リー代数が大きさの多項式である場合が特別であり、一次元に制限されることである。 We provide a classification of all dynamical Lie algebras generated by 2-local spin interactions on undirected graphs. Building on our previous work where we provided such a classification for spin chains, here we consider the more general case of undirected graphs. As it turns out, the one-dimensional case is special; for any other graph, the dynamical Lie algebra solely depends on whether the graph is bipartite or not. An important consequence of this result is that the cases where the dynamical Lie algebra is polynomial in size are special and restricted to one dimension.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# メンバーシップ推論攻撃は、モデルがデータでトレーニングされたことを証明できない Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data ( http://arxiv.org/abs/2409.19798v1 ) ライセンス: Link先を確認	Jie Zhang, Debeshee Das, Gautam Kamath, Florian Tramèr,	(参考訳) データ作成者や所有者が、データに基づいて機械学習モデルがトレーニングされたことを第三者に証明したいという、トレーニングデータ証明の問題について検討する。ウェブスケールのデータに基づいてトレーニングされた基礎モデルに対する最近の訴訟では、データ証明のトレーニングが重要な役割を担っている。多くの先行研究は、メンバシップ推論攻撃を用いたトレーニングデータ証明のインスタンス化を示唆している。確証のある証拠を提供するためには、データ作成者は攻撃が偽陽性率の低いこと、すなわち、攻撃の出力がターゲットデータでトレーニングされていないというnull仮説の下ではありそうにないことを示す必要がある。しかし、トレーニングセットの正確な内容が分かっていないため、このnull仮説からのサンプリングは不可能であり、大きな基礎モデルを再訓練することも(効果的に)できない。データ抽出攻撃と特別なカナリアデータに対するメンバーシップ推論が、音響訓練データ証明の作成に有効であることを示すことによって、この2つの経路を前進させることで結論付ける。 We consider the problem of a training data proof, where a data creator or owner wants to demonstrate to a third party that some machine learning model was trained on their data. Training data proofs play a key role in recent lawsuits against foundation models trained on web-scale data. Many prior works suggest to instantiate training data proofs using membership inference attacks. We argue that this approach is fundamentally unsound: to provide convincing evidence, the data creator needs to demonstrate that their attack has a low false positive rate, i.e., that the attack's output is unlikely under the null hypothesis that the model was not trained on the target data. Yet, sampling from this null hypothesis is impossible, as we do not know the exact contents of the training set, nor can we (efficiently) retrain a large foundation model. We conclude by offering two paths forward, by showing that data extraction attacks and membership inference on special canary data can be used to create sound training data proofs.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# Differentially Private Bilevel Optimization (特集:バイオサイバネティックス) Differentially Private Bilevel Optimization ( http://arxiv.org/abs/2409.19800v1 ) ライセンス: Link先を確認	Guy Kornowski,	(参考訳) 近年,様々な機械学習アプリケーションで注目されている問題クラスである,二段階最適化のための差分プライベート(DP)アルゴリズムを提案する。これらは、このタスクのための最初のDPアルゴリズムであり、任意のプライバシを提供すると同時に、大規模な設定では禁止されるヘッセン計算を避けることができる。上層が必ずしも凸ではなく、下層問題が強凸であるようなよく研究された設定の下で、提案された勾配ベースの$(\epsilon,\delta)$-DPアルゴリズムは、高次ノルムを持つ点を最大$\widetilde{\mathcal{O}}\left((\sqrt{d_\mathrm{up}}/\epsilon n)^{1/2}+(\sqrt{d_\mathrm{low}}/\epsilon n)^{1/3}\right)$で返します。本分析では, 制約付き, 制約なしの問題を網羅し, ミニバッチ勾配を考慮し, 経験的, 人口的損失の両面に適用した。 We present differentially private (DP) algorithms for bilevel optimization, a problem class that received significant attention lately in various machine learning applications. These are the first DP algorithms for this task that are able to provide any desired privacy, while also avoiding Hessian computations which are prohibitive in large-scale settings. Under the well-studied setting in which the upper-level is not necessarily convex and the lower-level problem is strongly-convex, our proposed gradient-based $(\epsilon,\delta)$-DP algorithm returns a point with hypergradient norm at most $\widetilde{\mathcal{O}}\left((\sqrt{d_\mathrm{up}}/\epsilon n)^{1/2}+(\sqrt{d_\mathrm{low}}/\epsilon n)^{1/3}\right)$ where $n$ is the dataset size, and $d_\mathrm{up}/d_\mathrm{low}$ are the upper/lower level dimensions. Our analysis covers constrained and unconstrained problems alike, accounts for mini-batch gradients, and applies to both empirical and population losses.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# CRScore: コードクレームとスモールにおけるコードレビューコメントの自動評価 CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells ( http://arxiv.org/abs/2409.19801v1 ) ライセンス: Link先を確認	Atharva Naik, Marcus Alenius, Daniel Fried, Carolyn Rose,	(参考訳) 自動コードレビューのタスクは最近、機械学習コミュニティから多くの注目を集めています。しかしながら、現在のレビューコメント評価指標は、コードレビューが1対1の問題であるにもかかわらず、与えられたコード変更(diffとも呼ばれる)に対する人間による参照との比較に依存している。これらの問題に対処するため、私たちは、簡潔さ、包括性、および関連性といったレビュー品質の次元を測定する基準のない指標であるCRScoreを開発します。我々はCRScoreを設計し、LCMと静的アナライザによって検出されたコードにおけるクレームや潜在的な問題に基づいてレビューを評価する。 CRScoreは人間の判断(0.54Spearman相関)に最も適合し、基準に基づく指標よりも敏感なレビュー品質の正確できめ細かなスコアを得られることを示す。また、自動メトリクスの開発をサポートするために、マシン生成およびGitHubレビューコメントのための2.6kの人手によるレビュー品質スコアのコーパスもリリースしました。 The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff), even though code review is a one-to-many problem like generation and summarization with many "valid reviews" for a diff. To tackle these issues we develop a CRScore - a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We design CRScore to evaluate reviews in a way that is grounded in claims and potential issues detected in the code by LLMs and static analyzers. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment (0.54 Spearman correlation) and are more sensitive than reference-based metrics. We also release a corpus of 2.6k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# RAGはLLMに不公平をもたらすか? Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems ( http://arxiv.org/abs/2409.19804v1 ) ライセンス: Link先を確認	Xuyang Wu, Shuowei Li, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang,	(参考訳) RAG(Retrieval-Augmented Generation)は、最近、オープンドメイン質問応答(QA)タスクにおいて外部知識ソースを統合する能力の強化により、大きな注目を集めている。しかし、これらのモデルがどのように公正な懸念に対処しているかは、特に性別、地理的な位置、その他の人口統計学的要因などのセンシティブな属性に関して、はっきりしない。第一に、言語モデルが実用性を優先するように進化するにつれて、正確なマッチング精度が向上するなど、公正性はほとんど見過ごされていました。第2に、RAGメソッドは複雑なパイプラインであり、各コンポーネントが異なる目標に最適化されているため、バイアスの特定と対処が難しい。本稿では,複数のRAG法における公平性を実証的に評価することを目的とする。本稿では,RAG法に適した公平度評価フレームワークを提案する。実験の結果、ユーティリティ駆動最適化の最近の進歩にもかかわらず、検索と生成の両方段階でフェアネスの問題が続き、RAGパイプライン内でより標的となるフェアネス介入の必要性が強調された。論文の受理後、データセットとコードを公開します。 RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources in open-domain question answering (QA) tasks. However, it remains unclear how these models address fairness concerns, particularly with respect to sensitive attributes such as gender, geographic location, and other demographic factors. First, as language models evolve to prioritize utility, like improving exact match accuracy, fairness may have been largely overlooked. Second, RAG methods are complex pipelines, making it hard to identify and address biases, as each component is optimized for different goals. In this paper, we aim to empirically evaluate fairness in several RAG methods. We propose a fairness evaluation framework tailored to RAG methods, using scenario-based questions and analyzing disparities across demographic attributes. The experimental results indicate that, despite recent advances in utility-driven optimization, fairness issues persist in both the retrieval and generation stages, highlighting the need for more targeted fairness interventions within RAG pipelines. We will release our dataset and code upon acceptance of the paper.	翻訳日:2024-11-05 17:39:51 公開日:2024-09-29
# PALM: 音声言語モデルのためのFew-Shot Prompt Learning PALM: Few-Shot Prompt Learning for Audio Language Models ( http://arxiv.org/abs/2409.19806v1 ) ライセンス: Link先を確認	Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki,	(参考訳) 音声言語モデル(ALM)は,視覚言語モデル(VLM)の進歩に触発されて,音声波形の特徴とクラス固有のテキストプロンプト機能とを一致させるゼロショット音声認識タスクにおいて,近年顕著な成功を収めている。手作りテキストプロンプトの選択に対するゼロショット性能の感度を考慮すると、VLM向けに多くの素早い学習技術が開発されている。本稿では,ALMにおけるこれらの手法の有効性について検討し,テキストエンコーダブランチの機能空間を最適化する新しい手法であるPrompt Learning in Audio Language Models (PALM)を提案する。入力空間で動作する既存の手法とは異なり、我々の手法はトレーニング効率を向上する。我々は,11の音声認識データセットに対するアプローチの有効性を実証し,様々な音声処理タスクを包含し,その結果を数ショットの学習設定で3つのベースラインと比較した。我々の手法は計算量が少なく、他の手法と同等か優れる。コードはhttps://asif-hanif.github.io/palm/で入手できる。 Audio-Language Models (ALMs) have recently achieved remarkable success in zero-shot audio recognition tasks, which match features of audio waveforms with class-specific text prompt features, inspired by advancements in Vision-Language Models (VLMs). Given the sensitivity of zero-shot performance to the choice of hand-crafted text prompts, many prompt learning techniques have been developed for VLMs. We explore the efficacy of these approaches in ALMs and propose a novel method, Prompt Learning in Audio Language Models (PALM), which optimizes the feature space of the text encoder branch. Unlike existing methods that work in the input space, our approach results in greater training efficiency. We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup. Our method is either on par with or outperforms other approaches while being computationally less demanding. Code is available at https://asif-hanif.github.io/palm/	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# モデルが例からスキル構成を学べるか? Can Models Learn Skill Composition from Examples? ( http://arxiv.org/abs/2409.19808v1 ) ライセンス: Link先を確認	Haoyu Zhao, Simran Kaur, Dingli Yu, Anirudh Goyal, Sanjeev Arora,	(参考訳) 大規模言語モデル(LLM)がますます進歩するにつれて、学習スキルをトレーニング中に遭遇しない新しい方法で組み合わせる能力である作曲の一般化を示す能力は、大きな注目を集めている。この種の一般化、特にトレーニングデータ以外のシナリオでは、AIの安全性とアライメントの研究にも大きな関心がある。最近の研究では、SKILL-MIXの評価を導入し、モデルが特定の言語スキルを1k$-tupleで使用することを実証する短い段落を構成することを課題としている。小型モデルは$k=3$でも組み立てに苦労したが、GPT-4のような大型モデルは$k=5$と$6$の順調に動作した。本稿では,SKILL-MIXに類似した設定を用いて,より小さなモデルのキャパシティを評価し,例から構成一般化を学習する。 GPT-4は、修辞学、文学、推論、心の理論、常識を含む多様な言語スキルのセットを利用して、ランダムに$k$スキルのサブセットを示すテキストサンプルを生成する。 1)$k=2$と$3$の組み合わせのトレーニングの結果、トレーニング中にそのような例を見たことのないモデルにもかかわらず、テキストを$k=4$と$5$のスキルで組み立てる能力が顕著に向上した。 2) スキルカテゴリーをトレーニンググループと保持グループに分けた場合, 微調整中のトレーニングスキルしか見ていないにもかかわらず, テスト中の保持スキルのあるテキストの作曲において, 従来は見つからなかったスキルであっても, トレーニングアプローチの有効性を示唆するモデルが有意に向上する。また,本研究では,スキルリッチ(潜在的に合成的な)テキストをトレーニングに取り入れることで,モデルの構成能力を大幅に向上させることが示唆された。 As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified $k$-tuple of language skills. While small models struggled with composing even with $k=3$, larger models like GPT-4 performed reasonably well with $k=5$ and $6$. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of $k$ skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of $k$, revealed the following findings: (1) Training on combinations of $k=2$ and $3$ skills results in noticeable improvements in the ability to compose texts with $k=4$ and $5$ skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# ハイブリッド特徴をもつロバストインクリメンタル構造 Robust Incremental Structure-from-Motion with Hybrid Features ( http://arxiv.org/abs/2409.19811v1 ) ライセンス: Link先を確認	Shaohui Liu, Yidan Gao, Tianyi Zhang, Rémi Pautrat, Johannes L. Schönberger, Viktor Larsson, Marc Pollefeys,	(参考訳) Structure-from-Motion (SfM) は、カメラキャリブレーションとシーン再構築のためのユビキタスツールとなり、コンピュータビジョンなど多くのダウンストリームアプリケーションで使われている。最先端のSfMパイプラインは、何十年もの間、よくコンテクストされた、よく構成されたシーンで高い成熟度に達してきたが、それでも、挑戦的なシナリオでSfMの問題を堅牢に解決するには至っていない。特に、弱いテクスチャ化されたシーンと制約の弱い構成は、しばしば破滅的な失敗や、主にキーポイントベースのパイプラインの大きなエラーを引き起こす。これらのシナリオでは、線分はしばしば豊富であり、相補的な幾何学的制約を与えることができる。それらの大きな空間範囲と典型的に構造化された構成は、伝統的なキーポイントベースの手法と比較して、より強い幾何学的制約をもたらす。本研究では、点に加えて、線とその構造的幾何関係を利用する漸進的なSfMシステムを導入する。技術的なコントリビューションはパイプライン全体(マッピング、三角測量、登録)に及び、これらを総合的なエンドツーエンドのSfMシステムに統合し、コミュニティとオープンソースソフトウェアとして共有しています。また, 感度解析による3次元最適化線の不確かさを伝搬する最初の解析手法を提案する。実験により、我々のシステムは、SfMの広く使われているポイントベースの最先端技術と比較して、一貫して堅牢で正確であることが示され、よりリッチなマップとより正確なカメラ登録を実現している。さらに、我々の不確実性を考慮したローカライゼーションモジュールだけでは、ポイントアローンとハイブリッドの両方のセットアップの下で、最先端よりも一貫して改善できる。 Structure-from-Motion (SfM) has become a ubiquitous tool for camera calibration and scene reconstruction with many downstream applications in computer vision and beyond. While the state-of-the-art SfM pipelines have reached a high level of maturity in well-textured and well-configured scenes over the last decades, they still fall short of robustly solving the SfM problem in challenging scenarios. In particular, weakly textured scenes and poorly constrained configurations oftentimes cause catastrophic failures or large errors for the primarily keypoint-based pipelines. In these scenarios, line segments are often abundant and can offer complementary geometric constraints. Their large spatial extent and typically structured configurations lead to stronger geometric constraints as compared to traditional keypoint-based methods. In this work, we introduce an incremental SfM system that, in addition to points, leverages lines and their structured geometric relations. Our technical contributions span the entire pipeline (mapping, triangulation, registration) and we integrate these into a comprehensive end-to-end SfM system that we share as an open-source software with the community. We also present the first analytical method to propagate uncertainties for 3D optimized lines via sensitivity analysis. Experiments show that our system is consistently more robust and accurate compared to the widely used point-based state of the art in SfM -- achieving richer maps and more precise camera registrations, especially under challenging conditions. In addition, our uncertainty-aware localization module alone is able to consistently improve over the state of the art under both point-alone and hybrid setups.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 隠れ状態から二項意味的特徴への変換 Transforming Hidden States into Binary Semantic Features ( http://arxiv.org/abs/2409.19813v1 ) ライセンス: Link先を確認	Tomáš Musil, David Mareček,	(参考訳) 大規模言語モデルは、分布意味論から直接的にインスパイアされた多くのNLPアプリケーションの系統に従っているが、もはやそれと密接な関係はないようである。本稿では,再び意味の分布論を採用することを提案する。独立成分分析を用いて、その難易度を克服し、大きな言語モデルがそれらの隠れ状態における意味的特徴を表現することを示す。 Large language models follow a lineage of many NLP applications that were directly inspired by distributional semantics, but do not seem to be closely related to it anymore. In this paper, we propose to employ the distributional theory of meaning once again. Using Independent Component Analysis to overcome some of its challenging aspects, we show that large language models represent semantic features in their hidden states.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 接地型カリキュラム学習 Grounded Curriculum Learning ( http://arxiv.org/abs/2409.19816v1 ) ライセンス: Link先を確認	Linji Wang, Zifan Xu, Peter Stone, Xuesu Xiao,	(参考訳) ロボット工学の強化学習(Reinforcement Learning, RL)における実世界のデータの高コスト化により、シミュレータが広く使われるようになる。シミュレーターのためのより良いダイナミクスモデルを構築するための広範な研究にもかかわらず、シミュレーションと現実の世界、すなわち利用可能なトレーニングタスクの分布との間には、しばしば見過ごされがちなミスマッチがある。このようなミスマッチは、実世界との関係を考慮せずに自動的にシミュレーションタスク分布を変化させる既存のカリキュラム学習技術によってさらに悪化する。これらの課題を考慮すると、ロボット工学RLのカリキュラム学習は現実世界のタスク分布に基礎を置く必要があると仮定する。そこで本研究では,カリキュラムにおける模擬課題分布を実世界と整合させるグラウンドドカリキュラムラーニング(GCL)を提案する。複雑なナビゲーションタスクに対してBARNデータセットを用いてGCLを検証し、最先端のCL法と人間専門家が設計したカリキュラムと比較して、成功率6.8%と6.5%を達成した。これらの結果から,GCLは実世界におけるシミュレーションタスクの分布を適応的なカリキュラム内で基礎づけることで,学習効率とナビゲーション性能を向上させることができることがわかった。 The high cost of real-world data for robotics Reinforcement Learning (RL) leads to the wide usage of simulators. Despite extensive work on building better dynamics models for simulators to match with the real world, there is another, often-overlooked mismatch between simulations and the real world, namely the distribution of available training tasks. Such a mismatch is further exacerbated by existing curriculum learning techniques, which automatically vary the simulation task distribution without considering its relevance to the real world. Considering these challenges, we posit that curriculum learning for robotics RL needs to be grounded in real-world task distributions. To this end, we propose Grounded Curriculum Learning (GCL), which aligns the simulated task distribution in the curriculum with the real world, as well as explicitly considers what tasks have been given to the robot and how the robot has performed in the past. We validate GCL using the BARN dataset on complex navigation tasks, achieving a 6.8% and 6.5% higher success rate compared to a state-of-the-art CL method and a curriculum designed by human experts, respectively. These results show that GCL can enhance learning efficiency and navigation performance by grounding the simulation task distribution in the real world within an adaptive curriculum.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 適応的温度スケーリングによる言語モデルの校正 Calibrating Language Models with Adaptive Temperature Scaling ( http://arxiv.org/abs/2409.19817v1 ) ライセンス: Link先を確認	Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric Mitchell, Chelsea Finn,	(参考訳) 大規模言語モデル(LLM)の有効性は、正確な出力を生成する能力だけでなく、その信頼性スコアが正しい出力の確率を如何に反映しているかによって測定される。教師なしプレトレーニングでは条件付き確率のLLMが得られることが示されているが、最近の研究では、人間からのフィードバック(RLHF)からの強化学習による微調整により、これらのモデルの校正が著しく低下することが示されている。本研究では,各トークンの温度スケーリングパラメータを予測するポストホックキャリブレーション法であるAdaptive Temperature Scaling (ATS)を導入する。予測温度値はトークンレベルの特徴に基づいて適応し、標準教師付き微調整(SFT)データセットに適合する。 ATSの適応性は、RLHF微調整後に起こりうる様々なキャリブレーションシフトに対処する。 ATSは、従来のキャリブレーション手法と比較して、3つの下流自然言語評価ベンチマークで10-50%以上のキャリブレーションを改善し、RLHFの性能改善を阻害しない。 The effectiveness of large language models (LLMs) is not only measured by their ability to generate accurate outputs but also by their calibration-how well their confidence scores reflect the probability of their outputs being correct. While unsupervised pre-training has been shown to yield LLMs with well-calibrated conditional probabilities, recent studies have shown that after fine-tuning with reinforcement learning from human feedback (RLHF), the calibration of these models degrades significantly. In this work, we introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction. The predicted temperature values adapt based on token-level features and are fit over a standard supervised fine-tuning (SFT) dataset. The adaptive nature of ATS addresses the varying degrees of calibration shift that can occur after RLHF fine-tuning. ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods and does not impede performance improvements from RLHF.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# Qompose: ニュートラル原子量子アーキテクチャのための最適アルゴリズム特化レイアウトの選択手法 Qompose: A Technique to Select Optimal Algorithm- Specific Layout for Neutral Atom Quantum Architectures ( http://arxiv.org/abs/2409.19820v1 ) ライセンス: Link先を確認	Daniel Silver, Tirthak Patel, Devesh Tiwari,	(参考訳) 量子コンピューティングアーキテクチャが成熟するにつれて、ユニークな利点をもたらす新しい技術を研究することが重要である。本研究では、中性原子の2次元トポロジ上の量子回路を効率的に構成するための中性原子量子コンピューティングフレームワークであるQomposeを提案する。 Qomposeは任意の回路に対して効率的なトポロジを選択し、効率的な並列処理による実行時間と全体的な忠実度を最適化する。我々の広範な評価は、Qomposeがランダムに生成された量子回路の大規模なコレクションや、VQE、ISING、QAOAなどの実世界のベンチマークに有効であることを示している。 As quantum computing architecture matures, it is important to investigate new technologies that lend unique advantages. In this work, we propose, Qompose, a neutral atom quantum computing framework for efficiently composing quantum circuits on 2-D topologies of neutral atoms. Qompose selects an efficient topology for any given circuit in order to optimize for length of execution through efficient parallelism and for overall fidelity. our extensive evaluation demonstrates the Qompose is effective for a large collection of randomly-generated quantum circuits and a range of real-world benchmarks including VQE, ISING, and QAOA.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# ロボットによる外科手術におけるすべての追跡 Tracking Everything in Robotic-Assisted Surgery ( http://arxiv.org/abs/2409.19821v1 ) ライセンス: Link先を確認	Bohan Zhan, Wang Zhao, Yi Fang, Bo Du, Francisco Vasconcelos, Danail Stoyanov, Daniel S. Elson, Baoru Huang,	(参考訳) ビデオ内の組織や器具の正確な追跡は、ロボットによる最小侵襲手術(RAMIS)にとって重要である。従来のキーポイントベースのスパーストラッキングは特徴点によって制限されるが、フローベースの密度の高い2ビューマッチングは長期的なドリフトに悩まされる。近年,これらの制限を克服し,高精度な長期追跡を実現するため,TAPアルゴリズムが提案されている。しかし、手術シナリオにおけるその有効性は、主に評価のための包括的な手術追跡データセットが欠如していることから、未検証のままである。このギャップに対処するために,我々は,複雑な組織と楽器の動きを持つ実世界の手術映像を含む,手術シナリオの追跡方法のベンチマークを行うための,新しい注釈付き手術追跡データセットを導入した。我々は,このデータセット上で最先端(SOTA)のTAPベースのアルゴリズムを広範囲に評価し,高速計器動作,重度閉塞,動きのぼけなど,困難な手術シナリオにおけるそれらの限界を明らかにする。さらに,新たなトラッキング手法であるSurgMotionを提案し,その課題の解決とトラッキング性能の向上を図る。提案手法は, 手術器具追跡におけるTAPベースのアルゴリズムよりも優れており, 特に, 挑戦的な医用ビデオのベースラインよりも顕著に改善されている。 Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery (RAMIS), as it enables the robot to comprehend the surgical scene with precise locations and interactions of tissues and tools. Traditional keypoint-based sparse tracking is limited by featured points, while flow-based dense two-view matching suffers from long-term drifts. Recently, the Tracking Any Point (TAP) algorithm was proposed to overcome these limitations and achieve dense accurate long-term tracking. However, its efficacy in surgical scenarios remains untested, largely due to the lack of a comprehensive surgical tracking dataset for evaluation. To address this gap, we introduce a new annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios, comprising real-world surgical videos with complex tissue and instrument motions. We extensively evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios, including fast instrument motion, severe occlusions, and motion blur, etc. Furthermore, we propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance. Our proposed method outperforms most TAP-based algorithms in surgical instruments tracking, and especially demonstrates significant improvements over baselines in challenging medical videos.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# OrganiQ: NISQ-Eraマシン上での量子生成逆数ネットワークの古典的リソース基盤のマイグレーション OrganiQ: Mitigating Classical Resource Bottlenecks of Quantum Generative Adversarial Networks on NISQ-Era Machines ( http://arxiv.org/abs/2409.19823v1 ) ライセンス: Link先を確認	Daniel Silver, Tirthak Patel, Aditya Ranjan, William Cutler, Devesh Tiwari,	(参考訳) ハードウェア能力の急速な進歩によって、量子機械学習は研究分野として注目されている。近年、量子画像生成は有望な結果を生み出している。しかし、従来の量子画像生成技術は古典的なニューラルネットワークに依存しており、量子ポテンシャルと画質を制限している。そこで我々は,従来のニューラルネットワークを使わずに高品質な画像を生成可能な,最初の量子GANであるOrganiQを紹介する。 Driven by swift progress in hardware capabilities, quantum machine learning has emerged as a research area of interest. Recently, quantum image generation has produced promising results. However, prior quantum image generation techniques rely on classical neural networks, limiting their quantum potential and image quality. To overcome this, we introduce OrganiQ, the first quantum GAN capable of producing high-quality images without using classical neural networks.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# ドメイン適応による広告ランク付けモデルの実証評価 Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation ( http://arxiv.org/abs/2409.19824v1 ) ライセンス: Link先を確認	Mohamed A. Radwan, Himaghna Bhattacharjee, Quinn Lanners, Jiasheng Zhang, Serkan Karakulak, Houssam Nassif, Murat Ali Bayir,	(参考訳) ランク付けモデルを評価するために,オフラインA/Bテストシステムと連携して機能するドメイン適応型報酬モデルを提案する。このアプローチは、IPSのようなモデルフリーメソッドが実現不可能な大規模広告レコメンデータシステムにおいて、モデル変更のランク付けに対する報酬を効果的に測定する。提案手法は,非一般化報酬モデルを用いて,バニラIPS法とアプローチの両方に優れることを示した。 We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# PhishGuard: 最適なフィッシングサイト検出のための多層アンサンブルモデル PhishGuard: A Multi-Layered Ensemble Model for Optimal Phishing Website Detection ( http://arxiv.org/abs/2409.19825v1 ) ライセンス: Link先を確認	Md Sultanul Islam Ovi, Md. Hasibur Rahman, Mohammad Arif Hossain,	(参考訳) フィッシング攻撃はサイバーセキュリティの脅威の増大であり、悪意のあるウェブサイトを通じて機密情報を盗むのに詐欺的手法を活用している。これらの攻撃に対処するために,フィッシングサイト検出を改善するために設計された最適なカスタムアンサンブルモデルであるPhishGuardを紹介した。このモデルは、ランダムフォレスト、グラディエントブースティング、CatBoost、XGBoostを含む複数の機械学習分類器を組み合わせて、検出精度を高める。 SelectKBestやRFECVといった高度な機能選択方法や、ハイパーパラメータチューニングやデータバランシングといった最適化を通じて、モデルをトレーニングし、4つの公開データセットで評価した。 PhishGuardは最先端のモデルよりも優れており、データセットの1つで99.05%の精度を達成し、他のデータセットでも同様に高い結果を得た。本研究は,アンサンブル学習と組み合わせた最適化手法がフィッシング検出性能を大幅に向上することを示す。 Phishing attacks are a growing cybersecurity threat, leveraging deceptive techniques to steal sensitive information through malicious websites. To combat these attacks, this paper introduces PhishGuard, an optimal custom ensemble model designed to improve phishing site detection. The model combines multiple machine learning classifiers, including Random Forest, Gradient Boosting, CatBoost, and XGBoost, to enhance detection accuracy. Through advanced feature selection methods such as SelectKBest and RFECV, and optimizations like hyperparameter tuning and data balancing, the model was trained and evaluated on four publicly available datasets. PhishGuard outperformed state-of-the-art models, achieving a detection accuracy of 99.05% on one of the datasets, with similarly high results across other datasets. This research demonstrates that optimization methods in conjunction with ensemble learning greatly improve phishing detection performance.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 教育コンテンツアセスメントプラットフォームにおけるブロックチェーンによる統合性検証:軽量で費用効率の良いアプローチ Blockchain-enhanced Integrity Verification in Educational Content Assessment Platform: A Lightweight and Cost-Efficient Approach ( http://arxiv.org/abs/2409.19828v1 ) ライセンス: Link先を確認	Talgar Bayan, Richard Banach, Askar Nurbekov, Makhmud Mustafabek Galy, Adi Sabyrbayev, Zhanat Nurbekova,	(参考訳) 教育のデジタル化の増大は、教育コンテンツの完全性と信頼性を維持する上で大きな課題となっている。従来のシステムでは、特に透明で安全な評価メカニズムの需要が増大している教師の職業的活動の評価において、データの信頼性の確保と不正な変更の防止に失敗することが多い。このような状況下では、ブロックチェーン技術はこれらの問題に対処するための新しいソリューションを提供する。本稿では,教育資料のレビューと評価に使用されるEPEC (Electronic Platform for Expertise of Content) のためのブロックチェーンフレームワークを提案する。弊社のアプローチでは、Ethereum用のLayer-2ソリューションであるPolygonネットワークを統合して、暗号化されたレビューをセキュアに保存および取得し、プライバシと説明責任の両方を保証します。 Python、Frask、Web3.pyを活用することで、私たちはSolidityベースのスマートコントラクトと対話し、各レビューを、オンチェーンデータを現実世界のデータベースに接続するユニークな識別子(UID)に安全にリンクします。 Dockerを使ってコンテナ化されたこのシステムは、APIエンドポイントによるデプロイと統合を容易にする。当社の実装は,Ethereumと比較して98%のガス料金削減を実現しており,スケーラブルで費用対効果の高いソリューションとなっている。本研究は,デジタル教育の現場における信頼と透明性を高める実践的かつセキュアなフレームワークを提供する,教育コンテンツ検証におけるブロックチェーン実装の継続的な取り組みに寄与する。 The growing digitization of education presents significant challenges in maintaining the integrity and trustworthiness of educational content. Traditional systems often fail to ensure data authenticity and prevent unauthorized alterations, particularly in the evaluation of teachers' professional activities, where demand for transparent and secure assessment mechanisms is increasing. In this context, Blockchain technology offers a novel solution to address these issues. This paper introduces a Blockchain-enhanced framework for the Electronic Platform for Expertise of Content (EPEC), a platform used for reviewing and assessing educational materials. Our approach integrates the Polygon network, a Layer-2 solution for Ethereum, to securely store and retrieve encrypted reviews, ensuring both privacy and accountability. By leveraging Python, Flask, and Web3.py, we interact with a Solidity-based smart contract to securely link each review to a unique identifier (UID) that connects on-chain data with real-world databases. The system, containerized using Docker, facilitates easy deployment and integration through API endpoints. Our implementation demonstrates significant cost savings, with a 98\% reduction in gas fees compared to Ethereum, making it a scalable and cost-effective solution. This research contributes to the ongoing effort to implement Blockchain in educational content verification, offering a practical and secure framework that enhances trust and transparency in the digital education landscape.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 分散型非ラベル運動計画のためのグラフニューラルネットワークの一般化可能性 Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning ( http://arxiv.org/abs/2409.19829v1 ) ライセンス: Link先を確認	Shreyas Muthusamy, Damian Owerko, Charilaos I. Kanatsoulis, Saurav Agarwal, Alejandro Ribeiro,	(参考訳) ラベルなしの運動計画では、衝突回避を確保しつつ、移動距離を最小にすることを目的として、ロボットのセットを目標の場所に割り当てる。この問題は、探査、監視、輸送などの応用において、マルチロボットシステムにとって不可欠なビルディングブロックを形成している。この問題に対処するために、各ロボットは、その400ドルのアネレストロボットと$k$アネレストターゲットの位置のみを知っている分散環境で対処する。このシナリオは組合せ代入と連続空間運動計画の要素を組み合わせることで、従来の集中型アプローチにおいて大きなスケーラビリティ上の課題を提起する。これらの課題を克服するために,グラフニューラルネットワーク(GNN)を用いて学習した分散ポリシを提案する。 GNN は,(1) ロボットが隣人と通信する情報を判断し,(2) 受信した情報をローカルな観察と統合する方法を判断することを可能にする。我々は,ハンガリーの集中型アルゴリズムをエキスパートポリシーとして模倣学習を用いてGNNを訓練し,さらに強化学習を用いて微調整を行い,衝突を回避し,性能を向上させる。大規模な経験的評価は、我々のアプローチのスケーラビリティと有効性を示している。 100台のロボットで訓練されたGNNポリシーは、最大500台のロボットでシナリオを一般化し、最先端のソリューションを平均8.6倍に上回り、非集中的な手法をはるかに上回っている。この作業は、スケーラビリティが重要である設定において、マルチロボット調整問題を解決するための基盤となる。 Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets. This scenario combines elements of combinatorial assignment and continuous-space motion planning, posing significant scalability challenges for traditional centralized approaches. To overcome these challenges, we propose a decentralized policy learned via a Graph Neural Network (GNN). The GNN enables robots to determine (1) what information to communicate to neighbors and (2) how to integrate received information with local observations for decision-making. We train the GNN using imitation learning with the centralized Hungarian algorithm as the expert policy, and further fine-tune it using reinforcement learning to avoid collisions and enhance performance. Extensive empirical evaluations demonstrate the scalability and effectiveness of our approach. The GNN policy trained on 100 robots generalizes to scenarios with up to 500 robots, outperforming state-of-the-art solutions by 8.6\% on average and significantly surpassing greedy decentralized methods. This work lays the foundation for solving multi-robot coordination problems in settings where scalability is important.	翻訳日:2024-11-05 17:29:56 公開日:2024-09-29
# 金融・電子商取引における機械学習のセキュリティとデータ攻撃に関する調査 Survey of Security and Data Attacks on Machine Unlearning In Financial and E-Commerce ( http://arxiv.org/abs/2410.00055v1 ) ライセンス: Link先を確認	Carl E. J. Brodzinski,	(参考訳) 本稿では、金融・電子商取引アプリケーションを中心に、機械学習におけるセキュリティとデータアタックの状況について調査する。我々は、メンバーシップ推論攻撃やデータ再構成攻撃などの重要なプライバシー上の脅威について論じ、敵は削除すべきデータを推測または再構成しようとする。さらに、モデルを操作したり破損させたりするためのアンラーニングの基本的なメカニズムをターゲットにした、Machine Unlearning Data Poisoning、Unlearning Request Attacks、Machine Unlearning Jailbreak Attacksなどのセキュリティ攻撃についても検討する。これらのリスクを軽減するために、差分プライバシー、堅牢な暗号保証、ZKP(Zero-Knowledge Proofs)など、さまざまな防御戦略が検討されている。これらのアプローチは、不正なモデルが詐欺、データ漏洩、評判の被害につながるような、高額な金融および電子商取引の文脈におけるデータの整合性とプライバシの保護に不可欠である。この調査は、セキュアマシンアンラーニングにおける継続的な研究とイノベーションの必要性と、進化する攻撃ベクトルに対する強力な防御を開発することの重要性を強調している。 This paper surveys the landscape of security and data attacks on machine unlearning, with a focus on financial and e-commerce applications. We discuss key privacy threats such as Membership Inference Attacks and Data Reconstruction Attacks, where adversaries attempt to infer or reconstruct data that should have been removed. In addition, we explore security attacks including Machine Unlearning Data Poisoning, Unlearning Request Attacks, and Machine Unlearning Jailbreak Attacks, which target the underlying mechanisms of unlearning to manipulate or corrupt the model. To mitigate these risks, various defense strategies are examined, including differential privacy, robust cryptographic guarantees, and Zero-Knowledge Proofs (ZKPs), offering verifiable and tamper-proof unlearning mechanisms. These approaches are essential for safeguarding data integrity and privacy in high-stakes financial and e-commerce contexts, where compromised models can lead to fraud, data leaks, and reputational damage. This survey highlights the need for continued research and innovation in secure machine unlearning, as well as the importance of developing strong defenses against evolving attack vectors.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# 円偏光下での非線形発振子としての水素原子:エピ環状電子軌道 Hydrogen atom as a nonlinear oscillator under circularly polarized light: epicyclical electron orbits ( http://arxiv.org/abs/2410.00056v1 ) ライセンス: Link先を確認	Quirino Sugon Jr, Clint Dominic G. Bennett, Daniel J. McNamara,	(参考訳) 本稿ではクリフォード代数 $Cl_{2,0}$ を用いて、クーロン力の下での水素電子の2次元軌道と、角周波数~$\omega$ で摂動する円偏光電場を求める。電子の非摂動円軌道を角周波数$\omega_0$で共振する座標系を用いて、ローレンツ振動子方程式と似ているが異なる摂動に関する複素非線形方程式を導出する:(1)加速項は似ており、(2)減衰項係数はコリオリス力により実ではなく虚である、(3)ばね力に類似した項は正ではないが負である、(3)ローレンツアナログを持たない摂動項の複素共役は非線型である、(4)強制項の角周波数は$\omega$ではなく$\omega - \omega_0$である。電子の位置と速度が t = 0$ で連続であることを示すことによって、電子の軌道は周波数 0, $\omega_0$, $2\omega_0$, $(2\omega_0 - \omega)$, $\omega$ の5つの指数フーリエ項の和であることを示す。共鳴光周波数$0$、$\omega_0$、および$2\omega_0$では、電子の軌道は分岐するが、ケプリリア楕円体に近似する。他の光周波数では、軌道は周波数比$\omega/\omega_0$に応じて$\pi/\omega_0$の整数倍の周期を持つ非発散である。そして$\omega/\omega_0\rightarrow \pm\infty$として、軌道は電子の未摂動円軌道に近づく。 In this paper, we use Clifford algebra $Cl_{2,0}$ to find the 2D orbit of Hydrogen electron under a Coulomb force and a perturbing circularly polarized electric field of light at angular frequency~$\omega$, which is turned on at time $t = 0$ via a unit step switch. Using a coordinate system co-rotating with the electron's unperturbed circular orbit at angular frequency $\omega_0$, we derive the complex nonlinear differential equation for the perturbation which is similar to but different from the Lorentz oscillator equation: (1) the acceleration terms are similar, (2) the damping term coefficient is not real but imaginary due to Coriolis force, (3) the term similar to spring force is not positive but negative, (3) there is a complex conjugate of the perturbation term which has no Lorentz analog but which makes the equation nonlinear, and (4) the angular frequency of the forcing term is not $\omega$ but $\omega - \omega_0$. By imposing that the position and velocity of the electron are continuous at time $t = 0$, we show that the orbit of the electron is a sum of five exponential Fourier terms with frequencies 0, $\omega_0$, $2\omega_0$, $(2\omega_0 - \omega)$, and $\omega$, which correspond to the eccentric, deferent, and three epicycles in Copernican astronomy. We show that at the three resonant light frequencies $0$, $\omega_0$, and $2\omega_0$, the electron's orbit is divergent, but approximates a Keplerian ellipse. At other light frequencies, the orbits are nondivergent with periods that are integer multiples of $\pi/\omega_0$ depending on the frequency ratio $\omega/\omega_0$. And as $\omega/\omega_0\rightarrow \pm\infty$, the orbit approaches the electron's unperturbed circular orbit.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# STTM:オンデマンドフードデリバリーにおけるリアルタイム圧力信号のための空間時間変換器とメモリネットワーク STTM: A New Approach Based Spatial-Temporal Transformer And Memory Network For Real-time Pressure Signal In On-demand Food Delivery ( http://arxiv.org/abs/2410.00057v1 ) ライセンス: Link先を確認	Jiang Wang, Haibin Wei, Xiaowei Xu, Jiacheng Shi, Jian Nie, Longzhi Du, Taixu Jiang,	(参考訳) オンデマンドフードデリバリー(OFD)サービスが世界中で一般的になっている。例えば、Ele.meプラットフォームでは、ユーザーは毎日1500万以上の食品を注文している。リアルタイム圧力信号(RPS)の予測は、主にロジスティクスシステムにおける圧力の現在の状態を測定するために使用されるため、OFDサービスにとって不可欠である。 RPSが上昇すると圧力が上昇し、プラットフォームはロジスティクスシステムが過負荷にならないよう迅速に措置を講じる必要がある。通常、ビジネス地区内の全ての注文に対する平均配達時間は、RSSを表すために使用される。既存のOFDサービスの研究は主に注文の納期予測に重点を置いているが、RPSの研究にはあまり注目されていない。従来の研究では、DeepFM、RNN、GNNといった一般的なモデルを直接適用しているが、OFDサービスの時間的・空間的特性を適切に活用できず、突然の厳しい気象条件やピーク時の感度が不十分な問題に直面している。そこで本研究では,時空間変圧器とメモリネットワーク(STTM)に基づく新しい手法を提案する。具体的には、新しい時空間変換器構造を用いて、時間的・空間的な特徴を学習し、ビジネス地区とその周辺地域の歴史的情報を符号化し、時間的・空間的情報の両方を学ぶ。さらに、異常事象に対する感度を高めるためにメモリネットワークが使用される。実世界のデータセットによる実験結果から,STTMはオフライン実験とオンラインA/Bテストの両方において従来手法よりも有意に優れており,本手法の有効性が示された。 On-demand Food Delivery (OFD) services have become very common around the world. For example, on the Ele.me platform, users place more than 15 million food orders every day. Predicting the Real-time Pressure Signal (RPS) is crucial for OFD services, as it is primarily used to measure the current status of pressure on the logistics system. When RPS rises, the pressure increases, and the platform needs to quickly take measures to prevent the logistics system from being overloaded. Usually, the average delivery time for all orders within a business district is used to represent RPS. Existing research on OFD services primarily focuses on predicting the delivery time of orders, while relatively less attention has been given to the study of the RPS. Previous research directly applies general models such as DeepFM, RNN, and GNN for prediction, but fails to adequately utilize the unique temporal and spatial characteristics of OFD services, and faces issues with insufficient sensitivity during sudden severe weather conditions or peak periods. To address these problems, this paper proposes a new method based on Spatio-Temporal Transformer and Memory Network (STTM). Specifically, we use a novel Spatio-Temporal Transformer structure to learn logistics features across temporal and spatial dimensions and encode the historical information of a business district and its neighbors, thereby learning both temporal and spatial information. Additionally, a Memory Network is employed to increase sensitivity to abnormal events. Experimental results on the real-world dataset show that STTM significantly outperforms previous methods in both offline experiments and the online A/B test, demonstrating the effectiveness of this method.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# IDEA: 逆ドメインエキスパート適応に基づくアクティブDNNIP保護手法 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method ( http://arxiv.org/abs/2410.00059v1 ) ライセンス: Link先を確認	Chaohui Xu, Qi Cui, Jinxin Dong, Weiyang He, Chip-Hong Chang,	(参考訳) ディープニューラルネットワーク(DNN)モデルの非合法的な複製、配布、導出は、経済的損失、評判のダメージ、さらにはプライバシー侵害にも影響を及ぼす可能性がある。透かしや指紋認証などの受動的DNN知的財産権(IP)保護手法は、IP侵害による所有権の証明を試みるが、IPの乱用による破滅的な被害を防ぎ、強敵に対する不利な攻撃を防ぐには遅すぎる。本稿では,逆ドメインエキスパート適応に基づくアクティブなDNNIP保護手法であるIDEAを提案する。 IDEAは、ドメイン適応の逆問題としてアクティブ認証を一般化する。マルチアダプティブ最適化は、1つの実数と2つの偽のエキスパートを混合したエキスパートモデルによって解決される。真の専門家は、ソースモデルを再最適化して、ユニークなモデルのユーザキーをステガノグラフィーに埋め込まれたテストイメージを正しく分類する。偽の専門家は、実際の専門家との相互情報(MI)を最小化することにより、不正なユーザキーを埋め込んだり、あるいは埋め込んだりすることなく、テスト画像に対してランダムな予測を出力するように訓練される。 MoEモデルは、MIを多層的な注意と対照的な表現損失最適化で最大化することにより、専門家モデルの特徴の漏洩を避けるため、統一された保護モデルに抽出された知識である。 IDEAは、機能モデルにアクセスするための有効なキーのない不正なユーザを防ぐだけでなく、モデル所有者がデプロイされたモデルを検証し、IP侵害の原因を追跡できるようにする。 5つのデータセットと4つのDNNモデル上でIDEAを広範囲に評価し、認証制御、原因追跡成功率、各種攻撃に対する堅牢性を実証した。 Illegitimate reproduction, distribution and derivation of Deep Neural Network (DNN) models can inflict economic loss, reputation damage and even privacy infringement. Passive DNN intellectual property (IP) protection methods such as watermarking and fingerprinting attempt to prove the ownership upon IP violation, but they are often too late to stop catastrophic damage of IP abuse and too feeble against strong adversaries. In this paper, we propose IDEA, an Inverse Domain Expert Adaptation based proactive DNN IP protection method featuring active authorization and source traceability. IDEA generalizes active authorization as an inverse problem of domain adaptation. The multi-adaptive optimization is solved by a mixture-of-experts model with one real and two fake experts. The real expert re-optimizes the source model to correctly classify test images with a unique model user key steganographically embedded. The fake experts are trained to output random prediction on test images without or with incorrect user key embedded by minimizing their mutual information (MI) with the real expert. The MoE model is knowledge distilled into a unified protected model to avoid leaking the expert model features by maximizing their MI with additional multi-layer attention and contrastive representation loss optimization. IDEA not only prevents unauthorized users without the valid key to access the functional model, but also enable the model owner to validate the deployed model and trace the source of IP infringement. We extensively evaluate IDEA on five datasets and four DNN models to demonstrate its effectiveness in authorization control, culprit tracing success rate, and robustness against various attacks.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# トラクラ変圧器のニューラルデコンパイル Neural Decompiling of Tracr Transformers ( http://arxiv.org/abs/2410.00061v1 ) ライセンス: Link先を確認	Hannes Thurnherr, Kaspar Riesen,	(参考訳) 近年,パターン認識や機械学習など多くの分野で,トランスフォーマーアーキテクチャが大幅に進歩している。しかしながら、他のニューラルネットワークモデルと同様に、内部動作を説明する一般的な方法は今のところ存在しない。本論文は、この方向への第一歩を示す。 RASP 用 \textit{Transformer Compiler (Tracr) を用いて、トランスフォーマー重みと対応する RASP プログラムのペアの大規模なデータセットを生成する。このデータセットに基づいて、コンパイルされたモデルからRASPコードを復元することを目的として、モデルを構築し、訓練します。本稿では,Tracr コンパイルしたトランスの重み付けの単純な形式が,そのようなデコンパイラモデルに対して解釈可能であることを示す。実験的な評価では,実験対象の30倍以上の精度で再現が可能であり,残りの70倍の精度では誤りが少ない。さらに、我々のモデルによって作成されたプログラムの70%以上は、機能的には基底真理と等価であり、したがってTrcrコンパイルされた変換器重みの有効な逆コンパイルである。 Recently, the transformer architecture has enabled substantial progress in many areas of pattern recognition and machine learning. However, as with other neural network models, there is currently no general method available to explain their inner workings. The present paper represents a first step towards this direction. We utilize \textit{Transformer Compiler for RASP} (Tracr) to generate a large dataset of pairs of transformer weights and corresponding RASP programs. Based on this dataset, we then build and train a model, with the aim of recovering the RASP code from the compiled model. We demonstrate that the simple form of Tracr compiled transformer weights is interpretable for such a decompiler model. In an empirical evaluation, our model achieves exact reproductions on more than 30\% of the test objects, while the remaining 70\% can generally be reproduced with only few errors. Additionally, more than 70\% of the programs, produced by our model, are functionally equivalent to the ground truth, and therefore a valid decompilation of the Tracr compiled transformer weights.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# 先進的CNNモデルを用いたパンプキン工場の病状自動診断 Automated Disease Diagnosis in Pumpkin Plants Using Advanced CNN Models ( http://arxiv.org/abs/2410.00062v1 ) ライセンス: Link先を確認	Aymane Khaldi, El Mostafa Kalmoun,	(参考訳) パンプキンは世界中で栽培される重要な作物であり、その生産力は特に発展途上国において食糧安全保障に不可欠である。カボチャ葉病の正確なかつタイムリーな検出は、収量と品質の大幅な損失を軽減するために不可欠である。従来の病気の同定法は、農家や専門家による主観的判断に大きく依存しており、非効率性や介入機会の欠如につながる可能性がある。機械学習とディープラーニングの最近の進歩は、植物病検出の精度を自動化し改善するための有望なソリューションを提供する。本稿では,カボチャ葉の病原体分類のための最先端の畳み込みニューラルネットワーク(CNN)モデルについて包括的解析を行った。 ResNet, DenseNet, EfficientNetを含む2000の高解像度画像の公開データセットを用いて, 健康な葉と4つの一般的な病原体, 粉末性ミドルウ, モザイク病, 細菌性葉点の認識におけるCNNアーキテクチャの性能を評価する。我々は、これらの事前訓練されたモデルを微調整し、ハイパーパラメータ最適化実験を行った。 ResNet-34, DenseNet-121, EfficientNet-B7は, 葉の疾患の分類に優れ, 高い性能を示した。解析の結果,DenseNet-121は精度と計算量の両方を考慮し,全体の精度は86%であった。本研究は, カボチャ病の診断を自動化し, 農業生産性の向上と経済損失の最小化に寄与する貴重な知見を提供する。 Pumpkin is a vital crop cultivated globally, and its productivity is crucial for food security, especially in developing regions. Accurate and timely detection of pumpkin leaf diseases is essential to mitigate significant losses in yield and quality. Traditional methods of disease identification rely heavily on subjective judgment by farmers or experts, which can lead to inefficiencies and missed opportunities for intervention. Recent advancements in machine learning and deep learning offer promising solutions for automating and improving the accuracy of plant disease detection. This paper presents a comprehensive analysis of state-of-the-art Convolutional Neural Network (CNN) models for classifying diseases in pumpkin plant leaves. Using a publicly available dataset of 2000 highresolution images, we evaluate the performance of several CNN architectures, including ResNet, DenseNet, and EfficientNet, in recognizing five classes: healthy leaves and four common diseases downy mildew, powdery mildew, mosaic disease, and bacterial leaf spot. We fine-tuned these pretrained models and conducted hyperparameter optimization experiments. ResNet-34, DenseNet-121, and EfficientNet-B7 were identified as top-performing models, each excelling in different classes of leaf diseases. Our analysis revealed DenseNet-121 as the optimal model when considering both accuracy and computational complexity achieving an overall accuracy of 86%. This study underscores the potential of CNNs in automating disease diagnosis for pumpkin plants, offering valuable insights that can contribute to enhancing agricultural productivity and minimizing economic losses.	翻訳日:2024-11-05 15:09:43 公開日:2024-09-29
# 極小ソフトウェア開発分野におけるCI/CD導入と適応 : 体系的文献レビュー Adoption and Adaptation of CI/CD Practices in Very Small Software Development Entities: A Systematic Literature Review ( http://arxiv.org/abs/2410.00623v1 ) ライセンス: Link先を確認	Mario Ccallo, Alex Quispe-Quispe,	(参考訳) 本研究は,ソフトウェア開発における極小エンティティ(VSE)における継続的インテグレーションと継続的デリバリ(CI/CD)の実践について,系統的な文献レビューを行った。この研究は、一般的なCI/CDプラクティスを特定し、VSEの特定の制限を特徴づけ、これらのプラクティスを小規模環境に適用するための戦略を探求する13の研究を分析している。リソース制約と複雑なツールエコシステムのため、VSEはCI/CDの実装において重大な課題に直面している。しかし、JenkinsやDockerのようなアクセス可能なツールの採用と、マイクロパイプラインのプラクティス、ISO 29110のような単純化されたフレームワークは、これらの課題に効果的に対処できる。調査では、マイクロサービスアーキテクチャの採用の増加傾向と、VSE固有のニーズに合わせてCI/CDプロセスを調整することの重要性を強調している。この研究は、限られたリソースにもかかわらず、小さなソフトウェアエンティティがCI/CDプラクティスをどのように活用して、競争力とソフトウェア品質を高めるかを理解するのに役立ちます。 This study presents a systematic literature review on the adoption of Continuous Integration and Continuous Delivery (CI/CD) practices in Very Small Entities (VSEs) in software development. The research analyzes 13 selected studies to identify common CI/CD practices, characterize the specific limitations of VSEs, and explore strategies for adapting these practices to small-scale environments. The findings reveal that VSEs face significant challenges in implementing CI/CD due to resource constraints and complex tool ecosystems. However, the adoption of accessible tools like Jenkins and Docker, coupled with micro-pipeline practices and simplified frameworks such as ISO 29110, can effectively address these challenges. The study highlights the growing trend of microservices architecture adoption and the importance of tailoring CI/CD processes to VSE-specific needs. This research contributes to the understanding of how small software entities can leverage CI/CD practices to enhance their competitiveness and software quality, despite limited resources.	翻訳日:2024-11-05 04:35:05 公開日:2024-09-29
# AIの顔におけるヒューマンバイアス:AI生成テキスト評価における人間の判断の役割 Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation ( http://arxiv.org/abs/2410.03723v1 ) ライセンス: Link先を確認	Tiffany Zhu, Iain Weissburg, Kexun Zhang, William Yang Wang,	(参考訳) AIがテキスト生成に進歩するにつれて、AIが生成したコンテンツに対する人間の信頼は、正確性の懸念を超えてバイアスによって制限される。本研究では、偏見がAIと人為的コンテンツの知覚をどう形成するかを考察する。テキスト・リフレッシング, ニュース記事要約, 説得文の3つの実験を通じて, ラベル付き・未ラベルのコンテンツに対して, 人間がどう反応するかを検討した。ブラインドテストでは2種類のテキストを区別できなかったが、「AI生成」とラベル付けされたコンテンツよりも「Human Generated」とラベル付けされたコンテンツを30%以上の優先スコアで圧倒的に好んだ。ラベルが意図的に交換された場合でも,同じパターンが観察された。このAIに対する人間の偏見は、AIのパフォーマンスを過小評価するため、より広い社会的・認知的意味を持つ。この研究は、AIと対話する際の人間の判断の限界を強調し、特に創造的な分野において、人間とAIのコラボレーションを改善する基盤を提供する。 As AI advances in text generation, human trust in AI generated content remains constrained by biases that go beyond concerns of accuracy. This study explores how bias shapes the perception of AI versus human generated content. Through three experiments involving text rephrasing, news article summarization, and persuasive writing, we investigated how human raters respond to labeled and unlabeled content. While the raters could not differentiate the two types of texts in the blind test, they overwhelmingly favored content labeled as "Human Generated," over those labeled "AI Generated," by a preference score of over 30%. We observed the same pattern even when the labels were deliberately swapped. This human bias against AI has broader societal and cognitive implications, as it undervalues AI performance. This study highlights the limitations of human judgment in interacting with AI and offers a foundation for improving human-AI collaboration, especially in creative fields.	翻訳日:2024-11-02 20:28:28 公開日:2024-09-29
# 言語モデルとBoXHEDを用いたリアルタイムマルチモーダル換気リスクモニタリング Realtime, multimodal invasive ventilation risk monitoring using language models and BoXHED ( http://arxiv.org/abs/2410.03725v1 ) ライセンス: Link先を確認	Arash Pakbin, Aaron Su, Donald K. K. Lee, Bobak J. Mortazavi,	(参考訳) 目的:集中治療室(ICU)における侵入換気(iV)のリアルタイムモニタリングは,迅速な介入の確保と患者の予後向上に重要な役割を担っている。しかし、従来の手法は、表形式のデータにのみ依存して、臨床ノートに埋め込まれた貴重な洞察を見落としていることが多い。本研究では,テキスト要約のための言語モデルを用いて,臨床ノートをモニタリングパイプラインに組み込むことにより,iVリスクモニタリングを強化する革新的なアプローチを提案する。結果:iVリスクモニタリングにおける最先端の指標,すなわちAUROCが0.86、AUC-PRが0.35、AUCtが0.86である。また、当社の手法は、特定の時間バケットにiVをフラグ付ける際に、よりリードタイムを増大させることができることも示しています。結論:本研究は,臨床ノートと言語モデルをリアルタイムのiVリスクモニタリングに統合し,ICU設定における患者ケアの改善と臨床意思決定への道を開く可能性を強調した。 Objective: realtime monitoring of invasive ventilation (iV) in intensive care units (ICUs) plays a crucial role in ensuring prompt interventions and better patient outcomes. However, conventional methods often overlook valuable insights embedded within clinical notes, relying solely on tabular data. In this study, we propose an innovative approach to enhance iV risk monitoring by incorporating clinical notes into the monitoring pipeline through using language models for text summarization. Results: We achieve superior performance in all metrics reported by the state-of-the-art in iV risk monitoring, namely: an AUROC of 0.86, an AUC-PR of 0.35, and an AUCt of up to 0.86. We also demonstrate that our methodology allows for more lead time in flagging iV for certain time buckets. Conclusion: Our study underscores the potential of integrating clinical notes and language models into realtime iV risk monitoring, paving the way for improved patient care and informed clinical decision-making in ICU settings.	翻訳日:2024-11-02 20:28:28 公開日:2024-09-29
# 人工精神画像のカオス投影による鎮痛装置の心理測定 Psychometrics for Hypnopaedia-Aware Machinery via Chaotic Projection of Artificial Mental Imagery ( http://arxiv.org/abs/2410.05284v1 ) ライセンス: Link先を確認	Ching-Chun Chang, Kai Gao, Shuying Xu, Anastasia Kordoni, Christopher Leckie, Isao Echizen,	(参考訳) ニューラルネットワークのバックドアは、不正な操作に対して学習機械を脆弱にし、破滅的な結果をもたらす人工知能の兵器化を可能にする、汚いサイバーセキュリティの抜け穴を表現している。バックドア攻撃(バックドア攻撃)とは、学習過程中に引き金が潜入する現象であり、催眠あるいは無意識の状態下で、アイデアが被験者の潜在意識に埋め込まれる、催眠療法と比喩的に類似している。感覚刺激によって起動されると、トリガーは機械に所定の応答をマウントするように指示する条件付き反射を誘発する。本研究では,信頼できないデータソースの動的性質により,バックドアの脅威を継続的に監視するためのサイバーネティックな枠組みを提案する。バックドアトリガからマシンの動作を自律的に切り離すための,自己認識型アンラーニング機構を開発した。逆エンジニアリングと統計的推測により, 偽装パターンを検出し, バックドア感染の可能性を推定する。我々は、確率過程を用いて、最適化経路を妨害し、収束するが、潜在的な欠陥のあるパターンを避けるために、モデルインバージョンを用いて、人工的なメンタルイメージを引き出す。仮説分析では、感染の可能性を推定し、感染の可能性を推定する。本研究の主な目的は,知識忠実度とバックドア脆弱性の平衡状態を維持することである。 Neural backdoors represent insidious cybersecurity loopholes that render learning machinery vulnerable to unauthorised manipulations, potentially enabling the weaponisation of artificial intelligence with catastrophic consequences. A backdoor attack involves the clandestine infiltration of a trigger during the learning process, metaphorically analogous to hypnopaedia, where ideas are implanted into a subject's subconscious mind under the state of hypnosis or unconsciousness. When activated by a sensory stimulus, the trigger evokes conditioned reflex that directs a machine to mount a predetermined response. In this study, we propose a cybernetic framework for constant surveillance of backdoors threats, driven by the dynamic nature of untrustworthy data sources. We develop a self-aware unlearning mechanism to autonomously detach a machine's behaviour from the backdoor trigger. Through reverse engineering and statistical inference, we detect deceptive patterns and estimate the likelihood of backdoor infection. We employ model inversion to elicit artificial mental imagery, using stochastic processes to disrupt optimisation pathways and avoid convergent but potentially flawed patterns. This is followed by hypothesis analysis, which estimates the likelihood of each potentially malicious pattern being the true trigger and infers the probability of infection. The primary objective of this study is to maintain a stable state of equilibrium between knowledge fidelity and backdoor vulnerability.	翻訳日:2024-11-01 19:47:38 公開日:2024-09-29
# 株価予測と伝統的モデル:短期・中期・長期目標達成へのアプローチ Stock Price Prediction and Traditional Models: An Approach to Achieve Short-, Medium- and Long-Term Goals ( http://arxiv.org/abs/2410.07220v1 ) ライセンス: Link先を確認	Opeyemi Sheu Alamu, Md Kamrul Siam,	(参考訳) 在庫価格予測のためのディープラーニングモデルと従来の統計手法の比較分析は、ナイジェリア証券取引所のデータを用いている。日替わり価格や取引量を含む歴史的データは、Long Short Term Memory(LSTM)ネットワーク、Gated Recurrent Units(GRU)、Autoregressive Integrated Average(ARIMA)、Autoregressive moving Average(ARMA)などのモデルを実装するために使用される。これらのモデルは、短期(1年)、中期(2.5年)、長期(5年)の3段階にわたって評価され、平均正方形誤差(MSE)と平均絶対誤差(MAE)によって測定される。時系列の安定性をADF(Augmented Dickey-Fuller)テストを用いて検証する。その結果、深層学習モデル、特にLSTMは、データの複雑な非線形パターンをキャプチャすることで従来の手法よりも優れており、より正確な予測結果が得られることがわかった。しかし、これらのモデルは計算資源を多く必要としており、従来の手法よりも解釈可能性が少ない。この結果は、金融予測と投資戦略を改善するための深層学習の可能性を強調している。今後の研究は、ソーシャルメディアの感情や経済指標、モデルアーキテクチャの洗練、予測精度とスケーラビリティを高めるためのリアルタイムアプリケーション探索などの外部要因を取り入れる可能性がある。 A comparative analysis of deep learning models and traditional statistical methods for stock price prediction uses data from the Nigerian stock exchange. Historical data, including daily prices and trading volumes, are employed to implement models such as Long Short Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Autoregressive Integrated Moving Average (ARIMA), and Autoregressive Moving Average (ARMA). These models are assessed over three-time horizons: short-term (1 year), medium-term (2.5 years), and long-term (5 years), with performance measured by Mean Squared Error (MSE) and Mean Absolute Error (MAE). The stability of the time series is tested using the Augmented Dickey-Fuller (ADF) test. Results reveal that deep learning models, particularly LSTM, outperform traditional methods by capturing complex, nonlinear patterns in the data, resulting in more accurate predictions. However, these models require greater computational resources and offer less interpretability than traditional approaches. The findings highlight the potential of deep learning for improving financial forecasting and investment strategies. Future research could incorporate external factors such as social media sentiment and economic indicators, refine model architectures, and explore real-time applications to enhance prediction accuracy and scalability.	翻訳日:2024-10-31 21:37:02 公開日:2024-09-29
# 効率的なフェデレーションエッジ学習のためのオンラインクライアントスケジューリングとリソース割り当て Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning ( http://arxiv.org/abs/2410.10833v1 ) ライセンス: Link先を確認	Zhidong Gao, Zhenxiao Zhang, Yu Zhang, Tongnian Wang, Yanmin Gong, Yuanxiong Guo,	(参考訳) フェデレートラーニング(FL)は、エッジデバイスが生データを共有せずに、機械学習モデルを協調的にトレーニングすることを可能にする。プライバシー保護の利点のため、FLは多くの現実世界のアプリケーションにデプロイされている。しかし、電力、帯域幅、計算などの制約のあるリソースを持つ移動エッジネットワーク上にFLをデプロイすることは、特にデータとシステムの不均一性の下で、高いトレーニング遅延と低いモデルの精度に悩まされる。本稿では,モデルの精度を維持しつつ,トレーニング遅延を最小限に抑えるために,資源制約と不確実性の下で,モバイルエッジネットワーク上でのFLの最適クライアントスケジューリングとリソース割り当てについて検討する。具体的には、クライアントサンプリングがFLにおけるモデル収束に与える影響を解析し、不均一なシステムリソースと不確実なシステムリソースの下でのランニング時間とモデル性能のトレードオフを捉える確率的最適化問題を定式化する。そこで本稿では,Lyapunovをベースとしたクライアントサンプリングとリソース割り当ての最適化に基づくオンライン制御方式を,FLシステムにおける将来のダイナミクスの知識を必要とせずにさらに発展させる。大規模な実験結果から,提案手法は既存の方式と比較してトレーニングの待ち時間と資源効率を両立させることができることが示された。 Federated learning (FL) enables edge devices to collaboratively train a machine learning model without sharing their raw data. Due to its privacy-protecting benefits, FL has been deployed in many real-world applications. However, deploying FL over mobile edge networks with constrained resources such as power, bandwidth, and computation suffers from high training latency and low model accuracy, particularly under data and system heterogeneity. In this paper, we investigate the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty to minimize the training latency while maintaining the model accuracy. Specifically, we first analyze the impact of client sampling on model convergence in FL and formulate a stochastic optimization problem that captures the trade-off between the running time and model performance under heterogeneous and uncertain system resources. To solve the formulated problem, we further develop an online control scheme based on Lyapunov-based optimization for client sampling and resource allocation without requiring the knowledge of future dynamics in the FL system. Extensive experimental results demonstrate that the proposed scheme can improve both the training latency and resource efficiency compared with the existing schemes.	翻訳日:2024-10-29 19:24:58 公開日:2024-09-29

Title

Authors

Abstract

論文公表日・翻訳日

# 定量的モラル基礎を用いたソーシャルメディアにおけるスタンス分類の強化

Enhancing Stance Classification on Social Media Using Quantified Moral Foundations ( http://arxiv.org/abs/2310.09848v2 )

ライセンス: Link先を確認

Hong Zhang, Prasanta Bhattacharya, Wei Gao, Liang Ze Wong, Brandon Siyuan Loh, Joseph J. P. Simons, Jisun An,

(参考訳) 本研究は、より深い心理的属性、特に個人の道徳的基盤を組み込むことにより、ソーシャルメディアにおけるスタンス検出を強化する。これらの理論的に派生した次元は、社会、政治、健康、環境など、様々な領域における行動と結びついている個人の道徳的関心の包括的プロファイルを提供することを目的としている。本稿では,道徳的基盤の次元が,特定の目標に対する個人の姿勢を予測するのにどのように貢献するかを検討する。具体的には、従来の機械学習モデルと大規模言語モデルの両方を用いて、テキストから抽出された道徳的基礎機能と、メッセージセマンティック機能を組み合わせて、メッセージレベルとユーザレベルの両方のスタンスを分類する。予備的な結果は、モラル基礎の符号化は、スタンス検出タスクのパフォーマンスを高め、特定のモラル基礎とターゲットトピックに対するオンラインスタンスとの関係を照らすのに役立つことを示唆している。この結果は、スタンス分析における深い心理的属性を考慮することの重要性を強調し、オンライン社会行動の指導における道徳的基礎の役割を浮き彫りにするものである。

This study enhances stance detection on social media by incorporating deeper psychological attributes, specifically individuals' moral foundations. These theoretically-derived dimensions aim to provide a comprehensive profile of an individual's moral concerns which, in recent work, has been linked to behaviour in a range of domains, including society, politics, health, and the environment. In this paper, we investigate how moral foundation dimensions can contribute to predicting an individual's stance on a given target. Specifically we incorporate moral foundation features extracted from text, along with message semantic features, to classify stances at both message- and user-levels using both traditional machine learning models and large language models. Our preliminary results suggest that encoding moral foundations can enhance the performance of stance detection tasks and help illuminate the associations between specific moral foundations and online stances on target topics. The results highlight the importance of considering deeper psychological attributes in stance analysis and underscores the role of moral foundations in guiding online social behavior.

翻訳日:2024-11-09 10:01:09 公開日:2024-09-29

# RSTAR4D: 分離4次元CNNを用いた4次元CBCTの回転ストリークアーティファクト低減

RSTAR4D: Rotational Streak Artifact Reduction in 4D CBCT using a Separable 4D CNN ( http://arxiv.org/abs/2403.16361v3 )

ライセンス: Link先を確認

Ziheng Deng, Hua Chen, Yongzheng Zhou, Haibo Hu, Zhiyong Xu, Jiayuan Sun, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao,

(参考訳) 4次元コーンビームCT(4D CBCT)は呼吸分解画像を提供し、放射線治療に用いられる。しかし、呼吸運動を明らかにする能力は、イメージアーティファクトのコストがかかる。生のプロジェクションデータを複数の呼吸相に分類すると、コーンビームプロジェクションはよりスペーサーになり、再構成された4D CBCT画像は厳しいストリークアーティファクトで被覆される。この問題に対処するためにいくつかのディープラーニングベースの手法が提案されているが、ほとんどのアルゴリズムは2Dネットワークモデルをバックボーンとして採用しており、4D CBCT画像内の固有の構造的先行性を無視している。本稿では,まず4次元CBCT画像におけるストリークアーティファクトの起源と外観について検討する。時空間領域における横隔膜駆動呼吸運動と区別し, ストリークアーティファクトは呼吸とともに独特の回転運動を示すことがわかった。そこで本研究では、4次元CBCT画像に空間情報と時間情報を統合することにより、回転STreakアーチファクト削減に対処する新しい4次元ニューラルネットワークモデルRSTAR4D-Netを提案する。具体的には、4Dニューラルネットワークの計算とトレーニングの難しさを克服する。特別に設計されたモデルは、4D畳み込みの効率的な実装を採用し、計算コストを削減し、4D画像全体を1パスで処理することができる。さらに,分離可能な4Dコンボリューションに関連するテトリストレーニング戦略を提案し,限られた4Dトレーニングサンプルを用いてモデルを効果的にトレーニングする。大規模な実験により提案手法の有効性が実証され,RSTAR4D-Netは他の手法と比較して優れた性能を示した。ソースコードと動的デモはhttps://github.com/ivy90921111/RSTARで公開されている。

Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ 2D network models as backbones, neglecting the intrinsic structural priors within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images. We find that streak artifacts exhibit a unique rotational motion along with the patient's respiration, distinguishable from diaphragm-driven respiratory motion in the spatiotemporal domain. Therefore, we propose a novel 4D neural network model, RSTAR4D-Net, designed to address Rotational STreak Artifact Reduction by integrating the spatial and temporal information within 4D CBCT images. Specifically, we overcome the computational and training difficulties of a 4D neural network. The specially designed model adopts an efficient implementation of 4D convolutions to reduce computational costs and thus can process the whole 4D image in one pass. Additionally, a Tetris training strategy pertinent to the separable 4D convolutions is proposed to effectively train the model using limited 4D training samples. Extensive experiments substantiate the effectiveness of our proposed method, and the RSTAR4D-Net shows superior performance compared to other methods. The source code and dynamic demos are available at https://github.com/ivy9092111111/RSTAR.

翻訳日:2024-11-09 03:48:22 公開日:2024-09-29

# 学習モダリティによる非同期マルチモーダルビデオシーケンスフュージョン-排他的および非依存的表現

Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations ( http://arxiv.org/abs/2407.04955v2 )

ライセンス: Link先を確認

Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang,

(参考訳) ビデオから人間の意図(例えば感情)を理解することは、最近かなりの注目を集めている。ビデオストリームは一般的に、自然言語、表情、聴覚的手がかりなど、異なるモーダル性に由来する時間データのブレンドを構成する。注意に基づくパラダイムによる先行研究の顕著な進歩にもかかわらず、本質的に時間的非同期性と不均一性の課題はマルチモーダルシーケンスの融合に残っており、パフォーマンスのボトルネックの原因となっている。これらの課題に対処するために,モーダリティ学習のためのマルチモーダル融合手法を提案する。一方、MEAは、モダリティ内の信頼性のあるコンテキストダイナミクスを捕捉し、モダリティ排他的空間上のユニークな特徴を補強する予測自己アテンションモジュールを導入している。一方、階層的クロスモーダルアテンションモジュールは、モダリティ-非依存空間上のモダリティ間の価値ある要素相関を探索するために設計されている。一方、異なる表現の対角的な生成を保証するために、二重識別器戦略が提示される。最終的に、不均一なモダリティ間の知識交換を強化し、下流タスクの堅牢なマルチモーダル表現を学習する疎結合グラフ融合機構を提案する。非同期シーケンスを持つ3つのマルチモーダルデータセット上で多数の実験を行う。システム分析は我々のアプローチの必要性を示している。

Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality heterogeneity challenges remain in multimodal sequence fusion, causing adverse performance bottlenecks. To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities. On the one hand, MEA introduces a predictive self-attention module to capture reliable context dynamics within modalities and reinforce unique features over the modality-exclusive spaces. On the other hand, a hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities over the modality-agnostic space. Meanwhile, a double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner. Eventually, we propose a decoupled graph fusion mechanism to enhance knowledge exchange across heterogeneous modalities and learn robust multimodal representations for downstream tasks. Numerous experiments are implemented on three multimodal datasets with asynchronous sequences. Systematic analyses show the necessity of our approach.

翻訳日:2024-11-08 23:35:45 公開日:2024-09-29

# CPM:音声視覚分割のためのクラス条件プロンプティングマシン

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation ( http://arxiv.org/abs/2407.05358v3 )

ライセンス: Link先を確認

Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro,

(参考訳) オーディオ・ビジュアル・セグメンテーション (AVS) は、オーディオ・ビジュアル・キューに基づいた音質オブジェクトを正確にセグメンテーションすることを目的とした新しいタスクである。 AVS学習システムの成功は、モーダル間相互作用の有効性に依存する。このような要求は、トランスフォーマーベースのセグメンテーションアーキテクチャを活用することで自然に達成できる。しかし,AVSでは,特に学習された音声クエリが明確な意味的手がかりを提供していない場合,クロスアテンションの有効性の低下や不安定なバイパーティイトマッチングなどのトランスフォーマーベースの手法の固有のトレーニング問題を増幅することができる。本稿では,これら2つの問題を,CPM(Class-conditional Prompting Machine)を用いて解決する。 CPMは、クラスに依存しないクエリとクラス条件のクエリを組み合わせた学習戦略により、バイパーティイトマッチングを改善している。クロスモーダルアテンションの有効性は,音声・視覚・関節モダリティの新しい学習目標によって向上する。我々はAVSベンチマーク実験を行い、その手法がSOTA(State-of-the-art)セグメンテーションの精度を実現することを示す。

Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.

翻訳日:2024-11-08 23:24:33 公開日:2024-09-29

# 語彙データのための木を解き放つ

Unmasking Trees for Tabular Data ( http://arxiv.org/abs/2407.05593v2 )

ライセンス: Link先を確認

Calvin McCarter,

(参考訳) 表形式のデータ生成と計算のための高度なディープラーニングと生成モデリング技術に関する研究にもかかわらず、従来の手法は計算ベンチマークで勝利し続けている。本稿では,個々の特徴を段階的に解き放つために,勾配型決定木を用いた表計算(および生成)の簡単な方法であるUnmaskingTreesを提案する。このアプローチは、命令処理における最先端のパフォーマンスと、不足したトレーニングデータを生成時に提供し、バニラ生成に対する競合的なパフォーマンスを提供する。条件生成サブプロブレムを解決するために,木分類器のバランス木に適合する表型確率予測手法BaltoBotを提案する。従来の方法とは異なり、条件分布のパラメトリックな仮定は不要であり、より新しい拡散法とは異なり、高速サンプリング、クローズドフォーム密度推定、離散変数のフレキシブルハンドリングを提供する。我々はついに2つのアプローチをメタアルゴリズムとみなし、TabPFNを用いた文脈内学習に基づく生成モデリングを実証した。

Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. This approach offers state-of-the-art performance on imputation, and on generation given training data with missingness; and it has competitive performance on vanilla generation. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.

翻訳日:2024-11-08 23:24:33 公開日:2024-09-29

# 語彙データのための木を解き放つ

Unmasking Trees for Tabular Data ( http://arxiv.org/abs/2407.05593v3 )

ライセンス: Link先を確認

Calvin McCarter,

翻訳日:2024-11-08 23:24:33 公開日:2024-09-29

# RadiomicsFill-Mammo: 合成マンモグラムマスマニピュレーションと放射能特性

RadiomicsFill-Mammo: Synthetic Mammogram Mass Manipulation with Radiomics Features ( http://arxiv.org/abs/2407.05683v2 )

ライセンス: Link先を確認

Inye Na, Jonghun Kim, Eun Sook Ko, Hyunjin Park,

(参考訳) 所望の属性を持つ腫瘍を生成するか?」という質問に動機づけられたこの研究は、放射能の特徴を活用して、合成腫瘍画像の作成の可能性を探る。低次元で生物学的に意味のあるマーカーによって特徴づけられる放射能は、複雑な医用画像データと実行可能な臨床所見のギャップを埋める。われわれはRadiomicsFillシリーズの第1弾であるRadiomicsFill-Mammoを紹介した。これは、マスク画像と反対乳房画像を用いて特定の放射能特性を反映したリアルなマンモグラムマス画像を生成する革新的な技術であり、最近の安定拡散モデルを利用している。このアプローチはまた、BI-RADSや乳房密度などの重要な臨床変数を、大量発生の条件として放射能の特徴とともに組み込むことも可能である。その結果,RadiomicsFill-Mammoは様々な放射線条件に基づいて,多彩で現実的な腫瘍像を効果的に生成できることが示唆された。また,RadiomicsFill-Mammoを模擬サンプル生成戦略として活用し,質量検出能力の大幅な向上を図った。さらに、RadiomicsFill-Mammoは、医療画像研究の進展だけでなく、治療計画と腫瘍シミュレーションの強化のための新たな道を開く。私たちのコードはhttps://github.com/nainye/RadiomicsFill.comから入手可能です。

Motivated by the question, "Can we generate tumors with desired attributes?'' this study leverages radiomics features to explore the feasibility of generating synthetic tumor images. Characterized by its low-dimensional yet biologically meaningful markers, radiomics bridges the gap between complex medical imaging data and actionable clinical insights. We present RadiomicsFill-Mammo, the first of the RadiomicsFill series, an innovative technique that generates realistic mammogram mass images mirroring specific radiomics attributes using masked images and opposite breast images, leveraging a recent stable diffusion model. This approach also allows for the incorporation of essential clinical variables, such as BI-RADS and breast density, alongside radiomics features as conditions for mass generation. Results indicate that RadiomicsFill-Mammo effectively generates diverse and realistic tumor images based on various radiomics conditions. Results also demonstrate a significant improvement in mass detection capabilities, leveraging RadiomicsFill-Mammo as a strategy to generate simulated samples. Furthermore, RadiomicsFill-Mammo not only advances medical imaging research but also opens new avenues for enhancing treatment planning and tumor simulation. Our code is available at https://github.com/nainye/RadiomicsFill.

翻訳日:2024-11-08 23:24:33 公開日:2024-09-29

# 大言語モデルリコール不確かさはファン効果によって変調される

Large Language Model Recall Uncertainty is Modulated by the Fan Effect ( http://arxiv.org/abs/2407.06349v2 )

ライセンス: Link先を確認

Jesse Roberts, Kyle Moore, Thao Pham, Oseremhen Ewaleifoh, Doug Fisher,

(参考訳) 本稿では,人間のテキストデータを用いて事前学習した後,大きな言語モデル(LLM)が,アンダーソンがヒトで発見したものと同様の認知ファン効果を示すか否かを評価する。ファン効果を誘発する2組のコンテキスト内リコール実験を行う。また, LLMリコールの不確実性は, トークンの確率によって測定され, ファン効果に影響されていることがわかった。以上の結果から,不確実性除去が観察効果を阻害することが明らかとなった。実験により、ファン効果は、ファン値が文脈内で誘導されるか、事前学習データ内で誘導されるかの一致が示唆された。最後に、これらの発見はファン効果と典型性が同じ現象の表現であることを示す。

This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan effect. Our results show that removing uncertainty disrupts the observed effect. The experiments suggest the fan effect is consistent whether the fan value is induced in-context or in the pre-training data. Finally, these findings provide in-silico evidence that fan effects and typicality are expressions of the same phenomena.

翻訳日:2024-11-08 23:13:33 公開日:2024-09-29

# 多エージェントシステムのためのモデルベース強化学習を用いたグラフニューラルネットワーク

Graph Neural Networks with Model-based Reinforcement Learning for Multi-agent Systems ( http://arxiv.org/abs/2407.09249v2 )

ライセンス: Link先を確認

Hanxiao Chen,

(参考訳) マルチエージェントシステム(MAS)は、マシンインテリジェンスと高度な応用を探索する上で重要な役割を果たしている。モデルベース強化学習を用いた状態空間グラフニューラルネットワークを用いて,MASのミッション(例えばビリヤード回避,自律走行車)に対処する。具体的には,まずGNNモデルを用いて,複数のエージェントの将来の状態や軌道を予測し,次にCEM(Cross-Entropy Method)最適化モデル予測制御を適用して,エゴエージェント計画動作を支援し,特定のMASタスクの達成に成功した。

Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.

翻訳日:2024-11-08 22:06:29 公開日:2024-09-29

# 視覚プロンプティングによるセンサデータを用いたマルチモーダル大言語モデルの構築

By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting ( http://arxiv.org/abs/2407.10385v2 )

ライセンス: Link先を確認

Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong, Kimin Lee, Sung-Ju Lee,

(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる例外的な能力を示している。しかし,LLMをユビキタスセンシングアプリケーションに利用することは,従来のテキストプロンプト手法が長いセンサデータシーケンスを扱う場合,大幅な性能劣化を示すため,依然として困難である。マルチモーダルLSM(MLLM)を用いたセンサデータに対する視覚的プロンプト手法を提案する。我々は,MLLMの視覚的プロンプトを設計し,ターゲットの知覚タスク記述と並行して可視化されたセンサデータを活用する。さらに、与えられた感覚タスクに合わせて最適な可視化を作成することを自動化する可視化生成装置を導入し、タスク固有の事前知識の必要性を解消する。我々は,4つの感覚モーダルを含む9つの感覚タスクに対するアプローチを評価し,テキストベースのプロンプトよりも平均10%高い精度を実現し,トークンコストを15.8倍削減した。 MLLMによる視覚刺激の有効性と費用対効果について検討した。ソースコードはhttps://github.com/diamond264/ByMyEyes.comで入手できる。

Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8 times. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks. The source code is available at https://github.com/diamond264/ByMyEyes.

翻訳日:2024-11-08 21:43:45 公開日:2024-09-29

# Splatfacto-W: 制約のない写真集のためのガウススプラッティングのNerfstudio実装

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections ( http://arxiv.org/abs/2407.12306v2 )

ライセンス: Link先を確認

Congrong Xu, Justin Kerr, Angjoo Kanazawa,

(参考訳) 未制約画像からの新たなビュー合成は、正確なシーン再構成を複雑にする光度変化と過渡的オクローダのために、重要な課題でありながら難しい課題である。従来の手法では,Neural Radiance Fields (NeRFs) に画像単位の外観特徴を組み込むことで,これらの問題にアプローチしている。 3D Gaussian Splatting (3DGS)は、高速なトレーニングとリアルタイムレンダリングを提供するが、制約のない画像コレクションに適応することは、アーキテクチャがかなり異なるため、簡単ではない。本稿では,ガウス色ごとのニューラルカラー特徴と画像ごとの外観をラスタライズプロセスに組み込むアプローチであるSplatfacto-Wを紹介する。我々の重要な貢献は、潜時外見モデリング、効率的な過渡的オブジェクトハンドリング、正確な背景モデリングである。 Splatfacto-Wは高品質でリアルタイムな新しいビュー合成を提供する。提案手法は,3DGSに比べて平均5.3dBのPak Signal-to-Noise Ratio(PSNR)を向上し,NeRF法に比べて150倍のトレーニング速度を向上し,3DGSと同様のレンダリング速度を実現する。 Nerfstudioに統合された追加のビデオ結果とコードはhttps://kevinxu02.github.io/splatfactow/.comで公開されている。

Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. Additional video results and code integrated into Nerfstudio are available at https://kevinxu02.github.io/splatfactow/.

翻訳日:2024-11-08 20:48:00 公開日:2024-09-29

# チップ設計における強化学習に基づくマクロ細胞の非重複配置

Non-Overlapping Placement of Macro Cells based on Reinforcement Learning in Chip Design ( http://arxiv.org/abs/2407.18499v3 )

ライセンス: Link先を確認

Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan,

(参考訳) チップ設計の複雑さが増大しているため、既存の配置法では、マクロセルのカバレッジと最適化効率に多くの欠点がある。本稿では,既存のチップ設計手法におけるレイアウトの重複,性能の低下,最適化効率の低下といった問題に着目し,強化学習に基づくエンドツーエンド配置手法SRLPlacerを提案する。まず、配置問題をマクロセル間の結合関係グラフモデルを確立することによりマルコフ決定プロセスに変換し、レイアウトの最適化戦略を学ぶ。第2に、標準セルレイアウトを統合した後、配置プロセス全体を最適化する。提案するSRLPlacerは,一般ベンチマークISPD2005に基づいて,混雑のルーティングやワイヤ長さの短縮を考慮しつつ,マクロセル間の重なり合う問題を効果的に解くことができる。コードはhttps://github.com/shuyusd/SRLPlacer.comで公開されている。

Due to the increasing complexity of chip design, existing placement methods still have many shortcomings in dealing with macro cells coverage and optimization efficiency. Aiming at the problems of layout overlap, inferior performance, and low optimization efficiency in existing chip design methods, this paper proposes an end-to-end placement method, SRLPlacer, based on reinforcement learning. First, the placement problem is transformed into a Markov decision process by establishing the coupling relationship graph model between macro cells to learn the strategy for optimizing layouts. Secondly, the whole placement process is optimized after integrating the standard cell layout. By assessing on the public benchmark ISPD2005, the proposed SRLPlacer can effectively solve the overlap problem between macro cells while considering routing congestion and shortening the total wire length to ensure routability. Codes are available at https://github.com/zhouyusd/SRLPlacer.

翻訳日:2024-11-08 14:50:05 公開日:2024-09-29

# ControlMLLM:マルチモーダル大規模言語モデルのための学習不要なビジュアルプロンプト学習

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models ( http://arxiv.org/abs/2407.21534v2 )

ライセンス: Link先を確認

Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji,

(参考訳) 本研究では,学習可能な視覚トークン最適化により,MLLM(Multimodal Large Language Models)に視覚参照を注入する学習自由手法を提案する。 MLLMにおけるテキストプロンプトトークンと視覚トークンの関係を観察する。提案手法では,推測中にMLP出力から視覚トークンを調整し,どのテキストプロンプトがどの視覚トークンに参加するかを制御する。我々は,エネルギー関数に基づいて学習可能な視覚トークンを最適化し,注目マップにおける参照領域の強度を高める。これにより、相当なトレーニングコストやモデル再トレーニングを必要とせずに、詳細な地域説明と推論が可能になる。本手法は,MLLMに参照能力を統合するための有望な方向を提供する。我々の方法は、ボックス、マスク、スクリブル、ポイントを参照することをサポートしている。その結果,本手法は制御性と解釈性を示すことがわかった。

In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlling which text prompt tokens attend to which visual tokens. We optimize a learnable visual token based on an energy function, enhancing the strength of referential regions in the attention map. This enables detailed region description and reasoning without the need for substantial training costs or model retraining. Our method offers a promising direction for integrating referential abilities into MLLMs. Our method support referring with box, mask, scribble and point. The results demonstrate that our method exhibits controllability and interpretability.

翻訳日:2024-11-08 13:40:32 公開日:2024-09-29

# 非畳み込みグラフニューラルネットワーク

Non-convolutional Graph Neural Networks ( http://arxiv.org/abs/2408.00165v3 )

ライセンス: Link先を確認

Yuanqing Wang, Kyunghyun Cho,

(参考訳) 畳み込みベースのグラフニューラルネットワーク(GNN)を再考する -- 表現力の制限、過度なスムース化、過剰なスキャッシングが特徴であり、効率的な計算には特別なスパースカーネルが必要である。本稿では、畳み込み演算子を完全に含まない単純なグラフ学習モジュールを設計し、ランダムウォークと統一メモリ(RUM)ニューラルネットワークを合成し、RNNが各ノードで終了するランダムウォークに沿ってトポロジとセマンティックグラフの特徴をマージする。 RNNの挙動とグラフトポロジーに関する豊富な文献に関連して,RUMが上記の症状を緩和し,Weisfeiler-Lehman(WL)同型性試験よりも表現力が高いことを理論的に証明し,実験的に検証した。様々なノードレベルの分類と回帰タスクにおいて、RUMは競争性能を達成するだけでなく、最も単純な畳み込みGNNよりも堅牢で、メモリ効率が良く、スケーラブルで、高速である。

Rethink convolution-based graph neural networks (GNN) -- they characteristically suffer from limited expressiveness, over-smoothing, and over-squashing, and require specialized sparse kernels for efficient computation. Here, we design a simple graph learning module entirely free of convolution operators, coined random walk with unifying memory (RUM) neural network, where an RNN merges the topological and semantic graph features along the random walks terminating at each node. Relating the rich literature on RNN behavior and graph topology, we theoretically show and experimentally verify that RUM attenuates the aforementioned symptoms and is more expressive than the Weisfeiler-Lehman (WL) isomorphism test. On a variety of node- and graph-level classification and regression tasks, RUM not only achieves competitive performance, but is also robust, memory-efficient, scalable, and faster than the simplest convolutional GNNs.

翻訳日:2024-11-08 13:40:31 公開日:2024-09-29

# オンライン線形計画法における頻繁な解法

Infrequent Resolving Algorithm for Online Linear Programming ( http://arxiv.org/abs/2408.00465v3 )

ライセンス: Link先を確認

Guokai Li, Zizhuo Wang, Jingwei Zhang,

(参考訳) オンラインリニアプログラミング(OLP)は、オンラインオークション、ネットワーク収益管理、広告などの幅広い応用により、研究者と実践者の両方から大きな注目を集めている。既存のOLPアルゴリズムは、LPベースアルゴリズムとLPフリーアルゴリズムの2つのカテゴリに分類される。前者は典型的にはパフォーマンスの向上を保証し、常に後悔しても良いが、計算コストのかかる大量のLPを解く必要がある。対照的に、LPフリーアルゴリズムは1次計算しか必要としないが、より悪い性能を誘導し、絶え間ないリフレッシュバウンドを欠いている。本研究では、未知の有限支持分布から入力が引き出される場合について検討し、この2つの極端間のギャップを、時間的地平線上でのO(\log\log T)$倍のLPを解きながら、絶え間なく後悔するアルゴリズムを提案して橋渡しする。さらに、LPをわずかに$M$回だけ解ける場合、$O\left(T^{(1/2+\epsilon)^{M-1}}\right)を許すアルゴリズムを提案する。さらに、最初に到着確率が分かると、我々のアルゴリズムはLPs$O(\log\log T)$ times と $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret を LPs$M$ times で解くことで、絶え間ない後悔を保証できる。提案アルゴリズムの効率性を示すために, 数値実験を行った。

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we study the case where the inputs are drawn from an unknown finite-support distribution, and bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.

翻訳日:2024-11-08 13:29:21 公開日:2024-09-29

# GoNoGo: 効率的なLCMベースのマルチエージェントシステム

GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making ( http://arxiv.org/abs/2408.09785v2 )

ライセンス: Link先を確認

Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt, Andris Freimanis, Patrick Andersson Rhodin, Dhasarathy Parthasarathy,

(参考訳) 自動車業界におけるソフトウェアデプロイメントの決定を行う従来の手法は、通常、表形式のソフトウェアテストデータの手動分析に頼っている。これらの手法は、労働集約性のために、ソフトウェアリリースサイクルのコストと遅延を高くすることが多い。大規模言語モデル(LLM)はこれらの課題に対して有望な解決策を提供する。しかし、そのアプリケーションは一般的に、人間主導のプロンプトエンジニアリングのラウンドを複数回必要としており、特に信頼性と効率的な結果を必要とする産業のエンドユーザーに対して、その実践的な展開を制限している。本稿では,機能要件と実用的産業制約の両方を満たしつつ,自動車ソフトウェアデプロイメントを効率化するLLMエージェントシステムであるGoNoGoを提案する。従来のシステムとは異なり、GoNoGoはドメイン固有でリスクに敏感なシステムに特化している。我々は,産業実践から得たゼロショットと少数ショットの例を用いて,GoNoGoの性能を,異なる課題にまたがって評価した。以上の結果から,GoNoGoは3ショットの例ではレベル2の難易度までのタスクを100%成功率で達成し,さらに複雑なタスクにおいても高いパフォーマンスを維持していることがわかった。 GoNoGoは、より簡単なタスクのための意思決定を効果的に自動化し、手作業による介入の必要性を大幅に低減します。要約すると、GoNoGoは、我々の産業パートナーの会社で現在採用されている効率的でユーザフレンドリなLCMベースのソリューションであり、ソフトウェアのリリース決定を支援し、リスクに敏感な車両システムのリリースプロセスにおいて、より情報とタイムリーな決定をサポートします。

Traditional methods for making software deployment decisions in the automotive industry typically rely on manual analysis of tabular software test data. These methods often lead to higher costs and delays in the software release cycle due to their labor-intensive nature. Large Language Models (LLMs) present a promising solution to these challenges. However, their application generally demands multiple rounds of human-driven prompt engineering, which limits their practical deployment, particularly for industrial end-users who need reliable and efficient results. In this paper, we propose GoNoGo, an LLM agent system designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints. Unlike previous systems, GoNoGo is specifically tailored to address domain-specific and risk-sensitive systems. We evaluate GoNoGo's performance across different task difficulties using zero-shot and few-shot examples taken from industrial practice. Our results show that GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks. We find that GoNoGo effectively automates decision-making for simpler tasks, significantly reducing the need for manual intervention. In summary, GoNoGo represents an efficient and user-friendly LLM-based solution currently employed in our industrial partner's company to assist with software release decision-making, supporting more informed and timely decisions in the release process for risk-sensitive vehicle systems.

翻訳日:2024-11-08 06:55:48 公開日:2024-09-29

# 古文書間分析のためのLLM談話パターンの検討

Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis ( http://arxiv.org/abs/2409.01882v2 )

ライセンス: Link先を確認

Ray Umphrey, Jesse Roberts, Lindsey Roberts,

(参考訳) 本研究では,大言語モデル (LLMs) の聖書, コイナ語, ギリシア語文における文間関係の同定と検討の可能性について検討する。 LLMの性能を様々なテクスト間シナリオで評価することにより、これらのモデルがテキスト間の直接的引用、暗示、エコーを検出することができることを示した。 LLMが新たなテクスト間観測と接続を生成する能力は、新たな洞察を明らかにする可能性を浮き彫りにしている。しかし、このモデルは、長いクエリパスと偽のテキスト間依存を含まないことにも苦慮し、専門家の評価の重要性を強調している。論文のエキスパート・イン・ザ・ループの方法論は、聖書のコーパス内外における複雑なテクスチュアリティのウェブについて、インターテクスチュアな研究を行うためのスケーラブルなアプローチを提供する。

This study explores the potential of large language models (LLMs) for identifying and examining intertextual relationships within biblical, Koine Greek texts. By evaluating the performance of LLMs on various intertextuality scenarios the study demonstrates that these models can detect direct quotations, allusions, and echoes between texts. The LLM's ability to generate novel intertextual observations and connections highlights its potential to uncover new insights. However, the model also struggles with long query passages and the inclusion of false intertextual dependences, emphasizing the importance of expert evaluation. The expert-in-the-loop methodology presented offers a scalable approach for intertextual research into the complex web of intertextuality within and beyond the biblical corpus.

翻訳日:2024-11-07 23:56:04 公開日:2024-09-29

# E2CL: 身体的エージェントの探索に基づく誤り訂正学習

E2CL: Exploration-based Error Correction Learning for Embodied Agents ( http://arxiv.org/abs/2409.03256v2 )

ライセンス: Link先を確認

Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li,

(参考訳) 言語モデルは、知識利用と推論の能力が増大している。しかし、具体的環境においてエージェントとして適用された場合、本質的な知識と環境的な知識の相違に悩まされることがしばしばあり、実行不可能な行動を引き起こす。専門家軌道の教師付き学習や強化学習といった従来の環境アライメント手法は, 環境知識をカバーし, 効率の良い収束を達成する上での限界に遭遇する。人間の学習に触発されて,探索に基づく誤り訂正学習(E2CL)を提案する。 E2CLは、環境フィードバックを収集し、誤った行動を正すために、教師誘導と教師なしの探究を取り入れている。エージェントはフィードバックと自己修正を提供することを学び、それによってターゲット環境への適応性を高める。 VirtualHome環境における大規模な実験により、E2CL訓練エージェントはベースライン法で訓練されたエージェントよりも優れ、優れた自己補正能力を示すことが示された。

Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.

翻訳日:2024-11-07 23:23:02 公開日:2024-09-29

# オープンワールドダイナミックプロンプトと連続視覚表現学習

Open-World Dynamic Prompt and Continual Visual Representation Learning ( http://arxiv.org/abs/2409.05312v2 )

ライセンス: Link先を確認

Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer,

(参考訳) オープンワールドは本質的に動的であり、絶え間なく進化する概念と分布によって特徴づけられる。この動的なオープンワールド環境における連続学習(CL)は、目に見えないテストタイムクラスに効果的に一般化する上で大きな課題となる。この課題に対処するために,オープンワールドの視覚表現学習に適した,実用的なCL設定を提案する。この設定では、後続のデータストリームは、以前のトレーニングフェーズで見られるクラスとは相容れない新しいクラスを体系的に導入する。そこで本研究では,シンプルなPrompt-based CL (PCL) 法である Dynamic Prompt and Representation Learner (DPaRL) を提案する。我々のDPaRLは、従来のPCL法で静的なプロンプトプールに依存するのとは対照的に、推論のための動的プロンプトを生成することを学ぶ。さらに、DPaRLはトレーニング段階ごとに動的プロンプト生成と識別表現を共同で学習するのに対し、PCL以前の手法はプロセス全体を通してのみプロンプト学習を洗練させる。 Recall@1の性能は平均4.7%向上した。

The open world is inherently dynamic, characterized by ever-evolving concepts and distributions. Continual learning (CL) in this dynamic open-world environment presents a significant challenge in effectively generalizing to unseen test-time classes. To address this challenge, we introduce a new practical CL setting tailored for open-world visual representation learning. In this setting, subsequent data streams systematically introduce novel classes that are disjoint from those seen in previous training phases, while also remaining distinct from the unseen test classes. In response, we present Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method. Our DPaRL learns to generate dynamic prompts for inference, as opposed to relying on a static prompt pool in previous PCL methods. In addition, DPaRL jointly learns dynamic prompt generation and discriminative representation at each training stage whereas prior PCL methods only refine the prompt learning throughout the process. Our experimental results demonstrate the superiority of our approach, surpassing state-of-the-art methods on well-established open-world image retrieval benchmarks by an average of 4.7% improvement in Recall@1 performance.

翻訳日:2024-11-07 22:38:45 公開日:2024-09-29

# 衛星画像時系列に基づく作物分類用SITSMamba

SITSMamba for Crop Classification based on Satellite Image Time Series ( http://arxiv.org/abs/2409.09673v2 )

ライセンス: Link先を確認

Xiaolei Qin, Xin Su, Liangpei Zhang,

(参考訳) 衛星画像時系列データ(SITS)は、季節や年を通して植生の変化や成長パターンを追跡することができる。 SITSを作物分類に用いた多くのディープラーニング(DL)アプローチが最近出現し、最新のアプローチではSITS分類にTransformerを採用している。しかし、Transformerにおける自己注意の二次的な複雑さは、時系列の分類に挑戦する。最先端のMambaアーキテクチャは、リモートセンシング画像解釈など様々な領域で強みを示してきたが、SITSデータの時間的表現を学習する能力は未解明のままである。さらに、既存のSITS分類法は、時間情報の完全活用に失敗する監視信号として、作物ラベルにのみ依存することが多い。本稿では,リモートセンシング時系列データに基づく作物分類のための衛星画像時系列マンバ(SITSMamba)手法を提案する。提案したSITSMambaは、畳み込みニューラルネットワーク(CNN)に基づく空間エンコーダと、マンバに基づく時間エンコーダを含む。 SITSからのより豊かな時間情報を活用するために、異なるタスクに使用するデコーダの2つのブランチを設計する。最初のブランチは、作物分類ブランチ(CBranch)で、作物マップに機能をデコードするConvBlockを含んでいる。第2のブランチはSITSレコンストラクションブランチで、線形層を使用してエンコードされた特徴を変換し、元の入力値を予測する。さらに、RBranchに適用された位置重み(PW)を設計し、SITSからリッチ潜在知識の学習を支援する。また、トレーニング中に2つの枝のバランスを制御するために2つの重み付け因子を設計する。 SITSMambaのコードは、https://github.com/XiaoleiQinn/SITSMambaで公開されている。

Satellite image time series (SITS) data provides continuous observations over time, allowing for the tracking of vegetation changes and growth patterns throughout the seasons and years. Numerous deep learning (DL) approaches using SITS for crop classification have emerged recently, with the latest approaches adopting Transformer for SITS classification. However, the quadratic complexity of self-attention in Transformer poses challenges for classifying long time series. While the cutting-edge Mamba architecture has demonstrated strength in various domains, including remote sensing image interpretation, its capacity to learn temporal representations in SITS data remains unexplored. Moreover, the existing SITS classification methods often depend solely on crop labels as supervision signals, which fails to fully exploit the temporal information. In this paper, we proposed a Satellite Image Time Series Mamba (SITSMamba) method for crop classification based on remote sensing time series data. The proposed SITSMamba contains a spatial encoder based on Convolutional Neural Networks (CNN) and a Mamba-based temporal encoder. To exploit richer temporal information from SITS, we design two branches of decoder used for different tasks. The first branch is a crop Classification Branch (CBranch), which includes a ConvBlock to decode the feature to a crop map. The second branch is a SITS Reconstruction Branch that uses a Linear layer to transform the encoded feature to predict the original input values. Furthermore, we design a Positional Weight (PW) applied to the RBranch to help the model learn rich latent knowledge from SITS. We also design two weighting factors to control the balance of the two branches during training. The code of SITSMamba is available at: https://github.com/XiaoleiQinn/SITSMamba.

翻訳日:2024-11-07 20:46:36 公開日:2024-09-29

# Rein to Fine-Tune Vision Foundation Model を用いたクロスオーガンおよびクロススキャン腺癌切除

Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models ( http://arxiv.org/abs/2409.11752v3 )

ライセンス: Link先を確認

Pengzhou Cai, Xueyuan Zhang, Libin Lan, Ze Zhao,

(参考訳) 近年,デジタル病理学の分野において腫瘍の分節化が著しい進展を遂げている。しかし, 臓器, 組織調製法, 画像取得過程の変動は, デジタル病理画像の領域差につながる可能性がある。そこで本論文では,MICCAI 2024とCOSAS2024の様々な視覚基盤モデル(VFM)をパラメトリックかつ効率的に微調整する手法であるReinを用いた。 Reinのコアは学習可能なトークンのセットで構成されており、インスタンスに直接リンクされ、各レイヤのインスタンスレベルの機能が改善されている。 COSAS2024 Challengeのデータ環境において、ラインは良好な結果を得るためにVFMを微調整した。具体的には、Reinを使ってConvNeXtとDINOv2を微調整しました。予備試験では0.7719点,最終試験では0.7557点,最終試験では0.8848点,最終試験では0.8192点を得た。コードはGitHubで入手できる。

In recent years, significant progress has been made in tumor segmentation within the field of digital pathology. However, variations in organs, tissue preparation methods, and image acquisition processes can lead to domain discrepancies among digital pathology images. To address this problem, in this paper, we use Rein, a fine-tuning method, to parametrically and efficiently fine-tune various vision foundation models (VFMs) for MICCAI 2024 Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation (COSAS2024). The core of Rein consists of a set of learnable tokens, which are directly linked to instances, improving functionality at the instance level in each layer. In the data environment of the COSAS2024 Challenge, extensive experiments demonstrate that Rein fine-tuned the VFMs to achieve satisfactory results. Specifically, we used Rein to fine-tune ConvNeXt and DINOv2. Our team used the former to achieve scores of 0.7719 and 0.7557 on the preliminary test phase and final test phase in task1, respectively, while the latter achieved scores of 0.8848 and 0.8192 on the preliminary test phase and final test phase in task2. Code is available at GitHub.

翻訳日:2024-11-07 19:50:48 公開日:2024-09-29

# SnO$_2$薄膜特性のスマートデータ駆動型GRU予測器

Smart Data-Driven GRU Predictor for SnO$_2$ Thin films Characteristics ( http://arxiv.org/abs/2409.11782v2 )

ライセンス: Link先を確認

Faiza Bouamra, Mohamed Sayah, Labib Sadek Terrissa, Noureddine Zerhouni,

(参考訳) 材料物理学では、物性、構造、エレクトロニクス、磁気、光学、誘電体、分光特性に関する材料データを取得するために、キャラクタリゼーション技術が最も重要である。しかし、多くの材料にとって、可用性と安全なアクセシビリティを確保することは必ずしも容易ではなく、完全に保証されているわけではない。さらに、モデリングとシミュレーション技術の使用には、コストのかかる計算時間と大きな複雑さを伴うことに加えて、多くの理論的知識が必要である。したがって、複数のサンプルを同時に分析する異なる手法で材料を分析することは、技術者や研究者にとって非常に困難である。非常に危険であるにもかかわらず、X線回折は結晶性1d, 2d, 3d材料の構造特性からデータを収集する、よく知られ、広く使われているキャラクタリゼーション技術である。本稿では, 酸化スズSnO$_2$(110) 薄膜の構造特性や特性を予測するための Gated Recurrent Unit モデルのためのスマート GRU を提案する。実際、薄膜サンプルは実験的に精巧に管理され、収集されたデータ辞書は、スズ酸化物SnO$_2$(110)構造特性のキャラクタリゼーションのためのAI-人工知能-GRUモデルを生成するために使用される。

In material physics, characterization techniques are foremost crucial for obtaining the materials data regarding the physical properties as well as structural, electronics, magnetic, optic, dielectric, and spectroscopic characteristics. However, for many materials, ensuring availability and safe accessibility is not always easy and fully warranted. Moreover, the use of modeling and simulation techniques need a lot of theoretical knowledge, in addition of being associated to costly computation time and a great complexity deal. Thus, analyzing materials with different techniques for multiple samples simultaneously, still be very challenging for engineers and researchers. It is worth noting that although of being very risky, X-ray diffraction is the well known and widely used characterization technique which gathers data from structural properties of crystalline 1d, 2d or 3d materials. We propose in this paper, a Smart GRU for Gated Recurrent Unit model to forcast structural characteristics or properties of thin films of tin oxide SnO$_2$(110). Indeed, thin films samples are elaborated and managed experimentally and the collected data dictionary is then used to generate an AI -- Artificial Intelligence -- GRU model for the thin films of tin oxide SnO$_2$(110) structural property characterization.

翻訳日:2024-11-07 19:50:48 公開日:2024-09-29

# 多段階因子モデルに適合する

Fitting Multilevel Factor Models ( http://arxiv.org/abs/2409.12067v2 )

ライセンス: Link先を確認

Tetiana Parshakova, Trevor Hastie, Stephen Boyd,

(参考訳) マルチレベル低ランク行列~\cite{parshakova2023factor} で与えられる共分散を持つ多レベル因子モデルの特別な場合について検討する。我々は,観測データの可能性を最大化するために,多レベル因子モデルに適した予測最大化(EM)アルゴリズムを高速に実装する。この方法は任意の階層構造を許容し、反復ごとに線形時間と保存の複雑さを維持する。これは、正定値MLR行列の逆数を計算するための新しい効率的な手法によって達成される。逆 PSD MLR 行列の逆行列は因子の間隔が同じである MLR 行列でもあることを示し、逆行列の因子を得るために、再帰的シャーマン・モリソン・ウードベリー行列恒等式を用いる。さらに、線形時間と空間の複素量を持つ拡張行列のコレスキー分解を計算し、共分散行列をシュア補数とするアルゴリズムを提案する。本稿では,提案手法を実装したオープンソースパッケージを添付する。

We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~\cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization (EM) algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of an invertible PSD MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.

翻訳日:2024-11-07 19:26:16 公開日:2024-09-29

# 物理インフォームドニューラルネットワークのための高次ReLU-KAN(HRKAN) ネットワーク(PINN)はより正確で、堅牢で、より速く

Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster ( http://arxiv.org/abs/2409.14248v1 )

ライセンス: Link先を確認

Chi Chiu So, Siu Pang Yung

(参考訳) 偏微分方程式(PDE)の解を見つけることは、多くの科学的・工学的な発見において重要な要素である。ディープラーニングによって強化される一般的なアプローチの1つは、物理情報ニューラルネットワーク(PINN)である。近年,MLP(Multilayer Perceptions)の代わりに,トレーニング可能なアクティベーション機能を持つニューラルネットワークモデルKAN(Kolmogorov-Arnold Networks)が提案されている。適合精度を高めるため, アクティベーション関数の基盤として「ReLU二乗」を用いた「ReLU-KAN」と呼ばれるカンの修正が提案されている。本研究では, 活性化関数である高次ReLU(HR)を, カンで使用される活性化関数であるBsplinesよりも単純で, 効率的なカン行列演算が可能であり, 物理インフォームドニューラルネットワークに必須なスムーズで非ゼロな高次微分を持つ, 活性化関数の別の基底として提案する。我々は、高次ReLU(HR)をアクティベーションとしてHRKAN(HRKAN)と呼ぶ。線形ポアソン方程式と非線形バーガース方程式の粘度に関する2つの有名なPDEに関する詳細な実験により,提案した高次ReLU-KAN (Higher-order-ReLU-KANs) がKans,ReLU-KANs,HRKANsの間で高い適合精度とトレーニングロバスト性,および最低トレーニング時間を達成することを明らかにした。実験を再現するコードはhttps://github.com/kelvinhkcs/HRKAN.comで公開されている。

Finding solutions to partial differential equations (PDEs) is an important and essential component in many scientific and engineering discoveries. One of the common approaches empowered by deep learning is Physics-informed Neural Networks (PINNs). Recently, a new type of fundamental neural network model, Kolmogorov-Arnold Networks (KANs), has been proposed as a substitute of Multilayer Perceptions (MLPs), and possesses trainable activation functions. To enhance KANs in fitting accuracy, a modification of KANs, so called ReLU-KANs, using "square of ReLU" as the basis of its activation functions, has been suggested. In this work, we propose another basis of activation functions, namely, Higherorder-ReLU (HR), which is simpler than the basis of activation functions used in KANs, namely, Bsplines; allows efficient KAN matrix operations; and possesses smooth and non-zero higher-order derivatives, essential to physicsinformed neural networks. We name such KANs with Higher-order-ReLU (HR) as their activations, HRKANs. Our detailed experiments on two famous and representative PDEs, namely, the linear Poisson equation and nonlinear Burgers' equation with viscosity, reveal that our proposed Higher-order-ReLU-KANs (HRKANs) achieve the highest fitting accuracy and training robustness and lowest training time significantly among KANs, ReLU-KANs and HRKANs. The codes to replicate our experiments are available at https://github.com/kelvinhkcs/HRKAN.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-29

# 物理学インフォームドニューラルネットワーク(PINN)をより正確に、堅牢かつ高速に解くための高次ReLU-KAN(HRKAN)

Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster ( http://arxiv.org/abs/2409.14248v2 )

ライセンス: Link先を確認

Chi Chiu So, Siu Pang Yung,

翻訳日:2024-11-06 23:26:16 公開日:2024-09-29

Chi Chiu So, Siu Pang Yung,

翻訳日:2024-11-06 23:26:16 公開日:2024-09-29

# 開量子系におけるマルコフ-非マルコフ遷移のスペクトル信号

Spectral Signatures of the Markovian to Non-Markovian Transition in Open Quantum Systems ( http://arxiv.org/abs/2409.14661v1 )

ライセンス: Link先を確認

Zeng-Zhao Li, Cho-Tung Yip, Bo Li,

(参考訳) 本稿では, 線形吸収スペクトルの解析を通じて, 振動浴に強く結合した量子集合体におけるマルコフ的-非マルコフ的遷移を研究するための新しいアプローチを提案する。周波数領域における階層的代数方程式を用いて、これらのスペクトルが、散逸の複雑な相互作用、集合バスカップリング、およびアグリゲート内双極子-双極子相互作用によって、マルコフ型と非マルコフ型の間の遷移を効果的に明らかにする方法を解明する。以上の結果から, 消散効果の低下はスペクトルピークの分裂を引き起こすことが示され, ジポール-ジポール相互作用の強化によってさらに増幅される, 入浴による非マルコフ効果の出現が示唆された。さらに、集合バス結合強度が増大すると、最初は対称または非対称のピークとスペクトル振幅の異なるピークは弱い双極子-双極子相互作用の下で結合するが、強い双極子-双極子相互作用はピーク分裂を引き起こす。これらの現象はマルコフ的行動から非マルコフ的行動への移行の代替指標となる。さらに、スペクトルの特徴は、集合の幾何学的構造を識別するための感度の高いプローブとして機能する。この研究はマルコフから非マルコフ遷移への理解を深めるだけでなく、量子システムを最適化し制御するための堅牢な枠組みも提供する。

We present a new approach for investigating the Markovian to non-Markovian transition in quantum aggregates strongly coupled to a vibrational bath through the analysis of linear absorption spectra. Utilizing hierarchical algebraic equations in the frequency domain, we elucidate how these spectra can effectively reveal transitions between Markovian and non-Markovian regimes, driven by the complex interplay of dissipation, aggregate-bath coupling, and intra-aggregate dipole-dipole interactions. Our results demonstrate that reduced dissipation induces spectral peak splitting, signaling the emergence of bath-induced non-Markovian effects, which are further amplified by enhanced dipole-dipole interactions. Additionally, with an increase in aggregate-bath coupling strength, initially symmetric or asymmetric peaks with varying spectral amplitudes may merge under weak dipole-dipole interactions, whereas strong dipole-dipole interactions are more likely to cause peak splitting. These phenomena serve as alternative indicators of the shift from Markovian to non-Markovian behavior. Moreover, the spectral features can act as sensitive probes for distinguishing geometric structures of the aggregates. This study not only deepens our understanding of the Markovian to non-Markovian transition but also provides a robust framework for optimizing and controlling quantum systems.

翻訳日:2024-11-06 21:34:58 公開日:2024-09-29

# 開量子系におけるマルコフ変換と非マルコフ遷移のスペクトルシグネチャ

Spectral signatures of the Markovian to Non-Markovian transition in open quantum systems ( http://arxiv.org/abs/2409.14661v2 )

ライセンス: Link先を確認

Zeng-Zhao Li, Cho-Tung Yip, Bo Li,

(参考訳) 本稿では, 線形吸収スペクトルの解析を通じて, 振動浴に強く結合した量子集合体におけるマルコフ的-非マルコフ的遷移を研究するための新しいアプローチを提案する。周波数領域における階層的代数方程式を用いて、これらのスペクトルが、散逸の複雑な相互作用、集合バスカップリング、およびアグリゲート内双極子-双極子相互作用によって、マルコフ型と非マルコフ型の間の遷移を効果的に明らかにする方法を解明する。以上の結果から,消散量の減少はスペクトルピークの分裂を誘導し,入浴による非マルコフ効果の出現を示唆することが明らかとなった。スペクトルピークスプリッティングは双極子-双極子相互作用の強化によっても駆動できるが、基礎となるメカニズムは散逸誘起スプリッティングと異なる。さらに、集合バス結合強度が増大すると、最初は対称または非対称のピークとスペクトル振幅の異なるピークは弱い双極子-双極子相互作用の下で結合するが、強い双極子-双極子相互作用はピーク分裂を引き起こす。さらに、スペクトル特徴は、凝集体の幾何学的構造を識別するための高感度な指標として機能し、非マルコフ的挙動を形成する上での幾何学的役割も明らかにしている。この研究はマルコフから非マルコフ遷移への理解を深めるだけでなく、量子システムを最適化し制御するための堅牢な枠組みも提供する。

We present a new approach for investigating the Markovian to non-Markovian transition in quantum aggregates strongly coupled to a vibrational bath through the analysis of linear absorption spectra. Utilizing hierarchical algebraic equations in the frequency domain, we elucidate how these spectra can effectively reveal transitions between Markovian and non-Markovian regimes, driven by the complex interplay of dissipation, aggregate-bath coupling, and intra-aggregate dipole-dipole interactions. Our results demonstrate that reduced dissipation induces spectral peak splitting, signaling the emergence of bath-induced non-Markovian effects. The spectral peak splitting can also be driven by enhanced dipole-dipole interactions, although the underlying mechanism differs from that of dissipation-induced splitting. Additionally, with an increase in aggregate-bath coupling strength, initially symmetric or asymmetric peaks with varying spectral amplitudes may merge under weak dipole-dipole interactions, whereas strong dipole-dipole interactions are more likely to cause peak splitting. Moreover, we find that spectral features serve as highly sensitive indicators for distinguishing the geometric structures of aggregates, while also unveiling the critical role geometry plays in shaping non-Markovian behavior. This study not only deepens our understanding of the Markovian to non-Markovian transition but also provides a robust framework for optimizing and controlling quantum systems.

翻訳日:2024-11-06 21:34:58 公開日:2024-09-29

# ロケーションが鍵:Verilogの関数バグローカライゼーションのための大規模言語モデルを活用する

Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog ( http://arxiv.org/abs/2409.15186v2 )

ライセンス: Link先を確認

Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, Nan Guan,

(参考訳) Verilogコードのバグローカライゼーションは,ハードウェア設計の検証において重要かつ時間を要する課題である。導入以来、LLM(Large Language Models)はその強力なプログラミング能力を示している。しかしながら、VerilogコードのバグローカライゼーションにLLMを使うことを検討する作業はまだない。本稿では,Verilogスニペット内の機能的エラーを検出するオープンソースLLMソリューションであるLocation-is-Keyを提案する。 LiKは高いローカライゼーション精度を達成し、我々のテストデータセットでは、RTLLMに基づいて93.3%のパス@1ローカライゼーション精度を達成し、GPT-4の77.9%を超え、Claude-3.5の90.8%に匹敵する。さらに、LiK が取得したバグ位置は GPT-3.5 のバグ修正効率を大幅に改善し(Functional pass@1 は 40.39% から 58.92% に増加した)、LLM ベースの Verilog デバッグにおけるバグローカライゼーションの重要性を強調した。既存のメソッドと比較して、LiKはテストベンチやアサーション、その他のEDAツールを必要とせずに、設計仕様と誤ったコードスニペットのみを必要とする。本研究は,Verilog エラーローカライゼーションに LLM を用いることが可能であることを示す。

Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming capabilities. However, no work has yet considered using LLMs for bug localization in Verilog code. This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. LiK achieves high localization accuracy, with a pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassing GPT-4's 77.9% and comparable to Claude-3.5's 90.8%. Additionally, the bug location obtained by LiK significantly improves GPT-3.5's bug repair efficiency (Functional pass@1 increased from 40.39% to 58.92%), highlighting the importance of bug localization in LLM-based Verilog debugging. Compared to existing methods, LiK only requires the design specification and the erroneous code snippet, without the need for testbenches, assertions, or any other EDA tools. This research demonstrates the feasibility of using LLMs for Verilog error localization, thus providing a new direction for automatic Verilog code debugging.

翻訳日:2024-11-06 20:27:58 公開日:2024-09-29

# スペインの低リソース言語に対する多言語移動とドメイン適応

Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain ( http://arxiv.org/abs/2409.15924v2 )

ライセンス: Link先を確認

Yuanchang Luo, Zhanglin Wu, Daimeng Wei, Hengchao Shang, Zongyao Li, Jiaxin Guo, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Yuhao Xie, Jiawei Zheng Bin Wei, Hao Yang,

(参考訳) 本稿では,Huawei Translation Service Center (HW-TSC) による,スペインにおける低リソース言語への翻訳の提出状況について紹介する。我々は,スペイン語からアラゴネーズ (es-arg) ,スペイン語からアラン語 (es-arn) ,スペイン語からアストゥリアン語 (es-ast) の3つの翻訳作業に参加した。これら3つの翻訳タスクでは、多言語翻訳、正規化ドロップアウト、前方翻訳、前方翻訳、音声認識、トランスダクション・アンサンブル学習などの学習戦略を、深層トランスフォーマー・ビッグアーキテクチャのトレーニングに基づくニューラル・マシン・トランスフォーメーション(NMT)モデルに適用する。これらの強化戦略を用いることで,最終評価において競争的な結果が得られた。

This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). We participated in three translation tasks: spanish to aragonese (es-arg), spanish to aranese (es-arn), and spanish to asturian (es-ast). For these three translation tasks, we use training strategies such as multilingual transfer, regularized dropout, forward translation and back translation, labse denoising, transduction ensemble learning and other strategies to neural machine translation (NMT) model based on training deep transformer-big architecture. By using these enhancement strategies, our submission achieved a competitive result in the final evaluation.

翻訳日:2024-11-06 19:21:13 公開日:2024-09-29

# 談話レベル文学翻訳のための文脈認識とスタイル関連インクリメンタルデコーディングフレームワーク

Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation ( http://arxiv.org/abs/2409.16539v2 )

ライセンス: Link先を確認

Yuanchang Luo, Jiaxin Guo, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Hao Yang,

(参考訳) 本稿では,WMT24 Discourse-Level Literary Translation Taskに対する我々のアプローチについて概説する。文学テキストの翻訳は、これらの作品に固有のニュアンスな意味、慣用的な表現、複雑な物語構造が原因で、大きな課題となっている。これらの課題に対処するために,我々はCPT(Continuous Pre-Training)とSFT(Supervised Fine-Tuning)を組み合わせることで,このタスクを特に強化した中国語-Llama2モデルを利用した。提案手法は,テキスト全体の一貫性と一貫性を維持しつつ,各文がより広い文脈で翻訳されることを保証する新しいインクリメンタル・デコーディング・フレームワークを含む。このアプローチにより、モデルは長距離の依存関係とスタイル的要素をキャプチャし、元の文学的品質を忠実に保存する翻訳を生成することができる。本実験は,文章レベルのBLEUスコアと文書レベルのBLEUスコアの両方において,文書レベルの文学翻訳の複雑さに対処する上で,提案手法の有効性を実証するものである。

This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT). Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text. This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality. Our experiments demonstrate significant improvements in both sentence-level and document-level BLEU scores, underscoring the effectiveness of our proposed framework in addressing the complexities of document-level literary translation.

翻訳日:2024-11-06 17:30:16 公開日:2024-09-29

# SynTQA: Text-to-SQLとE2E TQAの混合によるSynergistic Table-based Question Answering

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA ( http://arxiv.org/abs/2409.16682v2 )

ライセンス: Link先を確認

Siyue Zhang, Anh Tuan Luu, Chen Zhao,

(参考訳) Text-to-SQL解析とエンドツーエンド質問応答(E2E TQA)は、テーブルベースの質問回答タスクの2つの主要なアプローチである。複数のベンチマークで成功したが、まだ比較されておらず、相乗効果は未解明のままである。テキスト・トゥ・SQLは、算術演算や長いテーブルを含む問題を扱う上での優位性を示し、E2E TQAは曖昧な問題、非標準テーブルスキーマ、複雑なテーブル内容に対処する上で優れている。両長所を組み合わせるために,任意のモデルタイプに非依存な回答選択を通じて,異なるモデルを統合するSynergistic Tableベースの質問応答手法を提案する。さらに,機能ベースまたはLCMベースの回答セレクタによるアンサンブルモデルにより,個々のモデルよりも性能が大幅に向上することが検証された。

Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models.

翻訳日:2024-11-06 17:20:02 公開日:2024-09-29

# 時間感性質問応答の時間感性向上と推論

Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering ( http://arxiv.org/abs/2409.16909v2 )

ライセンス: Link先を確認

Wanqi Yang, Yanda Li, Meng Fang, Ling Chen,

(参考訳) Time-Sensitive Question Answering (TSQA)は、時間に敏感な質問に対処するために、複数の時間的事実を含む特定の時間的文脈を効果的に活用することを要求する。このことは、質問の中の時間情報のパーシングだけでなく、正確な答えを生成するために、時間進化する事実の識別と理解も必要である。しかし、現在の大規模言語モデルは、時間的情報に対する感度が限られており、その時間的推論能力が不十分である。本稿では,時間的認知と推論を時間的情報認識の埋め込みとグラニュラコントラスト強化学習を通じて促進する新しい枠組みを提案する。 4つのTSQAデータセットによる実験結果から、我々のフレームワークは、TSQAタスクにおける既存のLLMよりも大幅に優れており、マシンと人間の時間的理解と推論のパフォーマンスギャップを埋める上での一歩であることが示された。

Time-Sensitive Question Answering (TSQA) demands the effective utilization of specific temporal contexts, encompassing multiple time-evolving facts, to address time-sensitive questions. This necessitates not only the parsing of temporal information within questions but also the identification and understanding of time-evolving facts to generate accurate answers. However, current large language models still have limited sensitivity to temporal information and their inadequate temporal reasoning capabilities. In this paper, we propose a novel framework that enhances temporal awareness and reasoning through Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning. Experimental results on four TSQA datasets demonstrate that our framework significantly outperforms existing LLMs in TSQA tasks, marking a step forward in bridging the performance gap between machine and human temporal understanding and reasoning.

翻訳日:2024-11-06 17:10:14 公開日:2024-09-29

# 次世代予測のためのトレーニング変圧器の非漸近収束

Non-asymptotic Convergence of Training Transformers for Next-token Prediction ( http://arxiv.org/abs/2409.17335v2 )

ライセンス: Link先を確認

Ruiquan Huang, Yingbin Liang, Jing Yang,

(参考訳) トランスフォーマーは、特にNTP(Next-token Prediction)タスクにおいて、シーケンシャルなデータを処理する優れた能力のために、現代の機械学習において驚くべき成功を収めている。しかしながら、NTPにおけるそれらの性能に関する理論的理解は限られており、既存の研究は主に漸近性パフォーマンスに焦点を当てている。本稿では, 自己保持モジュールとフィードフォワード層からなる一層変圧器のトレーニングダイナミクスを, 微細な非漸近解析により解析する。まず,部分順序に基づく数学的枠組みを用いて,NTPのトレーニングデータセットの基本的な構造特性を特徴付ける。そこで,2段階学習アルゴリズムを設計し,フィードフォワード層をトレーニングする前処理ステージと,注目層をトレーニングする主処理ステージが高速収束性能を示す。具体的には、両方の層は対応する最大辺解の方向と直交収束する。また,クロスエントロピー損失は線形収束速度がよいことを示す。さらに、トレーニングされた変換器は、データセットシフトによる非自明な予測能力を示し、変換器の顕著な一般化性能に光を当てる。本手法は,注意勾配の新規な特性の発達と,これらの特性が学習過程の収束にどのように寄与するかを詳細に分析することを含む。我々の実験は理論的な結果をさらに検証する。

Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data, especially in next-token prediction (NTP) tasks. However, the theoretical understanding of their performance in NTP is limited, with existing studies focusing mainly on asymptotic performance. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer consisting of a self-attention module followed by a feed-forward layer. We first characterize the essential structural properties of training datasets for NTP using a mathematical framework based on partial orders. Then, we design a two-stage training algorithm, where the pre-processing stage for training the feed-forward layer and the main stage for training the attention layer exhibit fast convergence performance. Specifically, both layers converge sub-linearly to the direction of their corresponding max-margin solutions. We also show that the cross-entropy loss enjoys a linear convergence rate. Furthermore, we show that the trained transformer presents non-trivial prediction ability with dataset shift, which sheds light on the remarkable generalization performance of transformers. Our analysis technique involves the development of novel properties on the attention gradient and further in-depth analysis of how these properties contribute to the convergence of the training process. Our experiments further validate our theoretical findings.

翻訳日:2024-11-06 16:30:51 公開日:2024-09-29

# BEATS: BackVerifyとAdaptive Disambiguateに基づく効率的な木探索によるLLM数学能力の最適化

BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search ( http://arxiv.org/abs/2409.17972v2 )

ライセンス: Link先を確認

Linzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Conghui He, Zenan Zhou, Wentao Zhang,

(参考訳) 大規模言語モデル(LLM)は、幅広いタスクやドメインで例外的なパフォーマンスを示している。しかし、数学の厳密で論理的な性質のため、数学の問題を解くのに依然として苦労している。従来の研究では、教師付き微調整(SFT)、プロンプトエンジニアリング、LLMの数学的問題解決能力を改善するための探索に基づく手法が用いられてきた。これらの努力にもかかわらず、それらの性能は相変わらず最適であり、かなりの計算資源を必要としている。この問題に対処するために,数学的問題解決能力を高める新しい手法BEATSを提案する。提案手法では, モデルが反復的に書き直し, 一歩前進し, 前のステップに基づいて回答を生成するよう, 新たに設計されたプロンプトを利用する。さらに, LLMを用いた新たなバック検証手法を導入し, 結果の正当性を検証した。さらに, 探索時間を最適化し, 高い性能を実現するために, 伐採木探索を用いる。特に,本手法はQwen2-7b-Instructのスコアを36.94から61.52に改善し,GPT4の42.5をMATHベンチマークで上回った。

Large Language Models (LLMs) have exhibited exceptional performance across a broad range of tasks and domains. However, they still encounter difficulties in solving mathematical problems due to the rigorous and logical nature of mathematics. Previous studies have employed techniques such as supervised fine-tuning (SFT), prompt engineering, and search-based methods to improve the mathematical problem-solving abilities of LLMs. Despite these efforts, their performance remains suboptimal and demands substantial computational resources. To address this issue, we propose a novel approach, BEATS, to enhance mathematical problem-solving abilities. Our method leverages newly designed prompts that guide the model to iteratively rewrite, advance by one step, and generate answers based on previous steps. Additionally, we introduce a new back-verification technique that uses LLMs to validate the correctness of the generated answers. Furthermore, we employ a pruning tree search to optimize search time while achieving strong performance. Notably, our method improves Qwen2-7b-Instruct's score from 36.94 to 61.52, outperforming GPT4's 42.5 on the MATH benchmark.

翻訳日:2024-11-06 16:00:56 公開日:2024-09-29

# Dicke Superradiance の厳密解

Exact solution for Dicke superradiance ( http://arxiv.org/abs/2409.19040v1 )

ライセンス: Link先を確認

Raphael Holzinger, Claudiu Genes,

(参考訳) 我々は、Dicke Superradiance問題(英語版)という同一の$N$2レベル量子エミッタの初期の逆アンサンブルの時間進化の正確な解析解を提供する。この系は対称崩壊作用素で集団崩壊するので、密度作用素の時間的進化を、$N+1$ Dicke状態によって広がる完全対称部分空間に還元する。この集合基底において、系は全励起状態からゼロ励起基底状態へ、ディック状態の完全混合として緩和する。ここで導かれる解は、N$の任意の値と進化中の任意の時間$t$に対して正確である。

We provide an exact analytical solution for the time evolution of an initially inverted ensemble of identical $N$ two-level quantum emitters, i.e. to the Dicke superradiance problem. It is asumed that the system undergoes collective decay with a symmetric collapse operator, thus reducing the time evolution of the density operator to the fully symmetric subspace spanned by $N+1$ Dicke states. In this collective basis, the system relaxes from the fully excited state to the zero excitation ground state as a full mixture of Dicke states. The solution derived here is exact for any value of $N$ and any time time $t$ during the evolution.

翻訳日:2024-11-06 04:40:55 公開日:2024-09-29

# 高純度ダイヤモンドにおける5 %$^{13}$C核スピン超分極を室温・低磁場で達成する

Achieving 5 % $^{13}$C nuclear spin hyperpolarization in high-purity diamond at room temperature and low field ( http://arxiv.org/abs/2409.19489v1 )

ライセンス: Link先を確認

Vladimir V. Kavtanyuk, Changjae Lee, Keunhong Jeong, Jeong Hyun Shim,

(参考訳) ダイヤモンド中の光学偏光性窒素空孔(NV)中心は、低磁場と室温での$^{13}$C核スピンの過分極を可能にする。しかし、従来の動的核分極に匹敵する高レベルの分極を達成することは依然として困難である。ここでは,10mT以下では,7<times 10^6$以上の拡張比に相当する5%の$^{13}$C分極が得られることを示した。初期窒素濃度が低い高純度ダイヤモンド($1 ppm)を用いた結果,保存時間が100分を超える結果となった。磁場を[100]に沿って配向させることで、偏光移動に関与するNVスピンの数は4倍になる。このフィールド指向性のために、電界強度とマイクロ波(MW)周波数スイートパラメータの総合的な最適化を行った。最適MWスイープ幅は、主に核スピン拡散による固体効果を通じてバルク$^{13}$Cスピンに偏光転移が起こることを示唆している。

Optically polarizable nitrogen-vacancy (NV) center in diamond enables the hyperpolarization of $^{13}$C nuclear spins at low magnetic field and room temperature. However, achieving a high level of polarization comparable to conventional dynamic nuclear polarization has remained challenging. Here we demonstrate that, at below 10 mT, a $^{13}$C polarization of 5 % can be obtained, equivalent to an enhancement ratio over $7 \times 10^6$. We used high-purity diamond with a low initial nitrogen concentration ($<$ 1 ppm), which also results in a long storage time exceeding 100 minutes. By aligning the magnetic field along [100], the number of NV spins participating in polarization transfer increases fourfold. We conducted a comprehensive optimization of field intensity and microwave (MW) frequency-sweep parameters for this field orientation. The optimum MW sweep width suggests that polarization transfer occurs primarily to bulk $^{13}$C spins through the integrated solid effect followed by nuclear spin diffusion.

翻訳日:2024-11-05 22:57:44 公開日:2024-09-29

# KineDepth:オンラインメトリクス深さ推定にロボットキネマティクスを活用する

KineDepth: Utilizing Robot Kinematics for Online Metric Depth Estimation ( http://arxiv.org/abs/2409.19490v1 )

ライセンス: Link先を確認

Soofiyan Atar, Yuheng Zhi, Florian Richter, Michael Yip,

(参考訳) 深度知覚はロボットの環境の空間的および幾何学的理解に不可欠であり、多くのタスクは伝統的にRGB-Dやステレオカメラのようなハードウェアベースの深度センサーに依存している。しかし、これらのセンサーは、透明で反射的な物体の問題、高いコスト、キャリブレーションの複雑さ、空間的およびエネルギー的制約、複合システムにおける故障率の増加など、実用的な制限に直面している。単分子深度推定法はコスト効率が高く、より単純な代替手段を提供するが、ロボット工学におけるそれらの採用は、計量深度よりも相対的な出力によって制限されている。本稿では,1台のキャリブレーションカメラを用いて,ロボットが「測定スティック」として動作し,タスクの実行時に相対深度推定をリアルタイムに計量深度に変換する手法を提案する。提案手法はLSTMをベースとしたメートル法深度回帰器を用いて,特にロボットの動きに近縁な領域において,単眼深度マップ上でのメートル法深度を正確に復元する。実際のロボットを用いた実験により,本手法は現状のモノクロ距離推定手法よりも優れており,22.1%の深さ誤差の低減,52%のダウンストリームタスクの成功率向上を実現している。

Depth perception is essential for a robot's spatial and geometric understanding of its environment, with many tasks traditionally relying on hardware-based depth sensors like RGB-D or stereo cameras. However, these sensors face practical limitations, including issues with transparent and reflective objects, high costs, calibration complexity, spatial and energy constraints, and increased failure rates in compound systems. While monocular depth estimation methods offer a cost-effective and simpler alternative, their adoption in robotics is limited due to their output of relative rather than metric depth, which is crucial for robotics applications. In this paper, we propose a method that utilizes a single calibrated camera, enabling the robot to act as a ``measuring stick" to convert relative depth estimates into metric depth in real-time as tasks are performed. Our approach employs an LSTM-based metric depth regressor, trained online and refined through probabilistic filtering, to accurately restore the metric depth across the monocular depth map, particularly in areas proximal to the robot's motion. Experiments with real robots demonstrate that our method significantly outperforms current state-of-the-art monocular metric depth estimation techniques, achieving a 22.1% reduction in depth error and a 52% increase in success rate for a downstream task.

翻訳日:2024-11-05 22:57:44 公開日:2024-09-29

# MedHalu:大規模言語モデルによるヘルスケアクエリに対する幻覚

MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models ( http://arxiv.org/abs/2409.19492v1 )

ライセンス: Link先を確認

Vibhor Agarwal, Yiqiao Jin, Mohit Chandra, Munmun De Choudhury, Srijan Kumar, Nishanth Sastry,

(参考訳) 言語理解と生成における大きな言語モデル(LLM)の顕著な能力は、幻覚に免疫を与えていない。 LLMは、もっともらしい音を出すことができるが、実際には誤りまたは偽造情報を生成することができる。 LLMを利用したチャットボットが普及するにつれて、日常の人々は健康関連クエリーをよく尋ね、これらのLSM幻覚のリスクを負い、様々な社会的・医療的影響をもたらす。本研究は, LLMによる患者からのリアルタイム医療クエリに対する幻覚反応の先駆的な研究である。 MedHaluは、健康に関するさまざまなトピックと、ラベル付き幻覚タイプと幻覚テキストスパンを持つLSMからの対応する幻覚応答を備えた、慎重に構築された医療幻覚データセットである。 MedHaluDetect フレームワークを導入し,幻覚検出における様々な LLM の機能を評価する。我々はまた、医療専門家、LLM、および一般人という3つの評価者のグループを雇い、これらの医療幻覚に対してより脆弱な人物を研究する。 LLMは専門家よりもはるかに悪いことが分かりました。また、平民に劣らず、幻覚を検知するケースも少なくない。このギャップを埋めるために、専門家の推論を注入してLLMによる幻覚検出を改善するためのエキスパート・イン・ザ・ループアプローチを提案する。 GPT-4の平均マクロF1改善率は6.3ポイントである。

The remarkable capabilities of large language models (LLMs) in language understanding and generation have not rendered them immune to hallucinations. LLMs can still generate plausible-sounding but factually incorrect or fabricated information. As LLM-empowered chatbots become popular, laypeople may frequently ask health-related queries and risk falling victim to these LLM hallucinations, resulting in various societal and healthcare implications. In this work, we conduct a pioneering study of hallucinations in LLM-generated responses to real-world healthcare queries from patients. We propose MedHalu, a carefully crafted first-of-its-kind medical hallucination dataset with a diverse range of health-related topics and the corresponding hallucinated responses from LLMs with labeled hallucination types and hallucinated text spans. We also introduce MedHaluDetect framework to evaluate capabilities of various LLMs in detecting hallucinations. We also employ three groups of evaluators -- medical experts, LLMs, and laypeople -- to study who are more vulnerable to these medical hallucinations. We find that LLMs are much worse than the experts. They also perform no better than laypeople and even worse in few cases in detecting hallucinations. To fill this gap, we propose expert-in-the-loop approach to improve hallucination detection through LLMs by infusing expert reasoning. We observe significant performance gains for all the LLMs with an average macro-F1 improvement of 6.3 percentage points for GPT-4.

翻訳日:2024-11-05 22:57:44 公開日:2024-09-29

# OptiGrasp:暖房用ピッキングロボットのためのRGB画像を用いた最適グラフポス検出

OptiGrasp: Optimized Grasp Pose Detection Using RGB Images for Warehouse Picking Robots ( http://arxiv.org/abs/2409.19494v1 )

ライセンス: Link先を確認

Soofiyan Atar, Yi Li, Markus Grotz, Michael Wolf, Dieter Fox, Joshua Smith,

(参考訳) 倉庫環境では、ロボットはさまざまなオブジェクトを管理するために堅牢なピッキング機能を必要とする。効果的なデプロイメントには、最小限のハードウェア、新製品への強力な一般化、さまざまな設定でのレジリエンスが必要だ。現在の手法は、しばしば高コスト、複雑な設定、技術的な制限に悩まされる構造情報に対する深度センサーに依存している。コンピュータビジョンの最近の進歩に触発されて,基礎モデルを活用し,RGB画像のみを用いた吸引把握を向上させる革新的なアプローチを提案する。本手法は,合成データセットのみを用いて学習し,その把握能力を実世界のロボットに一般化し,トレーニングセットに含まれない多様な新しい物体を探索する。我々のネットワークは実世界のアプリケーションで82.3倍の成功率を達成した。コードとデータを備えたプロジェクトのWebサイトはhttp://optigrasp.github.io.comで公開されている。

In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advancements in computer vision, we propose an innovative approach that leverages foundation models to enhance suction grasping using only RGB images. Trained solely on a synthetic dataset, our method generalizes its grasp prediction capabilities to real-world robots and a diverse range of novel objects not included in the training set. Our network achieves an 82.3\% success rate in real-world applications. The project website with code and data will be available at http://optigrasp.github.io.

翻訳日:2024-11-05 22:57:44 公開日:2024-09-29

# 量子符号化のための量子スーパーポーシングアルゴリズム

Quantum superposing algorithm for quantum encoding ( http://arxiv.org/abs/2409.19496v1 )

ライセンス: Link先を確認

Jaehee Kim, Taewan Kim, Kyunghyun Baek, Yongsoo Hwang, Joonsuk Huh, Jeongho Bang,

(参考訳) 現在量子符号化と呼ばれる量子状態への古典データの効率的な符号化は、量子計算において重要な意味を持つ。有限サイズのデータベースや量子ビットレジスタの場合、量子符号化の一般的な戦略は、マシン認識可能なデータアドレスと、後に重畳される量子ビットインデックスとを関連付ける古典的なマッピングを確立することである。ここで最も重要なのが、任意のキュービット指数の重ね合わせを生成するアルゴリズムを鋳造することである。このアルゴリズムは、正式には量子スーパーポーシングアルゴリズムとして知られている。本研究では,実効的な量子符号化シナリオにおいて,その有効性と優れた計算性能を実証する,効率的な量子スーパーポーザリングアルゴリズムを提案する。理論的および数値解析により,既存のアルゴリズムと比較して計算効率が大幅に向上したことを示す。特に、我々のアルゴリズムは最大2n-3制御ノット数(CNOT)を持ち、これまでで最も最適化された結果を示している。

Efficient encoding of classical data into quantum state -- currently referred to as quantum encoding -- holds crucial significance in quantum computation. For finite-size databases and qubit registers, a common strategy of the quantum encoding entails establishing a classical mapping that correlates machine-recognizable data addresses with qubit indices that are subsequently superposed. Herein, the most imperative lies in casting an algorithm for generating the superposition of any given number of qubit indices. This algorithm is formally known as quantum superposing algorithm. In this work, we present an efficient quantum superposing algorithm, affirming its effectiveness and superior computational performance in a practical quantum encoding scenario. Our theoretical and numerical analyses demonstrate a substantial enhancement in computational efficiency compared to existing algorithms. Notably, our algorithm has a maximum of 2n-3 controlled-not (CNOT) counts, representing the most optimized result to date.

翻訳日:2024-11-05 22:48:00 公開日:2024-09-29

# 音声駆動型トーキングヘッド生成のためのフレームワイズ感情強度の学習

Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation ( http://arxiv.org/abs/2409.19501v1 )

ライセンス: Link先を確認

Jingyi Xu, Hieu Le, Zhixin Shu, Yang Wang, Yi-Hsuan Tsai, Dimitris Samaras,

(参考訳) 人間の感情表現は本質的に動的、複雑、流動的であり、言語コミュニケーションを通しての強度のスムーズな推移を特徴とする。しかし、そのような強度変動のモデル化は、しばしば静的な感情的な出力をもたらす、従来の音声駆動のトーキングヘッド生成法によってほとんど見落とされてきている。本稿では,音声中の感情の強度がどのように変動するかを考察し,これらの微妙な変化を捉え,生成する方法を提案する。具体的には,強度レベルを正確に制御し,様々な感情を生成できるトーキングヘッドフレームワークを開発する。これは、感情のタイプが潜伏方向内にエンコードされ、感情の強さが潜伏ノルムに反映される連続的な感情の潜伏空間を学習することで達成される。さらに、動的強度変動を捉えるために、その強度を反映する発声音を考慮した音声からインテンシティ予測器を採用する。この予測器のトレーニング信号は,フレームワイド・インテンシティ・ラベリングを必要とせずに,感情に依存しないインテンシティ・擬似ラベル法によって得られる。提案手法の有効性を検証し,提案手法の有効性を検証し,提案手法の有効性を検証した。

Human emotional expression is inherently dynamic, complex, and fluid, characterized by smooth transitions in intensity throughout verbal communication. However, the modeling of such intensity fluctuations has been largely overlooked by previous audio-driven talking-head generation methods, which often results in static emotional outputs. In this paper, we explore how emotion intensity fluctuates during speech, proposing a method for capturing and generating these subtle shifts for talking-head generation. Specifically, we develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. This is achieved by learning a continuous emotion latent space, where emotion types are encoded within latent orientations and emotion intensity is reflected in latent norms. In addition, to capture the dynamic intensity fluctuations, we adopt an audio-to-intensity predictor by considering the speaking tone that reflects the intensity. The training signals for this predictor are obtained through our emotion-agnostic intensity pseudo-labeling method without the need of frame-wise intensity labeling. Extensive experiments and analyses validate the effectiveness of our proposed method in accurately capturing and reproducing emotion intensity fluctuations in talking-head generation, thereby significantly enhancing the expressiveness and realism of the generated outputs.

翻訳日:2024-11-05 22:48:00 公開日:2024-09-29

# NLPの性質:NLP論文における貢献の分析

The Nature of NLP: Analyzing Contributions in NLP Papers ( http://arxiv.org/abs/2409.19505v1 )

ライセンス: Link先を確認

Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych,

(参考訳) 自然言語処理(NLP)は、コンピュータ科学、言語学、社会科学などの知的な伝統を統合する、動的で学際的な分野である。確立された存在にもかかわらず、NLP研究を構成するものの定義については議論が続いている。本研究では,NLPを構成するものについて,研究論文から定量的に検討する。そこで本研究では,NLPコントリビューション(NLPコントリビューション)の分類法を提案する。NLPコントリビューション(NLPコントリビューション)は,研究論文を要約したもので,科学的コントリビューションの同定と分類を専門的に行っている。また、これらの要素を自動的に識別する新しいタスクを提案し、データセット上で強力なベースラインをトレーニングします。この課題から得られた実験結果と,NLP研究の性質の理解を支援するため,NLP研究論文の$\sim$29k$にモデルを適用した。以上の結果から,NLP における機械学習の関与は,言語や人に関する知識の付加に焦点を絞るとともに,90年代初めから増加しており,また,2020 年以降も言語や人への注目が復活していることが明らかとなった。この作業がコミュニティの規範に関する議論を引き起こし、未来を意識的に形作る努力を促すことを願っています。

Natural Language Processing (NLP) is a dynamic, interdisciplinary field that integrates intellectual traditions from computer science, linguistics, social science, and more. Despite its established presence, the definition of what constitutes NLP research remains debated. In this work, we quantitatively investigate what constitutes NLP by examining research papers. For this purpose, we propose a taxonomy and introduce NLPContributions, a dataset of nearly $2k$ research paper abstracts, expertly annotated to identify scientific contributions and classify their types according to this taxonomy. We also propose a novel task to automatically identify these elements, for which we train a strong baseline on our dataset. We present experimental results from this task and apply our model to $\sim$$29k$ NLP research papers to analyze their contributions, aiding in the understanding of the nature of NLP research. Our findings reveal a rising involvement of machine learning in NLP since the early nineties, alongside a declining focus on adding knowledge about language or people; again, in post-2020, there has been a resurgence of focus on language and people. We hope this work will spark discussions on our community norms and inspire efforts to consciously shape the future.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# IWN:Idempotencyに基づく画像透かし

IWN: Image Watermarking Based on Idempotency ( http://arxiv.org/abs/2409.19506v1 )

ライセンス: Link先を確認

Kaixin Deng,

(参考訳) デジタルメディアの分野では、透かし技術の強さと完全性を維持することがますます困難になっている。本稿では,Idempotent Generative Network (IGN)に触発されて,画像透かし処理にIdempotencyを導入する可能性を探り,革新的なニューラルネットワークモデルであるIdempotent Watermarking Network (IWN)を提案する。カラー画像透かしの回復品質向上に焦点をあてたモデルでは、イデオロシティを活用して画像の可逆性を向上する。この機能により、カラー画像の透かしが攻撃されたり損傷されたりしても、効果的に投影され、元の状態にマッピングされる。そのため、抽出された透かしは必然的に品質が向上した。 IWNモデルは、従来の透かし技術とステガノグラフィー手法におけるこれらの2つの要因の相違をある程度緩和し、埋め込み能力とロバストネスのバランスを達成する。

In the expanding field of digital media, maintaining the strength and integrity of watermarking technology is becoming increasingly challenging. This paper, inspired by the Idempotent Generative Network (IGN), explores the prospects of introducing idempotency into image watermark processing and proposes an innovative neural network model - the Idempotent Watermarking Network (IWN). The proposed model, which focuses on enhancing the recovery quality of color image watermarks, leverages idempotency to ensure superior image reversibility. This feature ensures that, even if color image watermarks are attacked or damaged, they can be effectively projected and mapped back to their original state. Therefore, the extracted watermarks have unquestionably increased quality. The IWN model achieves a balance between embedding capacity and robustness, alleviating to some extent the inherent contradiction between these two factors in traditional watermarking techniques and steganography methods.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# メタ評価要約評価尺度の批判的考察

A Critical Look at Meta-evaluating Summarisation Evaluation Metrics ( http://arxiv.org/abs/2409.19507v1 )

ライセンス: Link先を確認

Xiang Dai, Sarvnaz Karimi, Biaoyan Fang,

(参考訳) 効果的な要約評価指標により、研究者と実践者は異なる要約システムを効率的に比較することができる。メタ評価と呼ばれる自動評価尺度の有効性を推定することは、非常に重要な研究課題である。本稿では,最近,要約評価指標のメタ評価手法を概説し,(1)評価指標が主にニュース要約データセットの例からなるデータセット上でメタ評価されていること,(2)生成した要約の忠実度を評価することに焦点を当てた研究が注目されていること等について述べる。我々は、より堅牢な評価指標の開発を可能にし、既存の評価指標の一般化能力を分析するために、より多様なベンチマークを構築するのに時間がかかっていると論じる。さらに、生成した要約のコミュニケーション目標とワークフローにおける要約の役割を考慮し、ユーザ中心の品質次元に焦点を当てた研究を呼び掛けている。

Effective summarisation evaluation metrics enable researchers and practitioners to compare different summarisation systems efficiently. Estimating the effectiveness of an automatic evaluation metric, termed meta-evaluation, is a critically important research question. In this position paper, we review recent meta-evaluation practices for summarisation evaluation metrics and find that (1) evaluation metrics are primarily meta-evaluated on datasets consisting of examples from news summarisation datasets, and (2) there has been a noticeable shift in research focus towards evaluating the faithfulness of generated summaries. We argue that the time is ripe to build more diverse benchmarks that enable the development of more robust evaluation metrics and analyze the generalization ability of existing evaluation metrics. In addition, we call for research focusing on user-centric quality dimensions that consider the generated summary's communicative goal and the role of summarisation in the workflow.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# ランドスケープの変容 : 大規模言語モデルがコンピュータ科学以外の学術分野に与える影響

Transforming Scholarly Landscapes: Influence of Large Language Models on Academic Fields beyond Computer Science ( http://arxiv.org/abs/2409.19508v1 )

ライセンス: Link先を確認

Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych,

(参考訳) 大規模言語モデル(LLM)は、自然言語処理(NLP)の転換期を辿り、研究を再構築し、NLPの影響を他の研究分野にまで広げてきた。しかし、LLMが他の研究分野にどのような影響を及ぼすかを調べる研究はほとんどない。この研究は、NLP以外の分野におけるLSMの影響と利用を経験的かつ体系的に検証する。 LLMを引用し、その影響を定量化し、使用パターンの傾向を明らかにするために、116ドルのLLMをキュレートし、$\sim$148kの論文を分析します。我々の分析では、非CS分野におけるLLMの普及だけでなく、その利用状況の相違も明らかであり、2018年以降の他の分野よりも利用頻度が高い分野もあり、特に言語学と工学が共にLLM引用の$\sim$45\%を計上している。さらに,これらの分野のほとんどが,ドメイン固有の問題に対処するために,さらなる微調整を必要とせず,ゼロあるいは少数ショット学習に熟練したタスク非依存のLLMを主に採用していることが示唆された。本研究は,LPMによるNLPの学際的影響に光を当て,その機会と課題をより深く理解するものである。

Large Language Models (LLMs) have ushered in a transformative era in Natural Language Processing (NLP), reshaping research and extending NLP's influence to other fields of study. However, there is little to no work examining the degree to which LLMs influence other research fields. This work empirically and systematically examines the influence and use of LLMs in fields beyond NLP. We curate $106$ LLMs and analyze $\sim$$148k$ papers citing LLMs to quantify their influence and reveal trends in their usage patterns. Our analysis reveals not only the increasing prevalence of LLMs in non-CS fields but also the disparities in their usage, with some fields utilizing them more frequently than others since 2018, notably Linguistics and Engineering together accounting for $\sim$$45\%$ of LLM citations. Our findings further indicate that most of these fields predominantly employ task-agnostic LLMs, proficient in zero or few-shot learning without requiring further fine-tuning, to address their domain-specific problems. This study sheds light on the cross-disciplinary impact of NLP through LLMs, providing a better understanding of the opportunities and challenges.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# 不均一性を考慮した階層型エッジ学習のための資源配分とトポロジー設計

Heterogeneity-Aware Resource Allocation and Topology Design for Hierarchical Federated Edge Learning ( http://arxiv.org/abs/2409.19509v1 )

ライセンス: Link先を確認

Zhidong Gao, Yu Zhang, Yanmin Gong, Yuanxiong Guo,

(参考訳) Federated Learning (FL)は、モバイルデバイス上で機械学習モデルをトレーニングするためのプライバシー保護フレームワークを提供する。従来のFLアルゴリズム、例えばFedAvgは、これらのデバイスに大量の通信負荷をかける。この問題を軽減するため、階層型フェデレーションエッジラーニング(HFEL)が提案され、エッジサーバをモデルアグリゲーションの仲介手段として活用している。その効果にもかかわらず、HFELは、特にシステムやデータの不均一性の存在下で、収束速度の緩やかさや資源消費などの課題に直面している。しかし、既存の研究は主に従来のFLの訓練効率の改善に重点を置いており、HFELの効率は未調査のままである。本稿では、エッジデバイスをエッジサーバに接続し、エッジサーバをピアツーピア(P2P)エッジバックホールを介して相互接続する2層HFELシステムについて考察する。我々の目標は、戦略的資源配分とトポロジ設計により、HFELシステムの訓練効率を向上させることである。具体的には、計算と通信資源を割り当て、P2P接続を調整することにより、トレーニング全体の遅延を最小化する最適化問題を定式化する。動的トポロジ下で収束を確保するため,収束誤差を解析し,最適化問題にモデルコンセンサス制約を導入する。提案した問題はいくつかのサブプロブレムに分解され、代わりにオンラインで解決することができる。本手法は,エッジネットワークにおけるデータとシステムの不均一性を考慮した大規模FLの効率的な実装を容易にする。ベンチマークデータセットの総合的な実験評価は,提案手法の有効性を検証し,各種ベースラインと比較してモデルの精度を維持しつつ,トレーニングの遅延を著しく低減することを示した。

Federated Learning (FL) provides a privacy-preserving framework for training machine learning models on mobile edge devices. Traditional FL algorithms, e.g., FedAvg, impose a heavy communication workload on these devices. To mitigate this issue, Hierarchical Federated Edge Learning (HFEL) has been proposed, leveraging edge servers as intermediaries for model aggregation. Despite its effectiveness, HFEL encounters challenges such as a slow convergence rate and high resource consumption, particularly in the presence of system and data heterogeneity. However, existing works are mainly focused on improving training efficiency for traditional FL, leaving the efficiency of HFEL largely unexplored. In this paper, we consider a two-tier HFEL system, where edge devices are connected to edge servers and edge servers are interconnected through peer-to-peer (P2P) edge backhauls. Our goal is to enhance the training efficiency of the HFEL system through strategic resource allocation and topology design. Specifically, we formulate an optimization problem to minimize the total training latency by allocating the computation and communication resources, as well as adjusting the P2P connections. To ensure convergence under dynamic topologies, we analyze the convergence error bound and introduce a model consensus constraint into the optimization problem. The proposed problem is then decomposed into several subproblems, enabling us to alternatively solve it online. Our method facilitates the efficient implementation of large-scale FL at edge networks under data and system heterogeneity. Comprehensive experiment evaluation on benchmark datasets validates the effectiveness of the proposed method, demonstrating significant reductions in training latency while maintaining the model accuracy compared to various baselines.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# CoT-ST:マルチモーダル・チェーン・オブ・サートによるLLM音声翻訳の強化

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought ( http://arxiv.org/abs/2409.19510v1 )

ライセンス: Link先を確認

Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin,

(参考訳) 音声言語モデル (SLM) は, 音声翻訳作業において顕著な性能を示した。しかし、既存の研究は主に直接指導の微調整に焦点を当てており、しばしばSLMの本質的な推論能力を見落としている。本稿では,SLMのチェーン・オブ・シント(CoT)機能を活性化する3段階のトレーニングフレームワークを提案する。本稿では,マルチモーダルCoTを用いた音声翻訳モデルであるCoT-STを提案する。提案手法の有効性を,CoVoST-2データセットとMuST-Cデータセットの2つのデータセットで検証した。実験の結果,CoT-STは従来の最先端手法よりも優れ,BLEUスコアは高い(CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2)。この作業はhttps://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2で公開されている。

Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# ユーザ毎のノード: グラフニューラルネットワークのためのノードレベルフェデレーション学習

One Node Per User: Node-Level Federated Learning for Graph Neural Networks ( http://arxiv.org/abs/2409.19513v1 )

ライセンス: Link先を確認

Zhidong Gao, Yuanxiong Guo, Yanmin Gong,

(参考訳) グラフニューラルネットワーク(GNN)トレーニングは、しばしば中央サーバに生のユーザデータを収集する必要がある。フェデレーション学習はソリューションとして登場し、ユーザが生データを直接共有することなく、協調的なモデルトレーニングを可能にする。しかし、GNNにフェデレートした学習を統合することは、特にクライアントがグラフノードを表現し、単に単一の特徴ベクトルを保持する場合、ユニークな課題を示す。本稿では,ノードレベルのフェデレーショングラフ学習のための新しいフレームワークを提案する。具体的には、第1のGNN層のメッセージパッシングと特徴ベクトル変換処理を分離し、ユーザデバイスとクラウドサーバ上で個別に実行されるようにする。さらに,特徴ベクトルの潜在表現に基づくグラフラプラシアン項を導入し,ユーザ側モデル更新を制御する。複数のデータセットに対する実験結果から,本手法はベースラインよりも性能がよいことが示された。

Graph Neural Networks (GNNs) training often necessitates gathering raw user data on a central server, which raises significant privacy concerns. Federated learning emerges as a solution, enabling collaborative model training without users directly sharing their raw data. However, integrating federated learning with GNNs presents unique challenges, especially when a client represents a graph node and holds merely a single feature vector. In this paper, we propose a novel framework for node-level federated graph learning. Specifically, we decouple the message-passing and feature vector transformation processes of the first GNN layer, allowing them to be executed separately on the user devices and the cloud server. Moreover, we introduce a graph Laplacian term based on the feature vector's latent representation to regulate the user-side model updates. The experiment results on multiple datasets show that our approach achieves better performance compared with baselines.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# KODA:Koopman演算子を用いた時系列予測とデータ同化のためのデータ駆動再帰モデル

KODA: A Data-Driven Recursive Model for Time Series Forecasting and Data Assimilation using Koopman Operators ( http://arxiv.org/abs/2409.19518v1 )

ライセンス: Link先を確認

Ashutosh Singh, Ashish Singh, Tales Imbiriba, Deniz Erdogmus, Ricardo Borsoi,

(参考訳) クープマン演算子に基づくアプローチは、複素非線形力学系(NLDS)によって生成される時系列データの予測に大きな可能性を示してきた。このような手法は、NLDSの潜在状態表現を捉えることができるが、実世界のデータに適用した場合、長期的な予測が困難である。具体的には、多くの現実世界のNLDSは時間変化の挙動を示し、そのようなモデルでは捉えにくい非定常性をもたらす。さらに、彼らはデータ同化を行うための体系的なデータ駆動アプローチ、すなわち予測タスクにおけるハエのノイズ測定を活用できない。上記の問題を緩和するために,NLDSにおける予測とデータ同化を統合したKoopman演算子(Koda-Koopman Operator with Data Assimilation)を提案する。特に、フーリエ領域フィルタを用いてデータを物理的コンポーネントに切り離し、そのダイナミクスはクープマン演算子によって正確に表現できる。我々はアーキテクチャを慎重に設計し、この分解が安定した長期的な予測につながることを確実にするためのトレーニング基準を策定する。さらに,推定時刻における新しい測定値とデータ同化を行うコース補正戦略を導入する。提案されたアプローチは完全にデータ駆動であり、エンドツーエンドで学習することができる。広範に実験を行った結果,KODAは電気,温度,天気,ローレンツ63,ダッフィング発振器などの複数の時系列ベンチマークにおいて,既存手法よりも優れた性能と有効性を示した。予測; 予測; 予測; 予測 b) データ同化及びデータ同化 c) 状態予測

Approaches based on Koopman operators have shown great promise in forecasting time series data generated by complex nonlinear dynamical systems (NLDS). Although such approaches are able to capture the latent state representation of a NLDS, they still face difficulty in long term forecasting when applied to real world data. Specifically many real-world NLDS exhibit time-varying behavior, leading to nonstationarity that is hard to capture with such models. Furthermore they lack a systematic data-driven approach to perform data assimilation, that is, exploiting noisy measurements on the fly in the forecasting task. To alleviate the above issues, we propose a Koopman operator-based approach (named KODA - Koopman Operator with Data Assimilation) that integrates forecasting and data assimilation in NLDS. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator, and residual dynamics that represents the local or time varying behavior that are captured by a flexible and learnable recursive model. We carefully design an architecture and training criterion that ensures this decomposition lead to stable and long-term forecasts. Moreover, we introduce a course correction strategy to perform data assimilation with new measurements at inference time. The proposed approach is completely data-driven and can be learned end-to-end. Through extensive experimental comparisons we show that KODA outperforms existing state of the art methods on multiple time series benchmarks such as electricity, temperature, weather, lorenz 63 and duffing oscillator demonstrating its superior performance and efficacy along the three tasks a) forecasting, b) data assimilation and c) state prediction.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# GenTel-Safe: プロンプトインジェクション攻撃に対する防御のための統一ベンチマークとシールドフレームワーク

GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks ( http://arxiv.org/abs/2409.19521v1 )

ライセンス: Link先を確認

Rongchang Li, Minjie Chen, Chang Hu, Han Chen, Wenpeng Xing, Meng Han,

(参考訳) GPT-4、LLaMA、Qwenのような大規模言語モデル(LLM)は、幅広いアプリケーションで顕著な成功を収めている。しかしながら、これらのモデルは、既存の安全性メカニズムを回避し、より堅牢な攻撃検出方法と包括的な評価ベンチマークの緊急性の必要性を強調したインジェクション攻撃に対して本質的に脆弱なままである。これらの課題に対処するために、新しいプロンプトインジェクション攻撃検出方法であるGenTel-Shieldと、包括的な評価ベンチマークであるGenTel-Benchを含む統合フレームワークであるGenTel-Safeを紹介した。 GenTel-Shieldの有効性を証明するため,GenTel-Benchデータセットに対するバニラ安全ガードレールと併用して評価を行った。実証的に、GenTel-Shieldは最先端の攻撃検出成功率を達成することができ、有害なプロンプトに対する既存の保護技術の重要な弱点を明らかにする。再現性のために、コードとベンチマークデータセットをプロジェクトページのhttps://gentellab.github.io/gentel-safe.github.io/で公開しました。

Large Language Models (LLMs) like GPT-4, LLaMA, and Qwen have demonstrated remarkable success across a wide range of applications. However, these models remain inherently vulnerable to prompt injection attacks, which can bypass existing safety mechanisms, highlighting the urgent need for more robust attack detection methods and comprehensive evaluation benchmarks. To address these challenges, we introduce GenTel-Safe, a unified framework that includes a novel prompt injection attack detection method, GenTel-Shield, along with a comprehensive evaluation benchmark, GenTel-Bench, which compromises 84812 prompt injection attacks, spanning 3 major categories and 28 security scenarios. To prove the effectiveness of GenTel-Shield, we evaluate it together with vanilla safety guardrails against the GenTel-Bench dataset. Empirically, GenTel-Shield can achieve state-of-the-art attack detection success rates, which reveals the critical weakness of existing safeguarding techniques against harmful prompts. For reproducibility, we have made the code and benchmarking dataset available on the project page at https://gentellab.github.io/gentel-safe.github.io/.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# LANDeRMT:LLMを機械翻訳に選択的に微調整するための言語対応ニューロンの検出とルーティング

LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation ( http://arxiv.org/abs/2409.19523v1 )

ライセンス: Link先を確認

Shaolin Zhu, Leiyu Pan, Bo Li, Deyi Xiong,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は,バイリンガルの監督が限定された場合でも,多言語翻訳において有望な結果を示している。主な課題は、並列トレーニングデータを提供する際に、微調整LDMに対する破滅的な忘れとパラメータ干渉である。これらの課題に対処するために,LANDeRMT, a \textbf{L}anguage-\textbf{A}ware \textbf{N}euron \textbf{De}tectingおよび \textbf{R}outing frameworkを提案する。 LANDeRMTでは、MTタスクに対するニューロンの認識を評価し、それらを言語一般ニューロンと言語固有ニューロンに分類する。この分類は、微調整、パラメータ干渉の緩和、破滅的な忘れの問題の間の選択的なパラメータ更新を可能にする。検出されたニューロンに対しては,LLM内の言語一般および言語固有能力を動的に調整し,翻訳信号で誘導する条件付き認識に基づくルーティング機構を提案する。実験の結果,提案するLANDeRMTは翻訳知識の学習に非常に有効であることが確認された。

Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision. The major challenges are catastrophic forgetting and parameter interference for finetuning LLMs when provided parallel training data. To address these challenges, we propose LANDeRMT, a \textbf{L}anguage-\textbf{A}ware \textbf{N}euron \textbf{De}tecting and \textbf{R}outing framework that selectively finetunes LLMs to \textbf{M}achine \textbf{T}ranslation with diverse translation training data. In LANDeRMT, we evaluate the awareness of neurons to MT tasks and categorize them into language-general and language-specific neurons. This categorization enables selective parameter updates during finetuning, mitigating parameter interference and catastrophic forgetting issues. For the detected neurons, we further propose a conditional awareness-based routing mechanism to dynamically adjust language-general and language-specific capacity within LLMs, guided by translation signals. Experimental results demonstrate that the proposed LANDeRMT is very effective in learning translation knowledge, significantly improving translation quality over various strong baselines for multiple language pairs.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# マルチモーダル・コントラスト学習における効果的なバックドア・ディフェンス:脅威の軽減のためのトーケンレベル・アンラーニング手法

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats ( http://arxiv.org/abs/2409.19526v1 )

ライセンス: Link先を確認

Kuanrong Liu, Siyuan Liang, Jiawei Liang, Pengwen Dai, Xiaochun Cao,

(参考訳) マルチモーダルコントラスト学習は高品質な特徴を生み出すために様々なデータモダリティを使用するが、インターネット上の広範囲なデータソースに依存しているため、バックドア攻撃に弱い。これらの攻撃は、推論中に特定のトリガーによって起動されるトレーニング中に悪意のある振る舞いを挿入し、重大なセキュリティリスクを生じさせる。このような攻撃による悪意のある影響を減らすための微調整による既存の対策にもかかわらず、これらの防御は大規模な訓練時間を必要とし、クリーンな精度を低下させる。本研究では,マシン・アンラーニングという概念を用いて,バックドア・脅威に対する効果的な防御機構を提案する。これは、Unlearn Backdoor Threats(UBT)として知られる、モデルによるバックドア脆弱性の迅速な未学習を支援するために、小さな毒のサンプルを戦略的に作成することを必要とする。具体的には、バックドアショートカットの改善と、潜在的中毒データセットにおける疑わしいサンプルの正確な検出に、オーバーフィットトレーニングを使用します。そして, バックドア効果を排除し, バックドア防御効率を向上させるため, 不審な試料から, 急激な忘れがちな試料を選別する。バックドア・アンラーニング・プロセスでは,新しいトークン・ベースの非ラーニング・トレーニング・システムを提案する。このテクニックは、モデル全体の完全性を維持しながら、バックドアの相関関係を解離する、モデルの妥協された要素に焦点を当てる。実験結果から,CLIPモデルのバックドア攻撃手法を効果的に防御できることが示唆された。 SoTAのバックドア防御法と比較して、UBTはモデルのクリーンな精度を保ちながら最小の攻撃成功率を達成する(攻撃成功率はSOTAに比べて19%減少し、クリーンな精度は2.57%上昇する)。

Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# BuildingView:ストリートビュー画像とマルチモーダル大言語モードによる都市ビルの外部データベースの構築

BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode ( http://arxiv.org/abs/2409.19527v1 )

ライセンス: Link先を確認

Zongrong Li, Yunlei Su, Chenyuan Zhu, Wufan Zhao,

(参考訳) 都市ビルの外観は、ストリートビュー画像の進歩と都市研究との統合によって、都市分析においてますます重要になっている。マルチモーダル大言語モデル(LLM)は都市アノテーションのための強力なツールを提供し、都市環境に対する深い洞察を可能にする。しかし、正確な都市ビルの外装データベースの作成、エネルギー効率、環境の持続可能性、人間中心の設計の重要指標の特定、これらの指標の体系的な整理といった課題が残されている。これらの課題に対処するために,Googleストリートビューの高解像度視覚データをOpenStreetMapの空間情報とOverpass APIを介して統合する,新しいアプローチであるBuildingViewを提案する。本研究は,都市の建築外装データの精度を向上し,キーサステナビリティと設計指標を特定し,その抽出と分類のための枠組みを開発する。本手法は,ChatGPT-4O APIを用いた文献の体系的レビュー,ビルディングとストリートビューのサンプリング,アノテーションを含む。結果として得られたデータベースは、ニューヨーク市、アムステルダム、シンガポールからのデータで検証され、都市計画、建築設計、環境政策における情報的意思決定をサポートする都市研究のための総合的なツールを提供する。 BuildingViewのコードはhttps://github.com/Jasper0122/BuildingViewで入手できる。

Urban Building Exteriors are increasingly important in urban analytics, driven by advancements in Street View Imagery and its integration with urban research. Multimodal Large Language Models (LLMs) offer powerful tools for urban annotation, enabling deeper insights into urban environments. However, challenges remain in creating accurate and detailed urban building exterior databases, identifying critical indicators for energy efficiency, environmental sustainability, and human-centric design, and systematically organizing these indicators. To address these challenges, we propose BuildingView, a novel approach that integrates high-resolution visual data from Google Street View with spatial information from OpenStreetMap via the Overpass API. This research improves the accuracy of urban building exterior data, identifies key sustainability and design indicators, and develops a framework for their extraction and categorization. Our methodology includes a systematic literature review, building and Street View sampling, and annotation using the ChatGPT-4O API. The resulting database, validated with data from New York City, Amsterdam, and Singapore, provides a comprehensive tool for urban studies, supporting informed decision-making in urban planning, architectural design, and environmental policy. The code for BuildingView is available at https://github.com/Jasper0122/BuildingView.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# 従来の東アジア医学におけるディメンダリティ・リダクションによる臨床的意思決定の理解--実証的研究

Understanding Clinical Decision-Making in Traditional East Asian Medicine through Dimensionality Reduction: An Empirical Investigation ( http://arxiv.org/abs/2409.19531v1 )

ライセンス: Link先を確認

Hyojin Bae, Bongsu Kang, Chang-Eop Kim,

(参考訳) 本研究では,従来の東アジア医学(TEAM)における臨床意思決定過程について,次元減少のレンズを通してパターン識別(PI)を再解釈することにより検討した。 8原則パターン同定(EPPI)システムに着目し,Shang-Han-Lunの実証データを活用することにより,診断と治療選択における外部パターンの優先順位付けの必要性と意義を検討する。 Ext-Intパターンが患者の症状に関する情報を最も多く含んでいるか,最も抽象的で一般化可能な症状情報を示し,適切な処方薬の選択を容易にするか,という3つの仮説を検証した。解析指標,クロスコンディショナライゼーション性能,決定木回帰などの定量的指標を用いて,Exterior-Interiorパターンは最も抽象的で一般化可能な症状情報を表現し,症状と草本処方薬の効率的なマッピングに寄与することを示した。本研究は、TEAMの基礎となる認知過程を理解するための客観的な枠組みを提供し、現代の計算手法で伝統的な医療実践をブリッジする。この発見は、TEAMおよび従来の医学におけるAI駆動診断ツールの開発に関する洞察を与え、臨床実践、教育、研究を進展させる可能性がある。

This study examines the clinical decision-making processes in Traditional East Asian Medicine (TEAM) by reinterpreting pattern identification (PI) through the lens of dimensionality reduction. Focusing on the Eight Principle Pattern Identification (EPPI) system and utilizing empirical data from the Shang-Han-Lun, we explore the necessity and significance of prioritizing the Exterior-Interior pattern in diagnosis and treatment selection. We test three hypotheses: whether the Ext-Int pattern contains the most information about patient symptoms, represents the most abstract and generalizable symptom information, and facilitates the selection of appropriate herbal prescriptions. Employing quantitative measures such as the abstraction index, cross-conditional generalization performance, and decision tree regression, our results demonstrate that the Exterior-Interior pattern represents the most abstract and generalizable symptom information, contributing to the efficient mapping between symptom and herbal prescription spaces. This research provides an objective framework for understanding the cognitive processes underlying TEAM, bridging traditional medical practices with modern computational approaches. The findings offer insights into the development of AI-driven diagnostic tools in TEAM and conventional medicine, with the potential to advance clinical practice, education, and research.

翻訳日:2024-11-05 22:47:59 公開日:2024-09-29

# Video DataFlywheel:ビデオ言語理解における不可能なデータのトリニティを解決する

Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding ( http://arxiv.org/abs/2409.19532v1 )

ライセンス: Link先を確認

Xiao Wang, Jianlong Wu, Zijia Lin, Fuzheng Zhang, Di Zhang, Liqiang Nie,

(参考訳) 近年,ビデオ言語理解は大規模事前学習によって大きな成功を収めている。しかし、データの不足は依然として大きな課題だ。本研究では,事前学習データセットにおけるデータ量,多様性,品質の「不可能トリニティ」を定量的に明らかにする。近年の取り組みは、合成アノテーションによって低品質で妥協された大規模で多様なASRデータセットを改良することを目指している。これらの手法は、オリジナルのアノテーションを洗練させるために、マルチモーダルなビデオコンテンツ(フレーム、タグ、ASR transcriptsなど)で有用な情報を活用することに成功した。それでも彼らは、合成アノテーション内のノイズを軽減し、データセットのサイズが拡大するにつれてスケーラビリティを欠いている。これらの問題に対処するために,ビデオアノテーションを改良されたノイズコントロール手法で反復的に洗練するVideo DataFlywheelフレームワークを導入する。反復的改良のために、まずビデオ言語モデルを用いて合成アノテーションを生成し、洗練されたデータセットを生成する。そして,それを事前訓練し,より強力なモデルのための人間の洗練例を微調整する。これらのプロセスは継続的改善のために繰り返されます。ノイズ制御のための新しいノイズ制御手法であるAda TaiLrを提案する。反復リファインメントとAdaTaiLrを組み合わせることで、ビデオ言語理解のスケーラビリティが向上する。大規模な実験により、我々のフレームワークは既存のデータ改善ベースラインよりも優れており、3%のパフォーマンス向上と、多様性の損失を最小限に抑えてデータセットの品質の向上を実現している。さらに、改良されたデータセットは、ビデオ質問応答やテキストビデオ検索など、様々なビデオ言語理解タスクの大幅な改善を促進する。

Recently, video-language understanding has achieved great success through large-scale pre-training. However, data scarcity remains a prevailing challenge. This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets. Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations. These methods successfully leverage useful information in multimodal video content (frames, tags, ASR transcripts, etc.) to refine the original annotations. Nevertheless, they struggle to mitigate noise within synthetic annotations and lack scalability as the dataset size expands. To address these issues, we introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods. For iterative refinement, we first leverage a video-language model to generate synthetic annotations, resulting in a refined dataset. Then, we pre-train on it and fine-tune on human refinement examples for a stronger model. These processes are repeated for continuous improvement. For noise control, we present AdaTaiLr, a novel noise control method that requires weaker assumptions on noise distribution, thereby proving more effective in large datasets with theoretical guarantees. The combination of iterative refinement and AdaTaiLr can achieve better scalability in video-language understanding. Extensive experiments show that our framework outperforms existing data refinement baselines, delivering a 3% performance boost and improving dataset quality with minimal diversity loss. Furthermore, our refined dataset facilitates significant improvements in various video-language understanding tasks, including video question answering and text-video retrieval.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 感情支援型チャットボットのための混合型心理療法

Mixed Chain-of-Psychotherapies for Emotional Support Chatbot ( http://arxiv.org/abs/2409.19533v1 )

ライセンス: Link先を確認

Siyuan Chen, Cong Ming, Zhiling Zhang, Yanyi Chen, Kenny Q. Zhu, Mengyue Wu,

(参考訳) メンタルヘルス支援チャットボットの領域では、共感を示し、適切なソリューションを提供するための自己探索を促進することが不可欠である。しかし、現在のアプローチは、ヘルプ・シーカーの状況を完全に理解することなく、一般的な洞察や解決策を提供する傾向にある。そこで我々は, 心理療法(Chain-of-Psychotherapies, CoP) の観点から, 探索者の状態分析を統合したチャットボットPsyMixを提案する。包括的評価により,PsyMixはChatGPTベースラインを上回り,ヒトカウンセラーに対する共感度は同等であった。

In the realm of mental health support chatbots, it is vital to show empathy and encourage self-exploration to provide tailored solutions. However, current approaches tend to provide general insights or solutions without fully understanding the help-seeker's situation. Therefore, we propose PsyMix, a chatbot that integrates the analyses of the seeker's state from the perspective of a psychotherapy approach (Chain-of-Psychotherapies, CoP) before generating the response, and learns to incorporate the strength of various psychotherapies by fine-tuning on a mixture of CoPs. Through comprehensive evaluation, we found that PsyMix can outperform the ChatGPT baseline, and demonstrate a comparable level of empathy in its responses to that of human counselors.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 非局所クラマース-モヤル式に基づく非ガウス確率力学系発見への進化的アプローチ

An evolutionary approach for discovering non-Gaussian stochastic dynamical systems based on nonlocal Kramers-Moyal formulas ( http://arxiv.org/abs/2409.19534v1 )

ライセンス: Link先を確認

Yang Li, Shengyuan Xu, Jinqiao Duan,

(参考訳) 統計力学系の(ガウス的)ブラウンノイズと(ガウス的でない)L''evyノイズの両方を持つ明示的な支配方程式をデータから発見することは、複雑な機能形式とL'evy運動の本質的な複雑さによって、変化している。本研究では,非局所クラマース・モラル式,遺伝的プログラミング,スパース回帰に基づいて,非ガウス確率力学系をサンプルパスデータから抽出する進化的シンボルスパース回帰(ESSR)手法を提案する。より具体的には、遺伝的プログラミングは多様な候補関数を生成するために使用され、スパース回帰法はこれらの候補に関連する係数を学習することを目的としており、非局所クラマース・モヤル式は、スパース回帰における適合度尺度と損失関数を構築する基盤となる。このアプローチの有効性と能力は、いくつかのイラストレーターモデルに適用することで示される。このアプローチは、利用可能なデータセットから非ガウス確率力学を解読するための強力な手段であり、様々な分野にまたがる幅広い応用を示す。

Discovering explicit governing equations of stochastic dynamical systems with both (Gaussian) Brownian noise and (non-Gaussian) L\'evy noise from data is chanllenging due to possible intricate functional forms and the inherent complexity of L\'evy motion. This present research endeavors to develop an evolutionary symbol sparse regression (ESSR) approach to extract non-Gaussian stochastic dynamical systems from sample path data, based on nonlocal Kramers-Moyal formulas, genetic programming, and sparse regression. More specifically, the genetic programming is employed to generate a diverse array of candidate functions, the sparse regression technique aims at learning the coefficients associated with these candidates, and the nonlocal Kramers-Moyal formulas serve as the foundation for constructing the fitness measure in genetic programming and the loss function in sparse regression. The efficacy and capabilities of this approach are showcased through its application to several illustrative models. This approach stands out as a potent instrument for deciphering non-Gaussian stochastic dynamics from available datasets, indicating a wide range of applications across different fields.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 量子鍵分布における位相誤差推定のポストセレクションセキュリティ解析の改善

Improved postselection security analysis of phase error estimation in quantum key distribution ( http://arxiv.org/abs/2409.19538v1 )

ライセンス: Link先を確認

Yang-Guang Shan, Zhen-Qiang Yin, Shuang Wang, Wei Chen, De-Yong He, Guang-Can Guo, Zheng-Fu Han,

(参考訳) 量子鍵分散(QKD)は、2つの離れたユーザ間でセキュアな鍵を生成する。一般的なコヒーレントな攻撃に対するQKDのセキュリティ証明は難しいが、集団攻撃に対する攻撃はずっと容易である。ポストセレクション法は,効果的で汎用的な手法として,コヒーレント攻撃に対する集団攻撃のセキュリティ解析を拡張しようとするものである。しかし、パフォーマンスは悪い。この欠点を克服するために、ポストセレクション法でキーレートを直接計算するのではなく、集合的およびコヒーレント攻撃に対する位相誤差推定の失敗確率を関連づける手法を提案し、コヒーレント攻撃に対するパラメータ推定において、独立的および同一に分布した仮定を用いることを可能にした。すると、キーレートはエントロピーの不確実性関係によって得られる。提案手法は様々なQKDプロトコルに適用可能であり,従来のポストセレクション法と比較して性能が向上する。例えば、サイドチャネルセキュア(SCS)QKDとノーフェーズポストセレクション(NPP)ツインフィールド(TF)QKDの有限鍵解析を行い、提案手法による性能改善を示す。

Quantum key distribution (QKD) enables the generation of secure keys between two distant users. Security proof of QKD against general coherent attacks is challenging, while the one against collective attacks is much easier. As an effective and general solution, the postselection method tries to extend security analyses of collective attacks to be against coherent attacks. However, it gives a bad performance. To overcome this drawback, instead of directly calculating key rate by postselection method, we propose a method correlating the failure probabilities of phase error estimation against collective and coherent attacks, enabling the use of the independent and identically distributed assumption in parameter estimation against coherent attacks. Then the key rate can be obtained by uncertainty relation of entropy. Our method can be applied to various QKD protocols, providing better performance compared with the traditional postselection method. For instance, we give the finite-key analyses of the side-channel-secure (SCS) QKD and the no-phase-postselection (NPP) twin-field (TF) QKD to show their performance improvements with the proposed method.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# LoRKD:医療ファウンデーションモデルのための低レベル知識分割

LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models ( http://arxiv.org/abs/2409.19540v1 )

ライセンス: Link先を確認

Haolin Li, Yuhang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya Zhang, Yanfeng Wang,

(参考訳) 大規模プレトレーニング技術の普及により、医療基盤モデルの開発が大幅に進展し、幅広い医療タスクにおいて汎用的なツールとして機能することができるようになった。しかし、その強力な一般化能力にもかかわらず、大規模なデータセットで事前訓練された医療基礎モデルは、異種データ間のドメインギャップに悩まされがちであり、以前の研究で証明されたように、専門的なモデルと比較して特定のタスクに対する準最適性能をもたらす。本稿では, 特定の医療課題における「知識分解」と呼ばれる新たな視点を探求し, 基礎モデルを複数の軽量専門家モデルに分解し, それぞれが特定の解剖学的領域に特化して, 専門性を高め, 資源消費を同時に低減することを目的としている。この目的を達成するために,ローランク知識分解(LoRKD)と呼ばれる新しいフレームワークを提案する。低ランクの専門家モジュールは、異なる解剖学的領域の異種データ間の勾配の衝突を解消し、低コストで強力な特殊化を提供する。効率的な知識分離畳み込みは、単一の前方伝播における知識分離を達成することにより、アルゴリズム効率を著しく向上させる。セグメンテーションおよび分類タスクに関する大規模な実験結果から, 分割されたモデルが最先端の性能を達成するだけでなく, 下流タスクに優れた伝達性を示すことが示され, タスク固有の評価において, 元の基礎モデルを上回る結果が得られた。コードはここにある。

The widespread adoption of large-scale pre-training techniques has significantly advanced the development of medical foundation models, enabling them to serve as versatile tools across a broad range of medical tasks. However, despite their strong generalization capabilities, medical foundation models pre-trained on large-scale datasets tend to suffer from domain gaps between heterogeneous data, leading to suboptimal performance on specific tasks compared to specialist models, as evidenced by previous studies. In this paper, we explore a new perspective called "Knowledge Decomposition" to improve the performance on specific medical tasks, which deconstructs the foundation model into multiple lightweight expert models, each dedicated to a particular anatomical region, with the aim of enhancing specialization and simultaneously reducing resource consumption. To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution. The low-rank expert modules resolve gradient conflicts between heterogeneous data from different anatomical regions, providing strong specialization at lower costs. The efficient knowledge separation convolution significantly improves algorithm efficiency by achieving knowledge separation within a single forward propagation. Extensive experimental results on segmentation and classification tasks demonstrate that our decomposed models not only achieve state-of-the-art performance but also exhibit superior transferability on downstream tasks, even surpassing the original foundation models in task-specific evaluations. The code is available at here.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# BiPC: 教師なしドメイン適応のための双方向確率校正

BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption ( http://arxiv.org/abs/2409.19542v1 )

ライセンス: Link先を確認

Wenlve Zhou, Zhiheng Zhou, Junyuan Shang, Chang Niu, Mingyue Zhang, Xiyuan Tao, Tianlei Wang,

(参考訳) Unsupervised Domain Adaptation (UDA)はラベル付きソースドメインを利用してラベルなしのターゲットドメインのタスクを解決する。 Transformer ベースの手法は UDA の有望性を示しているが、その応用は Convolutional Neural Networks (CNN) と階層型 Transformer を除いてプレーンな Transformer に限られている。この問題に対処するため,確率空間の観点からBidirectional Probability Calibration (BiPC)を提案する。本研究では,事前学習した頭部からの確率出力が領域ギャップに対して頑健であり,タスクヘッドの確率分布を調整できることを実証する。さらに、タスクヘッドは、適応訓練中に事前訓練されたヘッドを強化することができ、双方向補完によるモデル性能を向上させることができる。技術的には、ImageNet-1kプリトレーニングされた分類器などの事前学習された頭部の確率を調整するために、校正確率アライメント(CPA)を導入する。さらに,事前学習した分類器から学習した校正係数を用いて,タスクヘッドを改良するキャリブレーションギニ不純物(CGI)損失を設計する。 BiPCは、CNNやTransformerなど、さまざまなネットワークに適用可能な、シンプルで効果的な方法である。実験の結果、複数のUDAタスクにまたがる顕著なパフォーマンスが示された。私たちのコードは、https://github.com/Wenlve-Zhou/BiPC.comで公開されます。

Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: https://github.com/Wenlve-Zhou/BiPC.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 協調型企業間労働市場予測のための収束型クラスタ化グラフ学習フレームワーク

Convergence-aware Clustered Federated Graph Learning Framework for Collaborative Inter-company Labor Market Forecasting ( http://arxiv.org/abs/2409.19545v1 )

ライセンス: Link先を確認

Zhuoning Guo, Hao Liu, Le Zhang, Qi Zhang, Hengshu Zhu, Hui Xiong,

(参考訳) 人材需要と供給を予測する労働市場は、経営管理と経済発展に不可欠である。正確でタイムリーな予測では、雇用主は採用戦略を進化する労働市場に合わせて調整することができ、雇用者は将来の需要と供給に応じて積極的なキャリアパス計画を行うことができる。しかし、従来の研究では、異なる企業間の需要供給シーケンスと変動予測の立場の相互関係は無視されていた。さらに企業は、競争上の優位性やセキュリティ上の脅威、倫理的または法的違反を危険にさらす懸念から、グローバルな労働市場分析のためにプライベートな人的資源データを共有することに消極的だ。そこで本稿では,FedLMF(Federated Labor Market Forecasting)の問題を定式化し,MPCAC-FL(Meta-personalized Convergence-aware Clustered Federated Learning)フレームワークを提案する。まず、需要と供給の順序と企業配置のペアの間に固有の相関関係を捉えるグラフベースのシーケンシャルモデルを設計する。第2に,企業間で共有可能な効果的な初期モデルパラメータを学習するためにメタラーニング手法を採用し,異種データを持つ企業であっても,企業固有の需要と供給を予測するためにパーソナライズされたモデルを最適化する。第3に,モデル類似性に応じて企業を動的にグループに分割し,各グループにフェデレーションアグリゲーションを適用するコンバージェンス対応クラスタリングアルゴリズムを考案する。不均一性はより安定した収束とより良い性能のために緩和することができる。大規模な実験では、MPCAC-FLは3つの実世界のデータセットのベースラインを比較し、DH-GEMという最先端モデルの97%以上を非公開企業データを公開せずに達成している。

Labor market forecasting on talent demand and supply is essential for business management and economic development. With accurate and timely forecasts, employers can adapt their recruitment strategies to align with the evolving labor market, and employees can have proactive career path planning according to future demand and supply. However, previous studies ignore the interconnection between demand-supply sequences among different companies and positions for predicting variations. Moreover, companies are reluctant to share their private human resource data for global labor market analysis due to concerns over jeopardizing competitive advantage, security threats, and potential ethical or legal violations. To this end, in this paper, we formulate the Federated Labor Market Forecasting (FedLMF) problem and propose a Meta-personalized Convergence-aware Clustered Federated Learning (MPCAC-FL) framework to provide accurate and timely collaborative talent demand and supply prediction in a privacy-preserving way. First, we design a graph-based sequential model to capture the inherent correlation between demand and supply sequences and company-position pairs. Second, we adopt meta-learning techniques to learn effective initial model parameters that can be shared across companies, allowing personalized models to be optimized for forecasting company-specific demand and supply, even when companies have heterogeneous data. Third, we devise a Convergence-aware Clustering algorithm to dynamically divide companies into groups according to model similarity and apply federated aggregation in each group. The heterogeneity can be alleviated for more stable convergence and better performance. Extensive experiments demonstrate that MPCAC-FL outperforms compared baselines on three real-world datasets and achieves over 97% of the state-of-the-art model, i.e., DH-GEM, without exposing private company data.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 類似行列補完のためのテーラー低ランク行列分解法

Tailed Low-Rank Matrix Factorization for Similarity Matrix Completion ( http://arxiv.org/abs/2409.19550v1 )

ライセンス: Link先を確認

Changyi Ma, Runsheng Yu, Xiao Chen, Youzhi Zhang,

(参考訳) 類似度行列は、多くの下流機械学習タスクの中核にある基本的なツールとして機能する。しかし、欠落したデータは避けられず、しばしば不正確な類似性行列をもたらす。この問題に対処するため, 類似行列補完法(SMC)が提案されているが, Singular Value Decomposition (SVD) 演算による計算の複雑さに悩まされている。計算複雑性を低減するため、行列因子化(MF)技術はより明示的で、低ランクなソリューションを提供するために頻繁に適用されるが、非凸構造に苦しむため、正確な低ランクの最適解を保証することはできない。本稿では,より信頼性が高く効率的なソリューションを提供する新しいSMCフレームワークを提案する。具体的には,PSD(Positive Semi-Definiteness)特性を利用して完成過程を導出するだけでなく,最適かつ低ランクな解を実現するために,慎重に設計されたランク最小化正規化器をさらに補完する。基礎となるPSD特性と低ランク特性がSMC性能を改善するというキーインサイトに基づいて、PSD特性を探索し、非凸低ランク正規化器を組み込んで低ランク解を確実にする2つの新しい、スケーラブルで効果的なアルゴリズムSMCNNとSMCNmFを提案する。理論的解析により、より良い推定性能と収束速度が保証される。実世界のデータセットにおける実験結果から,提案手法が様々なベースライン手法よりも優れていることを示す。

Similarity matrix serves as a fundamental tool at the core of numerous downstream machine-learning tasks. However, missing data is inevitable and often results in an inaccurate similarity matrix. To address this issue, Similarity Matrix Completion (SMC) methods have been proposed, but they suffer from high computation complexity due to the Singular Value Decomposition (SVD) operation. To reduce the computation complexity, Matrix Factorization (MF) techniques are more explicit and frequently applied to provide a low-rank solution, but the exact low-rank optimal solution can not be guaranteed since it suffers from a non-convex structure. In this paper, we introduce a novel SMC framework that offers a more reliable and efficient solution. Specifically, beyond simply utilizing the unique Positive Semi-definiteness (PSD) property to guide the completion process, our approach further complements a carefully designed rank-minimization regularizer, aiming to achieve an optimal and low-rank solution. Based on the key insights that the underlying PSD property and Low-Rank property improve the SMC performance, we present two novel, scalable, and effective algorithms, SMCNN and SMCNmF, which investigate the PSD property to guide the estimation process and incorporate nonconvex low-rank regularizer to ensure the low-rank solution. Theoretical analysis ensures better estimation performance and convergence speed. Empirical results on real-world datasets demonstrate the superiority and efficiency of our proposed methods compared to various baseline methods.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 擬リーマン計量--量子領域の新しい視点

Pseudo-Riemannian metric: a new perspective on the quantum realm ( http://arxiv.org/abs/2409.19551v1 )

ライセンス: Link先を確認

Miaomiao Wei, Longjun Xiang, Fuming Xu, Baigeng Wang, Jian Wang,

(参考訳) 凝縮物質物理学の基本的な概念として、リーマン計量の量子幾何学はベリー曲率と量子計量によって駆動されるホール効果を含む様々なエキゾチックな現象を解明する。本研究では,量子物質の特異な性質を探求するために,擬リーマン的枠組み内での新しい量子幾何学を提案する。擬リーマン多様体上の異なる距離を定義し、スピン次数の自由を導入することにより、パウリ量子幾何テンソルを導入する。このテンソルの想像上の部分は、パウリ・ベリー曲率に対応し、新しい量子相: PT対称系におけるパウリ半金属の発見につながる。この位相は、位相的パウリ・チャーン数によって特徴づけられ、ヘリカルエッジ状態を持つ2次元のパウリ・チャーン絶縁体として現れる。これらの位相位相は、パウリ・リーマン計量によって一意に明らかにされ、リーマン計量を超越し、ベリー曲率はPT対称性により消える。パウリ・チャーン数(Pauli Chern number)は、ヘリカルトポロジカル絶縁体を時間反転対称性で分類することができる。擬リーマン計量は、量子材料に対する新たな洞察を与え、量子幾何学の範囲を広げる。

As a fundamental concept in condensed matter physics, quantum geometry within the Riemannian metric elucidates various exotic phenomena, including the Hall effects driven by Berry curvature and quantum metric. In this work, we propose novel quantum geometries within a pseudo-Riemannian framework to explore unique characteristic of quantum matter. By defining distinct distances on pseudo-Riemannian manifolds and incorporating spin degree of freedom, we introduce the Pauli quantum geometric tensor. The imaginary part of this tensor corresponds to the Pauli Berry curvature, leading to the discovery a novel quantum phase: Pauli semimetal in PT-symmetric systems. This phase, characterized by the topological Pauli Chern number, manifests as a two-dimensional Pauli Chern insulator with helical edge states. These topological phases, uniquely revealed by the Pauli-Riemannian metric, go beyond the familiar Riemannian metric, where Berry curvature vanishes due to PT-symmetry. Pauli Chern number can classify helical topological insulator with or without time reversal symmetry. Pseudo-Riemannian metrics offer new insights into quantum materials and extend the scope of quantum geometry.

翻訳日:2024-11-05 22:38:15 公開日:2024-09-29

# 材料X線吸収スペクトルの普遍的深層学習フレームワーク

A Universal Deep Learning Framework for Materials X-ray Absorption Spectra ( http://arxiv.org/abs/2409.19552v1 )

ライセンス: Link先を確認

Shubha R. Kharel, Fanchen Meng, Xiaohui Qu, Matthew R. Carbone, Deyu Lu,

(参考訳) X線吸収分光法(XAS)は、吸収する原子の局所的な化学的環境を調べるための強力な特徴付け技術である。しかしながら、XASデータの解析には重大な課題が伴い、多くの場合、広範囲で計算集約的なシミュレーションと重要なドメインの専門知識が必要である。これらの制限は、高速で堅牢なXAS分析パイプラインの開発を妨げる。 8個の3d遷移金属(Ti-Cu)をカバーするK-edge Spectraデータベースに示すように,これらの課題をXAS予測のための伝達学習アプローチを用いて解決し,それぞれが精度と効率の向上に一意に寄与する。私たちのフレームワークは3つの異なる戦略に基づいて構築されています。まず,M3GNetを用いて,吸収部位の局所化学環境の潜在的表現をXAS予測の入力として導出し,従来の工法よりも高次化を達成している。第二に、我々は階層的な伝達学習戦略を採用し、要素ごとの予測を微調整する前に、要素間で普遍的なマルチタスクモデルを訓練する。要素ワイド・ファインターン後のこのケースケードアプローチでは、要素固有モデルを最大31.5%上回るモデルが得られる。第3に、計算コストがはるかに高い異なるフィリティのシミュレーションによって生成されるスペクトルを予測するために、普遍モデルを適用し、クロスフィデリティ変換学習を実装した。このアプローチは、ターゲット忠実度だけで訓練されたモデルよりも最大24倍の精度で予測精度を向上させる。我々のアプローチは、幅広い要素に対してXAS予測に拡張可能であり、物質科学における他の深層学習モデルを強化するための一般化可能な伝達学習フレームワークを提供する。

X-ray absorption spectroscopy (XAS) is a powerful characterization technique for probing the local chemical environment of absorbing atoms. However, analyzing XAS data presents with significant challenges, often requiring extensive, computationally intensive simulations, as well as significant domain expertise. These limitations hinder the development of fast, robust XAS analysis pipelines that are essential in high-throughput studies and for autonomous experimentation. We address these challenges with a suite of transfer learning approaches for XAS prediction, each uniquely contributing to improved accuracy and efficiency, as demonstrated on K-edge spectra database covering eight 3d transition metals (Ti-Cu). Our framework is built upon three distinct strategies. First, we use M3GNet to derive latent representations of the local chemical environment of absorption sites as input for XAS prediction, achieving up to order-of-magnitude improvements over conventional featurization techniques. Second, we employ a hierarchical transfer learning strategy, training a universal multi-task model across elements before fine-tuning for element-specific predictions. This cascaded approach after element-wise fine-turning yields models that outperform element-specific models by up to 31\%. Third, we implement cross-fidelity transfer learning, adapting a universal model to predict spectra generated by simulation of a different fidelity with a much higher computational cost. This approach improves prediction accuracy by up to 24\% over models trained on the target fidelity alone. Our approach is extendable to XAS prediction for a broader range of elements and offers a generalizable transfer learning framework to enhance other deep-learning models in materials science.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 自律運転における高速収束とコミュニケーションの両立した不均一な階層的フェデレーション学習

Fast-Convergent and Communication-Alleviated Heterogeneous Hierarchical Federated Learning in Autonomous Driving ( http://arxiv.org/abs/2409.19560v1 )

ライセンス: Link先を確認

Wei-Bin Kou, Qingfeng Lin, Ming Tang, Rongguang Ye, Shuai Wang, Guangxu Zhu, Yik-Chung Wu,

(参考訳) ストリートシーンセマンティック理解(Street Scene Semantic Understanding、TriSU)は、自動運転(AD)の複雑なタスクである。しかし、特定の地理的領域のデータから訓練された推論モデルは、都市間データドメインシフトによって他の領域に適用された場合、一般化が不十分である。 Hierarchical Federated Learning (HFL)は、異なる都市の分散データセット上での協調的なプライバシ保存トレーニングによって、TriSUモデルの一般化を改善する潜在的なソリューションを提供する。残念なことに、異なる都市のデータは異なる統計特性を持つため、収束が遅い。既存のHFL法を超えて,都市間データの不均一性に対処し,収束を加速するガウス異質HFLアルゴリズム(FedGau)を提案する。提案したFedGauアルゴリズムでは、単一のRGB画像とRGBデータセットの両方をガウス分布としてモデル化し、集約重み付け設計を行う。このアプローチは、各RGB画像を統計分布で区別するだけでなく、従来検討されていたデータ量に加えて、各都市からのデータセットの統計を利用する。提案手法では既存のSOTA法と比較して35.5 %-40.6 %の収束が加速される。一方,通信資源の削減のため,新たなアダプティブ・アダプティブ・リソース・スケジューリング(AdapRS)ポリシーを導入する。隣接する2つのアグリゲーション間で一定の数のモデルを交換する従来の静的リソーススケジューリングポリシーとは異なり、AdapRSは不必要な通信を最小限に抑えるために異なるレベルのHFLのモデルアグリゲーション数を調整している。大規模な実験では、AdapRSは従来の静的リソーススケジューリングポリシーと比べて29.65 %の通信オーバーヘッドを節約し、ほぼ同じ性能を維持している。

Street Scene Semantic Understanding (denoted as TriSU) is a complex task for autonomous driving (AD). However, inference model trained from data in a particular geographical region faces poor generalization when applied in other regions due to inter-city data domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization by collaborative privacy-preserving training over distributed datasets from different cities. Unfortunately, it suffers from slow convergence because data from different cities are with disparate statistical properties. Going beyond existing HFL methods, we propose a Gaussian heterogeneous HFL algorithm (FedGau) to address inter-city data heterogeneity so that convergence can be accelerated. In the proposed FedGau algorithm, both single RGB image and RGB dataset are modelled as Gaussian distributions for aggregation weight design. This approach not only differentiates each RGB image by respective statistical distribution, but also exploits the statistics of dataset from each city in addition to the conventionally considered data volume. With the proposed approach, the convergence is accelerated by 35.5\%-40.6\% compared to existing state-of-the-art (SOTA) HFL methods. On the other hand, to reduce the involved communication resource, we further introduce a novel performance-aware adaptive resource scheduling (AdapRS) policy. Unlike the traditional static resource scheduling policy that exchanges a fixed number of models between two adjacent aggregations, AdapRS adjusts the number of model aggregation at different levels of HFL so that unnecessary communications are minimized. Extensive experiments demonstrate that AdapRS saves 29.65\% communication overhead compared to conventional static resource scheduling policy while maintaining almost the same performance.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# モデル予測制御によるバックプロパゲーションとフォワードフォワードアルゴリズムの統合

Unifying back-propagation and forward-forward algorithms through model predictive control ( http://arxiv.org/abs/2409.19561v1 )

ライセンス: Link先を確認

Lianhai Ren, Qianxiao Li,

(参考訳) 本稿では,深いニューラルネットワークをトレーニングするためのモデル予測制御(MPC)フレームワークを導入し,バックプロパゲーション(BP)アルゴリズムとフォワードフォワード(FF)アルゴリズムを体系的に統一する。同時に、様々なルックフォワードの水平線を持つ様々な中間トレーニングアルゴリズムが生まれ、パフォーマンス効率のトレードオフにつながります。定性的な結論が一般的なネットワークに渡される深層線形ネットワーク上で、このトレードオフを正確に解析する。そこで本研究では,目的とモデル仕様に基づいて最適化地平線を選択するための原理的手法を提案する。各種モデルおよびタスクの数値計算結果から,本手法の汎用性を示す。

We introduce a Model Predictive Control (MPC) framework for training deep neural networks, systematically unifying the Back-Propagation (BP) and Forward-Forward (FF) algorithms. At the same time, it gives rise to a range of intermediate training algorithms with varying look-forward horizons, leading to a performance-efficiency trade-off. We perform a precise analysis of this trade-off on a deep linear network, where the qualitative conclusions carry over to general networks. Based on our analysis, we propose a principled method to choose the optimization horizon based on given objectives and model specifications. Numerical results on various models and tasks demonstrate the versatility of our method.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# カメラ内人物再同定のためのCLIPに基づくカメラ非依存の特徴学習

CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification ( http://arxiv.org/abs/2409.19563v1 )

ライセンス: Link先を確認

Xuan Tan, Xun Gong, Yang Xiang,

(参考訳) コントラスト言語-画像事前訓練(CLIP)モデルは、歩行者画像のテキスト記述の生成に固有の利点があるため、従来の人物再識別(ReID)タスクに優れる。しかし、CLIPをカメラ内監督者再識別(ICS ReID)に直接適用することは、課題を提起する。 ICS ReIDは、カメラ間での関連なしに、各カメラ内で独立したIDラベリングを必要とする。これにより、テキストベースの拡張の有効性が制限される。そこで我々は,ICS ReIDのためのCLIPベースのカメラ非依存特徴学習(CCAFL)という新しいフレームワークを提案する。そのため、カメラ非依存の歩行者特徴を積極的に学習するためのモデルとして、ICDL(Intra-Camera Discriminative Learning)とIC(Inter-Camera Adversarial Learning)の2つのカスタムモジュールが設計されている。具体的には、まず、カメラ内歩行者画像の学習可能なテキストプロンプトを確立し、その後のカメラ内およびカメラ間学習において重要な意味的監視信号を得る。そこで, ICDLを設計し, 各カメラ内の強正・強負のサンプルを考慮し, より微細な歩行者特性を学習することで, クラス間変動を増大させる。さらに、歩行者画像が生み出すカメラの予測能力をペナルティ化し、異なる視点から歩行者を識別する能力を高めることにより、カメラ間歩行者特徴差の低減を図ることを提案する。一般的なReIDデータセットに関する大規模な実験は、我々のアプローチの有効性を実証している。特に、挑戦的なMSMT17データセットでは、mAPの精度で58.9\%に達し、最先端の手法を7.6\%上回る。コードは、https://github.com/Trangle12/CCAFL.comから入手できる。

Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model's ability to predict the camera from which a pedestrian image originates, thus enhancing the model's capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9\% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6\%. Code will be available at: https://github.com/Trangle12/CCAFL.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 多言語変換器を用いた低資源ネパール語の抽象要約

Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers ( http://arxiv.org/abs/2409.19566v1 )

ライセンス: Link先を確認

Prakash Dhakal, Daya Sagar Baral,

(参考訳) ネパール語におけるテキストの自動要約は、自然言語処理(NLP)における未探索領域である。抽出的な要約を専門とする研究が盛んに行われているが、抽象的な要約の領域、特にネパール語のような低リソース言語については、ほとんど探索されていない。本研究では,多言語トランスフォーマーモデル,特にmBARTとmT5を用いて,抽象要約によるネパールのニュース記事の見出しを生成する。この研究は、ネパールの様々なニュースポータルからのWebスクレイピングを通じて、まず要約データセットを作成することで、ネパールのテキストの要約に関連する重要な課題に対処する。これらの多言語モデルは異なる戦略を用いて微調整された。次に、ROUGEスコアと人的評価を用いて微調整モデルの性能を評価し、生成した要約が一致していることを確認し、本来の意味を伝達した。被験者は, 妥当性, 流布度, 簡潔さ, 情報性, 事実的正確性, カバレッジなどの基準に基づいて, モデルが生成したモデルの中から, 最高の要約を選択するよう依頼された。 ROUGEスコアを用いた評価では、LoRAモデルを用いた4ビット量子化mBARTは、他のモデルと比較してネパールのニュースの見出しを生成するのに有効であることが判明した。

Automatic text summarization in Nepali language is an unexplored area in natural language processing (NLP). Although considerable research has been dedicated to extractive summarization, the area of abstractive summarization, especially for low-resource languages such as Nepali, remains largely unexplored. This study explores the use of multilingual transformer models, specifically mBART and mT5, for generating headlines for Nepali news articles through abstractive summarization. The research addresses key challenges associated with summarizing texts in Nepali by first creating a summarization dataset through web scraping from various Nepali news portals. These multilingual models were then fine-tuned using different strategies. The performance of the fine-tuned models were then assessed using ROUGE scores and human evaluation to ensure the generated summaries were coherent and conveyed the original meaning. During the human evaluation, the participants were asked to select the best summary among those generated by the models, based on criteria such as relevance, fluency, conciseness, informativeness, factual accuracy, and coverage. During the evaluation with ROUGE scores, the 4-bit quantized mBART with LoRA model was found to be effective in generating better Nepali news headlines in comparison to other models and also it was selected 34.05% of the time during the human evaluation, outperforming all other fine-tuned models created for Nepali News headline generation.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 画像分割参照のための完全アライメントネットワーク

Fully Aligned Network for Referring Image Segmentation ( http://arxiv.org/abs/2409.19569v1 )

ライセンス: Link先を確認

Yong Liu, Ruihao Xu, Yansong Tang,

(参考訳) 本稿では、与えられた言語記述に基づいて画像からオブジェクトをセグメント化することを目的とした参照イメージセグメンテーション(RIS)タスクに焦点を当てる。 RISの重要な問題は、ターゲットオブジェクトを認識し、セグメント化するために、異なるモダリティ間のきめ細かいアライメントを達成することである。近年,モーダル間相互作用におけるアテンション機構の進歩は大きな進歩を遂げている。しかしながら、現在の手法は、ガイドラインとして相互作用設計の明確な原則を欠く傾向にあり、モダル間の理解が不十分になる。さらに、以前のほとんどの作品では、予測に単一モードマスクデコーダを使用しており、完全なクロスモーダルアライメントの利点を失っている。これらの課題に対処するために,4つのモード間相互作用の原則に従うフルアラインドネットワーク(FAN)を提案する。合理的なルールのガイダンスにより、我々のFANは、一般的なRISベンチマーク(RefCOCO、RefCOCO+、G-Ref)の最先端のパフォーマンスをシンプルなアーキテクチャで達成する。

This paper focuses on the Referring Image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description. The critical problem of RIS is achieving fine-grained alignment between different modalities to recognize and segment the target object. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 会話クエリ生成におけるオーバーアソシエーションの負の影響について

Mitigating the Negative Impact of Over-association for Conversational Query Production ( http://arxiv.org/abs/2409.19572v1 )

ライセンス: Link先を確認

Ante Wang, Linfeng Song, Zijun Min, Ge Xu, Xiaoli Wang, Junfeng Yao, Jinsong Su,

(参考訳) 会話クエリ生成は、対話履歴から検索クエリを生成することを目的としており、このクエリは、知識に基づく対話システムを支援するために、検索エンジンから関連する知識を取得するために使用される。金のクエリの可能性を最大化するために訓練された以前のモデルは、データ飢餓の問題に悩まされ、対話履歴から重要な概念を落とし、推論時に無関係な概念を生成する傾向がある。これらの問題は、多くのゴールドクエリが対話トピックと間接的に関連しているオーバー・アソシエーション現象によるもので、アノテータは、これらのゴールドクエリを生成する際に、その背景知識で無意識に推論を行う可能性があるためである。この現象が事前訓練したSeq2seqクエリー生成者に与える影響を慎重に分析し、これらの問題を複数の視点から緩和するための効果的なインスタンスレベルの重み付け戦略を提案する。 Wizard-of-InternetとDuSincという2つのベンチマークの実験は、私たちの戦略が負の効果を効果的に軽減し、パフォーマンスが大幅に向上することを示しています。さらに,本モデルでは,対話履歴からより良い概念を選択し,ベースラインの10倍のデータ効率を示す。コードはhttps://github.com/DeepLearnXMU/QG-OverAssoで公開されている。

Conversational query generation aims at producing search queries from dialogue histories, which are then used to retrieve relevant knowledge from a search engine to help knowledge-based dialogue systems. Trained to maximize the likelihood of gold queries, previous models suffer from the data hunger issue, and they tend to both drop important concepts from dialogue histories and generate irrelevant concepts at inference time. We attribute these issues to the over-association phenomenon where a large number of gold queries are indirectly related to the dialogue topics, because annotators may unconsciously perform reasoning with their background knowledge when generating these gold queries. We carefully analyze the negative effects of this phenomenon on pretrained Seq2seq query producers and then propose effective instance-level weighting strategies for training to mitigate these issues from multiple perspectives. Experiments on two benchmarks, Wizard-of-Internet and DuSinc, show that our strategies effectively alleviate the negative effects and lead to significant performance gains (2%-5% across automatic metrics and human evaluation). Further analysis shows that our model selects better concepts from dialogue histories and is 10 times more data efficient than the baseline. The code is available at https://github.com/DeepLearnXMU/QG-OverAsso.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 視覚的接地による鍵情報抽出の強化

See then Tell: Enhancing Key Information Extraction with Vision Grounding ( http://arxiv.org/abs/2409.19573v1 )

ライセンス: Link先を確認

Shuhang Liu, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Qing Wang, Jianshu Zhang, Chenyu Liu,

(参考訳) デジタル時代には、テキスト、複雑なレイアウト、画像を統合する視覚的にリッチな文書を理解する能力が不可欠である。従来のキー情報抽出(KIE)手法は主に光学文字認識(OCR)に依存しており、大きなレイテンシ、計算オーバーヘッド、エラーをもたらすことが多い。現在の高度な画像からテキストへのアプローチは、OCRをバイパスし、通常、対応する視覚的接地を伴わない平易なテキスト出力を出力する。本稿では,視覚基盤の正確な答えを提供するために設計された,新しいエンドツーエンドモデルSTNet(See then Tell Net)を紹介する。直感的には、STNetは固有の<see>トークンを使用して、関連する画像領域を観察し、このトークンにリンクされた物理座標を解釈するデコーダによって支援される。応答テキストの先頭に配置された<see>トークンは、まず入力された質問に関連する画像の領域を保存し、次に、指示されたテキスト応答を提供する。モデルの可視性を高めるため、広範囲に構造化されたテーブル認識データセットを収集する。 GPT-4の高度なテキスト処理技術を生かしたTVG(TableQA with Vision Grounding)データセットを開発した。提案手法は, CORD, SROIE, DocVQAなどの公開データセットに対して, 最先端の成果を達成し, KIE性能の大幅な向上を示す。コードは一般公開される予定だ。

In the digital era, the ability to understand visually rich documents that integrate text, complex layouts, and imagery is critical. Traditional Key Information Extraction (KIE) methods primarily rely on Optical Character Recognition (OCR), which often introduces significant latency, computational overhead, and errors. Current advanced image-to-text approaches, which bypass OCR, typically yield plain text outputs without corresponding vision grounding. In this paper, we introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding. Distinctively, STNet utilizes a unique <see> token to observe pertinent image areas, aided by a decoder that interprets physical coordinates linked to this token. Positioned at the outset of the answer text, the <see> token allows the model to first see--observing the regions of the image related to the input question--and then tell--providing articulated textual responses. To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets. Leveraging the advanced text processing prowess of GPT-4, we develop the TVG (TableQA with Vision Grounding) dataset, which not only provides text-based Question Answering (QA) pairs but also incorporates precise vision grounding for these pairs. Our approach demonstrates substantial advancements in KIE performance, achieving state-of-the-art results on publicly available datasets such as CORD, SROIE, and DocVQA. The code will also be made publicly available.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 音声視覚課題の定量的分析 : 情報理論の視点から

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective ( http://arxiv.org/abs/2409.19575v1 )

ライセンス: Link先を確認

Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang,

(参考訳) 音声言語処理の分野では、音声・視覚音声処理が研究の注目を集めている。本研究の主な構成要素は, 唇読解, 音声・視覚音声認識, 音声合成などである。かなりの成功を収めたものの、理論的解析は未だに音声・視覚のタスクには不十分である。本稿では,異なるモーダル間の情報交差に着目し,情報理論に基づく定量的解析を行う。この分析は,音声・視覚処理タスクの難易度や,モダリティ統合によって得られるメリットを理解する上で有用であることを示す。

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis. Although significant success has been achieved, theoretical analysis is still insufficient for audio-visual tasks. This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities. Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 地域スーパービジョンとモーションブラー条件を用いた高品質な人体画像アニメーション

High Quality Human Image Animation using Regional Supervision and Motion Blur Condition ( http://arxiv.org/abs/2409.19580v1 )

ライセンス: Link先を確認

Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou,

(参考訳) 近年,映像拡散モデルの進歩により,時間的コヒーレンスを伴う現実的で制御可能な人間の画像アニメーションが実現されている。合理的な結果を生み出すが、既存の手法は、顔や手などの重要な領域における地域監督の必要性を無視し、動きのぼやけを明示的にモデル化することを無視し、非現実的な低品質合成に繋がる。これらの制限に対処するために、我々はまず、顔と手の忠実度を高めるために、詳細領域の地域監督を活用する。第二に、動作のぼかしを明示的にモデル化し、外観の質をさらに向上させる。第3に,高精細な人体アニメーションのための新しいトレーニング戦略を探求し,全体の忠実度を向上する。実験の結果,提案手法は最先端の手法よりも優れており,HumanDanceデータセットの再現精度 (L1) と知覚品質 (FVD) において,最強のベースラインを21.0%以上,57.4%以上向上した。コードとモデルは利用可能になる。

Recent advances in video diffusion models have enabled realistic and controllable human image animation with temporal coherence. Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial areas such as the face and hands, and neglect the explicit modeling for motion blur, leading to unrealistic low-quality synthesis. To address these limitations, we first leverage regional supervision for detailed regions to enhance face and hand faithfulness. Second, we model the motion blur explicitly to further improve the appearance quality. Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity. Experimental results demonstrate that our proposed method outperforms state-of-the-art approaches, achieving significant improvements upon the strongest baseline by more than 21.0% and 57.4% in terms of reconstruction precision (L1) and perceptual quality (FVD) on HumanDance dataset. Code and model will be made available.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# DiMB-RE: ダイエットマイクロバイオーム協会のための科学文献のマイニング

DiMB-RE: Mining the Scientific Literature for Diet-Microbiome Associations ( http://arxiv.org/abs/2409.19581v1 )

ライセンス: Link先を確認

Gibong Hong, Veronica Hindle, Nadine M. Veasley, Hannah D. Holscher, Halil Kilicoglu,

(参考訳) モチベーション:腸内微生物は、最近、食事と人間の健康の特定の関係を支えている重要な要因として現れました。食事、ヒトの代謝、微生物に関する実験研究から、膨大な量の知識が集められている。しかし、この証拠はほとんど科学論文に埋もれており、この領域の生物医学文献の採掘は少ない。 DMB-REは15の実体型(例えば栄養素,微生物)と13の関連型(例:増加,改善)をアノテートした包括的コーパスである。また,名前付きエンティティ,トリガ,関係抽出のための最先端自然言語処理(NLP)モデルや,DMB-REを用いた事実検出の訓練と評価を行った。結果: DiMB-REは165記事から14,450のエンティティと4,206のリレーションシップで構成されている。 NLPモデルは、名前付きエンティティ認識(0.760 F$_{1}$)に対して合理的に動作したが、エンティティとトリガの欠如と、クロス文関係のため、エンドツーエンドの関係抽出性能は控えめであった(0.356 F$_{1}$)。結論: 我々の知る限り、ダイエットと微生物の相互作用に焦点を当てたDiMB-REは最大かつ最も多様なデータセットである。バイオメディカル文献採掘のためのベンチマークコーパスとして機能する。 DiMB-REとNLPモデルはhttps://github.com/ScienceNLP-Lab/DiMB-REで入手できる。

Motivation: The gut microbiota has recently emerged as a key factor that underpins certain connections between diet and human health. A tremendous amount of knowledge has been amassed from experimental studies on diet, human metabolism and microbiome. However, this evidence remains mostly buried in scientific publications, and biomedical literature mining in this domain remains scarce. We developed DiMB-RE, a comprehensive corpus annotated with 15 entity types (e.g., Nutrient, Microorganism) and 13 relation types (e.g., increases, improves) capturing diet-microbiome associations. We also trained and evaluated state-of-the-art natural language processing (NLP) models for named entity, trigger, and relation extraction as well as factuality detection using DiMB-RE. Results: DiMB-RE consists of 14,450 entities and 4,206 relationships from 165 articles. While NLP models performed reasonably well for named entity recognition (0.760 F$_{1}$), end-to-end relation extraction performance was modest (0.356 F$_{1}$), partly due to missed entities and triggers as well as cross-sentence relations. Conclusions: To our knowledge, DiMB-RE is largest and most diverse dataset focusing on diet-microbiome interactions. It can serve as a benchmark corpus for biomedical literature mining. Availability: DiMB-RE and the NLP models are available at https://github.com/ScienceNLP-Lab/DiMB-RE.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# テクスチャとモデルに基づくハイブリッドロバストのための自己教師付き補助学習と顔分析における公正な特徴

Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis ( http://arxiv.org/abs/2409.19582v1 )

ライセンス: Link先を確認

Shukesh Reddy, Nishit Poddar, Srijan Das, Abhijit Das,

(参考訳) 本研究では,テクスチャベースの局所記述子を特徴モデリングにブレンドし,効率的な顔分析を行うための補助課題として,自己教師あり学習(SSL)について検討する。主タスクと自己監督型補助タスクを組み合わせることは、堅牢な表現に有用である。そこで我々は,マスクオートエンコーダ(MAE)のSSLタスクを,局所パターンなどのテクスチャの特徴を再構築する補助タスクとして使用した。顔属性と顔に基づく感情分析,深度検出という,顔分析の3つの主要なパラダイムを仮説として検討した。実験結果から,提案モデルからより優れた特徴表現を抽出し,不公平かつ偏りのない顔分析を行うことができた。

In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as local patterns along with the primary task for robust and unbiased face analysis. We experimented with our hypothesis on three major paradigms of face analysis: face attribute and face-based emotion analysis, and deepfake detection. Our experiment results exhibit that better feature representation can be gleaned from our proposed model for fair and bias-less face analysis.

翻訳日:2024-11-05 22:28:30 公開日:2024-09-29

# 分子マーカーを用いたMRIの脳腫瘍分類

Brain Tumor Classification on MRI in Light of Molecular Markers ( http://arxiv.org/abs/2409.19583v1 )

ライセンス: Link先を確認

Jun Liu, Geng Yuan, Weihao Zeng, Hao Tang, Wenbin Zhang, Xue Lin, XiaoLin Xu, Dong Huang, Yanzhi Wang,

(参考訳) 研究報告では,1p/19q遺伝子の同時欠失は低次グリオーマの臨床成績と関連している。 1p19qの状態を予測できる能力は、治療計画と患者の追跡に重要である。本研究の目的は,MRIを用いた畳み込みニューラルネットワークを脳がん検出に活用することである。 RestNetやAlexNetのような公開ネットワークは、転写学習を用いて脳がんを効果的に診断できるが、このモデルには医療画像とは無関係な重みが含まれている。その結果、伝達学習モデルでは診断結果は信頼できない。信頼性の問題に対処するため、事前訓練されたモデルに依存するのではなく、ゼロからモデルを作成する。柔軟性を実現するため, オーバーフィッティングを低減し, コンボリューションスタックとドロップアウトとフル接続操作を併用し, 性能を向上した。モデルトレーニング中、与えられたデータセットを補完し、ガウスノイズを注入する。最適な選択モデルをトレーニングするために、3倍のクロスバリデーションを使用します。 InceptionV3,VGG16,MobileNetV2を事前学習モデルで微調整した場合と比較して,より優れた結果が得られる。 125のコードレプション対31の検証セットでは、コードレプション画像ではなく、1p/19qのコードレプションを分類すると96.37のF1スコア、97.46の精度、96.34のリコールを達成する。

In research findings, co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. The ability to predict 1p19q status is critical for treatment planning and patient follow-up. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection. Although public networks such as RestNet and AlexNet can effectively diagnose brain cancers using transfer learning, the model includes quite a few weights that have nothing to do with medical images. As a result, the diagnostic results are unreliable by the transfer learning model. To deal with the problem of trustworthiness, we create the model from the ground up, rather than depending on a pre-trained model. To enable flexibility, we combined convolution stacking with a dropout and full connect operation, it improved performance by reducing overfitting. During model training, we also supplement the given dataset and inject Gaussian noise. We use three--fold cross-validation to train the best selection model. Comparing InceptionV3, VGG16, and MobileNetV2 fine-tuned with pre-trained models, our model produces better results. On an validation set of 125 codeletion vs. 31 not codeletion images, the proposed network achieves 96.37\% percent F1-score, 97.46\% percent precision, and 96.34\% percent recall when classifying 1p/19q codeletion and not codeletion images.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# ターゲット話者抽出を用いたロバスト音声認識のための2段階フレームワーク

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions ( http://arxiv.org/abs/2409.19585v1 )

ライセンス: Link先を確認

Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda,

(参考訳) 雑音条件下での頑健な音声感情認識(SER)システムの開発は、異なる雑音特性によって生じる課題に直面している。従来の研究は人間の音声雑音の影響を考慮していないため、SERの適用範囲は制限されている。本稿では,ターゲット話者抽出法(TSE)とSERを用いて,この問題に対する新たな2段階の枠組みを提案する。まず、TSEモデルを訓練し、混合からターゲット話者の音声を抽出する。そして第2段階で,抽出した音声をSER訓練に用いる。さらに,第2段階におけるTSEモデルとSERモデルの共同トレーニングについて検討する。提案手法は,TSE法を使わずにベースラインと比較して14.33%の精度向上を実現し,人間の音声雑音の影響を緩和する枠組みの有効性を示した。さらに, 話者の性別を考慮した実験を行い, 異なるジェンダーの混合において, フレームワークが特に良好に機能することを示した。

Developing a robust speech emotion recognition (SER) system in noisy conditions faces challenges posed by different noise properties. Most previous studies have not considered the impact of human speech noise, thus limiting the application scope of SER. In this paper, we propose a novel two-stage framework for the problem by cascading target speaker extraction (TSE) method and SER. We first train a TSE model to extract the speech of target speaker from a mixture. Then, in the second stage, we utilize the extracted speech for SER training. Additionally, we explore a joint training of TSE and SER models in the second stage. Our developed system achieves a 14.33% improvement in unweighted accuracy (UA) compared to a baseline without using TSE method, demonstrating the effectiveness of our framework in mitigating the impact of human speech noise. Moreover, we conduct experiments considering speaker gender, showing that our framework performs particularly well in different-gender mixture.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# ヒト・イン・ザ・ループトレーニングによる全スライド画像の品質管理

Efficient Quality Control of Whole Slide Pathology Images with Human-in-the-loop Training ( http://arxiv.org/abs/2409.19587v1 )

ライセンス: Link先を確認

Abhijeet Patil, Harsh Diwakar, Jay Sawant, Nikhil Cherian Kurian, Subhash Yadav, Swapnil Rane, Tripti Bameta, Amit Sethi,

(参考訳) 病理組織学全体のスライド画像(WSI)は、特に精密腫瘍学において深層学習に基づく診断ソリューションの開発に広く利用されている。これらの診断ソフトウェアのほとんどは、トレーニングやテストデータにおけるバイアスや不純物に弱いため、不正確な診断につながる可能性がある。例えば、WSIには複数の種類の組織領域が含まれており、少なくともそのいくつかは診断に関連しないかもしれない。我々は,WSIを上皮,線条体,リンパ球,脂肪体,人工物,雑多な6つの組織領域に分離する,頑健で軽量なディープラーニングベースの分類器であるHistoROIを紹介した。 HistoROIは、ラベル付け効率のよい一般化のためのトレーニングデータのバリエーションを保証する新しいヒューマン・イン・ザ・ループ・アクティブ・ラーニング・パラダイムを用いて訓練されている。 HistoROIは、単一のデータセットでのみトレーニングされているにも関わらず、複数の臓器で一貫して良好に機能し、強力な一般化を示している。さらに,CAMELYON乳がんリンパ節とTGA肺がんデータセットを用いて,下流深層学習タスクの性能向上のためのHistoROIの有用性を検討した。前者のデータセットでは、弱い教師付き学習を用いて訓練したニューラルネットワークの転移と正常組織に対するレシーバ操作特性曲線(AUC)の下の領域は、HistoROIを用いてデータをフィルタリングすることにより0.88から0.92に増加した。同様に、AUCは肺がんデータセット上の腺癌と扁平上皮癌の分類において0.88から0.93に増加した。また,93個のアノテートされたWSIの試験データセット上で,HistoQCによるアーティファクト検出の性能向上も確認した。提案モデルの限界を解析し,潜在的な拡張についても論じる。

Histopathology whole slide images (WSIs) are being widely used to develop deep learning-based diagnostic solutions, especially for precision oncology. Most of these diagnostic softwares are vulnerable to biases and impurities in the training and test data which can lead to inaccurate diagnoses. For instance, WSIs contain multiple types of tissue regions, at least some of which might not be relevant to the diagnosis. We introduce HistoROI, a robust yet lightweight deep learning-based classifier to segregate WSI into six broad tissue regions -- epithelium, stroma, lymphocytes, adipose, artifacts, and miscellaneous. HistoROI is trained using a novel human-in-the-loop and active learning paradigm that ensures variations in training data for labeling-efficient generalization. HistoROI consistently performs well across multiple organs, despite being trained on only a single dataset, demonstrating strong generalization. Further, we have examined the utility of HistoROI in improving the performance of downstream deep learning-based tasks using the CAMELYON breast cancer lymph node and TCGA lung cancer datasets. For the former dataset, the area under the receiver operating characteristic curve (AUC) for metastasis versus normal tissue of a neural network trained using weakly supervised learning increased from 0.88 to 0.92 by filtering the data using HistoROI. Similarly, the AUC increased from 0.88 to 0.93 for the classification between adenocarcinoma and squamous cell carcinoma on the lung cancer dataset. We also found that the performance of the HistoROI improves upon HistoQC for artifact detection on a test dataset of 93 annotated WSIs. The limitations of the proposed model are analyzed, and potential extensions are also discussed.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# 画像超解像のための効果的な拡散変換器アーキテクチャ

Effective Diffusion Transformer Architecture for Image Super-Resolution ( http://arxiv.org/abs/2409.19589v1 )

ライセンス: Link先を確認

Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu,

(参考訳) 近年の進歩は、拡散モデルが画像超解像において大きな可能性を秘めていることを示している。最新の手法は主に畳み込みニューラルネットワークを用いた潜時拡散モデルに基づいているが、画像生成において顕著な性能を示すトランスフォーマーを探索する試みはほとんどない。本研究では,画像超解像(DiT-SR)のための効果的な拡散変換器を設計する。実際には、DiT-SRは全体のU字型アーキテクチャを活用し、異なるステージにわたるすべての変圧器ブロックに対して均一な等方性設計を採用する。前者はマルチスケールの階層的特徴抽出を促進し、後者は計算資源を重要な層に再配置して性能をさらに向上させる。さらに、広く使われているAdaLNの制限を徹底的に分析し、異なる時間ステップで異なる周波数情報を処理するために、周波数適応型時間ステップ条件付けモジュールを提案する。広汎な実験により、DiT-SRは既存のスクラッチ拡散に基づくSR法よりも優れており、画像超解像における拡散変圧器の優越性を証明し、事前訓練された安定拡散法に先立ついくつかの手法よりも優れていることが証明された。

Recent advances indicate that diffusion models hold great promise in image super-resolution. While the latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super-resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocates the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super-resolution.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# DiffCP:拡散モデルによる超低ビット協調知覚

DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model ( http://arxiv.org/abs/2409.19592v1 )

ライセンス: Link先を確認

Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu,

(参考訳) コラボレーティブ・インテリジェンス(CP)は、スタンドアローン・インテリジェンスの本質的な限界に対する有望な解決策として浮上している。しかし、現在の無線通信システムは、膨大な帯域幅要求のため、特徴レベルおよび生レベルの協調アルゴリズムをサポートできない。本稿では, DiffCPを提案する。 DiffCPは, 特殊な拡散モデルを用いて協調者の知覚情報を効率的に圧縮する新しいCPパラダイムである。幾何条件と意味条件の両方を生成モデルに組み込むことで、DiffCPは超低通信コストで特徴レベルの協調を可能にし、CPシステムの実践的実装を前進させる。このパラダイムは既存のCPアルゴリズムにシームレスに統合して、幅広い下流タスクを強化することができる。広範な実験を通じて,コミュニケーション,計算,性能のトレードオフについて検討する。 DiffCPは,最先端のアルゴリズムと同じ性能を維持しつつ,通信コストを14.5倍に削減できることを示す。

Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# MASKDROID: マスケグラフ表現を用いたロバストAndroidマルウェア検出

MASKDROID: Robust Android Malware Detection with Masked Graph Representations ( http://arxiv.org/abs/2409.19594v1 )

ライセンス: Link先を確認

Jingnan Zheng, Jiaohao Liu, An Zhang, Jun Zeng, Ziqi Yang, Zhenkai Liang, Tat-Seng Chua,

(参考訳) Androidのマルウェア攻撃は、モバイルユーザーに深刻な脅威を与えており、自動検出システムに対する大きな需要を必要としている。マルウェア検出に使用されるさまざまなツールの中で、グラフ表現(例えば関数呼び出しグラフ)は、Androidアプリの振る舞いを特徴づける上で重要な役割を担っている。しかし、現在最先端のグラフベースのマルウェア検出装置は、マルウェア検出において優れた性能を発揮するが、敵の例には弱い。これらの敵対的な例は、正常な悪意のある入力に特定の摂動を導入することで、細心の注意を払って作られている。敵の攻撃から守るために、既存の防御機構は、通常、検出器に補足的な追加であり、しばしば敵の事例の事前の知識に依存し、目に見えない種類の攻撃に対して効果的に防御することができない、重大な制限を示す。本稿では,マルウェアを識別する強力な識別能力と,敵攻撃に対する顕著な堅牢性を備えた強力な検出器MASKDROIDを提案する。具体的には、グラフニューラルネットワーク(GNN)ベースのフレームワークにマスキング機構を導入し、ランダムに選択されたノードの小さな部分(例えば20%)を使って、MASKDROIDに入力グラフ全体を復元させる。グラフ構造内の依存関係の形で安定な悪意的セマンティクスをキャプチャする一方で,MASKDROIDがよりコンパクトな表現を学習し,良質なアプリからマルウェアを検出するための識別力を高めるために,さらにコントラスト的なモジュールを用いている。

Android malware attacks have posed a severe threat to mobile users, necessitating a significant demand for the automated detection system. Among the various tools employed in malware detection, graph representations (e.g., function call graphs) have played a pivotal role in characterizing the behaviors of Android apps. However, though achieving impressive performance in malware detection, current state-of-the-art graph-based malware detectors are vulnerable to adversarial examples. These adversarial examples are meticulously crafted by introducing specific perturbations to normal malicious inputs. To defend against adversarial attacks, existing defensive mechanisms are typically supplementary additions to detectors and exhibit significant limitations, often relying on prior knowledge of adversarial examples and failing to defend against unseen types of attacks effectively. In this paper, we propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware and remarkable robustness against adversarial attacks. Specifically, we introduce a masking mechanism into the Graph Neural Network (GNN) based framework, forcing MASKDROID to recover the whole input graph using a small portion (e.g., 20%) of randomly selected nodes.This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks. While capturing stable malicious semantics in the form of dependencies inside the graph structures, we further employ a contrastive module to encourage MASKDROID to learn more compact representations for both the benign and malicious classes to boost its discriminative power in detecting malware from benign apps and adversarial examples.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# ECCV第2受入テストチャレンジ2024における時間音像定位課題の解法

Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024 ( http://arxiv.org/abs/2409.19595v1 )

ライセンス: Link先を確認

Haowei Gu, Weihao Zhu, Yang Yang,

(参考訳) 本報告では,ビデオ中に発生する音のイベントを,予め定義された音の集合に従って局所化し,分類するTSLタスクの改良手法を提案する。昨年の第1回大会のチャンピオンソリューションは、同じ重さでオーディオとビデオのモダリティを融合させることで、TSLを探索した。 TSLタスクは音事象の局所化を目的としており、音特徴の優越性を実証する関連実験を行っている(第3部)。この結果をもとに,InterVideo, CaVMAE, VideoMAEモデルなどの音声特徴を抽出するために,様々なモデルを用いた。私たちのアプローチは最終テストで最初に0.4925のスコアでランク付けします。

This report proposes an improved method for the Temporal Sound Localisation (TSL) task, which localizes and classifies the sound events occurring in the video according to a predefined set of sound classes. The champion solution from last year's first competition has explored the TSL by fusing audio and video modalities with the same weight. Considering the TSL task aims to localize sound events, we conduct relevant experiments that demonstrated the superiority of sound features (Section 3). Based on our findings, to enhance audio modality features, we employ various models to extract audio features, such as InterVideo, CaVMAE, and VideoMAE models. Our approach ranks first in the final test with a score of 0.4925.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# グラディエントは必要なもの:赤外小ターゲット検出のためのグラディエントベースアテンションフュージョン

Gradient is All You Need: Gradient-Based Attention Fusion for Infrared Small Target Detection ( http://arxiv.org/abs/2409.19599v1 )

ライセンス: Link先を確認

Chen Hu, Yian Huang, Kexuan Li, Luping Zhang, Yiming Zhu, Yufei Peng, Tian Pu, Zhenming Peng,

(参考訳) 赤外線小目標検出(IRSTD)は、民間や軍事用途で広く用いられている。しかし、IRSTDは、小さなターゲットや薄暗いターゲットが複雑な背景によって隠蔽される傾向など、いくつかの課題に直面している。この問題に対処するために,小ターゲットのエッジや勾配情報を抽出し,保存することを目的としたGradient Network(GaNet)を提案する。 GaNetはGradient Transformer(GradFormer)モジュールを採用し、中心差分畳み込み(CDC)をシミュレートして、より深い機能で勾配機能を抽出し統合している。さらに,背景情報を無視しながら,ネットワークが詳細のみに集中しないように包括的視点を提供するグローバル特徴抽出モデル(GFEM)を提案する。ネットワークと最先端技術(SOTA)のアプローチを比較し,本手法が有効であることを示す。ソースコードはhttps://github.com/greekinRoma/Gradient-Transformer.comで公開されています。

Infrared small target detection (IRSTD) is widely used in civilian and military applications. However, IRSTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Gradient Network (GaNet), which aims to extract and preserve edge and gradient information of small targets. GaNet employs the Gradient Transformer (GradFormer) module, simulating central difference convolutions (CDC) to extract and integrate gradient features with deeper features. Furthermore, we propose a global feature extraction model (GFEM) that offers a comprehensive perspective to prevent the network from focusing solely on details while neglecting the background information. We compare the network with state-of-the-art (SOTA) approaches, and the results demonstrate that our method performs effectively. Our source code is available at https://github.com/greekinRoma/Gradient-Transformer.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# 拡張クラスを用いた部分的ラベル学習のための非偏りリスク推定器

An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes ( http://arxiv.org/abs/2409.19600v1 )

ライセンス: Link先を確認

Jiayu Hu, Senlin Shu, Beibei Li, Tao Xiang, Zhongshi He,

(参考訳) 部分ラベル学習(Partial Label Learning, PLL)は、各トレーニングインスタンスが接頭辞を含む候補ラベルのセットでアノテートされていると仮定する、典型的な弱教師付き学習タスクである。近年のPLL法では、偽陽性ラベルの影響を緩和し、有望な性能を達成するために識別に基づく曖昧さが採用されている。しかし、それらはテストセットのすべてのクラスがトレーニングセットに現れることを要求し、新しいクラスが実際のアプリケーションで出現し続けるという事実を無視します。本稿では,1つ以上の拡張クラスがトレーニング段階では見えず,推論段階で現れるPLLAC(Partial Label Learning with Augmented Class)の問題に焦点をあてる。具体的には、既知のクラスをラベルなしデータと区別することにより、拡張クラスの分布を推定し、任意のPLL損失関数を組み込むことができるPLLACの理論的保証付き非バイアスリスク推定器を提案する。さらに,実験的リスク最小化器の真のリスク最小化器への収束を保証するため,推定器の誤差境界の理論的解析を行う。さらに、ネガティブな経験的リスクに起因する過度に適合する問題の影響を軽減するため、最適化目標にリスク対価正規化の項を付加する。ベンチマーク、UCI、実世界のデータセットに関する大規模な実験は、提案手法の有効性を実証している。

Partial Label Learning (PLL) is a typical weakly supervised learning task, which assumes each training instance is annotated with a set of candidate labels containing the ground-truth label. Recent PLL methods adopt identification-based disambiguation to alleviate the influence of false positive labels and achieve promising performance. However, they require all classes in the test set to have appeared in the training set, ignoring the fact that new classes will keep emerging in real applications. To address this issue, in this paper, we focus on the problem of Partial Label Learning with Augmented Class (PLLAC), where one or more augmented classes are not visible in the training stage but appear in the inference stage. Specifically, we propose an unbiased risk estimator with theoretical guarantees for PLLAC, which estimates the distribution of augmented classes by differentiating the distribution of known classes from unlabeled data and can be equipped with arbitrary PLL loss functions. Besides, we provide a theoretical analysis of the estimation error bound of the estimator, which guarantees the convergence of the empirical risk minimizer to the true risk minimizer as the number of training data tends to infinity. Furthermore, we add a risk-penalty regularization term in the optimization objective to alleviate the influence of the over-fitting issue caused by negative empirical risk. Extensive experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# 闇の中でのインファイティング:フェデレートラーニングにおけるマルチラベルバックドアアタック

Infighting in the Dark: Multi-Labels Backdoor Attack in Federated Learning ( http://arxiv.org/abs/2409.19601v1 )

ライセンス: Link先を確認

Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang,

(参考訳) フェデレートラーニング(FL)は、バックドア攻撃に弱いことが示されている。分散機械学習フレームワークとして、ほとんどの研究はSBA(Single-Label Backdoor Attack)に焦点を当てている。残念なことに、MBAに事前の作業を適用することは、効果がないだけでなく、お互いを緩和する可能性がある。本稿では,MBAに先行研究を適用する際の限界について検討する。続いて, 裏口トリガを逆順に適応させて, 裏口サンプルをグローバルモデルにおけるクリーンターゲットとして処理する, 新たな多ラベルバックドア攻撃であるM2Mを提案する。我々の重要な直感は、トリガーパターンとターゲットクラスの分布との接続を確立することであり、異なるトリガーが潜在的な緩和を心配することなく、ターゲットクラスのクリーンなアクティベーションパスに沿ってバックドアをアクティベートできるようにする。広範囲な評価により、M2Mは様々な最先端の攻撃方法より優れていることが示された。この研究は、研究者や開発者にこの潜在的な脅威を警告し、効果的な検出方法の設計を促すことを目的としている。私たちのコードは後で利用可能になります。

Federated Learning (FL) has been demonstrated to be vulnerable to backdoor attacks. As a decentralized machine learning framework, most research focuses on the Single-Label Backdoor Attack (SBA), where adversaries share the same target but neglect the fact that adversaries may be unaware of each other's existence and hold different targets, i.e., Multi-Label Backdoor Attack (MBA). Unfortunately, directly applying prior work to the MBA would not only be ineffective but also potentially mitigate each other. In this paper, we first investigate the limitations of applying previous work to the MBA. Subsequently, we propose M2M, a novel multi-label backdoor attack in federated learning (FL), which adversarially adapts the backdoor trigger to ensure that the backdoored sample is processed as clean target samples in the global model. Our key intuition is to establish a connection between the trigger pattern and the target class distribution, allowing different triggers to activate backdoors along clean activation paths of the target class without concerns about potential mitigation. Extensive evaluations comprehensively demonstrate that M2M outperforms various state-of-the-art attack methods. This work aims to alert researchers and developers to this potential threat and to inspire the design of effective detection methods. Our code will be made available later.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# ビデオのセグメンテーションを指示した言語を、すべてセグメンテーションする1つの方法

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos ( http://arxiv.org/abs/2409.19603v1 )

ライセンス: Link先を確認

Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou,

(参考訳) 本稿では,ビデオにおける言語指示による推論セグメンテーションの問題に対処するために,ビデオベースマルチモーダルな大規模言語モデルであるVideoLISAを紹介する。大規模言語モデルの推論能力と世界知識を活用し、Segment Anything Modelによって強化されたVideoLISAは、言語命令に基づいてビデオ内の時間的に一貫したセグメンテーションマスクを生成する。 LISAのような既存の画像ベース手法は、時間的ダイナミックな理解とフレーム間の一貫したセグメンテーションを必要とする、追加の時間的次元のために、ビデオタスクと競合する。 VideoLISAはこれらの課題に対処するため、Sparse Dense Smpling戦略をビデオLLMに統合し、計算制約の中で時間的コンテキストと空間的詳細をバランスさせる。さらに, 特別に設計された<TRK>トークンを用いて, 複数のフレームにまたがるオブジェクトのセグメンテーションと追跡を可能にするワンToken-Seg-Allアプローチを提案する。新しいReasonVOSベンチマークを含む多種多様なベンチマークの広範な評価は、複雑な推論、時間的理解、オブジェクト追跡を含むビデオオブジェクトセグメンテーションタスクにおいて、VideoLISAの優れたパフォーマンスを示す。 VideoLISAはビデオに最適化されているが、画像セグメンテーションへの有望な一般化を示し、言語で指示されたオブジェクトセグメンテーションの統一基盤モデルとしての可能性を明らかにしている。コードとモデルは、https://github.com/showlab/VideoLISA.comで利用可能になる。

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions. Existing image-based methods, such as LISA, struggle with video tasks due to the additional temporal dimension, which requires temporal dynamic understanding and consistent segmentation across frames. VideoLISA addresses these challenges by integrating a Sparse Dense Sampling strategy into the video-LLM, which balances temporal context and spatial detail within computational constraints. Additionally, we propose a One-Token-Seg-All approach using a specially designed <TRK> token, enabling the model to segment and track objects across multiple frames. Extensive evaluations on diverse benchmarks, including our newly introduced ReasonVOS benchmark, demonstrate VideoLISA's superior performance in video object segmentation tasks involving complex reasoning, temporal understanding, and object tracking. While optimized for videos, VideoLISA also shows promising generalization to image segmentation, revealing its potential as a unified foundation model for language-instructed object segmentation. Code and model will be available at: https://github.com/showlab/VideoLISA.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# ハイパーコネクション

Hyper-Connections ( http://arxiv.org/abs/2409.19606v1 )

ライセンス: Link先を確認

Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou,

(参考訳) 残余接続の代替として機能する,単純かつ効果的な方法であるハイパーコネクションを提案する。このアプローチは、勾配消滅と表現崩壊の間のシーソー効果のような、残差接続変種で観測される共通の欠点に特に対処する。理論的には、ハイパーコネクションにより、ネットワークは異なる深さと動的に再配列する層における特徴間の接続の強度を調整できる。我々は,高接続が残接続よりも顕著な性能向上を示すような高密度およびスパースモデルを含む,大規模言語モデルの事前学習に焦点を当てた実験を行う。視覚タスクに関する追加の実験でも同様の改善が示された。我々は、この手法が幅広いAI問題に広く適用され、有益なものになることを期待する。

We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.

翻訳日:2024-11-05 22:18:46 公開日:2024-09-29

# 視覚言語基礎モデルからのフェデレーション学習:理論的解析と方法

Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method ( http://arxiv.org/abs/2409.19610v1 )

ライセンス: Link先を確認

Bikang Pan, Wei Huang, Ye Shi,

(参考訳) CLIPのような事前学習された視覚言語基礎モデルをフェデレートラーニングに統合することは、様々なタスクにおける一般化を促進する上で大きな注目を集めている。一般的に、視覚言語モデルの連合学習は、即時学習を用いてコミュニケーションと計算コストを削減し、即時学習(即時学習)である。しかし、素早い学習におけるフェデレート学習の性能を理解するための理論的分析は限られている。本研究では,特徴学習理論を用いた素早いフェデレーション学習のための理論的分析フレームワークを構築した。具体的には,課題関連係数と課題関連係数の比率で評価できることを示す。さらに,ポートフォリオ最適化における収益とリスクの類似点と,特徴学習におけるタスク関連用語とタスク関連用語の類似点を抽出する。 2つの独立した資産を組み合わせることで、リスクを低減しつつ収入を維持するというポートフォリオ最適化からのインスピレーションを生かして、グローバル・プロンプトとローカル・プロンプトという2つのプロンプトを導入し、一般化とパーソナライゼーションのバランスをとるための迅速なポートフォリオを構築する。その結果,プロンプトポートフォリオの性能上の利点を示し,最適混合係数を導出した。これらの理論的な主張は実証実験によってさらに支持されている。

Integrating pretrained vision-language foundation models like CLIP into federated learning has attracted significant attention for enhancing generalization across diverse tasks. Typically, federated learning of vision-language models employs prompt learning to reduce communication and computational costs, i.e., prompt-based federated learning. However, there is limited theoretical analysis to understand the performance of prompt-based federated learning. In this work, we construct a theoretical analysis framework for prompt-based federated learning via feature learning theory. Specifically, we monitor the evolution of signal learning and noise memorization in prompt-based federated learning, demonstrating that performance can be assessed by the ratio of task-relevant to task-irrelevant coefficients. Furthermore, we draw an analogy between income and risk in portfolio optimization and the task-relevant and task-irrelevant terms in feature learning. Leveraging inspiration from portfolio optimization that combining two independent assets will maintain the income while reducing the risk, we introduce two prompts: global prompt and local prompt to construct a prompt portfolio to balance the generalization and personalization. Consequently, we showed the performance advantage of the prompt portfolio and derived the optimal mixing coefficient. These theoretical claims have been further supported by empirical experiments.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# 言語モデル連続学習のためのLORAの意図的混合学習

Learning Attentional Mixture of LoRAs for Language Model Continual Learning ( http://arxiv.org/abs/2409.19611v1 )

ライセンス: Link先を確認

Jialin Liu, Jianhua Wu, Jie Liu, Yutai Duan,

(参考訳) Low-Rank Adaption (LoRA) を用いた細調整型大規模言語モデル (LLM) は,新しいタスクに対する継続的な学習に有効なアプローチとして広く認められている。しかし、複数のタスクを逐次処理する場合は、悲惨な忘れがちであることが多い。この目的のために,LLM に適した連続学習手法である AM-LoRA (Attentional Mixture of LoRAs) を提案する。具体的には、AM-LoRAは一連のタスクに対するLoRAのシーケンスを学習し、異なるタスクからの知識を継続的に学習する。アプローチの鍵となるのは、各LoRAからの情報を適応的に統合する知識混合モジュールとして注意機構を考案することである。注意機構により、AM-LoRAはそれぞれのLoRAの特有な貢献を効果的に活用でき、一方、破滅的な忘れを招く可能性のある相互に負の相互作用のリスクを軽減できる。さらに、注意ベクトルをよりスパースにするために、学習プロセスに$L1$ normを導入します。スパース制約により、モデルはすべてのLoRAをまとめて重み付けするのではなく、いくつかの非常に関係の深いLoRAを選択することができるため、相互干渉による影響をさらに軽減することができる。連続学習ベンチマークの実験結果は,提案手法の優位性を示している。

Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks. However, it often suffers from catastrophic forgetting when dealing with multiple tasks sequentially. To this end, we propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs. Specifically, AM-LoRA learns a sequence of LoRAs for a series of tasks to continually learn knowledge from different tasks. The key of our approach is that we devise an attention mechanism as a knowledge mixture module to adaptively integrate information from each LoRA. With the attention mechanism, AM-LoRA can efficiently leverage the distinctive contributions of each LoRA, while mitigating the risk of mutually negative interactions among them that may lead to catastrophic forgetting. Moreover, we further introduce $L1$ norm in the learning process to make the attention vector more sparse. The sparse constraints can enable the model to lean towards selecting a few highly relevant LoRAs, rather than aggregating and weighting all LoRAs collectively, which can further reduce the impact stemming from mutual interference. Experimental results on continual learning benchmarks indicate the superiority of our proposed method.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# Few-Shotセグメンテーション用ハイブリッドマンバ

Hybrid Mamba for Few-Shot Segmentation ( http://arxiv.org/abs/2409.19613v1 )

ライセンス: Link先を確認

Qianxiong Xu, Xuanyi Liu, Lanyun Zhu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao,

(参考訳) 多くの小ショットセグメンテーション(FSS)メソッドは、2次複雑さに関係なく、FG(Fusion Support Foreground)をクエリ機能に利用している。最近の進歩であるMambaは、シーケンス内依存関係をうまくキャプチャできるが、複雑さは線形のみである。したがって、FSSのシーケンス間の依存関係をキャプチャするために、クロス(アテンションのような)Mambaを考案することを目指している。単純なアイデアは、サポート機能をスキャンして、それを隠された状態に選択的に圧縮し、クエリ機能をシーケンシャルにスキャンする初期隠れ状態として使用する、というものだ。クエリ FG は FG をサポートするよりも本質的にはそれ自身に似ており、すなわち、クエリはサポート機能をフューズするのではなく、隠れた状態から独自のものを使うのが好まれるが、FSS の成功はサポート情報の有効利用に依存している。そこで本研究では,(1) 検索時のサポート機能を定期的に再起動するMambaのハイブリッドネットワーク(HMNet)を設計し,隠れた状態が常にリッチなサポート情報を含むようにし,(2) クエリインターセプトされたMambaは,クエリピクセル間の相互通信を禁止し,隠れた状態からより多くのサポート機能を融合させる。これにより、サポート情報がより活用され、パフォーマンスが向上する。 2つの公開ベンチマークで大規模な実験が行われ、HMNetの優位性を示している。コードはhttps://github.com/Sam1224/HMNetで公開されている。

Many few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity. A recent advance Mamba can also well capture intra-sequence dependencies, yet the complexity is only linear. Hence, we aim to devise a cross (attention-like) Mamba to capture inter-sequence dependencies for FSS. A simple idea is to scan on support features to selectively compress them into the hidden state, which is then used as the initial hidden state to sequentially scan query features. Nevertheless, it suffers from (1) support forgetting issue: query features will also gradually be compressed when scanning on them, so the support features in hidden state keep reducing, and many query pixels cannot fuse sufficient support features; (2) intra-class gap issue: query FG is essentially more similar to itself rather than to support FG, i.e., query may prefer not to fuse support features but their own ones from the hidden state, yet the success of FSS relies on the effective use of support information. To tackle them, we design a hybrid Mamba network (HMNet), including (1) a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information; (2) a query intercepted Mamba to forbid the mutual interactions among query pixels, and encourage them to fuse more support features from the hidden state. Consequently, the support information is better utilized, leading to better performance. Extensive experiments have been conducted on two public benchmarks, showing the superiority of HMNet. The code is available at https://github.com/Sam1224/HMNet.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# カオスの識別:意図しない雑音から意図を遠ざけながら対向的摂動を検出する

Discerning the Chaos: Detecting Adversarial Perturbations while Disentangling Intentional from Unintentional Noises ( http://arxiv.org/abs/2409.19619v1 )

ライセンス: Link先を確認

Anubhooti Jain, Susim Roy, Kwanit Gupta, Mayank Vatsa, Richa Singh,

(参考訳) 顔認識や属性予測に使用される深層学習モデルは、ガウスノイズやインパルスノイズなど、敵対的ノイズや意図しないノイズなどの操作に影響を受けやすい。本稿では, 視覚変換器を改良し, 検出層を組み込んだクラス独立適応入出力検出ネットワークCIAIを紹介する。 CIAIは、画像クラスに関係なく、意図的(敵の攻撃)と意図しないノイズの両方を検出するために、最大平均離散性とセンターロスを組み合わせた新しい損失関数を採用している。マルチステップで訓練されている。また、追加のセキュリティ層として機能する検出時の意図的側面も紹介します。 CelebA, CelebA-HQ, LFW, AgeDB, CIFAR-10データセット上で提案した検出器の性能を示す。我々の検出器は、意図的に(FGSM、PGD、DeepFoolのような)、意図しない(ガウスノイズやソルト・アンド・ペッパーノイズのような)摂動を検出することができる。

Deep learning models, such as those used for face recognition and attribute prediction, are susceptible to manipulations like adversarial noise and unintentional noise, including Gaussian and impulse noise. This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers. CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise, regardless of the image class. It is trained in a multi-step fashion. We also introduce the aspect of intent during detection that can act as an added layer of security. We further showcase the performance of our proposed detector on CelebA, CelebA-HQ, LFW, AgeDB, and CIFAR-10 datasets. Our detector is able to detect both intentional (like FGSM, PGD, and DeepFool) and unintentional (like Gaussian and Salt & Pepper noises) perturbations.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# Bitcoinのプログラミング: Bitcoinエコシステムにおけるレイヤ1とレイヤ2技術の調査

Programming on Bitcoin: A Survey of Layer 1 and Layer 2 Technologies in Bitcoin Ecosystem ( http://arxiv.org/abs/2409.19622v1 )

ライセンス: Link先を確認

Guofu Liao, Taotao Wang, Qing Yang, Yihan Xia, Long Shi, Xiang Zhao, Xiaoxiao Wu, Shengli Zhang, Anthony Chan, Richard Yuen,

(参考訳) 本稿では「Bitcoinエコシステム」の重要な部分であるビットコインブロックチェーンのプログラミング機能を強化する革新的なプロトコルについて調査する。 Bitcoinは、Unspent Transaction Output(UTXO)モデルとスタックベースのスクリプト言語を使用して、効率的なピアツーピア支払いを実現しているが、プログラミング能力とスループットの制限に直面している。 2021年のTaprootはSchnorrシグネチャアルゴリズムとP2TRトランザクションタイプを導入し、Bitcoinのプライバシとプログラミング能力を大幅に改善した。このアップグレードにより、Odinals、Atomicals、BitVMなどのプロトコルが開発され、Bitcoinのプログラミング機能を強化し、そのエコシステムが強化された。 Taprootのアップグレードの技術的側面について検討し、Taprootの機能を活用してOrdinalsやAtomicalsを含むトランザクションにNFT(non-fungible tokens)をプログラムするBitcoin Layer 1プロトコルと、fugible token Standard BRC-20とARC-20について検討する。さらに、一部のBitcoinエコシステムプロトコルをEthereumに似たレイヤ2ソリューションに分類し、Bitcoinのパフォーマンスへの影響を分析します。 Bitcoinブロックチェーンのデータを分析することで、ブロック容量、マイナ手数料、Taprootトランザクションの成長に関するメトリクスを収集します。これらのプロトコルがBitcoinのメインネットに与える影響を確認し、Bitcoinのプログラミング能力とエコシステムプロトコルに関する文献のギャップを埋め、実践者や研究者に貴重な洞察を提供する。

This paper surveys innovative protocols that enhance the programming functionality of the Bitcoin blockchain, a key part of the "Bitcoin Ecosystem." Bitcoin utilizes the Unspent Transaction Output (UTXO) model and a stack-based script language for efficient peer-to-peer payments, but it faces limitations in programming capability and throughput. The 2021 Taproot upgrade introduced the Schnorr signature algorithm and P2TR transaction type, significantly improving Bitcoin's privacy and programming capabilities. This upgrade has led to the development of protocols like Ordinals, Atomicals, and BitVM, which enhance Bitcoin's programming functionality and enrich its ecosystem. We explore the technical aspects of the Taproot upgrade and examine Bitcoin Layer 1 protocols that leverage Taproot's features to program non-fungible tokens (NFTs) into transactions, including Ordinals and Atomicals, along with the fungible token standards BRC-20 and ARC-20. Additionally, we categorize certain Bitcoin ecosystem protocols as Layer 2 solutions similar to Ethereum's, analyzing their impact on Bitcoin's performance. By analyzing data from the Bitcoin blockchain, we gather metrics on block capacity, miner fees, and the growth of Taproot transactions. Our findings confirm the positive effects of these protocols on Bitcoin's mainnet, bridging gaps in the literature regarding Bitcoin's programming capabilities and ecosystem protocols and providing valuable insights for practitioners and researchers.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# MCDDPM:脳MRIにおける教師なし異常検出のためのマルチチャンネル条件付き拡散モデル

MCDDPM: Multichannel Conditional Denoising Diffusion Model for Unsupervised Anomaly Detection in Brain MRI ( http://arxiv.org/abs/2409.19623v1 )

ライセンス: Link先を確認

Vivek Kumar Trivedi, Bheeshm Sharma, P. Balamurugan,

(参考訳) 教師付きディープラーニング法を用いた脳MRIスキャンの異常検出は、解剖学的多様性とピクセルレベルのアノテーションの労働集約的要求による課題を提示する。 Denoising Diffusion Probabilistic Model (DDPM)のような生成モデルと、pDDPM、mDDPM、cDDPMのような変異は、最近、脳MRIスキャンで教師なしの異常検出を行うための強力な代替手段として現れている。これらの方法は、健康な脳のフレームレベルラベルを利用して、脳MRIスキャンで健康な組織を生成する。推論中、異常(または不健康)スキャン画像が入力として提示されると、これらのモデルが入力異常スキャンに対応する健全なスキャン画像を生成し、生成された健康スキャン画像と元の異常スキャン画像との差マップは、異常組織の画素レベル同定に必要となる。しかし、DDPM、pDDPM、mDDPMモデルから生成された健康画像は、忠実度の問題に悩まされ、医学的な意味を持たないアーティファクトを含んでいる。 cDDPMは若干の忠実さとアーチファクトの抑制を実現するが、メモリフットプリントが大幅に必要であり、他のDDPMベースモデルよりも計算コストがかかる。本研究では,脳MRIスキャンにおける異常検出のためのMCDDPM(Multi channel Conditional Denoising Diffusion Probabilistic Model)と呼ばれるDDPMの改良版を提案する。提案モデルでは, DDPM, pDDPM, mDDPMモデルと同等の計算コストとメモリ要求を伴って, DDPMモデルの表現力を向上し, トレーニング過程における健康画像からの付加情報を活用することにより, 高忠実度を実現する。複数のデータセット(例えば BraTS20, BraTS21)の実験結果から,提案手法の有望な性能が示された。コードはhttps://github.com/vivekkumartri/MCDDPMで公開されている。

Detecting anomalies in brain MRI scans using supervised deep learning methods presents challenges due to anatomical diversity and labor-intensive requirement of pixel-level annotations. Generative models like Denoising Diffusion Probabilistic Model (DDPM) and their variants like pDDPM, mDDPM, cDDPM have recently emerged to be powerful alternatives to perform unsupervised anomaly detection in brain MRI scans. These methods leverage frame-level labels of healthy brains to generate healthy tissues in brain MRI scans. During inference, when an anomalous (or unhealthy) scan image is presented as an input, these models generate a healthy scan image corresponding to the input anomalous scan, and the difference map between the generated healthy scan image and the original anomalous scan image provide the necessary pixel level identification of abnormal tissues. The generated healthy images from the DDPM, pDDPM and mDDPM models however suffer from fidelity issues and contain artifacts that do not have medical significance. While cDDPM achieves slightly better fidelity and artifact suppression, it requires huge memory footprint and is computationally expensive than the other DDPM based models. In this work, we propose an improved version of DDPM called Multichannel Conditional Denoising Diffusion Probabilistic Model (MCDDPM) for unsupervised anomaly detection in brain MRI scans. Our proposed model achieves high fidelity by making use of additional information from the healthy images during the training process, enriching the representation power of DDPM models, with a computational cost and memory requirements on par with DDPM, pDDPM and mDDPM models. Experimental results on multiple datasets (e.g. BraTS20, BraTS21) demonstrate promising performance of the proposed method. The code is available at https://github.com/vivekkumartri/MCDDPM.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# Storynizor: フレーム間同期およびシャッフルID注入による一貫性のあるストーリー生成

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection ( http://arxiv.org/abs/2409.19624v1 )

ライセンス: Link先を確認

Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Zeng Zhao, Changjie Fan, Zhipeng Hu,

(参考訳) テキスト・画像拡散モデルの最近の進歩は、連続したストーリー画像生成に大きな関心を喚起している。本稿では,フレーム間キャラクタの一貫性の強いコヒーレントなストーリーを生成可能なモデルであるStorynizorについて紹介する。 Storynizorの中核となるイノベーションは、主要なモジュールであるID-SynchronizerとID-Injectorにある。 ID-Synchronizerは、オートマスクの自己認識モジュールと、フレーム間のイメージ間の知覚的損失を利用して、キャラクター生成の一貫性を改善し、姿勢と背景を鮮明に表現する。 IDインジェクタは、Shuffling Reference Strategy(SRS)を使用して、ID機能を特定の場所に統合し、IDベースの一貫した文字生成を強化する。さらに、Storynizorのトレーニングを容易にするために、100,000の画像からなるStoryDBと呼ばれる新しいデータセットをキュレートした。このデータセットには、さまざまな環境、レイアウト、詳細な記述を伴うジェスチャーの単一および複数文字セットが含まれている。実験結果から,Storynizorは,他のキャラクタ固有の手法と比較して,高忠実度なキャラクタ一貫性,フレキシブルな姿勢,鮮明な背景を有する優れたコヒーレントなストーリー生成を示すことが示された。

Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images. This dataset contains single and multiple-character sets in diverse environments, layouts, and gestures with detailed descriptions. Experimental results indicate that Storynizor demonstrates superior coherent story generation with high-fidelity character consistency, flexible postures, and vivid backgrounds compared to other character-specific methods.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# 抽象的議論フレームワークのアクション言語に基づく形式化

An action language-based formalisation of an abstract argumentation framework ( http://arxiv.org/abs/2409.19625v1 )

ライセンス: Link先を確認

Yann Munro, Camilo Sarmiento, Isabelle Bloch, Gauvain Bourgne, Catherine Pelachaud, Marie-Jeanne Lesot,

(参考訳) 抽象的議論フレームワークは、対話の静的表現を提供するために一般的に使用される形式主義である。しかし、議論的対話における議論の列挙は非常に重要であり、この対話の結果に影響を与える可能性がある。本稿では,抽象的議論グラフをモデル化するための新しいフレームワークを提案する。この順序を考慮に入れれば、拡張(extension)と呼ばれる対話毎にユニークな結果を導出する手段が得られます。また、終端や正しさといったいくつかの性質を確立し、完全性の2つの概念について議論する。特に、「最後の最終更新」戦略に基づく前回の変換の修正を提案し、完全性の第2形態を検証した。

An abstract argumentation framework is a commonly used formalism to provide a static representation of a dialogue. However, the order of enunciation of the arguments in an argumentative dialogue is very important and can affect the outcome of this dialogue. In this paper, we propose a new framework for modelling abstract argumentation graphs, a model that incorporates the order of enunciation of arguments. By taking this order into account, we have the means to deduce a unique outcome for each dialogue, called an extension. We also establish several properties, such as termination and correctness, and discuss two notions of completeness. In particular, we propose a modification of the previous transformation based on a "last enunciated last updated" strategy, which verifies the second form of completeness.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# IDEAW: 可逆的デュアル埋め込みによるロバストなニューラルオーディオ透かし

IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding ( http://arxiv.org/abs/2409.19627v1 )

ライセンス: Link先を確認

Pengcheng Li, Xulong Zhang, Jing Xiao, Jianzong Wang,

(参考訳) 音声透かし技術は、メッセージをオーディオに埋め込んで、透かし付きオーディオからメッセージを正確に抽出する。従来の手法では、専門的な経験に基づくアルゴリズムを開発し、透かしを信号の時間領域や変換領域に埋め込む。ディープニューラルネットワークの開発に伴い、ディープラーニングベースのニューラルオーディオ透かしが登場している。従来のアルゴリズムと比較して、ニューラルオーディオの透かしは、トレーニング中に様々な攻撃を考慮することにより、より堅牢性を達成する。しかし、現在のニューラルウォーターマーキング法は、低容量で不満足な非受容性に悩まされている。さらに、ニューラルオーディオの透かしにおいて非常に重要であり、さらに顕著である透かし位置の問題は十分に研究されていない。本稿では,効率的な位置決めのための二重埋め込み型透かしモデルの設計を行う。また、ロバストネストレーニングにおいて、攻撃層が非可逆ニューラルネットワークに与える影響についても検討し、その妥当性と安定性の両方を高めるためにモデルを改善した。実験の結果,提案モデルであるIDEAWは,既存の手法と比較して,高いキャパシティと効率的な位置決め能力を備えた様々な攻撃に耐えることができることがわかった。

The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional methods develop algorithms based on expert experience to embed watermarks into the time-domain or transform-domain of signals. With the development of deep neural networks, deep learning-based neural audio watermarking has emerged. Compared to traditional algorithms, neural audio watermarking achieves better robustness by considering various attacks during training. However, current neural watermarking methods suffer from low capacity and unsatisfactory imperceptibility. Additionally, the issue of watermark locating, which is extremely important and even more pronounced in neural audio watermarking, has not been adequately studied. In this paper, we design a dual-embedding watermarking model for efficient locating. We also consider the impact of the attack layer on the invertible neural network in robustness training, improving the model to enhance both its reasonableness and stability. Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# 生活予測を継続するグラフニューラルネットワークに関する調査研究:方法論,評価,今後の展望

A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future Trends ( http://arxiv.org/abs/2409.19629v1 )

ライセンス: Link先を確認

Yucheng Wang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen,

(参考訳) Remaining Useful Life (RUL) prediction is a critical aspects of Prognostics and Health Management (PHM)。既存のディープラーニング手法は将来性を示しているが、複雑なシステムに固有の空間情報の活用に苦慮し、RUL予測の有効性を制限していることが多い。この課題に対処するため、最近の研究では、より正確なRUL予測のために空間情報をモデル化するためのグラフニューラルネットワーク(GNN)の利用について検討している。本稿では、RUL予測に適用されたGNN手法の総合的なレビューを行い、既存の手法を要約し、今後の研究のためのガイダンスを提供する。まず、GNNをRUL予測に適用する段階に基づいて、グラフ構築、グラフモデリング、グラフ情報処理、グラフ読み出しの4つの重要な段階に体系的にアプローチを分類する新しい分類法を提案する。このようにしてフィールドを整理することで、GNNパイプラインの各ステージにおけるユニークな課題と考慮点を強調します。さらに, 各種SOTA(State-of-the-art) GNN法を徹底的に評価し, 公正比較のための一貫した実験的設定を確実にする。この厳密な分析は、異なるアプローチの強みと弱みに関する貴重な洞察を与え、この分野で働く研究者や実践者の実験的ガイドとなる。最後に、GNNがRUL予測に革命をもたらし、PHM戦略の有効性を高める可能性を強調し、フィールドをさらに前進させるいくつかの有望な研究方向を特定し、議論する。ベンチマークコードはGitHubで公開されている。

Remaining Useful Life (RUL) prediction is a critical aspect of Prognostics and Health Management (PHM), aimed at predicting the future state of a system to enable timely maintenance and prevent unexpected failures. While existing deep learning methods have shown promise, they often struggle to fully leverage the spatial information inherent in complex systems, limiting their effectiveness in RUL prediction. To address this challenge, recent research has explored the use of Graph Neural Networks (GNNs) to model spatial information for more accurate RUL prediction. This paper presents a comprehensive review of GNN techniques applied to RUL prediction, summarizing existing methods and offering guidance for future research. We first propose a novel taxonomy based on the stages of adapting GNNs to RUL prediction, systematically categorizing approaches into four key stages: graph construction, graph modeling, graph information processing, and graph readout. By organizing the field in this way, we highlight the unique challenges and considerations at each stage of the GNN pipeline. Additionally, we conduct a thorough evaluation of various state-of-the-art (SOTA) GNN methods, ensuring consistent experimental settings for fair comparisons. This rigorous analysis yields valuable insights into the strengths and weaknesses of different approaches, serving as an experimental guide for researchers and practitioners working in this area. Finally, we identify and discuss several promising research directions that could further advance the field, emphasizing the potential for GNNs to revolutionize RUL prediction and enhance the effectiveness of PHM strategies. The benchmarking codes are available in GitHub: https://github.com/Frank-Wang-oss/GNN\_RUL\_Benchmarking.

翻訳日:2024-11-05 22:09:00 公開日:2024-09-29

# 古典的浅波における量子粒子統計

Quantum Particle Statistics in Classical Shallow Water Waves ( http://arxiv.org/abs/2409.19632v1 )

ライセンス: Link先を確認

Idan Ceausu, Yuval Dagan,

(参考訳) 我々は、ポテンシャル井戸における非相対論的量子粒子の新しい流体力学的類似性を示す。 Schr\\odinger方程式の実際の変種と重力キャピラリー浅瀬波の類似性を報告し、解析した。実波勾配によって局所的に振動する粒子が導かれると、粒子は波動ポテンシャルを増大させながら周期的あるいはカオス的ダイナミクスの軌跡を示す可能性がある。このアナログの粒子確率分布関数はシュリンガー方程式の標準解の量子統計を明らかにし、したがってボルン則の古典的決定論的解釈として表す。最後に、準定常状態間の遷移の古典的なメカニズムを提案する。

We present a new hydrodynamic analogy of nonrelativistic quantum particles in potential wells. Similarities between a real variant of the Schr\"odinger equation and gravity-capillary shallow water waves are reported and analyzed. We show that when locally oscillating particles are guided by real wave gradients, particles may exhibit trajectories of alternating periodic or chaotic dynamics while increasing the wave potential. The particle probability distribution function of this analogy reveals the quantum statistics of the standard solutions of the Schr\"odinger equation and thus manifests as a classical deterministic interpretation of Born's rule. Finally, a classical mechanism for the transition between quasi-stationary states is proposed.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# 時系列非教師なし領域適応のための時間的ソース復元

Temporal Source Recovery for Time-Series Source-Free Unsupervised Domain Adaptation ( http://arxiv.org/abs/2409.19635v1 )

ライセンス: Link先を確認

Yucheng Wang, Peiliang Gong, Min Wu, Felix Ott, Xiaoli Li, Lihua Xie, Zhenghua Chen,

(参考訳) Source-Free Unsupervised Domain Adaptation (SFUDA)は、ソースドメインにアクセスすることなく、トレーニング済みモデルをターゲットドメインに適応し、ソースデータのプライバシを確保する能力で人気を集めている。 SFUDAは視覚的なタスクでよく開発されているが、ドメイン間で重要な時間的依存関係を転送することの難しさから、時系列SFUDA(TS-SFUDA)への応用は制限されている。少数の研究者がこの領域を探求し始めたが、特定のソースドメイン設計に依存しており、ソースデータ所有者は特定の事前学習プロトコルに従うことが期待できないため、現実的ではない。そこで本稿では,効率的なTS-SFUDAの時間依存性をソース固有の設計を必要とせずに転送するフレームワークであるTemSRを提案する。 TemSRは、マスキング、リカバリ、最適化を活用して、ソース時間依存性を回復したソース風のディストリビューションを生成するリカバリプロセスを備えている。効率的な回復を確保するため,局所的な依存関係を復元するためのセグメントベース正規化と,ソースライクな分布の多様性を高めるためにアンカーベースリカバリの多様性の最大化を更に設計する。ソースライクな分布は、従来のUDA技術を使用してターゲットドメインに適合する。複数のTSタスクにわたる大規模な実験は、ソースドメイン設計を必要とする既存のTS-SFUDAメソッドを超越したTemSRの有効性を示す。コードはhttps://github.com/Frank-Wang-oss/TemSRで入手できる。

Source-Free Unsupervised Domain Adaptation (SFUDA) has gained popularity for its ability to adapt pretrained models to target domains without accessing source domains, ensuring source data privacy. While SFUDA is well-developed in visual tasks, its application to Time-Series SFUDA (TS-SFUDA) remains limited due to the challenge of transferring crucial temporal dependencies across domains. Although a few researchers begin to explore this area, they rely on specific source domain designs, which are impractical as source data owners cannot be expected to follow particular pretraining protocols. To solve this, we propose Temporal Source Recovery (TemSR), a framework that transfers temporal dependencies for effective TS-SFUDA without requiring source-specific designs. TemSR features a recovery process that leverages masking, recovery, and optimization to generate a source-like distribution with recovered source temporal dependencies. To ensure effective recovery, we further design segment-based regularization to restore local dependencies and anchor-based recovery diversity maximization to enhance the diversity of the source-like distribution. The source-like distribution is then adapted to the target domain using traditional UDA techniques. Extensive experiments across multiple TS tasks demonstrate the effectiveness of TemSR, even surpassing existing TS-SFUDA method that requires source domain designs. Code is available in https://github.com/Frank-Wang-oss/TemSR.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# BadHMP:人間の動き予測に対するバックドア攻撃

BadHMP: Backdoor Attack against Human Motion Prediction ( http://arxiv.org/abs/2409.19638v1 )

ライセンス: Link先を確認

Chaohui Xu, Si Wang, Chip-Hong Chang,

(参考訳) 過去の観測から得られたサブ秒間地平線上での人体運動の正確な予測は、様々な安全クリティカルな応用に不可欠である。これまで、回避攻撃に対する人間の動き予測の脆弱性を調べる研究は1つしかなかった。本稿では,人間の動作予測を対象とする最初のバックドアアタックであるBadHMPを提案する。我々のアプローチは、骨格の片腕に局所的なバックドアトリガーを埋め込むことで、有毒なトレーニングサンプルを生成し、選択された関節が比較的静止状態に留まったり、過去のステップで事前に定義された動きに追従したりすることである。その後、将来のシーケンスをターゲットシーケンスにグローバルに修正し、トレーニングデータセット全体をトラバースして、最も適した毒素サンプルを選択する。筆者らが設計したバックドアトリガとターゲットは, 有毒試料の滑らかさと自然さを保証し, モデルトレーナーによる検出を回避できるほどステルス性が高く, また, 有毒モデルが不確定な配列に対する予測忠実度を保ちつつも, その検出を回避できる。ターゲットシーケンスは、低毒性試料注入比であっても、設計された入力シーケンスによって正常に活性化することができる。 Human3.6MとCMU-Mocapの2つのデータセットと2つのネットワークアーキテクチャ(LTDとHRI)の実験結果は、BadHMPの高忠実性、有効性、ステルス性を示している。微調整防御に対する我々の攻撃のロバスト性も検証された。

Precise future human motion prediction over subsecond horizons from past observations is crucial for various safety-critical applications. To date, only one study has examined the vulnerability of human motion prediction to evasion attacks. In this paper, we propose BadHMP, the first backdoor attack that targets specifically human motion prediction. Our approach involves generating poisoned training samples by embedding a localized backdoor trigger in one arm of the skeleton, causing selected joints to remain relatively still or follow predefined motion in historical time steps. Subsequently, the future sequences are globally modified to the target sequences, and the entire training dataset is traversed to select the most suitable samples for poisoning. Our carefully designed backdoor triggers and targets guarantee the smoothness and naturalness of the poisoned samples, making them stealthy enough to evade detection by the model trainer while keeping the poisoned model unobtrusive in terms of prediction fidelity to untainted sequences. The target sequences can be successfully activated by the designed input sequences even with a low poisoned sample injection ratio. Experimental results on two datasets (Human3.6M and CMU-Mocap) and two network architectures (LTD and HRI) demonstrate the high-fidelity, effectiveness, and stealthiness of BadHMP. Robustness of our attack against fine-tuning defense is also verified.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# fCOP:カテゴリーレベルのオブジェクトからの焦点長推定

fCOP: Focal Length Estimation from Category-level Object Priors ( http://arxiv.org/abs/2409.19641v1 )

ライセンス: Link先を確認

Xinyue Zhang, Jiaqi Yang, Xiangting Meng, Abdelrahman Mohamed, Laurent Kneip,

(参考訳) コンピュータビジョンの領域では、視覚信号による3D世界の認識と再構築は、長い間コミュニティ内で激しい研究の対象となっていたカメラ固有のパラメータに大きく依存している。現実的な応用では、マンハッタン・ワールドの仮定や特別な人工キャリブレーションパターンのような強いシーン幾何学がなければ、単眼焦点距離推定は難しい課題となる。本稿では,カテゴリレベルの対象先行値を用いた単眼焦点距離推定手法を提案する。単眼深度推定とカテゴリーレベルの対象標準表現学習という,既存の2つの課題に基づいて,対象を含む画像から奥行き先と対象形状先を抽出し,クローズド形式の三重項から焦点長を推定する。シミュレーションおよび実世界データを用いた実験により,提案手法は現状よりも優れており,長期の単分子焦点長推定問題に対する有望な解であることが示された。

In the realm of computer vision, the perception and reconstruction of the 3D world through vision signals heavily rely on camera intrinsic parameters, which have long been a subject of intense research within the community. In practical applications, without a strong scene geometry prior like the Manhattan World assumption or special artificial calibration patterns, monocular focal length estimation becomes a challenging task. In this paper, we propose a method for monocular focal length estimation using category-level object priors. Based on two well-studied existing tasks: monocular depth estimation and category-level object canonical representation learning, our focal solver takes depth priors and object shape priors from images containing objects and estimates the focal length from triplets of correspondences in closed form. Our experiments on simulated and real world data demonstrate that the proposed method outperforms the current state-of-the-art, offering a promising solution to the long-standing monocular focal length estimation problem.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# 自動車ダイナミクスモデル推定のための微調整ハイブリッド物理インフォームドニューラルネットワーク

Fine-Tuning Hybrid Physics-Informed Neural Networks for Vehicle Dynamics Model Estimation ( http://arxiv.org/abs/2409.19647v1 )

ライセンス: Link先を確認

Shiming Fang, Kaiyan Yu,

(参考訳) 正確なダイナミックモデリングは、特に安全のために正確な動き予測が不可欠である高速かつアジャイルな操作において、自動運転車にとって重要なものである。従来のパラメータ推定手法では、初期推定への依存、労働集約的な適合手順、複雑なテスト設定などの制限に直面している。一方、純粋にデータ駆動機械学習手法は、固有の物理的制約を捉えるのに苦労し、通常、最適なパフォーマンスのために大きなデータセットを必要とする。これらの課題に対処するために,物理に基づくモデリングとデータ駆動技術を組み合わせた,教師付きおよび教師なしの物理情報ニューラルネットワーク(PINN)を統合したFTHD(Fin-Tuning Hybrid Dynamics)手法を提案する。 FTHDは、より小さなトレーニングデータセットを使用して、トレーニング済みのDeep Dynamics Model(DDM)を微調整し、Deep Pacejka Model(DPM)のような最先端の手法よりも優れたパフォーマンスを提供し、オリジナルのDDMよりも優れたパフォーマンスを提供する。さらに、拡張カルマンフィルタ(EKF)をFTHD(EKF-FTHD)内に埋め込んで、ノイズの多い実世界のデータを効果的に管理し、車両の本質的な物理的特性を保ちながら正確な復調を保証する。提案するFTHDフレームワークは,BayesRace Physics-based Simulator を用いた大規模シミュレーションと,Indy Autonomous Challenge による実世界の実環境実験により検証された。その結果, パラメータ推定精度は従来のモデルより大幅に向上し, 既存のモデルよりも優れていた。 EKF-FTHDは、物理的洞察を維持しながら現実世界のデータをノイズ化することでロバスト性を高める。

Accurate dynamic modeling is critical for autonomous racing vehicles, especially during high-speed and agile maneuvers where precise motion prediction is essential for safety. Traditional parameter estimation methods face limitations such as reliance on initial guesses, labor-intensive fitting procedures, and complex testing setups. On the other hand, purely data-driven machine learning methods struggle to capture inherent physical constraints and typically require large datasets for optimal performance. To address these challenges, this paper introduces the Fine-Tuning Hybrid Dynamics (FTHD) method, which integrates supervised and unsupervised Physics-Informed Neural Networks (PINNs), combining physics-based modeling with data-driven techniques. FTHD fine-tunes a pre-trained Deep Dynamics Model (DDM) using a smaller training dataset, delivering superior performance compared to state-of-the-art methods such as the Deep Pacejka Model (DPM) and outperforming the original DDM. Furthermore, an Extended Kalman Filter (EKF) is embedded within FTHD (EKF-FTHD) to effectively manage noisy real-world data, ensuring accurate denoising while preserving the vehicle's essential physical characteristics. The proposed FTHD framework is validated through scaled simulations using the BayesRace Physics-based Simulator and full-scale real-world experiments from the Indy Autonomous Challenge. Results demonstrate that the hybrid approach significantly improves parameter estimation accuracy, even with reduced data, and outperforms existing models. EKF-FTHD enhances robustness by denoising real-world data while maintaining physical insights, representing a notable advancement in vehicle dynamics modeling for high-speed autonomous racing.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# OrientedFormer: リモートセンシング画像におけるEnd-to-End変換器に基づくオブジェクト指向物体検出器

OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images ( http://arxiv.org/abs/2409.19648v1 )

ライセンス: Link先を確認

Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wen-Liang Du, Rui Yao, Abdulmotaleb El Saddik,

(参考訳) リモートセンシング画像におけるオブジェクト指向物体検出は、複数方向のオブジェクトが分散しているため、難しい課題である。近年,従来のCNN方式と比較して,後処理演算子の必要性を排除して,エンドツーエンドトランスフォーマーベースの手法が成功を収めている。しかし、変換器を直接オブジェクト指向オブジェクト検出に拡張することは、次の3つの主要な問題をもたらす。 1) 物体は任意に回転し,位置及び大きさとともに角度の符号化を必要とする。 2 配向対象物の幾何学的関係は、内容と位置関係の相互作用が欠如しているため、自己注意が欠如している。 3) 対象物は, 主に位置関係における値と位置関係の相違を生じ, 正確な分類と局所化を困難にしている。本稿では,これらの問題に対処する3つの専用モジュールからなる,エンドツーエンドのトランスフォーマーに基づくオブジェクト指向検出器を提案する。まず、ガウス分布を用いた配向箱の角度、位置、大きさを符号化するガウス位置符号化を提案する。第二に、ワッサーシュタインの自己注意は幾何学的関係を導入し、ガウス的ワッサーシュタイン距離スコアを利用して、内容と位置的クエリ間の相互作用を促進する。第3に、位置問合せの周囲のサンプリング点を角度に応じて回転させることにより、値と位置問合せを整列させる指向的相互注意を提案する。 DOTA,HRSC2016, ICDAR2015のシリーズであるDIOR-Rによる6つのデータセットの実験は、我々のアプローチの有効性を示している。従来のエンドツーエンド検出器と比較して、OrientedFormerはDIOR-RとDOTA-v1.0でそれぞれ1.16および1.21 AP$_{50}$を獲得し、トレーニングエポックを3$\times$から1$\times$に下げる。コードはhttps://github.com/wokaikaixinxin/OrientedFormerで入手できる。

Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation. Recently, end-to-end transformer-based methods have achieved success by eliminating the need for post-processing operators compared to traditional CNN-based methods. However, directly extending transformers to oriented object detection presents three main issues: 1) objects rotate arbitrarily, necessitating the encoding of angles along with position and size; 2) the geometric relations of oriented objects are lacking in self-attention, due to the absence of interaction between content and positional queries; and 3) oriented objects cause misalignment, mainly between values and positional queries in cross-attention, making accurate classification and localization difficult. In this paper, we propose an end-to-end transformer-based oriented object detector, consisting of three dedicated modules to address these issues. First, Gaussian positional encoding is proposed to encode the angle, position, and size of oriented boxes using Gaussian distributions. Second, Wasserstein self-attention is proposed to introduce geometric relations and facilitate interaction between content and positional queries by utilizing Gaussian Wasserstein distance scores. Third, oriented cross-attention is proposed to align values and positional queries by rotating sampling points around the positional query according to their angles. Experiments on six datasets DIOR-R, a series of DOTA, HRSC2016 and ICDAR2015 show the effectiveness of our approach. Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_{50}$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$\times$ to 1$\times$. The codes are available at https://github.com/wokaikaixinxin/OrientedFormer.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# エゴセントリック相互作用による3次元シーンのグラウンディング

Grounding 3D Scene Affordance From Egocentric Interactions ( http://arxiv.org/abs/2409.19650v1 )

ライセンス: Link先を確認

Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha,

(参考訳) 3Dシーンの空き地は、3D環境における対話的な領域を見つけることを目的としており、エージェントが周囲と知的に対話することが重要である。既存のほとんどのアプローチは、静的な幾何学的構造と視覚的外観に基づいてセマンティクスを3Dインスタンスにマッピングすることでこれを達成している。この受動的戦略は、エージェントが環境を積極的に知覚し、関与する能力を制限し、事前に定義された意味的指示に依存する。対照的に、人間は周囲との相互作用を観察し模倣することで複雑な相互作用のスキルを発達させる。このような能力でモデルを強化するために,エゴセントリックなインタラクションから3Dシーンのアベイランスを基盤として,インタラクションのエゴセントリックなビデオに基づいて,対応する3Dシーンのアベイランス領域を特定するという,新しいタスクを導入する。このタスクは、複数のソースにわたる空間的複雑さとアライメント複雑性の課題に直面する。これらの課題に対処するために,インタラクション関連サブリージョンに着目し,双方向クエリデコーダ機構を通じて異なるソースからのアプライアンス機能を調整することを目的とした,インタラクションインテンションを利用したEgocentric Interaction-driven 3D Scene Affordance Grounding(Ego-SAG)フレームワークを提案する。さらに,エゴセントリックなビデオ3D Scene Affordance Dataset (VSAD)を導入し,多種多様なインタラクションタイプと多種多様な3D環境をカバーした。 VSADにおける広範囲な実験により,提案課題の実現可能性と提案手法の有効性が検証された。

Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the environment, making it reliant on predefined semantic instructions. In contrast, humans develop complex interaction skills by observing and imitating how others interact with their surroundings. To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction. This task faces the challenges of spatial complexity and alignment complexity across multiple sources. To address these challenges, we propose the Egocentric Interaction-driven 3D Scene Affordance Grounding (Ego-SAG) framework, which utilizes interaction intent to guide the model in focusing on interaction-relevant sub-regions and aligns affordance features from different sources through a bidirectional query decoder mechanism. Furthermore, we introduce the Egocentric Video-3D Scene Affordance Dataset (VSAD), covering a wide range of common interaction types and diverse 3D environments to support this task. Extensive experiments on VSAD validate both the feasibility of the proposed task and the effectiveness of our approach.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# 心理測定尺度を用いた事前学習言語モデルにおける潜時構造の評価と操作

Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales ( http://arxiv.org/abs/2409.19655v1 )

ライセンス: Link先を確認

Maor Reuben, Ortal Slobodin, Aviad Elyshar, Idan-Chaim Cohen, Orna Braun-Lewensohn, Odeya Cohen, Rami Puzis,

(参考訳) 人間のような性格特性は、最近、大きな言語モデルで発見され、その(未知の)バイアスが人間の潜伏した心理的構造に一致するという仮説を提起した。大きな会話モデルは心理測定のアンケートに答えるのに騙されるかもしれないが、他のタスクのために訓練された何千もの単純なトランスフォーマーの潜在的な心理的構成は、現在適切な心理測定方法が欠如しているため評価できない。本稿では,標準的な心理アンケートを自然言語推論のプロンプトに再構成する方法を示し,任意のモデルの心理指標評価を支援するためのコードライブラリを提供する。我々は、88の公開モデルを用いて、人間の心理学における標準的な理論に準拠し、類似の相関関係と緩和戦略を示す、人間に似た精神保健関連構造(不安、抑うつ、一貫性の感覚を含む)の存在を実証する。心理的ツールを使用して言語モデルのパフォーマンスを解釈し、修正する能力は、より説明可能な、制御可能な、信頼できるモデルの開発を促進することができる。

Human-like personality traits have recently been discovered in large language models, raising the hypothesis that their (known and as yet undiscovered) biases conform with human latent psychological constructs. While large conversational models may be tricked into answering psychometric questionnaires, the latent psychological constructs of thousands of simpler transformers, trained for other tasks, cannot be assessed because appropriate psychometric methods are currently lacking. Here, we show how standard psychological questionnaires can be reformulated into natural language inference prompts, and we provide a code library to support the psychometric assessment of arbitrary models. We demonstrate, using a sample of 88 publicly available models, the existence of human-like mental health-related constructs (including anxiety, depression, and Sense of Coherence) which conform with standard theories in human psychology and show similar correlations and mitigation strategies. The ability to interpret and rectify the performance of language models by using psychological tools can boost the development of more explainable, controllable, and trustworthy models.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# マルチモーダルLLMを用いた合成データからの学習によるマルチモーダル誤情報検出

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs ( http://arxiv.org/abs/2409.19656v1 )

ライセンス: Link先を確認

Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang,

(参考訳) マルチモーダルな誤情報の検出,特に画像とテキストのペアによる検出が重要である。大規模で高品質な実世界のファクトチェックデータセットをトレーニングするには、コストがかかるため、研究者はAI技術によって生成された合成データセットを使用することができる。しかし、合成データに基づいて訓練された検出器の現実シナリオへの一般化性は、分布ギャップのため不明である。そこで本研究では,合成データと実世界のデータ分布を一致させる2つのモデルに依存しないデータ選択手法を用いて,実世界のマルチモーダル誤情報を検出するための合成データからの学習を提案する。 GPT-4V~\cite{GPT-4V} を超越して実世界のファクトチェックデータセット上でのMLLM (13B) の性能を向上させる実験を行った。

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# 関節分割と変形性医用画像登録のためのマルチスケールデュアルアテンション周波数フュージョン

Dual-Attention Frequency Fusion at Multi-Scale for Joint Segmentation and Deformable Medical Image Registration ( http://arxiv.org/abs/2409.19658v1 )

ライセンス: Link先を確認

Hongchao Zhou, Shunbo Hu,

(参考訳) 変形可能な医用画像登録は、医用画像解析の重要な側面である。近年, 医用画像登録における複雑な変形問題に対処するため, 補助的タスク(教師付きセグメンテーションなど)を活用して, 一次登録タスクの解剖学的構造情報の提供を開始している。本研究では,マルチスケールデュアルアテンション周波数融合(DAFF-Net)に基づくマルチタスク学習フレームワークを提案する。 DAFF-Netはグローバルエンコーダ、セグメンテーションデコーダ、粗大なピラミッド登録デコーダで構成される。登録復号処理中、我々は2つのタスク間の相関関係を完全に活用し、異なるスケールでの登録とセグメント化機能を融合するためのデュアルアテンション周波数特徴融合(DAFF)モジュールを設計する。 DAFFモジュールはグローバルおよびローカルな重み付け機構を通じて機能を最適化する。局所重み付けでは、高周波情報と低周波情報の両方を組み込んで、登録作業に不可欠な特徴をさらに捉えている。セグメンテーションの助けを借りて、登録はより正確な解剖学的構造情報を学び、これにより登録後の歪んだ画像の解剖学的整合性を高める。さらに,DAFFモジュールが効果的な特徴情報を抽出する能力に優れており,その応用を教師なし登録に拡張する。 3つのパブリック3次元脳磁気共鳴画像(MRI)データセットの広範囲な実験により,提案したDAFF-Netとその教師なし変種は,いくつかの評価指標において,最先端の登録方法よりも優れており,変形可能な医用画像登録におけるアプローチの有効性が実証されている。

Deformable medical image registration is a crucial aspect of medical image analysis. In recent years, researchers have begun leveraging auxiliary tasks (such as supervised segmentation) to provide anatomical structure information for the primary registration task, addressing complex deformation challenges in medical image registration. In this work, we propose a multi-task learning framework based on multi-scale dual attention frequency fusion (DAFF-Net), which simultaneously achieves the segmentation masks and dense deformation fields in a single-step estimation. DAFF-Net consists of a global encoder, a segmentation decoder, and a coarse-to-fine pyramid registration decoder. During the registration decoding process, we design the dual attention frequency feature fusion (DAFF) module to fuse registration and segmentation features at different scales, fully leveraging the correlation between the two tasks. The DAFF module optimizes the features through global and local weighting mechanisms. During local weighting, it incorporates both high-frequency and low-frequency information to further capture the features that are critical for the registration task. With the aid of segmentation, the registration learns more precise anatomical structure information, thereby enhancing the anatomical consistency of the warped images after registration. Additionally, due to the DAFF module's outstanding ability to extract effective feature information, we extend its application to unsupervised registration. Extensive experiments on three public 3D brain magnetic resonance imaging (MRI) datasets demonstrate that the proposed DAFF-Net and its unsupervised variant outperform state-of-the-art registration methods across several evaluation metrics, demonstrating the effectiveness of our approach in deformable medical image registration.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# マルチパスアグリゲーションを用いたヒト・マシーン共同視覚のためのオールインワン画像符号化

All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation ( http://arxiv.org/abs/2409.19660v1 )

ライセンス: Link先を確認

Xu Zhang, Peiyao Guo, Ming Lu, Zhan Ma,

(参考訳) マルチタスクアプリケーションのための画像符号化は、人間の知覚とマシンビジョンの両方に対応し、広範囲に研究されている。既存の手法は複数のタスク固有のエンコーダとデコーダのペアに依存しており、パラメータとビットレートの使用のオーバーヘッドが高くなり、あるいは統一された表現の下で多目的最適化の課題に直面し、性能と効率の両方を達成できなかった。そこで本研究では,マルチパスアグリゲーション(MPA, Multi-Path Aggregation)を協調型ヒューマンマシンビジョンのための既存の符号化モデルに統合し,特徴表現をオールインワンアーキテクチャで統一する。 MPAはタスクごとに異なる特徴量に基づいてタスク固有のパスに潜時的特徴を割り当てる予測器を使用し、タスク固有の特徴を後続の改善のために保存しながら、共有機能の有用性を最大化する。特徴相関を利用して、マルチタスク性能劣化を軽減するための2段階最適化戦略を開発する。共有機能の再利用では、1.89%のパラメータが特定のタスクのためにさらに拡張され、微調整され、モデル全体の広範囲な最適化が完全に回避される。実験結果から,MPAは人間の視界と機械分析タスクをまたいだタスク固有および多目的の最適化において,最先端の手法に匹敵する性能を達成していることがわかった。さらに、我々のオールインワン設計は、人間と機械指向の再構築間のシームレスな遷移をサポートし、統一モデルを変更することなくタスク制御可能な解釈を可能にする。コードはhttps://github.com/NJUVISION/MPA.comで入手できる。

Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at https://github.com/NJUVISION/MPA.

翻訳日:2024-11-05 21:58:59 公開日:2024-09-29

# 整数二次計画法の局所探索

Local Search for Integer Quadratic Programming ( http://arxiv.org/abs/2409.19668v1 )

ライセンス: Link先を確認

Xiang He, Peng Lin, Shaowei Cai,

(参考訳) Integer Quadratic Programming (IQP) はオペレーション研究において重要な問題である。局所探索は難しい問題を解くための強力な手法であるが、IQP解決のための局所探索アルゴリズムの研究はまだ初期段階にある。本稿では、LS-IQCQPと呼ばれる一般IQPを解くための効率的な局所探索解法を開発する。目的関数,制約,あるいはその両方で2次項を扱えるIQP用の新しい局所探索演算子を4つ提案する。さらに、2モードの局所探索アルゴリズムを導入し、新たに設計されたスコアリング機能を利用して探索プロセスを強化する。標準IQPベンチマークQPLIBとMINLPLIBで実験を行い、LS-IQCQPと最先端IQPソルバを比較した。実験の結果、LS-IQCQPは最も強力な商用解法であるグロビと競合し、他の最先端解法よりも優れていることが示された。さらに、LS-IQCQPはQPLIBとMINLPLIBのオープンインスタンスに6つの新しいレコードを樹立した。

Integer Quadratic Programming (IQP) is an important problem in operations research. Local search is a powerful method for solving hard problems, but the research on local search algorithms for IQP solving is still on its early stage. This paper develops an efficient local search solver for solving general IQP, called LS-IQCQP. We propose four new local search operators for IQP that can handle quadratic terms in the objective function, constraints or both. Furthermore, a two-mode local search algorithm is introduced, utilizing newly designed scoring functions to enhance the search process. Experiments are conducted on standard IQP benchmarks QPLIB and MINLPLIB, comparing LS-IQCQP with several state-of-the-art IQP solvers. Experimental results demonstrate that LS-IQCQP is competitive with the most powerful commercial solver Gurobi and outperforms other state-of-the-art solvers. Moreover, LS-IQCQP has established 6 new records for QPLIB and MINLPLIB open instances.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 非理想性認識トレーニングは、敵の攻撃に対して間欠的ネットワークをより堅牢にする

Nonideality-aware training makes memristive networks more robust to adversarial attacks ( http://arxiv.org/abs/2409.19671v1 )

ライセンス: Link先を確認

Dovydas Joksas, Luis Muñoz-González, Emil Lupu, Adnan Mehonic,

(参考訳) ニューラルネットワークは現在、オブジェクト分類から自然言語システムまで、幅広い領域に展開されている。 memristorのようなアナログデバイスを使った実装は、より優れた電力効率を約束する。しかし、このようなシステムではデバイス障害が頻発し、全体としては敵攻撃への曝露は広く研究されていない。本研究では, 身体的非観念に対処する一般的な手法である非観念的訓練が, 相手の強靭性にどのように影響するかを検討する。非イデアルがテスト時間中に何に遭遇するかという知識が限られているにもかかわらず、敵の堅牢性は著しく改善されていることが分かりました。

Neural networks are now deployed in a wide number of areas from object classification to natural language systems. Implementations using analog devices like memristors promise better power efficiency, potentially bringing these applications to a greater number of environments. However, such systems suffer from more frequent device faults and overall, their exposure to adversarial attacks has not been studied extensively. In this work, we investigate how nonideality-aware training - a common technique to deal with physical nonidealities - affects adversarial robustness. We find that adversarial robustness is significantly improved, even with limited knowledge of what nonidealities will be encountered during test time.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 視覚豊かな文書理解のための順序関係としてのレイアウト読解順序のモデル化

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding ( http://arxiv.org/abs/2409.19672v1 )

ライセンス: Link先を確認

Chong Zhang, Yi Tu, Yixi Zhao, Chenshu Yuan, Huan Chen, Yue Zhang, Mingxu Chai, Ya Guo, Huijia Zhu, Qi Zhang, Tao Gui,

(参考訳) 視覚的にリッチなドキュメント(VrD)におけるレイアウト読み込み順序のモデル化と活用は、ドキュメント内のリッチな構造セマンティクスを捉えるため、ドキュメントインテリジェンスにおいて重要である。以前の作業は通常、レイアウト要素の置換、すなわちすべてのレイアウト要素を含むシーケンスとしてレイアウト読み込み順序を定式化した。しかし、この定式化はレイアウトの完全な読み出し順序情報を適切に伝達しないため、下流のVrDタスクの性能低下につながる可能性がある。この問題に対処するために、レイアウト要素の集合上の順序関係としてレイアウト読み出し順序をモデル化し、完全な読み出し順序情報に十分な表現能力を有することを提案する。改良型読み順序予測(ROP)に向けた手法の実証評価を可能にするため,レイアウト要素上の関係として読み順序アノテーションを含む包括的なベンチマークデータセットと,従来手法よりも優れた関係抽出に基づく手法を構築した。そこで本研究では,任意のVrDタスク上でのモデル性能向上のために,読み出し順序関係入力を導入することで,読み出し順序対応型パイプラインを提案する。総合的な結果から,パイプラインは一般的に下流VrDタスクに有効であることが示された。(1)読み出し順序関係情報を活用することにより,対象データセットの2つのタスク設定でSOTA結果が得られること,(2)提案したROPモデルによって生成された擬似読み出し順序情報を活用することにより,拡張モデルの性能は3つのモデルすべてと8つのクロスドメインVrD-IE/QAタスク設定で目標最適化なしで改善されている。

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements. However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream VrD tasks. To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the reading order annotation as relations over layout elements, together with a relation-extraction-based method that outperforms previous methods. Moreover, to highlight the practical benefits of introducing the improved form of layout reading order, we propose a reading-order-relation-enhancing pipeline to improve model performance on any arbitrary VrD task by introducing additional reading order relation inputs. Comprehensive results demonstrate that the pipeline generally benefits downstream VrD tasks: (1) with utilizing the reading order relation information, the enhanced downstream models achieve SOTA results on both two task settings of the targeted dataset; (2) with utilizing the pseudo reading order information generated by the proposed ROP model, the performance of the enhanced models has improved across all three models and eight cross-domain VrD-IE/QA task settings without targeted optimization.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 計算生物学におけるシミュレーションに基づく推論の包括的ガイド

A Comprehensive Guide to Simulation-based Inference in Computational Biology ( http://arxiv.org/abs/2409.19675v1 )

ライセンス: Link先を確認

Xiaoyu Wang, Ryan P. Kelly, Adrianne L. Jenner, David J. Warne, Christopher Drovandi,

(参考訳) 計算モデルは現実世界の生物学的プロセスの複雑さを捉えるのに有用である。しかし、特に実世界の観測データを扱う際には、推論タスクに適したアルゴリズムの選択は困難で未探索の領域のままである。このギャップは、特にSBI(Simulation-Based Inference)の領域において、ニューラルネットワークや統計的SBI法のような様々なパラメータ推定アルゴリズムの開発を加速させた。実世界のデータに直面する場合、SBIの方法に対して情報的選択を行う方法に関する限定的な研究があり、しばしばある種のモデルの誤特定をもたらす。本稿では,複雑な生物モデルに対するSBIアプローチを決定するための包括的ガイドラインを提供する。本ガイドラインは,実世界のデータを用いた細胞動態を記述する2つのエージェントベースモデルに適用する。我々の研究では、ニューラルネットワークのSBI手法は推論結果のシミュレーションをはるかに少なく要求する一方で、偏りのある推定を産み出す傾向があります。一方,SBI法の精度はシミュレーションの数が増えるにつれて著しく向上する。この結果から, 計算予算が十分であれば, 統計SBIがニューラルSBIを超えることが示唆された。実世界のシナリオで異なるSBI手法が有効であるだけでなく、神経性SBIアプローチを強化するための潜在的な道筋も示唆している。本研究は,生物モデルにおけるSBIの複雑な景観をナビゲートする計算生物学者にとって有用な資源であると考えられる。

Computational models are invaluable in capturing the complexities of real-world biological processes. Yet, the selection of appropriate algorithms for inference tasks, especially when dealing with real-world observational data, remains a challenging and underexplored area. This gap has spurred the development of various parameter estimation algorithms, particularly within the realm of Simulation-Based Inference (SBI), such as neural and statistical SBI methods. Limited research exists on how to make informed choices on SBI methods when faced with real-world data, which often results in some form of model misspecification. In this paper, we provide comprehensive guidelines for deciding between SBI approaches for complex biological models. We apply the guidelines to two agent-based models that describe cellular dynamics using real-world data. Our study unveils a critical insight: while neural SBI methods demand significantly fewer simulations for inference results, they tend to yield biased estimations, a trend persistent even with robust variants of these algorithms. On the other hand, the accuracy of statistical SBI methods enhances substantially as the number of simulations increases. This finding suggests that, given a sufficient computational budget, statistical SBI can surpass neural SBI in performance. Our results not only shed light on the efficacy of different SBI methodologies in real-world scenarios but also suggest potential avenues for enhancing neural SBI approaches. This study is poised to be a useful resource for computational biologists navigating the intricate landscape of SBI in biological modeling.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# SemiDDM-Weather:オールインワン逆気象除去のための半教師付き学習フレームワーク

SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal ( http://arxiv.org/abs/2409.19679v1 )

ライセンス: Link先を確認

Fang Long, Wenkang Su, Zixuan Li, Lei Cai, Mingjie Li, Yuan-Gen Wang, Xiaochun Cao,

(参考訳) 逆天候除去は、悪天候下で透明な視界を回復することを目的としている。既存の方法は、主に特定の気象タイプに合わせて調整されており、広範囲のラベル付きデータに大きく依存している。この2つの制約に対処するために,教師学生ネットワーク上に構築された半教師付きオールインワン悪天候除去フレームワークを,セミDDMウェザー(SemiDDM-Weather)と呼ぶバックボーンとしてデノイング拡散モデル(DDM)を用いて提案する。筆者らは,SemiDDM-WeatherにおけるDDMバックボーンの設計について,限定ラベルデータを用いた効率的なオールインワン悪天候除去のための多対一マッピング分布の学習を容易にすることを目的とした,カスタマイズされた入力と損失関数を備えたSOTAウェーブレット拡散モデル-Wavediffを採用している。半教師学習において教師ネットワークが生み出す潜在的に不正確な擬似ラベルによる誤学習のリスクを軽減するため,教師ネットワークからの「最適」出力を擬似ラベルとして表示する品質評価と内容整合性制約を導入し,未ラベルデータによる学生ネットワークトレーニングをより効果的に指導する。実験結果から,SemiDDM-Weatherは,総合的および実世界の両方のデータセットにおいて,完全に監督された競合相手と比較して,常に高い視覚的品質と優れた悪天候の除去を提供することが示された。私たちのコードと事前訓練されたモデルは、このリポジトリで利用可能です。

Adverse weather removal aims to restore clear vision under adverse weather conditions. Existing methods are mostly tailored for specific weather types and rely heavily on extensive labeled data. In dealing with these two limitations, this paper presents a pioneering semi-supervised all-in-one adverse weather removal framework built on the teacher-student network with a Denoising Diffusion Model (DDM) as the backbone, termed SemiDDM-Weather. As for the design of DDM backbone in our SemiDDM-Weather, we adopt the SOTA Wavelet Diffusion Model-Wavediff with customized inputs and loss functions, devoted to facilitating the learning of many-to-one mapping distributions for efficient all-in-one adverse weather removal with limited label data. To mitigate the risk of misleading model training due to potentially inaccurate pseudo-labels generated by the teacher network in semi-supervised learning, we introduce quality assessment and content consistency constraints to screen the "optimal" outputs from the teacher network as the pseudo-labels, thus more effectively guiding the student network training with unlabeled data. Experimental results show that on both synthetic and real-world datasets, our SemiDDM-Weather consistently delivers high visual quality and superior adverse weather removal, even when compared to fully supervised competitors. Our code and pre-trained model are available at this repository.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# インストラクション埋め込み:タスク識別に向けてのインストラクションの潜在表現

Instruction Embedding: Latent Representations of Instructions Towards Task Identification ( http://arxiv.org/abs/2409.19680v1 )

ライセンス: Link先を確認

Yiwei Li, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li,

(参考訳) インストラクションデータは、人間レベルのパフォーマンスと整合する大規模言語モデル(LLM)の能力を改善するために不可欠である。近年のLIMAは、アライメントは、モデルが様々なタスクを解決し、事前訓練された知識とスキルを活用するために、命令のインタラクションスタイルやフォーマットを適応させるプロセスであることを示した。したがって、教育データにとって最も重要な側面は、特定の意味論や知識情報ではなく、それが表すタスクである。命令の潜在表現は、データ選択やデモ検索のような、命令関連のタスクで役割を果たす。しかし、それらは常にテキスト埋め込みから派生しており、タスクカテゴリの表現に影響を与える全体的な意味情報を含んでいる。本研究では,そのトレーニングと評価のための新しい概念,命令埋め込み,命令埋め込みベンチマーク(IEB)の構築について紹介する。そこで本研究では,PIE(Prompt-based Instruction Embedding)法を提案する。 PIEの評価は、IPB上に2つのタスクを組み込んだ他の埋め込み手法とともに、タスクカテゴリを正確に識別する上で、優れた性能を示している。さらに、4つの下流タスクへの命令埋め込みの適用は、命令関連タスクの有効性と適合性を示している。

Instruction data is crucial for improving the capability of Large Language Models (LLMs) to align with human-level performance. Recent research LIMA demonstrates that alignment is essentially a process where the model adapts instructions' interaction style or format to solve various tasks, leveraging pre-trained knowledge and skills. Therefore, for instructional data, the most important aspect is the task it represents, rather than the specific semantics and knowledge information. The latent representations of instructions play roles for some instruction-related tasks like data selection and demonstrations retrieval. However, they are always derived from text embeddings, encompass overall semantic information that influences the representation of task categories. In this work, we introduce a new concept, instruction embedding, and construct Instruction Embedding Benchmark (IEB) for its training and evaluation. Then, we propose a baseline Prompt-based Instruction Embedding (PIE) method to make the representations more attention on tasks. The evaluation of PIE, alongside other embedding methods on IEB with two designed tasks, demonstrates its superior performance in accurately identifying task categories. Moreover, the application of instruction embeddings in four downstream tasks showcases its effectiveness and suitability for instruction-related tasks.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 拡散モデルの簡易・高速蒸留

Simple and Fast Distillation of Diffusion Models ( http://arxiv.org/abs/2409.19681v1 )

ライセンス: Link先を確認

Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu,

(参考訳) 拡散に基づく生成モデルは、様々なタスクにまたがって強力な性能を示すが、これはサンプリング速度の遅いコストが伴う。効率的かつ高品質な合成を実現するため, 近年, 蒸留法に基づく加速サンプリング法が開発されている。しかし、一般に、特定の関数評価(NFE)において満足な性能を達成するために、精巧な設計による細かなチューニングに時間を要するため、実際にの使用は困難である。この問題に対処するため,拡散モデルの簡易・高速蒸留(SFD)を提案し,既存の手法で用いられるパラダイムを単純化し,微調整時間を1000$\times$まで短縮する。本研究は,バニラ蒸留法に基づくサンプリング法から始まり,合成効率と品質に影響を及ぼすいくつかの小さいが重要な要因を特定し,対処することにより,その性能を最先端に向上する。また, 単一蒸留モデルを用いて, 可変NFEを用いたサンプリングも行うことができる。大規模な実験により、SFDは、数ステップの画像生成タスクにおいて、サンプル品質と微調整コストのバランスが良好であることを実証した。例えば、SFDはCIFAR-10上で4.53 FID(NFE=2)を達成する。私たちのコードはhttps://github.com/zju-pi/diff-sampler.comから入手可能です。

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000$\times$. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://github.com/zju-pi/diff-sampler.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 半導体二重量子ドットHeisenbergスピントリマーから導出される真のデコヒーレンス自由部分空間

True decoherence-free-subspace derived from a semiconductor double quantum dot Heisenberg spin-trimer ( http://arxiv.org/abs/2409.19683v1 )

ライセンス: Link先を確認

Wonjin Jang, Jehyun Kim, Jaemin Park, Min-Kyun Cho, Hyeongyu Jang, Sangwoo Sim, Hwanchul Jung, Vladimir Umansky, Dohun Kim,

(参考訳) 固体系のスピンは本質的に量子シミュレーションや量子情報処理の量子ビットとして機能する。スピン量子ビットは通常、環境磁場のゆらぎに起因するが、デコヒーレンスフリーサブスペース(DFS)に符号化されたスピン量子ビットは、DFSの特定の構造に依存する特定の環境ノイズから保護することができる。ここでは、反強磁性ハイゼンベルクスピン-1/2トリマーから「真の」DFSを導出し、短波長および長波長の磁場変動に対して量子状態を保護する。 3つの電子をゲート定義のGaAs二重量子ドット(DQD)に閉じ込めたスピントリマーを定義し、量子ドットの1つでウィグナー分子化を利用する。まず、DQD内において、大きな磁場差である$\Delta B_\mathrm{z}$を生成する動的核偏極(DNP)をトリマーとして利用する。大型の$\Delta B_\mathrm{z}$はトリマーの固有スペクトルを著しく変化させ、DQDの「真の」DFSをもたらすことを示す。 DFSエネルギーギャップのリアルタイムベイズ推定は、長波長のものに加えて短波長の磁場変動に対するDFSの保護を明示的に示している。我々の研究は、交換結合型量子ドットスピン鎖のコンパクトDFS構造への道を開いた。

Spins in solid systems can inherently serve as qubits for quantum simulation or quantum information processing. Spin qubits are usually prone to environmental magnetic field fluctuations; however, a spin qubit encoded in a decoherence-free-subspace (DFS) can be protected from certain degrees of environmental noise depending on the specific structure of the DFS. Here, we derive the "true" DFS from an antiferromagnetic Heisenberg spin-1/2 trimer, which protects the qubit states against both short- and long-wavelength magnetic field fluctuations. We define the spin trimer with three electrons confined in a gate-defined GaAs double quantum dot (DQD) where we exploit Wigner-molecularization in one of the quantum dots. We first utilize the trimer for dynamic nuclear polarization (DNP), which generates a sizable magnetic field difference, $\Delta B_\mathrm{z}$, within the DQD. We show that large $\Delta B_\mathrm{z}$ significantly alters the eigenspectrum of the trimer and results in the "true" DFS in the DQD. Real-time Bayesian estimation of the DFS energy gap explicitly demonstrates protection of the DFS against short-wavelength magnetic field fluctuations in addition to long-wavelength ones. Our findings pave the way toward compact DFS structures for exchange-coupled quantum dot spin chains, the internal structure of which can be coherently controlled completely decoupled from environmental magnetic fields.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# MedViLaM:医療データ理解と生成のための高度な一般化性と説明可能性を備えた多モード大言語モデル

MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation ( http://arxiv.org/abs/2409.19684v1 )

ライセンス: Link先を確認

Lijian Xu, Hao Sun, Ziyu Ni, Hongsheng Li, Shaoting Zhang,

(参考訳) 医学は本質的にマルチモーダルでマルチタスクであり、テキストや画像にまたがる多様なデータモダリティがある。しかし、医療分野のほとんどのモデルは単調な単一タスクであり、優れた一般化性と説明性に欠ける。本研究では,医療データに対する汎用モデルに向けた統一視覚言語モデルであるMedViLaMを紹介する。このようなマルチタスクモデルの作成を容易にするため,我々は,連続質問応答,マルチラベル病分類,疾患の局所化,放射線診断レポートの生成と要約という,いくつかの異なるタスクからなる総合的な保持データセットとベンチマークであるMultiMedBenchをキュレートした。 MedViLaMは、すべてのMultiMedBenchタスクで強力なパフォーマンスを示し、他のジェネラリストモデルよりも大幅に優れています。さらに,新たな医療概念やタスクへのゼロショット一般化,タスク間の効果的な伝達学習,ゼロショット医学推論の出現について紹介する。

Medicine is inherently multimodal and multitask, with diverse data modalities spanning text, imaging. However, most models in medical field are unimodal single tasks and lack good generalizability and explainability. In this study, we introduce MedViLaM, a unified vision-language model towards a generalist model for medical data that can flexibly encode and interpret various forms of medical data, including clinical language and imaging, all using the same set of model weights. To facilitate the creation of such multi-task model, we have curated MultiMedBench, a comprehensive pretaining dataset and benchmark consisting of several distinct tasks, i.e., continuous question-answering, multi-label disease classification, disease localization, generation and summarization of radiology reports. MedViLaM demonstrates strong performance across all MultiMedBench tasks, frequently outpacing other generalist models by a significant margin. Additionally, we present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# カラーコード分解・適応・補間による水中生物色強調

Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation ( http://arxiv.org/abs/2409.19685v1 )

ライセンス: Link先を確認

Xiaofeng Cong, Jing Zhang, Yeying Jin, Junming Hou, Yu Zhao, Jie Gui, James Tin-Yau Kwok, Yuan Yan Tang,

(参考訳) 水中画像は、しばしば吸収と散乱効果による品質劣化に悩まされる。既存の水中画像強調アルゴリズムは、単一の固定色画像を生成し、ユーザの柔軟性と応用を制限している。この制限に対処するため,制御可能な色出力の範囲を提供しながら,水中画像の強調を行う「textit{ColorCode}」という手法を提案する。提案手法では、教師付きトレーニングにより水中画像を基準強調画像に復元し、自己再構成とクロスコンストラクションにより色と内容コードに分解する。カラーコードはガウス分布に従うように明示的に制約され、推論中に効率的なサンプリングと補間を可能にする。 ColorCodeには3つの重要な機能がある。 1) 色強調,固定色による強調画像の作成 2) 誘導画像を用いた長波長成分の制御可能な調整を可能にする色適応,及び 3)カラーコードの連続サンプリングにより複数色をスムーズに生成できるカラー補間。人気のある、挑戦的なベンチマークデータセットに対する定量的かつ視覚的な評価は、多様な、制御可能な、色現実的な拡張結果を提供することにおいて、既存のメソッドよりもColorCodeの方が優れていることを示している。ソースコードはhttps://github.com/Xiaofeng-life/ColorCodeで入手できる。

Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our approach involves recovering an underwater image to a reference enhanced image through supervised training and decomposing it into color and content codes via self-reconstruction and cross-reconstruction. The color code is explicitly constrained to follow a Gaussian distribution, allowing for efficient sampling and interpolation during inference. ColorCode offers three key features: 1) color enhancement, producing an enhanced image with a fixed color; 2) color adaptation, enabling controllable adjustments of long-wavelength color components using guidance images; and 3) color interpolation, allowing for the smooth generation of multiple colors through continuous sampling of the color code. Quantitative and visual evaluations on popular and challenging benchmark datasets demonstrate the superiority of ColorCode over existing methods in providing diverse, controllable, and color-realistic enhancement results. The source code is available at https://github.com/Xiaofeng-life/ColorCode.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# 運動マスク付き拡散モデルを用いたテキスト駆動型人体運動生成

Text-driven Human Motion Generation with Motion Masked Diffusion Model ( http://arxiv.org/abs/2409.19686v1 )

ライセンス: Link先を確認

Xingyu Chen,

(参考訳) テキスト駆動型ヒューマンモーション生成は、自然言語で条件付けられた人間のモーションシーケンスを合成するマルチモーダルタスクである。多様な条件入力の下でテキスト記述を満足すると同時に、多様で現実的な人間の行動を生成する必要がある。既存の拡散モデルに基づくアプローチは、生成の多様性と多モード性において優れた性能を持つ。しかし, 自己回帰法と比較して, 拡散法は不満足なFIDスコアにつながる人間の動作特徴の分布に適合しない。 1つの洞察は、拡散モデルは文脈的推論を通して時空間意味論の運動関係を学習する能力が欠けていることである。そこで本研究では,移動列間の文脈的関節から時空間的関係を学習する能力を明確に向上させる,新しい人体運動マスク機構である運動マスク拡散モデル(MMDM)を提案する。また、動的時間特性と空間構造を持つ人間の動作データの複雑さを考慮し、2つのマスクモデリング戦略を設計した: \textbf{time frames mask} と \textbf{body parts mask}。トレーニング中、MMDMはモーション埋め込み空間内の特定のトークンをマスクする。そして、拡散復号器は、各サンプリングステップのマスク埋め込みから全動作シーケンスを学習するように設計され、不完全表現から完全シーケンスを復元することができる。 HumanML3DとKIT-MLデータセットの実験では、動作品質とテキスト-モーションの一貫性のバランスをとることで、マスク戦略が効果的であることが示された。

Text-driven human motion generation is a multimodal task that synthesizes human motion sequences conditioned on natural language. It requires the model to satisfy textual descriptions under varying conditional inputs, while generating plausible and realistic human actions with high diversity. Existing diffusion model-based approaches have outstanding performance in the diversity and multimodality of generation. However, compared to autoregressive methods that train motion encoders before inference, diffusion methods lack in fitting the distribution of human motion features which leads to an unsatisfactory FID score. One insight is that the diffusion model lack the ability to learn the motion relations among spatio-temporal semantics through contextual reasoning. To solve this issue, in this paper, we proposed Motion Masked Diffusion Model \textbf{(MMDM)}, a novel human motion masked mechanism for diffusion model to explicitly enhance its ability to learn the spatio-temporal relationships from contextual joints among motion sequences. Besides, considering the complexity of human motion data with dynamic temporal characteristics and spatial structure, we designed two mask modeling strategies: \textbf{time frames mask} and \textbf{body parts mask}. During training, MMDM masks certain tokens in the motion embedding space. Then, the diffusion decoder is designed to learn the whole motion sequence from masked embedding in each sampling step, this allows the model to recover a complete sequence from incomplete representations. Experiments on HumanML3D and KIT-ML dataset demonstrate that our mask strategy is effective by balancing motion quality and text-motion consistency.

翻訳日:2024-11-05 21:49:14 公開日:2024-09-29

# ラマン分光法による魚介類生化学組成分析のための機械学習

Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis ( http://arxiv.org/abs/2409.19688v1 )

ライセンス: Link先を確認

Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen,

(参考訳) 魚の生化学成分の迅速かつ正確な検出は,魚介類産業における高付加価値製品の最適利用と抽出を容易にする重要な実世界の課題である。ラマン分光法は、機械学習回帰モデルを用いて、ラマンスペクトルと生化学参照データとを関連付けることにより、魚の生化学組成を迅速かつ非破壊的に分析するための有望なソリューションを提供する。本稿では, この課題に対処するさまざまな回帰モデルについて検討し, 水, タンパク質, 脂質の収量を予測するために, 畳み込みニューラルネットワーク(CNN)の新たな設計を提案する。我々の知る限りでは、非常に小さなラマン分光データセットに基づいて魚の生化学的組成を分析するために、CNNを用いて成功した研究を最初に行った。当社のアプローチでは,CNNアーキテクチャと包括的データ準備手順を組み合わせることで,極端なデータ不足による課題を効果的に軽減する。その結果、我々のCNNは2つの最先端のCNNモデルと複数の従来の機械学習モデルを大きく上回り、魚の生化学組成の正確かつ自動分析の道を開くことができた。

The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# InfantCryNet: 幼児の泣き声をインテリジェントに分析するためのデータ駆動フレームワーク

InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries ( http://arxiv.org/abs/2409.19689v1 )

ライセンス: Link先を確認

Mengze Hong, Chen Jason Zhang, Lingxiao Yang, Yuanfeng Song, Di Jiang,

(参考訳) 幼児の泣き声の意味を理解することは、新生児の世話をする若い親にとって重要な課題である。背景雑音の存在とラベル付きデータの欠如は、泣き声を検知し、その根本原因を分析するシステム開発における実践的な課題である。本稿では,これらのタスクを実現するための新しいデータ駆動フレームワーク"InfantCryNet"を提案する。データ不足の問題に対処するために、事前学習された音声モデルを用いて、事前知識をモデルに組み込む。本稿では,より効率的に特徴を抽出するために,統計的プーリングとマルチヘッドアテンションプーリング手法を提案する。さらに、知識蒸留とモデル量子化を適用して、モデル効率を高め、モデルサイズを小さくし、モバイルデバイスの産業展開をより良く支援する。実生活データセットの実験では、提案フレームワークの優れた性能を示し、分類精度が4.4%向上した。モデル圧縮は、性能を損なうことなくモデルサイズを7%、精度を8%低下させるだけで最大28%削減し、モデル選択とシステム設計の実践的な洞察を提供する。

Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# Neural-Polyptych: 多様な遺伝子に対するコンテントコントロール可能な絵画レクリエーション

Neural-Polyptych: Content Controllable Painting Recreation for Diverse Genres ( http://arxiv.org/abs/2409.19690v1 )

ライセンス: Link先を確認

Yiming Zhao, Dewen Guo, Zhouhui Lian, Yue Gao, Jianhong Han, Jie Feng, Guoping Wang, Bingfeng Zhou, Sheng Li,

(参考訳) アーティストと非スペシャリストのギャップを埋めるため,原画の断片にインタラクティブな手描きスケッチをシームレスに組み込むことで,広範かつ高解像度な絵画の作成を容易にする統一的な枠組みであるNeural-Polyptychを提案する。我々は、生成プロセスを2つの部分に分割し、グローバルな特徴とローカルな特徴を識別するマルチスケールのGANアーキテクチャを設計した。ユーザによるスケッチアウトラインから生成されたセマンティックディテールの忠実性を高めるため,我々の参照銀行戦略を利用した対応注意モジュールを提案する。これにより、高品質で複雑な要素をアートワーク内で作成することができる。最終的な結果は、これらの局所的な要素を慎重にブレンドし、一貫性のあるグローバルな一貫性を保つことで達成される。これにより、メガピクセルスケールでデジタル絵画を制作し、多様な芸術表現を収容し、ユーザーが制御された方法でコンテンツを再現することができる。我々は東洋絵画と西洋絵画の両方の多様なジャンルへのアプローチを検証する。大規模塗装, テクスチャシャッフル, ジャンル変更, 壁画の復元, 再構成などの応用は, 我々の枠組みに基づいて実現可能である。

To bridge the gap between artists and non-specialists, we present a unified framework, Neural-Polyptych, to facilitate the creation of expansive, high-resolution paintings by seamlessly incorporating interactive hand-drawn sketches with fragments from original paintings. We have designed a multi-scale GAN-based architecture to decompose the generation process into two parts, each responsible for identifying global and local features. To enhance the fidelity of semantic details generated from users' sketched outlines, we introduce a Correspondence Attention module utilizing our Reference Bank strategy. This ensures the creation of high-quality, intricately detailed elements within the artwork. The final result is achieved by carefully blending these local elements while preserving coherent global consistency. Consequently, this methodology enables the production of digital paintings at megapixel scale, accommodating diverse artistic expressions and enabling users to recreate content in a controlled manner. We validate our approach to diverse genres of both Eastern and Western paintings. Applications such as large painting extension, texture shuffling, genre switching, mural art restoration, and recomposition can be successfully based on our framework.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# CERD:エッセイにおける修辞的理解と生成のための総合的な中国の修辞的データセット

CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays ( http://arxiv.org/abs/2409.19691v1 )

ライセンス: Link先を確認

Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia,

(参考訳) 既存の修辞的理解と生成データセットやコーパスは、主に単一の粗いカテゴリまたは細かなカテゴリに焦点を当て、独立したサブタスクとして扱うことで、異なる修辞的装置間の共通的な相互関係を無視している。本稿では, 比喩, 人格化, ハイパーボレ, 並列性を含む4つの大まかなカテゴリと, 形と内容の双方で23の細かなカテゴリから構成される中国のエッセイ・レトリック・データセット(CERD)を提案する。 CERDは、手動で注釈付きで包括的な中国の修辞的データセットで、5つの相互関連サブタスクがある。過去の研究と異なり,我々のデータセットは,様々な修辞装置を理解し,対応する修辞要素を認識し,与えられた条件下で修辞文を生成することを支援する。 CERDにおける複数のタスク間の相互関係を実証し、将来の修辞学研究のためのベンチマークを確立するために、広範囲な実験を行った。実験結果から,大規模言語モデルがほとんどのタスクで最高のパフォーマンスを達成し,複数のタスクを共同で微調整することで,パフォーマンスがさらに向上することが示唆された。

Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including metaphor, personification, hyperbole and parallelism and 23 fine-grained categories across both form and content levels. CERD is a manually annotated and comprehensive Chinese rhetoric dataset with five interrelated sub-tasks. Unlike previous work, our dataset aids in understanding various rhetorical devices, recognizing corresponding rhetorical components, and generating rhetorical sentences under given conditions, thereby improving the author's writing proficiency and language usage skills. Extensive experiments are conducted to demonstrate the interrelations between multiple tasks in CERD, as well as to establish a benchmark for future research on rhetoric. The experimental results indicate that Large Language Models achieve the best performance across most tasks, and jointly fine-tuning with multiple tasks further enhances performance.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# 強雑音ラベル検出器としての視覚言語モデル

Vision-Language Models are Strong Noisy Label Detectors ( http://arxiv.org/abs/2409.19696v1 )

ライセンス: Link先を確認

Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang,

(参考訳) 微調整型視覚言語モデルに関する最近の研究は、様々な下流タスクにおいて印象的な性能を示している。しかし、実世界のアプリケーションで正確にラベル付けされたデータを得るという課題は、微調整の過程で大きな障害となる。この課題に対処するために、視覚言語モデルに適応するためのDeFTと呼ばれるDenoising Fine-Tuningフレームワークを提案する。 DeFTは、何百万もの補助的な画像テキストペアで事前訓練されたテキストと視覚的特徴のロバストなアライメントを利用して、ノイズの多いラベルを抽出する。提案フレームワークは,各クラスに対して正および負のテキストプロンプトを学習することにより,ノイズのあるラベル検出を行う。正のプロンプトはクラスの特徴を明らかにしようとするが、負のプロンプトはクリーンでノイズの多いサンプルを分離するための学習可能なしきい値となる。我々は、学習したテキストプロンプトとのアライメントを促進するために、事前学習されたビジュアルエンコーダの適応にパラメータ効率の微調整を用いる。一般的なフレームワークとして、DeFTは慎重に選択されたクリーンサンプルを利用して、多くの事前訓練されたモデルを下流タスクにシームレスに微調整することができる。 7つの合成および実世界のノイズデータセットの実験結果から,ノイズラベル検出と画像分類の両方においてDeFTの有効性が検証された。

Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples. We employ parameter-efficient fine-tuning for the adaptation of a pre-trained visual encoder to promote its alignment with the learned textual prompts. As a general framework, DeFT can seamlessly fine-tune many pre-trained models to downstream tasks by utilizing carefully selected clean samples. Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# フォック状態格子におけるダークステートエンジニアリング

Dark-state engineering in Fock-state lattices ( http://arxiv.org/abs/2409.19697v1 )

ライセンス: Link先を確認

Xuan Zhao, Yi Xu, Le-Man Kuang, Jie-Qiao Liao,

(参考訳) フォック状態格子(FSL)は、量子物理学において新たなホットスポットになりつつある。これは、FSLが原子-場相互作用の研究の新しい視点を提供するだけでなく、量子光学と凝縮物質物理学の接続を構築するためでもある。格子内の複数の遷移経路のため、これらの系には固有の量子干渉効果が存在し、新しい量子コヒーレント現象を発見し、それらの応用を利用する方法がこの分野で重要かつ望ましい課題となっている。本研究では,マルチモードJaynes-Cummings(JC)モデルにより生成されたFSLの暗黒状態効果について検討する。ある種の励起数部分空間におけるFSLを考慮し、矢頭行列法を用いて原子励起状態に関連する状態に関する暗黒状態について検討する。直交ダークステートの数によって決定される次元を持つダークステート部分空間が存在することが分かる。次元が 1 より大きいとき、これらのダークステート基底の形式はユニークではない。さらに,2モード,3モード,4モードのJCモデルにおいて,直交暗黒状態の数と形状を求める。さらに、一般的な$N$-mode JCモデルに対して、$n$-励起部分空間に$C_{N+n-2}^{N-2}$直交ダーク状態が存在することが分かる。ダークモードとダークステートの関係も構築しています。我々の研究は、FSLに基づく量子光学効果と量子情報処理の探求の道を開く。

Fock-state lattices (FSLs) are becoming an emerging research hotspot in quantum physics, not only because the FSLs provide a new perspective for studying atom-field interactions, but also because they build the connection between quantum optics and condensed matter physics. Owing to the multiple transition paths in the lattices, inherent quantum interference effect exists in these systems, and hence how to find new quantum coherent phenomena and exploit their applications becomes a significant and desired task in this field. In this work, we study the dark-state effect in the FSLs generated by the multimode Jaynes-Cummings (JC) models. By considering the FSLs in certain-excitation-number subspaces, we study the dark states with respect to the states associated with the atomic excited state using the arrowhead-matrix method. We find that there exist dark-state subspaces with the dimensions determined by the number of orthogonal dark states. When the dimension is larger than one, the forms of these dark-state bases are not unique. Further, we obtain the number and form of the orthogonal dark states in the two-, three-, and four-mode JC models. In addition, we find that for a general $N$-mode JC model, there are $C_{N+n-2}^{N-2}$ orthogonal dark states in the $n$-excitation subspace. We also build the relationship between the dark modes and dark states. Our work will pave the way for exploring quantum optical effects and quantum information processing based on the FSLs.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# 局所測定による安定化器符号部分空間の有効検証

Efficient Verification of Stabilizer Code Subspaces with Local Measurements ( http://arxiv.org/abs/2409.19699v1 )

ライセンス: Link先を確認

Congcong Zheng, Xutao Yu, Zaichen Zhang, Ping Xu, Kun Wang,

(参考訳) 本稿では、特定の安定化器符号で保護されるように設計された量子コンピュータが、対応する論理量子ビットを正しく符号化するかどうかを検証するタスクに対処する。これを実現するために,サブスペース検証のための汎用フレームワークを開発し,実用上重要な安定化符号部分空間を探索する。まず、安定化器コード部分空間に対する2つの効率的な検証方法を提案し、安定化器生成器と安定化器群の測定をそれぞれ利用した。次に,その部分空間が特定の構造特性を示すとき,ある試験を並列に行うことができるという観測に基づいて,グラフコード部分空間に適した着色戦略と,Calderbank-Shor-Steane (CSS)コード部分空間に適したXZ戦略を提案する。安定化器ベースの戦略と比較して、これらの新しい戦略は測定設定を著しく減らし、状態コピーを減らし、準グローバルな最適性に近づいた。特に、全ての戦略は限られた数のパウリ測度を使用し、非適応的であり、混合状態に作用し、雑音量子コンピュータにおける論理量子ビットと論理演算の両方の効率的な実験的な証明を可能にする。この研究は、局所測定による安定化符号部分空間の効率的な検証に関する最初の体系的研究に寄与する。

We address the task of verifying whether a quantum computer, designed to be protected by a specific stabilizer code, correctly encodes the corresponding logical qubits. To achieve this, we develop a general framework for subspace verification and explore several stabilizer code subspaces of practical significance. First, we present two efficient verification strategies for general stabilizer code subspaces, utilizing measurements of their stabilizer generators and stabilizer groups, respectively. Then, building on the observation that certain tests can be conducted in parallel when the subspace exhibits specific structural properties, we propose a coloring strategy tailored to graph code subspaces and an XZ strategy tailored to Calderbank-Shor-Steane (CSS) code subspaces. Compared to stabilizer-based strategies, these new strategies require significantly fewer measurement settings and consume fewer state copies, approaching near-global optimality. Notably, all the strategies employ a limited number of Pauli measurements, are non-adaptive, and work on mixed states, enabling efficient experimental certification of both logical qubits and logical operations in noisy quantum computers. This work contributes to the first systematic study of efficient verification of stabilizer code subspaces with local measurements.

翻訳日:2024-11-05 21:39:30 公開日:2024-09-29

# 適応型U-Netアーキテクチャを用いたUAVからの農業画像のハイパースペクトルアンミックス

Hyperspectral Unmixing of Agricultural Images taken from UAV Using Adapted U-Net Architecture ( http://arxiv.org/abs/2409.19701v1 )

ライセンス: Link先を確認

Vytautas Paura, Virginijus Marcinkevičius,

(参考訳) ハイパースペクトルアンミキシング法(Hyperspectral unmixing method)は、ハイパースペクトルデータ立方体ピクセルから物質(通常はエンドメンバーと呼ばれる)データを抽出するアルゴリズムである。ハイパースペクトルセンサデータの空間分解能が低いため、各画素は複数のエンドメンバーからの混合情報を含むことができる。本稿では,UAVに搭載されたハイパースペクトルカメラによって収集されたブルーベリーフィールドデータから作成したハイパースペクトルアンミックスデータセットを作成する。また,U-Netネットワークアーキテクチャに基づくハイパースペクトルアンミキシングアルゴリズムを提案し,既存のハイパースペクトルアンミキシングデータセットに対してより正確なアンミキシング結果を実現する。

The hyperspectral unmixing method is an algorithm that extracts material (usually called endmember) data from hyperspectral data cube pixels along with their abundances. Due to a lower spatial resolution of hyperspectral sensors data in each of the pixels may contain mixed information from multiple endmembers. In this paper we create a hyperspectral unmixing dataset, created from blueberry field data gathered by a hyperspectral camera mounted on a UAV. We also propose a hyperspectral unmixing algorithm based on U-Net network architecture to achieve more accurate unmixing results on existing and newly created hyperspectral unmixing datasets.

翻訳日:2024-11-05 21:29:26 公開日:2024-09-29

# 大規模言語モデルのための認証されたロバストな透かし

A Certified Robust Watermark For Large Language Models ( http://arxiv.org/abs/2409.19708v1 )

ライセンス: Link先を確認

Xianheng Feng, Jian Liu, Kui Ren, Chun Chen,

(参考訳) AI生成テキスト識別における透かしアルゴリズムの有効性は注目されている。同時に、様々なウォーターマーク攻撃に対するロバスト性を高めるために、ウォーターマークアルゴリズムの数が増加している。しかし、これらの透かしアルゴリズムは、適応攻撃や見当たらない攻撃の影響を受けやすいままである。この問題に対処するため,我々は,ランダムな平滑化に基づく大規模言語モデルに対して,最初の確証付き頑健な透かしアルゴリズムを提案し,透かし付きテキストの保証を提供する。具体的には、透かし生成と検出にそれぞれ2つの異なるモデルを使用し、透かし検出器のトレーニングおよび推論段階における埋め込みおよび置換空間にガウスノイズと均一ノイズを加え、透かし検出器の信頼性を向上し、認証半径を導出する。透かしアルゴリズムの実証的ロバスト性および証明的ロバスト性を評価するため,包括的実験を行った。その結果,本アルゴリズムはベースラインアルゴリズムに匹敵する性能を示す一方で,精度の高いロバスト性が得られることが示唆された。

The effectiveness of watermark algorithms in AI-generated text identification has garnered significant attention. Concurrently, an increasing number of watermark algorithms have been proposed to enhance the robustness against various watermark attacks. However, these watermark algorithms remain susceptible to adaptive or unseen attacks. To address this issue, to our best knowledge, we propose the first certified robust watermark algorithm for large language models based on randomized smoothing, which can provide provable guarantees for watermarked text. Specifically, we utilize two different models respectively for watermark generation and detection and add Gaussian and Uniform noise respectively in the embedding and permutation space during the training and inference stages of the watermark detector to enhance the certified robustness of our watermark detector and derive certified radius. To evaluate the empirical robustness and certified robustness of our watermark algorithm, we conducted comprehensive experiments. The results indicate that our watermark algorithm shows comparable performance to baseline algorithms while our algorithm can derive substantial certified robustness, which means that our watermark can not be removed even under significant alterations.

翻訳日:2024-11-05 21:29:26 公開日:2024-09-29

# 脳記録からの音声テキストの非侵襲復号のためのマルチモーダルLLM

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings ( http://arxiv.org/abs/2409.19710v1 )

ライセンス: Link先を確認

Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni,

(参考訳) 人工知能における脳関連研究トピックは、特にコンピュータビジョンから自然言語処理へのマルチモーダルアーキテクチャの拡張により、最近人気が高まっている。本研究の主な目的は、非侵襲的なfMRI記録からの音声テキスト復号におけるこれらのアーキテクチャの可能性と限界を探ることである。視覚とテキストデータとは対照的に、fMRIデータは脳スキャナーの多様性による複雑なモダリティを表す。 i) 記録された信号形式の種類二生信号の低分解能及び雑音三生成学習の基礎モデルとして活用できる事前学習モデルの不足これらの点は、fMRI記録からのテキストの非侵襲的復号化の問題を非常に困難にしている。本稿では,fMRI信号から音声テキストを復号するマルチモーダルLLMを提案する。提案されているアーキテクチャは一エンコーダの増設埋込層を組み込んだ特定の変圧器から派生したエンコーダであって、最先端にあるものよりも優れた調整された注意機構を有するもの (II) 入力テキストの埋め込みと脳活動のエンコード埋め込みを調整して出力テキストを復号化するために適合した凍結した大言語モデル。 fMRIと会話信号が同期的に記録される、人間-ロボット相互作用と人間-ロボット相互作用のセットからなるコーパス上で行われたベンチマーク。提案手法は評価されたモデルより優れており, より正確な意味を抽出するテキストを生成することができるため, 得られた結果は非常に有望である。実装コードはhttps://github.com/Hmamouche/brain_decode.comで提供されている。

Brain-related research topics in artificial intelligence have recently gained popularity, particularly due to the expansion of what multimodal architectures can do from computer vision to natural language processing. Our main goal in this work is to explore the possibilities and limitations of these architectures in spoken text decoding from non-invasive fMRI recordings. Contrary to vision and textual data, fMRI data represent a complex modality due to the variety of brain scanners, which implies (i) the variety of the recorded signal formats, (ii) the low resolution and noise of the raw signals, and (iii) the scarcity of pretrained models that can be leveraged as foundation models for generative learning. These points make the problem of the non-invasive decoding of text from fMRI recordings very challenging. In this paper, we propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals. The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art, and (ii) a frozen large language model adapted to align the embedding of the input text and the encoded embedding of brain activity to decode the output text. A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously. The obtained results are very promising, as our proposal outperforms the evaluated models, and is able to generate text capturing more accurate semantics present in the ground truth. The implementation code is provided in https://github.com/Hmamouche/brain_decode.

翻訳日:2024-11-05 21:29:26 公開日:2024-09-29

# 後方コンフォーマル予測

Posterior Conformal Prediction ( http://arxiv.org/abs/2409.19712v1 )

ライセンス: Link先を確認

Yao Zhang, Emmanuel J. Candès,

(参考訳) コンフォーマル予測は、分布のないカバレッジ保証を伴う予測区間を構築するための一般的な手法である。カバー範囲は極端であり、人口全体に対して平均的にしか持たないが、必ずしも特定のサブグループに限らない。本稿では,データから自然に発見されたクラスタ(あるいはサブグループ)に対して,境界条件と近似条件の双方で予測間隔を生成する手法である後部共形予測(PCP)を提案する。 PCPは、クラスタ分布の混合として条件整合スコア分布をモデル化することにより、これらの保証を達成する。近似条件妥当性を持つ他の手法と比較して、本手法は特に検証データによく表されるクラスタからテストデータが引き出される場合、より厳密な間隔を生じる。 PCPはまた、ユーザが指定したサブグループの条件付きカバレッジを保証するためにも適用できる。分類において、PCPの基礎となる理論は、分類器の信頼度に基づいてカバレッジレベルを調整でき、標準の共形予測セットよりもはるかに小さな集合を達成できる。社会経済・科学・医療分野の多様なデータセットを対象としたPCPの性能評価を行った。

Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, meaning it only holds on average over the entire population but not necessarily for any specific subgroup. This article introduces a new method, posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional validity for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional conformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional validity, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it achieves robust coverage on smaller subgroups within the specified subgroups. In classification, the theory underlying PCP allows for adjusting the coverage level based on the classifier's confidence, achieving significantly smaller sets than standard conformal prediction sets. We evaluate the performance of PCP on diverse datasets from socio-economic, scientific and healthcare applications.

翻訳日:2024-11-05 21:29:26 公開日:2024-09-29

# 配電系統運用者のメタデータを用いた低電圧給電装置のピーク対応擬似測定生成

Generating peak-aware pseudo-measurements for low-voltage feeders using metadata of distribution system operators ( http://arxiv.org/abs/2409.19713v1 )

ライセンス: Link先を確認

Manuel Treutlein, Marc Schmidt, Roman Hahn, Matthias Hertel, Benedikt Heidrich, Ralf Mikut, Veit Hagenmeyer,

(参考訳) 配電系統オペレーター(DSO)は、気候中立性経路に沿った配電網の再構築や、配電網の消費と発生を管理し制御する能力など、新たな課題に対処しなければならない。課題を満たすため、分散グリッド内の測定は、しばしばDSOの基礎を形成する。したがって、測定装置が多くの低電圧(LV)グリッドにインストールされていないことは緊急の問題である。この問題を解決するために,回帰モデルを用いて,各供給者のメタデータに基づいて,非測定型LVフィードの擬似測定を推定する手法を提案する。供給者メタデータは、グリッド接続点数、消費者及び生産者の設置電力、下流のLVグリッドにおける請求データを含む。さらに、天気データ、カレンダーデータ、タイムスタンプ情報をモデル機能として使用しています。既存の測定値はモデルターゲットとして使用される。 2,323LVのフィードインを特徴とする大規模実世界のデータセット上での擬似測定を広範囲に評価した。この目的のために,BigDEAL チャレンジにインスパイアされたピークメトリクスを導入し,消費とフィードインの両面でのピークサイズ,タイミング,形状について検討する。回帰モデルとして、XGBoost、多層パーセプトロン(MLP)、線形回帰(LR)を用いる。 XGBoost と MLP は LR よりも優れていた。さらに, 気象, カレンダー, タイムスタンプの異なる条件に適応し, フィードのメタデータに基づいて現実的な負荷曲線を生成できることを示した。将来的には、サブステーション変圧器のような他のグリッドレベルにも適用でき、負荷モデリングや状態推定、LV負荷予測といった研究分野を補うことができる。

Distribution system operators (DSOs) must cope with new challenges such as the reconstruction of distribution grids along climate neutrality pathways or the ability to manage and control consumption and generation in the grid. In order to meet the challenges, measurements within the distribution grid often form the basis for DSOs. Hence, it is an urgent problem that measurement devices are not installed in many low-voltage (LV) grids. In order to overcome this problem, we present an approach to estimate pseudo-measurements for non-measured LV feeders based on the metadata of the respective feeder using regression models. The feeder metadata comprise information about the number of grid connection points, the installed power of consumers and producers, and billing data in the downstream LV grid. Additionally, we use weather data, calendar data and timestamp information as model features. The existing measurements are used as model target. We extensively evaluate the estimated pseudo-measurements on a large real-world dataset with 2,323 LV feeders characterized by both consumption and feed-in. For this purpose, we introduce peak metrics inspired by the BigDEAL challenge for the peak magnitude, timing and shape for both consumption and feed-in. As regression models, we use XGBoost, a multilayer perceptron (MLP) and a linear regression (LR). We observe that XGBoost and MLP outperform the LR. Furthermore, the results show that the approach adapts to different weather, calendar and timestamp conditions and produces realistic load curves based on the feeder metadata. In the future, the approach can be adapted to other grid levels like substation transformers and can supplement research fields like load modeling, state estimation and LV load forecasting.

翻訳日:2024-11-05 21:29:26 公開日:2024-09-29

# 安全ヒートポンプ制御のための拘束強化学習

Constrained Reinforcement Learning for Safe Heat Pump Control ( http://arxiv.org/abs/2409.19716v1 )

ライセンス: Link先を確認

Baohe Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker,

(参考訳) 制約強化学習(RL:Constrained Reinforcement Learning)は、様々な制御タスクにおける安全性とパフォーマンスを高めるために、報酬への制約の統合が不可欠であるRL内の重要な研究領域として登場した。建物内の暖房システムの文脈では、住民の熱快適性を保ちながらエネルギー効率を最適化することは、制約付き最適化問題として直感的に定式化することができる。しかし、それをRLで解くには大量のデータが必要になるかもしれない。そのため、正確で多用途なシミュレータが好まれる。本稿では,異なる用途のインタフェースを提供する新しいビルディングシミュレータI4Bを提案するとともに,線形平滑ログバリア関数(CSAC-LB)を用いた制約付きソフトアクター・クリティカルというモデルレス制約付きRLアルゴリズムを加熱最適化問題に適用する。ベースラインアルゴリズムに対するベンチマークは、CSAC-LBのデータ探索、制約満足度、性能における効率を示す。

Constrained Reinforcement Learning (RL) has emerged as a significant research area within RL, where integrating constraints with rewards is crucial for enhancing safety and performance across diverse control tasks. In the context of heating systems in the buildings, optimizing the energy efficiency while maintaining the residents' thermal comfort can be intuitively formulated as a constrained optimization problem. However, to solve it with RL may require large amount of data. Therefore, an accurate and versatile simulator is favored. In this paper, we propose a novel building simulator I4B which provides interfaces for different usages and apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# 分散シフト下における時系列予測のためのマルチスケール正規化の展開

Evolving Multi-Scale Normalization for Time Series Forecasting under Distribution Shifts ( http://arxiv.org/abs/2409.19718v1 )

ライセンス: Link先を確認

Dalin Qin, Yehui Li, Weiqi Chen, Zhaoyang Zhu, Qingsong Wen, Liang Sun, Pierre Pinson, Yi Wang,

(参考訳) 複雑な分布シフトは、正確な長期時系列予測を達成するための主要な障害である。分布特性を把握し、分布シフトの影響を軽減するための適応正規化手法を提案するために、いくつかの試みがなされている。しかし、これらの手法は、様々なスケールで観測される複雑な分布ダイナミクスと、分布力学と正規化された写像関係の進化する機能を無視している。そこで本研究では,分散シフト問題に対処するモデル非依存型マルチスケール正規化(EvoMSN)フレームワークを提案する。マルチスケール統計予測モジュールと適応アンサンブルに基づくフレキシブル正規化と非正規化を提案する。予測モデルと統計予測モジュールを協調的に更新して、シフトする分布を追跡するために、進化的最適化戦略が設計されている。ベンチマークデータセット上での5つの主流予測手法の性能向上におけるEvoMSNの有効性を評価するとともに,既存の高度正規化やオンライン学習手法と比較して,その優位性を示す。コードはhttps://github.com/qindalin/EvoMSN.comで公開されている。

Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving functions of distribution dynamics and normalized mapping relationships. To this end, we propose a novel model-agnostic Evolving Multi-Scale Normalization (EvoMSN) framework to tackle the distribution shift problem. Flexible normalization and denormalization are proposed based on the multi-scale statistics prediction module and adaptive ensembling. An evolving optimization strategy is designed to update the forecasting model and statistics prediction module collaboratively to track the shifting distributions. We evaluate the effectiveness of EvoMSN in improving the performance of five mainstream forecasting methods on benchmark datasets and also show its superiority compared to existing advanced normalization and online learning approaches. The code is publicly available at https://github.com/qindalin/EvoMSN.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# FAST:全スライド画像分類のための2層Few-Shot学習パラダイム

FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification ( http://arxiv.org/abs/2409.19720v1 )

ライセンス: Link先を確認

Kexue Fu, Xiaoyuan Luo, Linhao Qu, Shuo Wang, Ying Xiong, Ilias Maglogiannis, Longxiang Gao, Manning Wang,

(参考訳) 高価な微粒化アノテーションとデータ不足は、ディープラーニングベースのWSI分類アルゴリズムを臨床実践で広く採用する上で、主要な障害となっている。各画像のラベルを活用できる自然画像の少数ショット学習法とは異なり、既存の少数ショットWSI分類法では、高価な微粒なアノテーションを避けるために、少数の微粒なラベルまたは弱い教師付きスライドラベルしか使用していない。利用可能なWSIを十分にマイニングすることができず、WSI分類性能を著しく制限しています。上記の課題に対処するため,WSI分類のためのFASTという,新規で効率的な2層複数ショット学習パラダイムを提案する。 FASTはデュアルレベルアノテーション戦略とデュアルブランチ分類フレームワークで構成されている。まず、高価なきめ細かいアノテーションを避けるために、スライドレベルで非常に少数のWSIを集め、非常に少数のパッチを注釈付けします。そして、利用可能なWSIを完全にマイニングするために、すべてのパッチと利用可能なパッチラベルを使用してキャッシュブランチを構築します。また、キャッシュブランチに加えて、学習可能なプロンプトベクトルを含む事前ブランチを構築し、パッチ分類のための視覚言語モデルのテキストエンコーダを使用する。最後に、WSI分類を達成するために、両方のブランチの結果を統合する。バイナリおよびマルチクラスデータセットに対する大規模な実験により,提案手法は既存の少数ショット分類法をはるかに上回り,アノテーションコストがわずか0.22$\%の完全教師付き手法の精度にアプローチすることを示した。すべてのコードとモデルはhttps://github.com/fukexue/FASTで公開される。

The expensive fine-grained annotation and data scarcity have become the primary obstacles for the widespread adoption of deep learning-based Whole Slide Images (WSI) classification algorithms in clinical practice. Unlike few-shot learning methods in natural images that can leverage the labels of each image, existing few-shot WSI classification methods only utilize a small number of fine-grained labels or weakly supervised slide labels for training in order to avoid expensive fine-grained annotation. They lack sufficient mining of available WSIs, severely limiting WSI classification performance. To address the above issues, we propose a novel and efficient dual-tier few-shot learning paradigm for WSI classification, named FAST. FAST consists of a dual-level annotation strategy and a dual-branch classification framework. Firstly, to avoid expensive fine-grained annotation, we collect a very small number of WSIs at the slide level, and annotate an extremely small number of patches. Then, to fully mining the available WSIs, we use all the patches and available patch labels to build a cache branch, which utilizes the labeled patches to learn the labels of unlabeled patches and through knowledge retrieval for patch classification. In addition to the cache branch, we also construct a prior branch that includes learnable prompt vectors, using the text encoder of visual-language models for patch classification. Finally, we integrate the results from both branches to achieve WSI classification. Extensive experiments on binary and multi-class datasets demonstrate that our proposed method significantly surpasses existing few-shot classification methods and approaches the accuracy of fully supervised methods with only 0.22$\%$ annotation costs. All codes and models will be publicly available on https://github.com/fukexue/FAST.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# パーソナリティトラストの探索:対話における説明可能なパーソナリティ認識のための新しいベンチマークデータセット

Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues ( http://arxiv.org/abs/2409.19723v1 )

ライセンス: Link先を確認

Lei Sun, Jinming Zhao, Qin Jin,

(参考訳) パーソナリティ認識は,対話やソーシャルメディア投稿などのユーザデータに含まれる性格特性を識別することを目的としている。現在の研究では、主にパーソナリティを分類課題として扱い、認識されたパーソナリティを裏付ける証拠を明らかにしていない。本稿では,人格特性の証拠として推論過程を明らかにすることを目的とした,説明可能なパーソナリティ認識という新しいタスクを提案する。人格論に触発された性格特性は、特定の瞬間における具体的な状況下での思考、感情、行動の短期的な特徴パターンである人格状態の安定パターンで構成されている。本稿では、特定の文脈から短期的な人格状態、長期的な人格特性への推論プロセスを含む、CoPE(Chain-of-Personality-Evidence)と呼ばれる説明可能な人格認識フレームワークを提案する。さらに,CoPEフレームワークをベースとして,対話から説明可能なパーソナリティ認識データセットであるPersonalityEvdを構築した。本稿では2つのパーソナリティ状態認識タスクと説明可能なパーソナリティ特性認識タスクを導入し,モデルがパーソナリティ状態と特徴ラベルとそれに対応するサポートエビデンスを認識することを要求する。この2つの課題に関する大規模言語モデルに基づく広範な実験により,人格の特徴を明らかにすることは極めて困難であることが示され,今後の研究にいくつかの知見が提示される。私たちのデータとコードはhttps://github.com/Lei-Sun-RUC/PersonalityEvd.comで公開されています。

Personality recognition aims to identify the personality traits implied in user data such as dialogues and social media posts. Current research predominantly treats personality recognition as a classification task, failing to reveal the supporting evidence for the recognized personality. In this paper, we propose a novel task named Explainable Personality Recognition, aiming to reveal the reasoning process as supporting evidence of the personality trait. Inspired by personality theories, personality traits are made up of stable patterns of personality state, where the states are short-term characteristic patterns of thoughts, feelings, and behaviors in a concrete situation at a specific moment in time. We propose an explainable personality recognition framework called Chain-of-Personality-Evidence (CoPE), which involves a reasoning process from specific contexts to short-term personality states to long-term personality traits. Furthermore, based on the CoPE framework, we construct an explainable personality recognition dataset from dialogues, PersonalityEvd. We introduce two explainable personality state recognition and explainable personality trait recognition tasks, which require models to recognize the personality state and trait labels and their corresponding support evidence. Our extensive experiments based on Large Language Models on the two tasks show that revealing personality traits is very challenging and we present some insights for future research. Our data and code are available at https://github.com/Lei-Sun-RUC/PersonalityEvd.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# DataDRILL:掘削リグの生成圧力予測とキック検出

DataDRILL: Formation Pressure Prediction and Kick Detection for Drilling Rigs ( http://arxiv.org/abs/2409.19724v1 )

ライセンス: Link先を確認

Murshedul Arifeen, Andrei Petrovski, Md Junayed Hasan, Igor Kotenko, Maksim Sletov, Phil Hassard,

(参考訳) 製造工程の意思決定とコスト効率を大幅に向上させるため, 掘削作業において, 生成圧力とキック検出の正確なリアルタイム予測が不可欠である。データ駆動型モデルは、形成圧力を予測し、キックを検出することによって掘削作業を自動化することで人気を集めている。しかし、現在の文献では、掘削リグの分野での研究を進めるために、サポートデータセットが公開されていないため、この領域の技術的進歩を妨げている。本稿では,石油・ガス掘削研究を強化するインテリジェントアルゴリズムの開発を支援するために,新たな2つのデータセットを提案する。データセットには、28の掘削変数と2000以上のデータサンプルを備えた、形成圧力予測とキック検出のためのデータサンプルが含まれている。主成分回帰は生成圧力を予測するために使用され、主成分分析はデータセットの技術検証のためのキックを特定するために使用される。特に、主成分回帰に対するR2と残留予測偏差スコアはそれぞれ0.78と0.922である。

Accurate real-time prediction of formation pressure and kick detection is crucial for drilling operations, as it can significantly improve decision-making and the cost-effectiveness of the process. Data-driven models have gained popularity for automating drilling operations by predicting formation pressure and detecting kicks. However, the current literature does not make supporting datasets publicly available to advance research in the field of drilling rigs, thus impeding technological progress in this domain. This paper introduces two new datasets to support researchers in developing intelligent algorithms to enhance oil/gas well drilling research. The datasets include data samples for formation pressure prediction and kick detection with 28 drilling variables and more than 2000 data samples. Principal component regression is employed to forecast formation pressure, while principal component analysis is utilized to identify kicks for the dataset's technical validation. Notably, the R2 and Residual Predictive Deviation scores for principal component regression are 0.78 and 0.922, respectively.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# ネットワーク・プルーニングが性能と解釈可能性に及ぼす影響の検討

Investigating the Effect of Network Pruning on Performance and Interpretability ( http://arxiv.org/abs/2409.19727v1 )

ライセンス: Link先を確認

Jonathan von Rad, Florian Seuffert,

(参考訳) ディープニューラルネットワーク(DNN)は、タスクに対して過度にパラメータ化され、重みを取り除くことで大幅に圧縮される、プルーニングと呼ばれるプロセスである。異なる刈り取り技術がGoogLeNetの分類性能と解釈性に与える影響について検討する。ネットワークに非構造的および構造的プルーニングと接続間隔(入力重みのプルーニング)手法を体系的に適用し、ImageNetの検証セット上でのネットワークの性能に関する結果を分析する。また,反復刈り込みやワンショット刈り込みなど,異なるトレーニング戦略を比較した。十分な再トレーニングエポックがあれば、ネットワークのパフォーマンスはデフォルトのGoogLeNetのパフォーマンスを近似することができます。解釈可能性を評価するために、Zimmermannらが開発したメカニスティック解釈可能性スコア(MIS)を用いる。実験の結果,MISを指標とした場合,解釈可能性とプルーニング率との間に有意な相関は認められなかった。さらに、極めて低い精度のネットワークは高いMISスコアを達成でき、MISは正しい決定の基盤を理解するなど、直感的な解釈可能性の概念と常に一致するとは限らないことを示唆している。

Deep Neural Networks (DNNs) are often over-parameterized for their tasks and can be compressed quite drastically by removing weights, a process called pruning. We investigate the impact of different pruning techniques on the classification performance and interpretability of GoogLeNet. We systematically apply unstructured and structured pruning, as well as connection sparsity (pruning of input weights) methods to the network and analyze the outcomes regarding the network's performance on the validation set of ImageNet. We also compare different retraining strategies, such as iterative pruning and one-shot pruning. We find that with sufficient retraining epochs, the performance of the networks can approximate the performance of the default GoogLeNet - and even surpass it in some cases. To assess interpretability, we employ the Mechanistic Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments reveal that there is no significant relationship between interpretability and pruning rate when using MIS as a measure. Additionally, we observe that networks with extremely low accuracy can still achieve high MIS scores, suggesting that the MIS may not always align with intuitive notions of interpretability, such as understanding the basis of correct decisions.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# 残留幾何強化を用いた一元化勾配型機械学習

Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement ( http://arxiv.org/abs/2409.19732v1 )

ライセンス: Link先を確認

Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang,

(参考訳) 深層ニューラルネットワークのプライバシーと信頼性を高めるために、機械学習(MU)が登場した。近似MUは大規模モデルの実用的手法である。約MUに関する我々の研究は、パラメータの近傍の正確なMUへの出力Kulback-Leiblerの発散を最小限に抑え、最も急降下方向を特定することから始まる。このプローブ方向は、重み付き忘れ勾配上昇、微調整による勾配降下、重み付き塩分濃度行列の3つの成分に分解される。ユークリッド計量から導かれるそのような分解は、既存の勾配に基づくMU法の大半を包含する。それでもユークリッド空間に付着すると、出力確率空間の見過ごされた幾何学的構造のために、準最適反復軌道が生じる可能性がある。残りの幾何によって表現された多様体に、未学習の更新を埋め込むことを提案し、残りのデータから2階ヘッセンを組み込む。効果的なアンラーニングが維持されたパフォーマンスに干渉するのを防ぐのに役立つ。しかし、大規模モデルに対する2階Hessianの計算は難解である。ヘッセン変調の利点を効果的に活用するために,最新の正解な未学習方向を暗黙的に近似する高速スローパラメータ更新戦略を提案する。特定のモーダル制約がなければ、我々のアプローチは、分類や生成を含む、コンピュータビジョンの未学習タスクに適応できる。大規模な実験は、我々の有効性と効率を検証します。特に,DiTを用いたImageNetのクラスフォゲッティングに成功し,DDPMを用いたCIFAR-10のクラスを50ステップで忘れることに成功した。

Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. Our investigation into approximate MU starts with identifying the steepest descent direction, minimizing the output Kullback-Leibler divergence to exact MU inside a parameters' neighborhood. This probed direction decomposes into three components: weighted forgetting gradient ascent, fine-tuning retaining gradient descent, and a weight saliency matrix. Such decomposition derived from Euclidean metric encompasses most existing gradient-based MU methods. Nevertheless, adhering to Euclidean space may result in sub-optimal iterative trajectories due to the overlooked geometric structure of the output probability space. We suggest embedding the unlearning update into a manifold rendered by the remaining geometry, incorporating second-order Hessian from the remaining data. It helps prevent effective unlearning from interfering with the retained performance. However, computing the second-order Hessian for large-scale models is intractable. To efficiently leverage the benefits of Hessian modulation, we propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction. Free from specific modal constraints, our approach is adaptable across computer vision unlearning tasks, including classification and generation. Extensive experiments validate our efficacy and efficiency. Notably, our method successfully performs class-forgetting on ImageNet using DiT and forgets a class on CIFAR-10 using DDPM in just 50 steps, compared to thousands of steps required by previous methods.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# Pear: ビジュアルパラメータ効率の良いファインチューニングにおけるプルーニングとアダプタの共有

Pear: Pruning and Sharing Adapters in Visual Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2409.19733v1 )

ライセンス: Link先を確認

Yibo Zhong, Yao Zhou,

(参考訳) アダプタは、微調整された基礎モデルにおいて計算と記憶のコストを軽減するために広く研究されてきた。しかし、アダプタ自体が冗長性を示し、不要なストレージオーバーヘッドと性能の低下につながる。本稿では,事前学習した視覚基盤モデルの高精度な微調整を行うための,新しいアダプタ・プルーニングフレームワークであるPrune and Share (Pear)を提案する。具体的には、特定のアダプタをプルークし、より重要でないアダプタをプルークされた位置と共有し、プルーニング後のこれらの位置への連続的な適応を可能にする。さらに、プルーンドアダプタの情報を保存する知識チェックポイント戦略を導入し、パフォーマンスをさらに向上させる。視覚適応ベンチマークの実験結果は、他の競合手法と比較して、提案したPearの有効性と効率を検証した。コードはhttps://github.com/yibozhong/pear.comにある。

Adapters have been widely explored to alleviate computational and storage costs when fine-tuning pretrained foundation models. However, the adapter itself can exhibit redundancy, leading to unnecessary storage overhead and inferior performance. In this paper, we propose Prune and Share (Pear), a novel adapter-pruning framework for efficient fine-tuning of pretrained visual foundation models. Specifically, we prune certain adapters and share the more important unpruned ones with positions where adapters are pruned, allowing continual adaptation at these positions after pruning. Additionally, a knowledge checkpoint strategy is introduced, which preserves the information of the pruned adapters and further boosts performance. Experimental results on visual adaptation benchmark validate the effectiveness and efficiency of the proposed Pear comparing to other competitive methods. Code is in https://github.com/yibozhong/pear.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# スクランブルテキスト:合成データを用いたOCR誤り訂正のための言語モデルトレーニング

Scrambled text: training Language Models to correct OCR errors using synthetic data ( http://arxiv.org/abs/2409.19735v1 )

ライセンス: Link先を確認

Jonathan Bourne,

(参考訳) OCRエラーは、デジタル化された歴史的アーカイブにおいて、そのユーザビリティと価値に大きな影響を及ぼす。生成言語モデル(LM)は、腐敗したテキストとより広い社会文化的文脈(Context Leveraging OCR Correction (CLOCR-C)と呼ばれるプロセス)によって、これらのエラーを修正する可能性を示している。しかし、そのようなモデルを微調整するのに十分なトレーニングデータを取得することは困難である。本稿では,LMを用いた合成データ上での言語モデルの微調整と文字レベルのマルコフ汚損処理により,OCR誤りの訂正能力を大幅に向上できることを示す。合成データで訓練されたモデルは、文字誤り率を55%減らし、単語誤り率を32%減らし、実際のデータで訓練されたモデルよりも優れていた。主な発見は、過度に破損したデータに対するトレーニングは、過度に破損したデータよりも優れており、不均一な文字レベルの汚職は、均一な汚職よりも優れている。本論文の出力は,有効なCLOCR-Cモデルの学習のための8つのヒューリスティックス,合成19世紀の新聞記事11,000のデータセット,および合成劣化データを作成するためのピソンライブラリスクランブルテキストである。

OCR errors are common in digitised historical archives significantly affecting their usability and value. Generative Language Models (LMs) have shown potential for correcting these errors using the context provided by the corrupted text and the broader socio-cultural context, a process called Context Leveraging OCR Correction (CLOCR-C). However, getting sufficient training data for fine-tuning such models can prove challenging. This paper shows that fine-tuning a language model on synthetic data using an LM and using a character level Markov corruption process can significantly improve the ability to correct OCR errors. Models trained on synthetic data reduce the character error rate by 55% and word error rate by 32% over the base LM and outperform models trained on real data. Key findings include; training on under-corrupted data is better than over-corrupted data; non-uniform character level corruption is better than uniform corruption; More tokens-per-observation outperforms more observations for a fixed token budget. The outputs for this paper are a set of 8 heuristics for training effective CLOCR-C models, a dataset of 11,000 synthetic 19th century newspaper articles and scrambledtext a python library for creating synthetic corrupted data.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# 認知症タスク・データセット・オポチュニティのためのNLPの体系的レビュー

A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities ( http://arxiv.org/abs/2409.19737v1 )

ライセンス: Link先を確認

Lotem Peled-Cohen, Roi Reichart,

(参考訳) 認知機能低下と言語との密接な関係は、認知症研究におけるNLPと医療コミュニティの長年の協力を後押ししてきた。そこで本研究では,NLPを認知症関連に応用した200以上の論文について,医学・技術・NLPに焦点をあてた文献から概説した。認知症検出,言語的バイオマーカー抽出,介護支援,患者支援などの主要な研究領域を同定し,その半数が臨床データを用いた認知症検出のみに焦点を当てていることを示す。人工的に劣化した言語モデル、合成データ、デジタルツインなどだ。信頼、科学的厳密性、適用性、コミュニティ間のコラボレーションに関するギャップと機会を強調し、レビューを通じて遭遇する多様なデータセット(記録、書き込み、構造化、自然発生、合成、臨床、ソーシャルメディアベースなど)を紹介します。このレビューは、医療・NLPコミュニティにおける認知症研究へのより創造的なアプローチを刺激することを目的としている。

The close link between cognitive decline and language has fostered long-standing collaboration between the NLP and medical communities in dementia research. To examine this, we reviewed over 200 papers applying NLP to dementia related efforts, drawing from medical, technological, and NLP-focused literature. We identify key research areas, including dementia detection, linguistic biomarker extraction, caregiver support, and patient assistance, showing that half of all papers focus solely on dementia detection using clinical data. However, many directions remain unexplored: artificially degraded language models, synthetic data, digital twins, and more. We highlight gaps and opportunities around trust, scientific rigor, applicability, and cross-community collaboration, and showcase the diverse datasets encountered throughout our review: recorded, written, structured, spontaneous, synthetic, clinical, social media based, and more. This review aims to inspire more creative approaches to dementia research within the medical and NLP communities.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# ANNによる3kbit NMR量子プロセッサにおけるマルチパートの絡み合いの検出

ANN-Enhanced Detection of Multipartite Entanglement in a Three-Qubit NMR Quantum Processor ( http://arxiv.org/abs/2409.19739v1 )

ライセンス: Link先を確認

Vaishali Gulati, Shivanshu Siyanwal, Arvind, Kavita Dorai,

(参考訳) 確率的局所演算および古典的通信(SLOCC)の下で、6つの非等価なクラスのうちの1つから引き出された実験的に生成された3ビット純状態の絡み合いクラスを、人工知能ニューラルネットワーク(ANN)モデルを用いて同定する。 ANNモデルは、州における真のマルチパーティ・エンタングルメント(GME)の存在を検出することもできる。計算対象の密度行列要素の削減に対応する問題次元の削減にデータサイエンス手法を適用した。 ANNモデルは、まずランダムに生成された状態を含むシミュレーションデータセットで訓練され、その後、標準形にキャストされたノイズの多い実験的な3量子状態で検証され、核磁気共鳴(NMR)量子プロセッサで生成される。我々は,Support Vector Machines (SVMs) とK-Nearest Neighbor (KNN) アルゴリズムを用いてANNモデルをベンチマークし,既存の3ビットSLOCCエンタングル分類スキームと比較した。以上の結果から,ANNモデルでは,数個の密度行列要素の事前知識を入力として,GME検出とSLOCCクラス識別を高精度に行うことができることが示された。 ANNモデルは入力データセットの削減に有効であるため、実験データセットに制限のある実生活環境での絡み合い分類には魅力的な方法である。

We use an artificial neural network (ANN) model to identify the entanglement class of an experimentally generated three-qubit pure state drawn from one of the six inequivalent classes under stochastic local operations and classical communication (SLOCC). The ANN model is also able to detect the presence of genuinely multipartite entanglement (GME) in the state. We apply data science techniques to reduce the dimensionality of the problem, which corresponds to a reduction in the number of required density matrix elements to be computed. The ANN model is first trained on a simulated dataset containing randomly generated states, and is later tested and validated on noisy experimental three-qubit states cast in the canonical form and generated on a nuclear magnetic resonance (NMR) quantum processor. We benchmark the ANN model via Support Vector Machines (SVMs) and K-Nearest Neighbor (KNN) algorithms and compare the results of our ANN-based entanglement classification with existing three-qubit SLOCC entanglement classification schemes such as 3-tangle and correlation tensors. Our results demonstrate that the ANN model can perform GME detection and SLOCC class identification with high accuracy, using a priori knowledge of only a few density matrix elements as inputs. Since the ANN model works well with a reduced input dataset, it is an attractive method for entanglement classification in real-life situations with limited experimental data sets.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# Molecular GANがByte-Pairエンコーディングに出会ったとき

When Molecular GAN Meets Byte-Pair Encoding ( http://arxiv.org/abs/2409.19740v1 )

ライセンス: Link先を確認

Huidong Tang, Chen Li, Yasuhiko Morimoto,

(参考訳) GAN(Generative Adversarial Network)のような深層生成モデルは、デノボ分子生成による新規な薬物様候補の発見において重要な役割を担っている。しかし、伝統的なキャラクタリゼーションは、分子データの新規で複雑なサブ構造を特定するのにしばしば苦労する。対照的に、代替トークン化法は優れた性能を示している。本研究は, バイトレベルのバイトペアエンコーディングトークンを組み込んだ分子GANを導入し, デノボ分子生成の強化に強化学習を用いた。具体的には、ジェネレータはアクターとして機能し、SMILES文字列を生成し、識別器は批評家として機能し、その品質を評価する。我々の分子GANは、計算効率の向上を目的とした革新的な報酬機構も統合している。詳細な可視化分析によって補完された妥当性,独特性,新奇性,多様性を評価し,GANの有効性を強く実証した。

Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by detailed visualization analysis, robustly demonstrate the effectiveness of our GAN.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# Tailored Federated Learning: 方向性規制と知識蒸留の活用

Tailored Federated Learning: Leveraging Direction Regulation & Knowledge Distillation ( http://arxiv.org/abs/2409.19741v1 )

ライセンス: Link先を確認

Huidong Tang, Chen Li, Huachong Yu, Sayaka Kamei, Yasuhiko Morimoto,

(参考訳) フェデレートラーニング(FL)は、特に医療のようなプライバシーに敏感な分野において、変革的なトレーニングパラダイムとして登場した。しかし、データ、計算能力、タスクにおけるクライアントの不均一性は重大な課題である。このような課題に対処するため、モデルデルタ正則化、パーソナライズされたモデル、フェデレーションされた知識蒸留、混合プールを統合するFL最適化アルゴリズムを提案する。モデルデルタ正則化は、サーバ上のモデル更新を集中的に最適化し、最小の通信コストでクライアントを効率的に更新する。タスクの不均一性に効果的に取り組むために、パーソナライズドモデルとフェデレーションド知識蒸留戦略が採用されている。さらに、読み出し操作の感度の変動に対応するためにミックスプーリングが導入されている。実験結果は,モデルデルタ正則化によって達成された顕著な精度と迅速な収束を示す。さらに,フェデレートド・ナレッジ蒸留アルゴリズムはFL性能,特に多様なデータを持つシナリオにおいて顕著に向上する。さらに,ミキシング・プール・リードアウト・オペレーションは,提案手法の有効性を示すとともに,クライアントにとって有意義なメリットを提供する。

Federated learning (FL) has emerged as a transformative training paradigm, particularly invaluable in privacy-sensitive domains like healthcare. However, client heterogeneity in data, computing power, and tasks poses a significant challenge. To address such a challenge, we propose an FL optimization algorithm that integrates model delta regularization, personalized models, federated knowledge distillation, and mix-pooling. Model delta regularization optimizes model updates centrally on the server, efficiently updating clients with minimal communication costs. Personalized models and federated knowledge distillation strategies are employed to tackle task heterogeneity effectively. Additionally, mix-pooling is introduced to accommodate variations in the sensitivity of readout operations. Experimental results demonstrate the remarkable accuracy and rapid convergence achieved by model delta regularization. Additionally, the federated knowledge distillation algorithm notably improves FL performance, especially in scenarios with diverse data. Moreover, mix-pooling readout operations provide tangible benefits for clients, showing the effectiveness of our proposed methods.

翻訳日:2024-11-05 21:19:41 公開日:2024-09-29

# 可視化のための自然言語生成 - 現状, 課題, 今後の方向性

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions ( http://arxiv.org/abs/2409.19747v1 )

ライセンス: Link先を確認

Enamul Hoque, Mohammed Saidul Islam,

(参考訳) 自然言語と可視化は、情報伝達において重要な役割を果たす人間のコミュニケーションの2つの相補的なモダリティである。可視化はデータのトレンドやパターン、異常を発見するのに役立つが、自然言語の記述はこれらの洞察を説明するのに役立つ。したがって、テキストと視覚化を組み合わせることは、データのコアメッセージを効果的に配信するための一般的なテクニックである。自然言語生成(NLG)の台頭を踏まえると、視覚化のための自然言語記述を自動的に作成することへの関心が高まっている。本調査では, 可視化のためのNLGの現状を体系的に検討し, 問題の分類について紹介する。 NLGタスクは、可視化のための自然言語インタフェース(NLI)の領域に該当する。調査の範囲を狭めるため、主に可視化のためのテキスト生成に焦点を当てた研究に焦点をあてる。提案手法のNLG問題と設計空間を特徴付けるために、5つのWh-questions, why and how NLG task are performed for visualizations, the task input and outputs, and the where and when the generated texts are integrated with visualizations。我々はこれらの5つのWh-questionsに基づいて、調査論文で使用されるソリューションを分類する。最後に、この領域における今後の研究の鍵となる課題と可能性について論じる。

Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# NeuroMax: 相互情報の最大化とグループトピック正規化によるニューラルトピックモデリングの強化

NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization ( http://arxiv.org/abs/2409.19749v1 )

ライセンス: Link先を確認

Duy-Tung Pham, Thien Trang Nguyen Vu, Tung Nguyen, Linh Ngo Van, Duc Anh Nguyen, Thien Huu Nguyen,

(参考訳) ニューラルトピックモデルの最近の進歩は、推論ネットワーク(エンコーダ)と事前学習言語モデル(PLM)の統合と、生成モデル(デコーダ)における単語とトピックの関係のモデリングの2つの主要な方向に集中している。しかし、大きなPLMを使用することで推論コストが大幅に増加し、推論時間が少なくなる状況では実用性が低下する。さらに、話題と言葉の関係とトピック間の相互関係を同時にモデル化することが重要である。本研究では,これらの課題に対処するため,ニューロマックス(ニューラルトピックモデルと事前学習言語モデルとグループトピック正規化を用いた相互情報の最大化)という新しいフレームワークを提案する。 NeuroMaxは、ニューラルトピックモデルにおけるエンコーダから得られたトピック表現と、PLMから派生した表現との相互情報を最大化する。さらに、NeuroMaxは最適なトランスポートを使用して、それらの間の情報の転送方法を分析することで、トピック間の関係を学習する。実験結果から、NeuroMaxは推論時間を短縮し、より一貫性のあるトピックやトピックグループを生成し、より代表的なドキュメント埋め込みを生成し、下流タスクのパフォーマンスを向上させることが示唆された。

Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low inference times. Furthermore, it is crucial to simultaneously model the relationships between topics and words as well as the interrelationships among topics themselves. In this work, we propose a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization) to address these challenges. NeuroMax maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM. Additionally, NeuroMax employs optimal transport to learn the relationships between topics by analyzing how information is transported among them. Experimental results indicate that NeuroMax reduces inference time, generates more coherent topics and topic groups, and produces more representative document embeddings, thereby enhancing performance on downstream tasks.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialated LLMs for Astronomy

AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy ( http://arxiv.org/abs/2409.19750v1 )

ライセンス: Link先を確認

Rui Pan, Tuan Dung Nguyen, Hardik Arora, Alberto Accomazzi, Tirthankar Ghosal, Yuan-Sen Ting,

(参考訳) ダウンストリームタスクの性能を高めるため,大規模言語モデルのドメイン固有データへの継続的な事前学習が提案されている。天文学では、以前は天文学に焦点を当てたベンチマークがなかったため、これらの特殊なLLMモデルの客観的評価が妨げられている。本研究は、高品質の天体MCQをキュレートする最近の取り組みを活用し、天文学における特殊なLSMを定量的に評価することを目的としている。 LLaMA-2-7BをベースとしたAstroLLaMAシリーズは,ベースモデルと比較して性能が低かった。この性能劣化は、arXivの要約テキストなど、継続事前学習のための高品質なデータを活用することで部分的に軽減できることを示す。その結果,70Bモデル上での連続的事前訓練は大きな改善をもたらす可能性が示唆された。しかし、現在の教師付き微調整データセットはインストラクションモデルの性能を制限している。本研究と合わせて,AstroLLaMA-3-8BとAstroLLaMA-2-70Bという新モデルを紹介した。

Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised fine-tuning dataset still constrains the performance of instruct models. In conjunction with this study, we introduce a new set of models, AstroLLaMA-3-8B and AstroLLaMA-2-70B, building upon the previous AstroLLaMA series.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# 尺度のバランスをとる:二項分類におけるクラス不均衡への対処に関する総合的研究

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification ( http://arxiv.org/abs/2409.19751v1 )

ライセンス: Link先を確認

Mohamed Abdelhamid, Abhyuday Desai,

(参考訳) 二項分類タスクにおけるクラス不均衡は、機械学習において重要な課題であり、しばしば少数クラスのパフォーマンスが低下する。本研究では,SMOTE(Synthetic Minority Over-Sampling Technique),SMOTE(Class Weights tuning),Decision Threshold Calibration(Decision Threshold Calibration)の3つの手法を網羅的に評価した。これらの手法を、15の多様な機械学習モデルと、さまざまなドメインからの30のデータセットにまたがる非干渉のベースラインシナリオと比較し、合計9000の実験を行った。 F1スコア, 精度, リコール, Brier-score, PR-AUC, AUCの計9項目について, 評価を行った。以上の結果から,3つの戦略がベースラインを上回り,決定閾値キャリブレーションが一貫した有効手法として出現することが示唆された。しかし,データセット間でのベストパフォーマンス手法では,特定の問題に対する複数のアプローチをテストすることの重要性が強調され,大きなばらつきが見られた。本研究は、不均衡なデータセットを扱う実践者にとって貴重な洞察を提供し、クラス不均衡処理手法を評価する際に、データセット固有の分析の必要性を強調する。

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling class imbalance: Synthetic Minority Over-sampling Technique (SMOTE), Class Weights tuning, and Decision Threshold Calibration. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models and 30 datasets from various domains, conducting a total of 9,000 experiments. Performance was primarily assessed using the F1-score, although our study also tracked results on additional 9 metrics including F2-score, precision, recall, Brier-score, PR-AUC, and AUC. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold Calibration emerging as the most consistently effective technique. However, we observed substantial variability in the best-performing method across datasets, highlighting the importance of testing multiple approaches for specific problems. This study provides valuable insights for practitioners dealing with imbalanced datasets and emphasizes the need for dataset-specific analysis in evaluating class imbalance handling techniques.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# 特徴分散型変分オートエンコーダを用いたオフライン署名検証

Offline Signature Verification Based on Feature Disentangling Aided Variational Autoencoder ( http://arxiv.org/abs/2409.19754v1 )

ライセンス: Link先を確認

Hansong Zhang, Jiangjian Guo, Kun Li, Yang Zhang, Yimei Zhao,

(参考訳) オフライン手書き署名検証システムは、手書き署名画像を本物の署名または偽物として認識することで、個人の身元を確認するために使用される。シグネチャ検証システムの主な課題は、シグネチャ画像から特徴を抽出し、分類のための分類器を訓練することである。これらのタスクの課題は2つあります。第一に、本物のシグネチャと熟練した偽造物はその外観に非常によく似ており、クラス間距離は小さい。第二に、シグネチャ検証モデルがトレーニングされている場合、熟練したフォージェリーのインスタンスは利用できないことが多い。そこで本研究では,新しい署名検証手法を提案する。署名画像から直接特徴を抽出するために可変オートエンコーダ(VAE)を使用する最初のモデルである。機能をより差別的にするために、機能分離のための新しい損失関数を導入することで、従来のVAEを改善します。さらに、抽出した特徴に応じて分類するためにSVM(Support Vector Machine)に依存している。 MCYT-75とGPDS-syntheticの2つの公開データセットで大規模な実験を行い、提案手法は13ドルの代表的オフライン署名検証法を著しく上回った。特徴的データセットの達成された改善は、実際のアプリケーションにおける開発システムの堅牢性と大きなポテンシャルを示している。

Offline handwritten signature verification systems are used to verify the identity of individuals, through recognizing their handwritten signature image as genuine signatures or forgeries. The main tasks of signature verification systems include extracting features from signature images and training a classifier for classification. The challenges of these tasks are twofold. First, genuine signatures and skilled forgeries are highly similar in their appearances, resulting in a small inter-class distance. Second, the instances of skilled forgeries are often unavailable, when signature verification models are being trained. To tackle these problems, this paper proposes a new signature verification method. It is the first model that employs a variational autoencoder (VAE) to extract features directly from signature images. To make the features more discriminative, it improves the traditional VAEs by introducing a new loss function for feature disentangling. In addition, it relies on SVM (Support Vector Machine) for classification according to the extracted features. Extensive experiments are conducted on two public datasets: MCYT-75 and GPDS-synthetic where the proposed method significantly outperformed $13$ representative offline signature verification methods. The achieved improvement in distinctive datasets indicates the robustness and great potential of the developed system in real application.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# Bose-Einstein Condensateを用いたLambda-Gravityの探索

Probing Lambda-Gravity with Bose-Einstein Condensate ( http://arxiv.org/abs/2409.19755v1 )

ライセンス: Link先を確認

Hector A. Fernandez-Melendez, Alexander Belyaev, Vahe Gurzadyan, Ivette Fuentes,

(参考訳) 本稿では,ボース・アインシュタイン凝縮体(BEC)における量子音速励起のダイナミクスを利用した,テーブルトップ実験の規模で動作可能な新しい検出器の概念を用いて,2つの基本重力定数の精密な試験を提案する。この設定では、音速励起とBECの基底状態とを混合するトリッター演算により、約2桁の感度が向上する。 BECは重力ポテンシャルの2つの重要な成分に対して、Newtonian $GM/r$項と宇宙定数$\Lambda r^2$の2つの感度を示す。最先端の実験設計を用いて、重力定数$G$を10^{-17}$ N m$^2$/kg$^2$まで精度で測定できることを予測した。さらに、この実験は、地球上の最高の上限である$\Lambda$を$<10^{-31}$ m$^{-2}$と定め、宇宙定数を初めて実験室で観測した。さらに、この設定は重力ポテンシャルにおける各項の距離依存的な振る舞いの測定を可能にし、修正重力理論をテストする新しい手段を提供する。

We propose a precise test of two fundamental gravitational constants using a novel detector concept that exploits the dynamics of quantum phononic excitations in a trapped Bose-Einstein condensate (BEC), operable at the scale of table-top experiments. In this setup, the sensitivity is enhanced by approximately two orders of magnitude through the use of a tritter operation, which mixes phononic excitations with the BEC's ground state. The BEC exhibits unique sensitivity to the two key components of the gravitational potential in $\Lambda$-gravity: the Newtonian $GM/r$ term and the cosmological constant $\Lambda r^2$. Using state-of-the-art experimental design, we predict that the gravitational constant $G$ could be measured with an accuracy up to $10^{-17}$ N m$^2$/kg$^2$, representing an improvement by two orders of magnitude over current measurements. Moreover, this experiment could establish the best Earth-based upper limit on $\Lambda$ at $<10^{-31}$ m$^{-2}$, marking the first laboratory-based probe of the cosmological constant. Additionally, the setup allows for the measurement of the distance-dependent behaviour of each term in the gravitational potential, providing a novel means to test modified gravity theories.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# 完全学習型医療システムを実現するためのフェデレーションラーニングのためのプライバシの進歩

Advances in Privacy Preserving Federated Learning to Realize a Truly Learning Healthcare System ( http://arxiv.org/abs/2409.19756v1 )

ライセンス: Link先を確認

Ravi Madduri, Zilinghan Li, Tarak Nandi, Kibaek Kim, Minseok Ryu, Alex Rodriguez,

(参考訳) 学習医療システム(LHS)の概念は、患者医療からのマルチモーダルデータを継続的に分析し、将来の医療成果を高める自己改善ネットワークを構想している。しかし、このビジョンを実現することは、データ共有とプライバシ保護において大きな課題に直面している。プライバシ保護フェデレーションラーニング(PPFL)は、患者プライバシを保護しながら、分散データからの協調的な学習を可能にすることによって、これらの課題に対処する可能性を秘めている、変革的で有望なアプローチである。本稿では,医学研究所 (IOM) Roundtable が定義した,真の LHS を実現するために,PPFL を医療エコシステムに統合するビジョンを提案する。

The concept of a learning healthcare system (LHS) envisions a self-improving network where multimodal data from patient care are continuously analyzed to enhance future healthcare outcomes. However, realizing this vision faces significant challenges in data sharing and privacy protection. Privacy-Preserving Federated Learning (PPFL) is a transformative and promising approach that has the potential to address these challenges by enabling collaborative learning from decentralized data while safeguarding patient privacy. This paper proposes a vision for integrating PPFL into the healthcare ecosystem to achieve a truly LHS as defined by the Institute of Medicine (IOM) Roundtable.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# 三次元因果順序推定ゲームにおける共有絡み合い

Shared entanglement for three-party causal order guessing game ( http://arxiv.org/abs/2409.19762v1 )

ライセンス: Link先を確認

Ryszard Kukulski, Paulina Lewandowska, Karol Życzkowski,

(参考訳) コミュニケーションタスクの変種では、プレイヤーは特定のタスクを後で計算するためのローカル戦略を選択し、別々に作業する。パーティ間の通信と共有の絡み合いに量子ビットを利用することは、これらの状況におけるパフォーマンスを高めるための認識された方法である。本研究では,アリス,ボブ,チャーリーの3人が,その動きの隠された順序を知りたがるゲームを紹介した。演算の合成順序を識別するために、古典的な設定よりも共有絡み合いと局所演算を用いる量子戦略の利点を示す。また, 量子資源の役割について検討した。

In a variant of communication tasks, players cooperate in choosing their local strategies to compute a given task later, working separately. Utilizing quantum bits for communication and sharing entanglement between parties is a recognized method to enhance performance in these situations. In this work, we introduce the game for which three parties, Alice, Bob and Charlie, would like to discover the hidden order in which they make the moves. We show the advantage of quantum strategies that use shared entanglement and local operations over classical setups for discriminating operations' composition order. The role of quantum resources improving the probability of successful discrimination is also investigated.

翻訳日:2024-11-05 17:49:48 公開日:2024-09-29

# 空間的時間的注意を伴うスパイキングトランス

Spiking Transformer with Spatial-Temporal Attention ( http://arxiv.org/abs/2409.19764v1 )

ライセンス: Link先を確認

Donghyun Lee, Yuhang Li, Youngeun Kim, Shiting Xiao, Priyadarshini Panda,

(参考訳) スパイキングニューラルネットワーク(SNN)は、疎二元活性化のため、従来のニューラルネットワーク(ANN)に代わる、魅力的でエネルギー効率のよい代替手段を提供する。トランスアーキテクチャの成功を生かしたスパイクトランスアーキテクチャは、データセットのサイズとパフォーマンスをスケールアップするために検討されている。しかし、既存の研究はスパイク変圧器における空間的自己意識のみを考慮し、時間経過を通して固有の時間的文脈を無視している。本研究では,空間的および時間的情報を付加的な計算負荷で自己注意に組み込むための,シンプルで簡単なアーキテクチャである空間的時間的注意を伴うスパイキングトランスフォーマー(STAtten)を提案する。 STAttenは、時間的またはトークンのインデックスを分割し、クロスマンタ内の自己アテンションを計算して、空間的時間的情報を効果的に組み込む。まず、時系列データセットを用いて、長期の時間的依存を捕捉する空間的注意機構の能力を検証する。さらに、CIFAR10/100、ImageNet、CIFAR10-DVS、N-Caltech101など、さまざまなデータセットに関する広範な実験を通じて、このアプローチを検証する。特に、当社のクロスアテンションメカニズムは、ImageNetデータセットで78.39パーセントの精度を実現しています。

Spiking Neural Networks (SNNs) present a compelling and energy-efficient alternative to traditional Artificial Neural Networks (ANNs) due to their sparse binary activation. Leveraging the success of the transformer architecture, the spiking transformer architecture is explored to scale up dataset size and performance. However, existing works only consider the spatial self-attention in spiking transformer, neglecting the inherent temporal context across the timesteps. In this work, we introduce Spiking Transformer with Spatial-Temporal Attention (STAtten), a simple and straightforward architecture designed to integrate spatial and temporal information in self-attention with negligible additional computational load. The STAtten divides the temporal or token index and calculates the self-attention in a cross-manner to effectively incorporate spatial-temporal information. We first verify our spatial-temporal attention mechanism's ability to capture long-term temporal dependencies using sequential datasets. Moreover, we validate our approach through extensive experiments on varied datasets, including CIFAR10/100, ImageNet, CIFAR10-DVS, and N-Caltech101. Notably, our cross-attention mechanism achieves an accuracy of 78.39 % on the ImageNet dataset.